intro-to-neural-networks

Hands-on Tutorial on Neural Networks

This is a hands-on tutorial on using Keras to learn how feed-forward neural networks work. It is a part of the Artificial Intelligence course at the University of Missouri-St. Louis.


Activity 1. Practice Python3

Learning Objectives:

Video Lectures:

Notebooks:

What to submit?

  1. Describe three most important ideas discussed in the lectures. Mentioning the topic or idea isn’t enough, you need to describe the actual specific idea.
  2. Describe two most concepts ideas discussed in the lectures that you did not understand. Mentioning the topic or idea isn’t enough, you need to describe precisely what you did not understand.
  3. Mention one question that comes to your mind that you feel like asking.
  4. HTML of your Jupyter Notebook where you practiced Python fundamentals discussed in the Video lectures.

What NOT to submit?

Activity 2. Practice Numpy, Matplotlib, and Pandas

Learning Objectives:

Video Lectures:

Notebooks:

What to submit?

  1. Describe the three most important ideas discussed in the lectures. Mentioning the topic or idea isn’t enough, you need to describe the actual specific idea.
  2. Describe two concepts discussed in the lectures that you did not understand. Mentioning the topic or idea isn’t enough, you need to describe precisely what you did not understand.
  3. Mention one question that comes to your mind that you feel like asking.
  4. HTML of your Jupyter Notebook where you practiced the concepts and the code discussed in the lectures.

Activity 3. Univariate Linear Regression

Learning Objectives:

Video Lectures:

Notebooks:

Tasks:

  1. Pick a dataset from the UCI ML database.
  2. Perform univariate linear regression on the dataset (not the ‘pima-diabetes’ dataset).

Note: When selecting variables (columns) for performing linear regression, it is important to choose continuous variables and not binary variables. Before feeding the data to the regression model, you may need to normalize/standardize your dataset.

What to submit?

  1. Describe the three most important ideas discussed in the lectures. Mentioning the topic or idea isn’t enough, you need to describe the actual specific idea.
  2. Describe two concepts discussed in the lectures that you did not understand. Mentioning the topic or idea isn’t enough, you need to describe precisely what you did not understand.
  3. Mention one question that comes to your mind that you feel like asking.
  4. HTML of your complete Jupyter Notebook.

Activity 4. Logistic Regression

Learning Objectives:

Video Lectures:

Notebooks:

Tasks:

  1. Pick a dataset from the UCI ML database.
  2. Perform logistic logistic regression on the dataset (not the ‘pima-diabetes’ dataset).

Note: When selecting variables (columns) for performing logistic regression, it is important to select a binary variable (i.e. the values of this variable must be 0 or 1, nothing else) as the output variable.

What to submit?

  1. Describe the three most important ideas discussed in the lectures. Mentioning the topic or idea isn’t enough, you need to describe the actual specific idea.
  2. Describe two concepts discussed in the lectures that you did not understand. Mentioning the topic or idea isn’t enough, you need to describe precisely what you did not understand.
  3. Mention one question that comes to your mind that you feel like asking.
  4. HTML of your complete Jupyter Notebook.

Activity 5. Using Neural Networks for Binary Classification

Learning Objectives:

Video Lectures:

Notebooks:

Readings:

Tasks:

  1. Pick a dataset from the UCI ML database.
  2. Build a neural network classifier for a dataset of your choice
  3. Evaluate your model using accuracy, precision, and recall
  4. Compare the accuracy of your model with the baseline accuracy
  5. Compare the performance of the neural network with a logistic regression model

Note: The neural network classifier should be more accurate than a basic logistic regression model. This is because a neural network model has more parameters (weights and biases) to learn the patterns in the data.

What to submit?

  1. Describe why it is crucial to consider ‘random baseline accuracy’ when evaluating classification models.
  2. Why should any model’s accuracy be higher than the ‘random baseline accuracy’?
  3. HTML of your complete Jupyter Notebook.

Activity 6. Overfitting vs Generalization

Learning Objectives: fitting, and generalization

Video Lectures:

Tasks: Complete the following steps for a standard tabular classification dataset of your choice, where the output variable is a binary variable.

  1. Shuffle the rows (see example code below)
    # Shuffle the datasets
    import random
    np.random.shuffle(dataset)
    
  2. The next step is to split the rows into training and validation set. For small datasets, selecting a random 30% of the rows as the validation set and leaving the rest as the training set works well. For larger datasets, smaller percentages can be enough. This splitting yields four numpy arrays - XTRAIN, YTRAIN, XVALID, and YVALID (see example code below).
    # Split into training and validation, 30% validation set and 70% training 
    index_30percent = int(0.3 * len(dataset[:, 0]))
    print(index_30percent)
    XVALID = dataset[:index_30percent, "all input columns"]
    YVALID = dataset[:index_30percent, "output column"]
    XTRAIN = dataset[index_30percent:, "all input columns"]
    YTRAIN = dataset[index_30percent:, "output column"]
    
  3. Normalize the data to obtain the ‘mean’ and ‘standard deviation’. It is important to only use the XTRAIN array, not XVALID. XVALID should be normalized using the mean and standard deviation obtained from XTRAIN.
  4. If your model is trained using the training data (XTRAIN and YTRAIN) how does it perform on the validation set (XVALID and YVALID)?
    # Learn the model from the training set
    model.fit(XTRAIN, YTRAIN, ...)
    
# Evaluate on the training set (should deliver high accuracy)
P = model.predict(XTRAIN)
accuracy = model.evaluate(XTRAIN, YTRAIN)
#Evaluate on the validation set
P = model.predict(XVALID)
accuracy = model.evaluate(XVALID, YVALID)
  1. Build a neural network model to overfit the training set (to get almost 100% accuracy or as high as it is possible) and then evaluate it on the validation set. To obtain high accuracy on the training set, one can build a larger neural network (with more layers and more neurons per layer) and train as long as possible.

Submission Requirements: In the notebook where you practice, also answer the following:

  1. Does your model perform better (in terms of accuracy) on the training set or validation set? Is this a problem? How to avoid this problem?
  2. Why can over-training be a problem?
  3. What is the difference between generalization, overfitting, and underfitting?
  4. Why should you not normalize XVALID separately, i.e. why should we use the parameters from XTRAIN to normalize XVALID?

Activity 8. Fine-tuning Hyper-parameters of the Model

In this activity, the task is to learn how to design and train a model that does well on the unseen (validation) daset. The weights and biases of a neural network model are its parameters. The parameters such as the number of layers of neurons, numbers of neurons in each layer, number of epochs, batch size, activation functions, choice of optimizer, choice of loss function, etc. are the hyperparameters of a model. When training a model for a new dataset an extremely important question is - what combinations of hyperameters yield the maximum accuracy on the validation set? Remember, when playing with activation functions, the activation of the last layer should not change - it should always be sigmoid for binary classification and ReLU or linear for regression. The task is in this activity is to try as many hyperparameters as possible to obtain the highest possible accuracy on the validation set. For a classification dataset of your choice, the first step is to create a notebook where you can train the model using the training set and evaluate on the validation set. Then, the objective is to find the optimal (best) hyper-parameters that maximize the accuracy (or minimize MAE) on the validation set.

Below are the hyperparameters to optimize:

  1. The number of layers in the neural network (try 1, 2, 4, 8, 16, etc.).
  2. The number of neurons in each layer (try 2, 4, 8, 16, 32, 64, 128, 256, 512, etc.).
  3. Various batch sizes (8, 16, 32, 64, etc.).
  4. Various number of epochs (2, 4, 8, …, 5000, etc.).
  5. Various optimizers (rmsprop, sgd, nadam, adam, gd, etc.)
  6. Various activation functions for the intermediate layers (relu, sigmoid, elu, etc.)

Activity 9. Early Stopping

Assumption: You already know (tentatively) what hyperparameters are good for your dataset

Activity 10. Iterative Feature Selection & Removal

What to submit?

Contact:

Badri Adhikari
Associate Professor
Department of Computer Science
University of Missouri-St. Louis