Hands-on Tutorial on Neural Networks
This is a hands-on tutorial on using Keras to learn how feed-forward neural networks work. It is a part of the Artificial Intelligence course at the University of Missouri-St. Louis.
Activity 1. Practice Python3
Learning Objectives:
- Learn how to use Google Colab (or a local Jupyter Notebook installation)
- Practice Python3 fundamentals for machine learning
Video Lectures:
Notebooks:
What to submit?
- Describe three most important ideas discussed in the lectures. Mentioning the topic or idea isn’t enough, you need to describe the actual specific idea.
- Describe two most concepts ideas discussed in the lectures that you did not understand. Mentioning the topic or idea isn’t enough, you need to describe precisely what you did not understand.
- Mention one question that comes to your mind that you feel like asking.
- HTML of your Jupyter Notebook where you practiced Python fundamentals discussed in the Video lectures.
What NOT to submit?
- Your notebook ‘.pynb’ file
Activity 2. Practice Numpy, Matplotlib, and Pandas
Learning Objectives:
- Practice Numpy
- Practice Plotly (note the 3D plot)
- Practice Pandas for data analysis
- Learn data cleaning and data normalization
Video Lectures:
Notebooks:
What to submit?
- Describe the three most important ideas discussed in the lectures. Mentioning the topic or idea isn’t enough, you need to describe the actual specific idea.
- Describe two concepts discussed in the lectures that you did not understand. Mentioning the topic or idea isn’t enough, you need to describe precisely what you did not understand.
- Mention one question that comes to your mind that you feel like asking.
- HTML of your Jupyter Notebook where you practiced the concepts and the code discussed in the lectures.
Activity 3. Univariate Linear Regression
Learning Objectives:
- learn univariate linear regression
- practice univariate linear regression
- apply linear regression modeling to a new dataset
Video Lectures:
Notebooks:
Tasks:
- Pick a dataset from the UCI ML database.
- Perform univariate linear regression on the dataset (not the ‘pima-diabetes’ dataset).
Note: When selecting variables (columns) for performing linear regression, it is important to choose continuous variables and not binary variables. Before feeding the data to the regression model, you may need to normalize/standardize your dataset.
What to submit?
- Describe the three most important ideas discussed in the lectures. Mentioning the topic or idea isn’t enough, you need to describe the actual specific idea.
- Describe two concepts discussed in the lectures that you did not understand. Mentioning the topic or idea isn’t enough, you need to describe precisely what you did not understand.
- Mention one question that comes to your mind that you feel like asking.
- HTML of your complete Jupyter Notebook.
Activity 4. Logistic Regression
Learning Objectives:
- learn logistic regression
- practice logistic regression on a dataset with more than one input variable
Video Lectures:
Notebooks:
Tasks:
- Pick a dataset from the UCI ML database.
- Perform logistic logistic regression on the dataset (not the ‘pima-diabetes’ dataset).
Note: When selecting variables (columns) for performing logistic regression, it is important to select a binary variable (i.e. the values of this variable must be 0 or 1, nothing else) as the output variable.
What to submit?
- Describe the three most important ideas discussed in the lectures. Mentioning the topic or idea isn’t enough, you need to describe the actual specific idea.
- Describe two concepts discussed in the lectures that you did not understand. Mentioning the topic or idea isn’t enough, you need to describe precisely what you did not understand.
- Mention one question that comes to your mind that you feel like asking.
- HTML of your complete Jupyter Notebook.
Activity 5. Using Neural Networks for Binary Classification
Learning Objectives:
- learn the difference between a logistic regression model and a neural network model
- practice building neural networks
- practice training a neural network model to perform binary classification
Video Lectures:
Notebooks:
Readings:
Tasks:
- Pick a dataset from the UCI ML database.
- Build a neural network classifier for a dataset of your choice
- Evaluate your model using accuracy, precision, and recall
- Compare the accuracy of your model with the baseline accuracy
- Compare the performance of the neural network with a logistic regression model
Note: The neural network classifier should be more accurate than a basic logistic regression model. This is because a neural network model has more parameters (weights and biases) to learn the patterns in the data.
What to submit?
- Describe why it is crucial to consider ‘random baseline accuracy’ when evaluating classification models.
- Why should any model’s accuracy be higher than the ‘random baseline accuracy’?
- HTML of your complete Jupyter Notebook.
Activity 6. Overfitting vs Generalization
Learning Objectives: fitting, and generalization
- learn to identify when a model exhibits overfitting, underfitting, or generalization
- practice building neural networks
- learn the purpose of splitting a dataset into the training set and validation set
Video Lectures:
Tasks:
Complete the following steps for a standard tabular classification dataset of your choice, where the output variable is a binary variable.
- Shuffle the rows (see example code below)
# Shuffle the datasets
import random
np.random.shuffle(dataset)
- The next step is to split the rows into training and validation set. For small datasets, selecting a random 30% of the rows as the validation set and leaving the rest as the training set works well. For larger datasets, smaller percentages can be enough. This splitting yields four numpy arrays - XTRAIN, YTRAIN, XVALID, and YVALID (see example code below).
# Split into training and validation, 30% validation set and 70% training
index_30percent = int(0.3 * len(dataset[:, 0]))
print(index_30percent)
XVALID = dataset[:index_30percent, "all input columns"]
YVALID = dataset[:index_30percent, "output column"]
XTRAIN = dataset[index_30percent:, "all input columns"]
YTRAIN = dataset[index_30percent:, "output column"]
- Normalize the data to obtain the ‘mean’ and ‘standard deviation’. It is important to only use the XTRAIN array, not XVALID. XVALID should be normalized using the mean and standard deviation obtained from XTRAIN.
- If your model is trained using the training data (XTRAIN and YTRAIN) how does it perform on the validation set (XVALID and YVALID)?
# Learn the model from the training set
model.fit(XTRAIN, YTRAIN, ...)
# Evaluate on the training set (should deliver high accuracy)
P = model.predict(XTRAIN)
accuracy = model.evaluate(XTRAIN, YTRAIN)
#Evaluate on the validation set
P = model.predict(XVALID)
accuracy = model.evaluate(XVALID, YVALID)
- Build a neural network model to overfit the training set (to get almost 100% accuracy or as high as it is possible) and then evaluate it on the validation set. To obtain high accuracy on the training set, one can build a larger neural network (with more layers and more neurons per layer) and train as long as possible.
Submission Requirements:
In the notebook where you practice, also answer the following:
- Does your model perform better (in terms of accuracy) on the training set or validation set? Is this a problem? How to avoid this problem?
- Why can over-training be a problem?
- What is the difference between generalization, overfitting, and underfitting?
- Why should you not normalize XVALID separately, i.e. why should we use the parameters from XTRAIN to normalize XVALID?
Activity 8. Fine-tuning Hyper-parameters of the Model
In this activity, the task is to learn how to design and train a model that does well on the unseen (validation) daset. The weights and biases of a neural network model are its parameters. The parameters such as the number of layers of neurons, numbers of neurons in each layer, number of epochs, batch size, activation functions, choice of optimizer, choice of loss function, etc. are the hyperparameters of a model. When training a model for a new dataset an extremely important question is - what combinations of hyperameters yield the maximum accuracy on the validation set? Remember, when playing with activation functions, the activation of the last layer should not change - it should always be sigmoid for binary classification and ReLU or linear for regression. The task is in this activity is to try as many hyperparameters as possible to obtain the highest possible accuracy on the validation set. For a classification dataset of your choice, the first step is to create a notebook where you can train the model using the training set and evaluate on the validation set. Then, the objective is to find the optimal (best) hyper-parameters that maximize the accuracy (or minimize MAE) on the validation set.
Below are the hyperparameters to optimize:
- The number of layers in the neural network (try 1, 2, 4, 8, 16, etc.).
- The number of neurons in each layer (try 2, 4, 8, 16, 32, 64, 128, 256, 512, etc.).
- Various batch sizes (8, 16, 32, 64, etc.).
- Various number of epochs (2, 4, 8, …, 5000, etc.).
- Various optimizers (rmsprop, sgd, nadam, adam, gd, etc.)
- Various activation functions for the intermediate layers (relu, sigmoid, elu, etc.)
Activity 9. Early Stopping
Assumption: You already know (tentatively) what hyperparameters are good for your dataset
- Find a regression dataset of your choice and split into training and validation set
- There are two objectives in this activity:
a. Implement automatic stopping of training if the accuracy does not improve for certain epochs
b. Implement automatic saving of the best model (best on the validation set)
- Define callbacks as follows (and fix the obvious bugs):
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
# File name must be in quotes
callback_a = ModelCheckpoint(filepath = your_model.hdf5, monitor='val_loss', save_best_only = True, save_weights_only = True, verbose = 1)
# The patience value can be 10, 20, 100, etc. depending on when your model starts to overfit
callback_b = EarlyStopping(monitor='val_loss', mode='min', patience=your_patience_value, verbose=1)
- Update your
model.fit()
by adding the callbacks:
history = model.fit(XTRAIN, YTRAIN, validation_data=(XVALID, YVALID), epochs=?, batch_size=?, callbacks = [callback_a, callback_b])
- Before you evaluate your model on the validation set, it is important to load the “checkpoint-ed” model:
# File name must be in quotes
model.load_weights(your_model.hdf5)
- Plot the learning curves and demonstrate that model checkpointing helps to obtain higher accuracy on the validation set
- At the end of your notebook, answer the following questions:
a. Almost always, training with early stopping finishes faster (because it stops early). Approximately, how long does it take for your training to finish with and without early stopping?
b. When model checkpointing, your checkpointed model will almost always be more accurate on the validation set. What is the MAE on the Validation set with and without model checkpointing?
Activity 10. Iterative Feature Selection & Removal
- As of now, it is assumed that given a dataset (of your choice) you can build a model that can do reasonably well on the validation set, i.e. you have a good idea of the network architecture needed, the number of epochs needed, model Checkpointing, the approximate MAE or accuracy that one might expect, etc.
- Here we will train a model using the training set and evaluate on the validation set; you are free to choose your own dataset (even your project dataset is fine)
- In this activity you will implement a simple Recursive Feature Elimination (RFE) technique to remove redundant or insignificant input features
- Expected output 1: Plot the significance (importance) of each feature after training your model using one feature at a time:
a. X-axis represents the feature that was used as the input
b. Y-axis is accuracy or MAE of the validation set
- Observing these MAE/accuracy values, we can rank the features by their importance (how informative each one is)
- Next, iteratively remove one feature at a time (starting with the least significant feature) and repeat the training noting the accuracy/MAE on the validation set
- Expected output 2: Plot to report your findings:
a. X-axis represents feature removal, for example, second entry is after removing feature1, and third entry is after removing feature1 and feature2
b. Y-axis is accuracy or MAE of the validation set
What to submit?
- An HTML version of your Colab notebook.
- A small report with results on the “Iterative feature removal & selection”.
Badri Adhikari
Associate Professor
Department of Computer Science
University of Missouri-St. Louis