A hands-on introduction to feed-forward neural networks using Tensorflow and Keras


This is a mini crash course on using Tensorflow to develop feed-forward neural networks. It contains 10 scaffolded activities. If you already have some background in Tensorflow, Keras, and/or machine learning, you may also be interested to take the machine learning crash course that Google recently released. You may also find it helpful to often refer to this recipe for supervised learning development.

Activity 1: Practice Python3


In this activity, the task is to learn how to use Google Colab and practice Python3. If you are doing Python programming for the first time, please practice Python3 at online platforms such as codewars.org too. If you fear learning new things (including Python3) you are welcome to take the Learning How to Learn course, the world’s most popular online course.

Activity 2: Practice Numpy, Matplotlib, and Pandas


In this activity, the task is to practice Numpy, Matplotlib, Plotly, Pandas for basic data analysis, and techniques of data cleaning and data normalization.

Activity 3: Create your own IMDB movie dataset


Please find at least 100 movies at IMDB. You can choose the movies that you have already watched or any other movies that you would like to watch. For each movie you will need to note certain features of the movie (see below for the list of features). You can create a table of all your movies and save in a Google Sheet or a Microsoft Excel file and export a .csv file. You table should look like the one below. Here is an example .csv file. For neural network training experiments the first column (Movie name) is not needed but you may want to keep it in your original version so you can later fix errors if you find any. To convert this dataset into a classification problem, you can sort the rows by the last column (i.e., the rating column) and convert the first half of the numbers to a 0 (zero) and the bottom half to a 1 (one). This way, your output column (rating) will be a binary variable. Alternatively, you can pick a dataset from the UCI ML database.

Features

  1. Duration
  2. Release year
  3. Number of ratings
  4. Number of reviews
  5. Number of critic reviews
  6. Rating
Name Duration_Min ReleaseYear NumberOfRatings NumberOfReviews NumberOfCriticReviews Rating
Gladiator 155 2000 1338076 2692 214 8.5
300 117 2006 731880 2212 476 7.6
I am legend 101 2007 685532 1559 331 7.2

Activity 4: Logistic regression


In this activity, the goal is to practice logistic regression on a dataset with more than one input variables. When selecting a variable (column) for performing logistic regression, it is important to select a binary variable as the output variable. In other words, the values of this variable must be 0 or 1, nothing else. Before feeding the data to the model, it is often important to normalize/standardize your input dataset. You may need to normalize your data for classification to work. Here, the task is to perform logistic regression on your classification dataset.

Activity 5: Binary classification using neural networks


In this activity, the goal is to practice training a neural network model to perform binary classification. A neural network classifier should be more accurate than a basic logistic regression model. This is because a neural network model has more parameters (weights and biases) to learn the patterns in the data. A binary classifier can be evaluated using metrics such as accuracy, precision, and recall. Interpreting the accuracy of a binary classifier can be tricky. This is because the baseline accuracy, i.e., minimum accuracy, is at least 50%. A good classifier should result in an accuracy that is much higher than a baseline accuracy. The tasks in this activity are (i) Build a neural network classifier for your dataset, (ii) Evaluate your model using accuracy, precision, and recall, (iii) Compare the accuracy of your model with the baseline accuracy, and (iv) Compare the performance of the neural network with a logistic regression model.

Activity 6: Overfitting vs generalization


In this activity, the goal is to learn the concepts of overfitting, underfitting, generalization, and the purpose of splitting a dataset into training set and validation set. For a standard tabular classification dataset of your choice, where the output variable is a binary variable, the first step is to shuffle the rows (see example code below). The next step is to split the rows into training and validation set. For small datasets, selecting a random 30% of the rows as the validation set and leaving the rest as the training set works well. For larger datasets, smaller percents can be enough. This splitting yeilds four numpy arrays - XTRAIN, YTRAIN, XVALID, and YVALID (see example code below). For normalizing the data and to obtain the ‘mean’ and ‘standard deviation’ it is important to only use the XTRAIN array, not XVALID. XVALID should be normalized using the mean and standard deviation obtained from XTRAIN. Then the main question one should ask is - if a model is trained using the training data (XTRAIN and YTRAIN) how does it perform on the validation set (XVALID and YVALID)? In this activity there are two tasks: (i) Build a neural network model to overfit the training set (to get almost 100% accuracy or as high as it is possible) and then evaluate on the validation set, and (ii) Evaluate the accuracy of the model for the training set and the validation set and discuss your findings. To obtain high accuracy on the training set, one can build a larger neural network (with more layers and more neurons per layer) and train as long as possible.

# Shuffle the datasets
import random
np.random.shuffle(dataset)
# Split into training and validation, 30% validation set and 70% training 
index_30percent = int(0.3 * len(dataset[:, 0]))
print(index_30percent)
XVALID = dataset[:index_30percent, "all input columns"]
YVALID = dataset[:index_30percent, "output column"]
XTRAIN = dataset[index_30percent:, "all input columns"]
YTRAIN = dataset[index_30percent:, "output column"]
# Learn the model from training set
model.fit(XTRAIN, YTRAIN, ...)
# Evaluate on the training set (should deliver high accuracy)
P = model.predict(XTRAIN)
accuracy = model.evaluate(XTRAIN, YTRAIN)
#Evaluate on the validation set
P = model.predict(XVALID)
accuracy = model.evaluate(XVALID, YVALID)

In the notebook where you practice, also answer the following:

  1. Does your model perform better (in terms of accuracy) on the training set or validation set? Is this a problem? How to avoid this problem?
  2. Why can over training be a problem?
  3. What is the difference between generalization, overfitting, and underfitting?
  4. Why should you not normalize XVALID separately, i.e. why should we use the parameters from XTRAIN to normalize XVALID?

Activity 7: Learning curves


This activity assumes that you have successfully completed all previous activities. It also requires some focus. Learning curves are a key to debug and diagnose a model’s performance. The goal in this activity is to plot learning curves and to interpret various learning curves. For a regression dataset of your choice, the first step is to shuffle the dataset. The next step is to split the dataset into the four arrays: XTRAIN, YTRAIN, XVALID, and YVALID. The next step is to train a neural network model using model.fit(). However, this time, XVALID and YVALID will also be passed as arguments to the model.fit() method. This is so when we call the method, it can evaluate the model on the validation set at the end of each epoch (see code block below). It is extremely important to understand that the model.fit() method does NOT use the validation dataset to perform the learning, it is only to evaluate the model after each epoch. When calling the model.fit() method we can also save its output in a variable, usually named history. This variable can be used to plot learning curves (see code block below). The task in this activity is to plot many learning curves in various scenarios. In particular, it is of interest to observe and analyze how the learning plots look like various settings. The following article discusses learning curves in more detail.

# Do the training (specify the validation set as well)
history = model.fit(XTRAIN, YTRAIN, validation_data = (XVALID, YVALID), verbose = 1)
# Check what's in the history
print(history.params)
# Plot the learning curves (loss/accuracy/MAE)
plt.plot(history.history['accuracy']) # replace with accuracy/MAE
plt.plot(history.history['val_accuracy']) # replace with val_accuracy, etc.
plt.ylabel('Accuracy')
plt.xlabel('epoch')
plt.legend(['training data', 'validation data'], loc='lower right')
plt.show()

Produce learning curves that represent the following cases:

  1. the validation set is too small relative to the training set - for example, only 1% or 2% of the total rows of data.
  2. the training set is too small compared to the validation sat - for example, only 1% or 2% of the total rows of data.
  3. a good learning curve (practically good)
  4. an overfitting model
  5. a model that shows that further training is required
  6. an underfit model that does not have sufficient capacity (also may imply that the data itself is difficult)

Activity 8: Fine-tuning hyper-parameters of your model


In this activity, the task is to learn how to design and train a model that does well on the unseen (validation) dataset. The weights and biases of a neural network model are its parameters. The parameters such as the number of layers of neurons, numbers of neurons in each layer, number of epochs, batch size, activation functions, choice of optimizer, choice of loss function, etc. are the hyperparameters of a model. When training a model for a new dataset an extremely important question is - what combinations of hyperameters yield the maximum accuracy on the validation set? Remember, when playing with activation functions, the activation of the last layer should not change - it should always be sigmoid for binary classification and ReLU or linear for regression. The task is in this activity is to try as many hyperparameters as possible to obtain the highest possible accuracy on the validation set. For a classification dataset of your choice, the first step is to create a notebook where you can train the model using the training set and evaluate on the validation set. Then, the objective is to find the optimal (best) hyper-parameters that maximize the accuracy (or minimize MAE) on the validation set.

Below are the hyperparameters to optimize:

  1. The number of layers in the neural network (try 1, 2, 4, 8, 16, etc.).
  2. The number of neurons in each layer (try 2, 4, 8, 16, 32, 64, 128, 256, 512, etc.).
  3. Various batch sizes (8, 16, 32, 64, etc.).
  4. Various number of epochs (2, 4, 8, …, 5000, etc.).
  5. Various optimizers (rmsprop, sgd, nadam, adam, gd, etc.)
  6. Various activation functions for the intermediate layers (relu, sigmoid, elu, etc.)

Activity 9: Early stopping


Assumption: You already know (tentatively) what hyperparameters are good for your dataset

Activity 10: Iterative feature removal & selection


Datasets refereed


  1. Pima diabetes dataset
  2. Wine quality dataset

Optional Activities


Univariate linear regression

In this activity, the goal is to practice univariate linear regression. When selecting variables (columns) for performing linear regression, it is important to choose continuous variables and not binary variables. Before feeding the data to the regression model, it is often important to normalize/standardize your input dataset. You may need to normalize your data for regression to work. Here, the task is to perform univariate linear regression on a dataset of your choice (other than the ‘pima-diabetes’ dataset).

Learning with missing values & noisy data

Linear regression with at least two input variables (Chapter 18)

In this activity, the goal is to practice linear regression on a dataset with more than one input variables. When selecting variables (columns) for performing linear regression, it is important to choose continuous variables and not binary variables. Before feeding the data to the regression model, it is often important to normalize/standardize your input dataset. You may need to normalize your data for regression to work. Here, the task is to perform linear regression on a dataset of your choice (other than the ‘pima-diabetes’ dataset).

Regression using NN and evaluation

In this activity, the goal is to practice training a neural network model to perform regression, i.e. predict continuous values. A neural network regression model should be more accurate than a basic linear regression model. This is because a neural network model has more parameters (weights and biases) to learn the patterns in the data. A regression model can be evaluated using metrics such as mean absolute error (MAE). This activity has five tasks: (i) Build a neural network regression model for a dataset of your choice, (ii) Evaluate your model using MAE, (iii) Compare the MAE of your model with a linear regression model, (iv) Assess if your model is biased towards predicting either larger values more correctly or smaller values more correctly, and (v) Experiment with various loss functions such as mae, mse, mean_squared_logarithmic_error, and ‘logcosh’, to find out which delivers the lowest MAE.