Facial Expression Recognition - GraphLab MXNet CNN

Kaggle - Challenges in Representation Learning: Facial Expression Recognition Challenge

https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data

The data consists of 48x48 pixel grayscale images of faces. The faces have been automatically registered so that the face is more or less centered and occupies about the same amount of space in each image. The task is to categorize each face based on the emotion shown in the facial expression in to one of seven categories (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral).

train.csv contains two columns, "emotion" and "pixels". The "emotion" column contains a numeric code ranging from 0 to 6, inclusive, for the emotion that is present in the image. The "pixels" column contains a string surrounded in quotes for each image. The contents of this string a space-separated pixel values in row major order. test.csv contains only the "pixels" column and your task is to predict the emotion column.

The training set consists of 28,709 examples. The public test set used for the leaderboard consists of 3,589 examples. The final test set, which was used to determine the winner of the competition, consists of another 3,589 examples.

This dataset was prepared by Pierre-Luc Carrier and Aaron Courville, as part of an ongoing research project. They have graciously provided the workshop organizers with a preliminary version of their dataset to use for this contest.

In [1]:
%matplotlib inline
In [2]:
import graphlab as gl
import matplotlib.pyplot as plt
import numpy as np
from timeit import default_timer as timer
import os

Read data

In [3]:
trainfile = 'train.txt'
testfile = 'test.txt'
pri_testfile = 'pri_test.txt'
In [8]:
# Read training data into lists
trainlabels = []
traindata = []

with open(trainfile, 'r') as f:
    for line in f:
        currfile, currlabel = line.split()
        trainlabels.append(currlabel)
In [9]:
# Read public test data into lists
testlabels = []
testdata = []
with open(testfile, 'r') as f:
    for line in f:
        currfile, currlabel = line.split()
        testlabels.append(currlabel)
In [10]:
# Read private test data into lists
pri_testlabels = []
pri_testdata = []
with open(pri_testfile, 'r') as f:
    for line in f:
        currfile, currlabel = line.split()
        pri_testlabels.append(currlabel)
In [11]:
# Verify dataset sizes
print len(traindata), len(trainlabels), len(testdata), len(testlabels), len(pri_testdata), len(pri_testlabels)
0 28707 0 3589 0 3589
In [4]:
# Load the images
trainimgs = gl.image_analysis.load_images('Training/', random_order=False)
testimgs = gl.image_analysis.load_images('PublicTest/', random_order=False)
pri_testimgs = gl.image_analysis.load_images('PrivateTest/', random_order=False)
[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1509241980.log
This non-commercial license of GraphLab Create for academic use is assigned to rsridha2@uci.edu and will expire on December 03, 2017.
Unsupported image format. Supported formats are JPG and PNG	 file: /Users/rahulsridhar/Documents/Courses - Spring/CS 216/Project/Facial Expression/Code/Training/.DS_Store
Read 16878 images in 5.00012 secs	speed: 3375.52 file/sec
Unsupported image format. Supported formats are JPG and PNG	 file: /Users/rahulsridhar/Documents/Courses - Spring/CS 216/Project/Facial Expression/Code/PublicTest/.DS_Store
Unsupported image format. Supported formats are JPG and PNG	 file: /Users/rahulsridhar/Documents/Courses - Spring/CS 216/Project/Facial Expression/Code/PrivateTest/.DS_Store
In [5]:
# Verify data shapes
print trainimgs.shape, testimgs.shape, pri_testimgs.shape
(28707, 2) (3589, 2) (3589, 2)
In [12]:
# Create columns for the data labels
trainimgs['label'] = trainlabels
testimgs['label'] = testlabels
pri_testimgs['label'] = pri_testlabels
In [13]:
# Randomly permute rows of the data
np.random.seed(0)
train_idx = np.random.permutation(len(trainimgs))
test_idx = np.random.permutation(len(testimgs))
pri_test_idx = np.random.permutation(len(pri_testimgs))
In [14]:
# Create a column for the random permutation
trainimgs['idx'] = train_idx
testimgs['idx'] = test_idx
pri_testimgs['idx'] = pri_test_idx
In [15]:
train_idx
Out[15]:
array([17226, 27381, 27735, ...,  9845, 10799,  2732])
In [16]:
pri_testimgs.print_rows(2)
+-------------------------------+----------------------+-------+------+
|              path             |        image         | label | idx  |
+-------------------------------+----------------------+-------+------+
| /Users/rahulsridhar/Docume... | Height: 48 Width: 48 |   0   | 3372 |
| /Users/rahulsridhar/Docume... | Height: 48 Width: 48 |   0   | 263  |
+-------------------------------+----------------------+-------+------+
[3589 rows x 4 columns]

In [17]:
# Sort the datasets based on the random permutation
trainimgs_rand = trainimgs.sort('idx')
testimgs_rand = testimgs.sort('idx')
pri_testimgs_rand = pri_testimgs.sort('idx')
In [18]:
pri_testimgs_rand.print_rows(5)
+-------------------------------+----------------------+-------+-----+
|              path             |        image         | label | idx |
+-------------------------------+----------------------+-------+-----+
| /Users/rahulsridhar/Docume... | Height: 48 Width: 48 |   3   |  0  |
| /Users/rahulsridhar/Docume... | Height: 48 Width: 48 |   6   |  1  |
| /Users/rahulsridhar/Docume... | Height: 48 Width: 48 |   0   |  2  |
| /Users/rahulsridhar/Docume... | Height: 48 Width: 48 |   2   |  3  |
| /Users/rahulsridhar/Docume... | Height: 48 Width: 48 |   5   |  4  |
+-------------------------------+----------------------+-------+-----+
[3589 rows x 4 columns]

In [19]:
# Remove the columns that are not required
trainimgs_rand.remove_columns(['idx', 'path'])
testimgs_rand.remove_columns(['idx', 'path'])
pri_testimgs_rand.remove_columns(['idx', 'path'])
Out[19]:
image label
Height: 48 Width: 48 3
Height: 48 Width: 48 6
Height: 48 Width: 48 0
Height: 48 Width: 48 2
Height: 48 Width: 48 5
Height: 48 Width: 48 6
Height: 48 Width: 48 0
Height: 48 Width: 48 6
Height: 48 Width: 48 3
Height: 48 Width: 48 4
[3589 rows x 2 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.
In [20]:
# Save the random permutations - for future use
np.save('GraphLabOutput/train_idx', train_idx)
np.save('GraphLabOutput/test_idx', test_idx)
np.save('GraphLabOutput/pri_test_idx', pri_test_idx)

Resize the data

In [21]:
training_data = trainimgs_rand
test_data = testimgs_rand
pri_test_data = pri_testimgs_rand
In [22]:
print training_data.shape, test_data.shape, pri_test_data.shape#, validation_data.shape
(28707, 2) (3589, 2) (3589, 2)
In [23]:
# Resize the images in the datasets
training_data['image'] = gl.image_analysis.resize(training_data['image'], 48, 48, 1, decode=True)
test_data['image'] = gl.image_analysis.resize(test_data['image'], 48, 48, 1, decode=True)
pri_test_data['image'] = gl.image_analysis.resize(pri_test_data['image'], 48, 48, 1, decode=True)
In [24]:
# Convert the labels to int
training_data['label'] = training_data['label'].astype(int) 
#validation_data['label'] = validation_data['label'].astype(int) 
test_data['label'] = test_data['label'].astype(int) 
pri_test_data['label'] = pri_test_data['label'].astype(int) 
In [25]:
# Verify that there are 7 unique labels
print np.unique(training_data['label'])
#print np.unique(validation_data['label'])
print np.unique(test_data['label'])
print np.unique(pri_test_data['label'])
[0 1 2 3 4 5 6]
[0 1 2 3 4 5 6]
[0 1 2 3 4 5 6]

MXNet - Another deep learning framework

In [1]:
from graphlab import mxnet as mx
[INFO] graphlab.mxnet.base: CUDA support is currently not available on this platform. GPU support is disabled.
This non-commercial license of GraphLab Create for academic use is assigned to rsridha2@uci.edu and will expire on December 03, 2017.
[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1509319486.log
In [27]:
# Load training data mean
training_im_mean = np.load("Training_Img_Mean.npy")
print training_im_mean.shape
print "Mean min max", np.min(training_im_mean), np.max(training_im_mean)
(48, 48)
Mean min max 0.419400353 0.693280081443
In [28]:
training_im_mean = np.reshape(training_im_mean, (48, 48, 1)) # Reshape mean 
In [29]:
# Store the public test data in an iterator
testdataiter = mx.io.SFrameImageIter(test_data, data_field=['image'],
                            label_field='label',
                            data_name='data',
                            label_name='sm_label', mean_nd = training_im_mean)
In [30]:
# Store the private test data in an iterator
pri_testdataiter = mx.io.SFrameImageIter(pri_test_data, data_field=['image'],
                            label_field='label',
                            data_name='data',
                            label_name='sm_label', mean_nd = training_im_mean)
In [39]:
# Define the network symbols

# Conv layers
data = mx.symbol.Variable('data')
conv1= mx.symbol.Convolution(data = data, name='conv1', num_filter=64, kernel=(3,3), stride=(2,2), pad=(2,2))
bn1 = mx.symbol.BatchNorm(data = conv1, name="bn1")
act1 = mx.symbol.Activation(data = bn1, name='relu1', act_type="relu")
#mp1 = mx.symbol.Pooling(data = act1, name = 'mp1', kernel=(2,2), stride=(2,2), pool_type='max')
mp1 = act1

conv2= mx.symbol.Convolution(data = mp1, name='conv2', num_filter=64, kernel=(3,3), stride=(2,2), pad=(2,2))
bn2 = mx.symbol.BatchNorm(data = conv2, name="bn2")
act2 = mx.symbol.Activation(data = bn2, name='relu2', act_type="relu")
#mp2 = mx.symbol.Pooling(data = act2, name = 'mp2', kernel=(2,2), stride=(2,2), pool_type='max')
mp2 = act2

conv3= mx.symbol.Convolution(data = mp2, name='conv3', num_filter=64, kernel=(3,3), stride=(2,2), pad=(2,2)) 
bn3 = mx.symbol.BatchNorm(data = conv3, name="bn3")
act3 = mx.symbol.Activation(data = bn3, name='relu3', act_type="relu")
mp3 = mx.symbol.Pooling(data = act3, name = 'mp3', kernel=(2,2), stride=(2,2), pool_type='max')
#mp3 = act3
mp3 = mx.symbol.Dropout(data = mp3, p = 0.5)

conv4 = mx.symbol.Convolution(data = mp3, name='conv4', num_filter=128, kernel=(3,3), stride=(2,2), pad=(2,2))
bn4 = mx.symbol.BatchNorm(data = conv4, name="bn4")
act4 = mx.symbol.Activation(data = bn4, name='relu4', act_type="relu")
#mp4 = mx.symbol.Pooling(data = act4, name = 'mp4', kernel=(2,2), stride=(2,2), pool_type='max')
mp4 = act4

conv5 = mx.symbol.Convolution(data = mp4, name='conv5', num_filter=128, kernel=(3,3), stride=(2,2), pad=(2,2))
bn5 = mx.symbol.BatchNorm(data = conv5, name="bn5")
act5 = mx.symbol.Activation(data = bn5, name='relu5', act_type="relu")
mp5 = mx.symbol.Pooling(data = act5, name = 'mp5', kernel=(2,2), stride=(2,2), pool_type='max')

# Fully connected layers
fl = mx.symbol.Flatten(data = mp5, name="flatten")

fc2 = mx.symbol.FullyConnected(data = fl, name='fc2', num_hidden=1024)
fc2 = mx.symbol.Dropout(data = fc2, p = 0.7)

fc3 = mx.symbol.FullyConnected(data = fc2, name='fc3', num_hidden=512)
fc3 = mx.symbol.Dropout(data = fc3, p = 0.7)

fc4 = mx.symbol.FullyConnected(data = fc3, name='fc4', num_hidden=7)
softmax = mx.symbol.SoftmaxOutput(data = fc4, name = 'sm')
In [48]:
batch_size = 128
num_epoch = 15

start = timer()

# Prepare the training data iterator from SFrame
# `data_name` must match the first layer's name of the network.
# `label_name` must match the last layer's name plus "_label".
dataiter = mx.io.SFrameImageIter(training_data, data_field='image',
                            label_field='label',
                            data_name='data',
                            label_name='sm_label', batch_size=batch_size, mean_nd = training_im_mean)#,\
                            #random_flip=True)

# Train the network
adam = mx.optimizer.Adam(learning_rate=0.0001)
model = mx.model.FeedForward.create(softmax, X=dataiter,
                                    num_epoch=num_epoch,
                                    learning_rate=0.0001, wd=0.0008,
                                    momentum=0.9,
                                    eval_metric=mx.metric.Accuracy(), 
                                    optimizer=adam)
end = timer()
os.system('say "Your program has finished"')
print "Time elapsed = ", end-start, "seconds"
[INFO] graphlab.mxnet.model: Start training with [cpu(0)]
[INFO] graphlab.mxnet.model: Epoch[0] Resetting Data Iterator
[INFO] graphlab.mxnet.model: Epoch[0] Train-accuracy=0.266215
[INFO] graphlab.mxnet.model: Epoch[0] Time cost=507.476
[INFO] graphlab.mxnet.model: Epoch[1] Resetting Data Iterator
[INFO] graphlab.mxnet.model: Epoch[1] Train-accuracy=0.352257
[INFO] graphlab.mxnet.model: Epoch[1] Time cost=473.077
[INFO] graphlab.mxnet.model: Epoch[2] Resetting Data Iterator
[INFO] graphlab.mxnet.model: Epoch[2] Train-accuracy=0.395035
[INFO] graphlab.mxnet.model: Epoch[2] Time cost=457.901
[INFO] graphlab.mxnet.model: Epoch[3] Resetting Data Iterator
[INFO] graphlab.mxnet.model: Epoch[3] Train-accuracy=0.424861
[INFO] graphlab.mxnet.model: Epoch[3] Time cost=682.317
[INFO] graphlab.mxnet.model: Epoch[4] Resetting Data Iterator
[INFO] graphlab.mxnet.model: Epoch[4] Train-accuracy=0.435556
[INFO] graphlab.mxnet.model: Epoch[4] Time cost=465.843
[INFO] graphlab.mxnet.model: Epoch[5] Resetting Data Iterator
[INFO] graphlab.mxnet.model: Epoch[5] Train-accuracy=0.450035
[INFO] graphlab.mxnet.model: Epoch[5] Time cost=487.036
[INFO] graphlab.mxnet.model: Epoch[6] Resetting Data Iterator
[INFO] graphlab.mxnet.model: Epoch[6] Train-accuracy=0.459653
[INFO] graphlab.mxnet.model: Epoch[6] Time cost=473.942
[INFO] graphlab.mxnet.model: Epoch[7] Resetting Data Iterator
[INFO] graphlab.mxnet.model: Epoch[7] Train-accuracy=0.473576
[INFO] graphlab.mxnet.model: Epoch[7] Time cost=475.025
[INFO] graphlab.mxnet.model: Epoch[8] Resetting Data Iterator
[INFO] graphlab.mxnet.model: Epoch[8] Train-accuracy=0.480694
[INFO] graphlab.mxnet.model: Epoch[8] Time cost=470.468
[INFO] graphlab.mxnet.model: Epoch[9] Resetting Data Iterator
[INFO] graphlab.mxnet.model: Epoch[9] Train-accuracy=0.491285
[INFO] graphlab.mxnet.model: Epoch[9] Time cost=456.989
[INFO] graphlab.mxnet.model: Epoch[10] Resetting Data Iterator
[INFO] graphlab.mxnet.model: Epoch[10] Train-accuracy=0.498403
[INFO] graphlab.mxnet.model: Epoch[10] Time cost=467.466
[INFO] graphlab.mxnet.model: Epoch[11] Resetting Data Iterator
[INFO] graphlab.mxnet.model: Epoch[11] Train-accuracy=0.506806
[INFO] graphlab.mxnet.model: Epoch[11] Time cost=464.642
[INFO] graphlab.mxnet.model: Epoch[12] Resetting Data Iterator
[INFO] graphlab.mxnet.model: Epoch[12] Train-accuracy=0.509549
[INFO] graphlab.mxnet.model: Epoch[12] Time cost=491.030
[INFO] graphlab.mxnet.model: Epoch[13] Resetting Data Iterator
[INFO] graphlab.mxnet.model: Epoch[13] Train-accuracy=0.515694
[INFO] graphlab.mxnet.model: Epoch[13] Time cost=495.313
[INFO] graphlab.mxnet.model: Epoch[14] Resetting Data Iterator
[INFO] graphlab.mxnet.model: Epoch[14] Train-accuracy=0.524375
[INFO] graphlab.mxnet.model: Epoch[14] Time cost=461.288
Time elapsed =  7331.32261395 seconds
In [49]:
# Make predictions on the public test data
testpred = model.predict(testdataiter)
os.system('say "Your program has finished"')
Out[49]:
0
In [50]:
# Make predictions on the private test data
pri_testpred = model.predict(pri_testdataiter)
os.system('say "Your program has finished"')
Out[50]:
0
In [51]:
# Verify shapes of predictions
print testpred.shape, pri_testpred.shape
(3589, 7) (3589, 7)
In [52]:
# Look at the mean prediction scores for each class
print np.mean(testpred, axis = 0)
print np.mean(pri_testpred, axis = 0)
[ 0.15888491  0.01580035  0.18856853  0.16486245  0.15111475  0.16397272
  0.15679605]
[ 0.15765251  0.0161378   0.18977651  0.1681392   0.1503212   0.16439429
  0.15357827]
In [53]:
print np.sum(testpred) # Should be = number of test elements
print np.sum(pri_testpred) # Should be = number of test elements
3589.0
3589.0
In [54]:
# Save the predictions
testpred_argmax = np.argmax(testpred, axis = 1)
pri_testpred_argmax = np.argmax(pri_testpred, axis = 1)
np.save('GraphLabOutput/testpred_GraphLab_CNN_48', testpred_argmax)
np.save('GraphLabOutput/pri_testpred_GraphLab_CNN_48', pri_testpred_argmax)

Compute test accuracy

In [55]:
print "Public test accuracy = ", np.mean(np.equal(np.argmax(testpred, axis = 1), test_data['label']))
print "Private test accuracy = ", np.mean(np.equal(np.argmax(pri_testpred, axis = 1), pri_test_data['label']))
Public test accuracy =  0.480913903594
Private test accuracy =  0.482585678462

Results - trained on 80% of training data
10 epochs - 0.001 Adam, 0.0001 wd L2, softmax, momentum 0.9, 128 batch size - training 61.4% test 39.426%; 30 mins
15 epochs - 0.001 Adam, 0.0001 wd L2, softmax, momentum 0.9, 128 batch size - training 72.34% test 44.135%; 46 mins
8 epochs - 0.0001 Adam, 0.0005 wd L2, softmax, momentum 0.9, 128 batch size, dropout 0.7 FC (thrown away) and dropout 0.7 after 3rd conv layer - training 42.92% public test private test ; 45 mins (5 conv layers; 64 64 64 128 128)

Results - trained on 100% of training data
15 epochs - 0.001 Adam, 0.0001 wd L2, softmax, momentum 0.9, 128 batch size, dropout 0.75 (thrown away) - training 72.2% test 43.6%; 56 mins
10 epochs - 0.001 Adam, 0.0001 wd L2, softmax, momentum 0.9, 128 batch size, dropout 0 (thrown away) - training 61.8% test 34%; 56 mins (3rd conv layer had only 64 filters)
20 epochs - 0.001 Adam, 0.0003 wd L2, softmax, momentum 0.9, 128 batch size, dropout 0.5 (thrown away) - training 76.67% test 43.99%; 56 mins
20 epochs - 0.001 Adam, 0.0003 wd L2, softmax, momentum 0.9, 128 batch size, dropout 0.6 (thrown away) - training 78.65% test 40.84%; 70 mins
15 epochs - 0.001 Adam, 0.0003 wd L2, softmax, momentum 0.9, 64 batch size, dropout 0.7 (thrown away) - training 69.71% test 44.72%; 57 mins
15 epochs - 0.001 Adam, 0.0003 wd L2, softmax, momentum 0.9, 128 batch size, dropout 0.7 (thrown away) - training 73.86% test 46.14%; 84 mins (4 conv layers; 64 64 64 128)
15 epochs - 0.001 Adam, 0.0005 wd L2, softmax, momentum 0.9, 128 batch size, dropout 0.7 (thrown away) - training 72.28% public test 46.89% private test 45.55%; 84 mins (5 conv layers; 64 64 64 128 128)

7 epochs - 0.001 Adam, 0.0005 wd L2, softmax, momentum 0.9, 128 batch size, dropout 0.7 (thrown away) - training 53.33% public test 47.87% private test 46.17%; 84 mins (5 conv layers; 64 64 64 128 128)
10 epochs - 0.0001 Adam, 0.0005 wd L2, softmax, momentum 0.9, 128 batch size, dropout 0.7 FC (thrown away) and dropout 0.5 after 3rd conv layer - training 48.86% public test 47.28% private test 47.62%; 74 mins (5 conv layers; 64 64 64 128 128)
10 epochs - 0.0001 Adam, 0.0005 wd L2, softmax, momentum 0.9, 128 batch size, dropout 0.7 FC (thrown away) and dropout 0.5 after 3rd conv layer - training 48.5% public test 47.23% private test 48.06%; 75 mins (5 conv layers; 64 64 64 128 128)
15 epochs - 0.0001 Adam, 0.0008 wd L2, softmax, momentum 0.9, 128 batch size, dropout 0.7 FC (thrown away) and dropout 0.5 after 3rd conv layer - training 52.4% public test 48.09% private test 48.25%; >100 mins (5 conv layers; 64 64 64 128 128)

In [ ]:
 

Plot training accuracy and error vs. number of epochs

In [15]:
# Training accuracies for the best model 
# (have to hardcode since MXNet doesn't return the data in an object)
accuracy = [27, 35.6, 39.7, 41.8, 43.3, 44.8, 45.8, 46.8, 47.6, 48.5]
maximum = [100]*10
error = [a - b for a, b in zip(maximum, accuracy)]
In [29]:
num_epoch = 10
epochs = np.arange(1, num_epoch+1)
plt.figure(figsize=(10, 8))
plt.plot(epochs, accuracy)
plt.xlabel('Epoch')
plt.ylabel('Training accuracy (%)')
plt.title('Training accuracy vs. Number of epochs')
Out[29]:
<matplotlib.text.Text at 0x11eced390>
In [28]:
num_epoch = 10
epochs = np.arange(1, num_epoch+1)
plt.figure(figsize=(10, 8))
plt.plot(epochs, error)
plt.xlabel('Epoch')
plt.ylabel('Training error (%)')
plt.title('Training error vs. Number of epochs')
Out[28]:
<matplotlib.text.Text at 0x11e9f0e90>
In [ ]: