Facial Expression Recognition - Ensemble

Kaggle - Challenges in Representation Learning: Facial Expression Recognition Challenge

https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data

The data consists of 48x48 pixel grayscale images of faces. The faces have been automatically registered so that the face is more or less centered and occupies about the same amount of space in each image. The task is to categorize each face based on the emotion shown in the facial expression in to one of seven categories (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral).

train.csv contains two columns, "emotion" and "pixels". The "emotion" column contains a numeric code ranging from 0 to 6, inclusive, for the emotion that is present in the image. The "pixels" column contains a string surrounded in quotes for each image. The contents of this string a space-separated pixel values in row major order. test.csv contains only the "pixels" column and your task is to predict the emotion column.

The training set consists of 28,709 examples. The public test set used for the leaderboard consists of 3,589 examples. The final test set, which was used to determine the winner of the competition, consists of another 3,589 examples.

This dataset was prepared by Pierre-Luc Carrier and Aaron Courville, as part of an ongoing research project. They have graciously provided the workshop organizers with a preliminary version of their dataset to use for this contest.

In [1]:
%matplotlib inline
In [2]:
import graphlab as gl
import matplotlib.pyplot as plt
import numpy as np
from timeit import default_timer as timer
import os

Import predictions of the baseline models - CNN and KNN

In [3]:
#cnn_testpred = np.load('GraphLabOutput/testpred_GraphLab_CNN.npy')
cnn_testpred = np.load('GraphLabOutput/testpred_GraphLab_CNN_48.npy')
knn_testpred = np.load('SFrame/test_37percent_120cos11.npy')
cnn_pritestpred = np.load('GraphLabOutput/pri_testpred_GraphLab_CNN_48.npy')
knn_pritestpred = np.load('SFrame/pritest_37percent_120cos11.npy')
In [4]:
# CNN predictions are shuffled - have to reshuffle them to be able to compare with KNN
# Load the random permutations that were used on CNN's test data
cnn_testidx = np.load('GraphLabOutput/test_idx.npy')
cnn_pri_testidx = np.load('GraphLabOutput/pri_test_idx.npy')
In [17]:
cnn_testpred = cnn_testpred.astype(int)
knn_testpred = knn_testpred.astype(int)
cnn_pritestpred = cnn_pritestpred.astype(int)
knn_pritestpred = knn_pritestpred.astype(int)
In [18]:
# Read true public test labels
testlabels = []
testfile = 'test.txt'
with open(testfile, 'r') as f:
    for line in f:
        currfile, currlabel = line.split()
        testlabels.append(currlabel)
In [19]:
# Read true private test labels
pri_testlabels = []
pri_testfile = 'pri_test.txt'
with open(pri_testfile, 'r') as f:
    for line in f:
        currfile, currlabel = line.split()
        pri_testlabels.append(currlabel)
In [20]:
# Convert the labels to int
testlabels = np.asarray(testlabels).astype(int)
pri_testlabels = np.asarray(pri_testlabels).astype(int)
In [21]:
# Re-permute cnn predictions based on the originally used random indices
cnn_testpred_perm = [cnn_testpred[i] for i in cnn_testidx]
cnn_testpred_perm = np.asarray(cnn_testpred_perm)
cnn_pri_testpred_perm = [cnn_pritestpred[i] for i in cnn_pri_testidx]
cnn_pri_testpred_perm = np.asarray(cnn_pri_testpred_perm)
In [22]:
# Verify shapes of the datasets
print cnn_testpred.shape, knn_testpred.shape, testlabels.shape, cnn_testidx.shape, cnn_pri_testidx.shape, \
    cnn_testpred_perm.shape, cnn_pri_testpred_perm.shape
(3589,) (3589,) (3589,) (3589,) (3589,) (3589,) (3589,)
In [23]:
# In what % of instances are both knn and cnn's predictions the same - approx 35%
print float(np.sum(cnn_testpred_perm == knn_testpred))/len(knn_testpred)
print float(np.sum(cnn_pri_testpred_perm == knn_pritestpred))/len(knn_pritestpred)
0.347729172471
0.353301755364
In [24]:
# Verify original models' accuracies
print "CNN Public Test Accuracy: ", float(np.sum(cnn_testpred_perm == testlabels))/len(testlabels)
print "CNN Private Test Accuracy: ", float(np.sum(cnn_pri_testpred_perm == pri_testlabels))/len(pri_testlabels)
print "KNN Public Test Accuracy: ", float(np.sum(knn_testpred == testlabels))/len(testlabels)
print "KNN Private Test Accuracy: ", float(np.sum(knn_pritestpred == pri_testlabels))/len(pri_testlabels)
CNN Public Test Accuracy:  0.480913903594
CNN Private Test Accuracy:  0.482585678462
KNN Public Test Accuracy:  0.371412649763
KNN Private Test Accuracy:  0.367511841739
In [25]:
# Stack all labels column wise and create two datasets - one for public
# and one for private
data = np.column_stack([cnn_testpred_perm, knn_testpred]).astype(int)
print data.shape
pri_data = np.column_stack([cnn_pri_testpred_perm, knn_pritestpred]).astype(int)
print pri_data.shape
(3589, 2)
(3589, 2)

Experiment 1 - Naive Bayes Ensemble

In [16]:
from sklearn.naive_bayes import GaussianNB
In [17]:
# Experiment with uniform priors
numlabels = len(np.unique(testlabels))
priors = np.asarray([1./numlabels]*numlabels)
priors[6] += 1 - np.sum(priors) # To counter precision errors (not summing to 1)
print priors
[ 0.14285714  0.14285714  0.14285714  0.14285714  0.14285714  0.14285714
  0.14285714]
In [18]:
# Fit a Naive Bayes model
#gnb = GaussianNB(priors=priors) # With uniform priors
gnb = GaussianNB()               # With priors based on the data
gnb.fit(data, testlabels)
Out[18]:
GaussianNB(priors=None)
In [19]:
# Make predictions on the public test dataset
gnb_data_pred = gnb.predict(data)
print "Public Test Accuracy: ", float(np.sum(gnb_data_pred == testlabels))/len(testlabels)
Public Test Accuracy:  0.397046531067
In [20]:
# Make predictions on the private test dataset
gnb_data_pred = gnb.predict(pri_data)
print "Private Test Accuracy: ", float(np.sum(gnb_data_pred == pri_testlabels))/len(pri_testlabels)
Private Test Accuracy:  0.400668709947

Experiment 2 - Logistic Regression Ensemble

In [21]:
from sklearn.linear_model import LogisticRegression
In [22]:
logistic = LogisticRegression(max_iter=1000, solver = 'sag', tol = 1e-10)
logistic.fit(data, testlabels)
Out[22]:
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=1000, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='sag', tol=1e-10,
          verbose=0, warm_start=False)
In [23]:
# Make predictions on the public test dataset
logistic_data_pred = logistic.predict(data)
print "Public Test Accuracy: ", float(np.sum(logistic_data_pred == testlabels))/len(testlabels)
Public Test Accuracy:  0.314015045974
In [24]:
# Make predictions on the private test dataset
logistic_data_pred = logistic.predict(pri_data)
print "Private Test Accuracy: ", float(np.sum(logistic_data_pred == pri_testlabels))/len(pri_testlabels)
Private Test Accuracy:  0.32348843689

Experiment 3 - Gradient Boosting Ensemble

In [26]:
from sklearn.ensemble import GradientBoostingClassifier
[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1509419666.log
This non-commercial license of GraphLab Create for academic use is assigned to rsridha2@uci.edu and will expire on December 03, 2017.
In [27]:
boost = GradientBoostingClassifier(n_estimators=50, learning_rate=0.1, max_depth=10, max_leaf_nodes=10, 
                                   min_samples_split=5)
#boost = GradientBoostingClassifier()
boost.fit(data, testlabels)
Out[27]:
GradientBoostingClassifier(criterion='friedman_mse', init=None,
              learning_rate=0.1, loss='deviance', max_depth=10,
              max_features=None, max_leaf_nodes=10,
              min_impurity_decrease=0.0, min_impurity_split=None,
              min_samples_leaf=1, min_samples_split=5,
              min_weight_fraction_leaf=0.0, n_estimators=50,
              presort='auto', random_state=None, subsample=1.0, verbose=0,
              warm_start=False)
In [28]:
# Make predictions on the public test dataset
boost_data_pred = boost.predict(data)
print "Public Test Accuracy: ", float(np.sum(boost_data_pred == testlabels))/len(testlabels)
Public Test Accuracy:  0.499860685428
In [29]:
# Save the predictions
np.save("GraphLabOutput/boost_data_publicpred_49_v2%.npy", boost_data_pred)
In [30]:
# Make predictions on the private test dataset
boost_data_pred = boost.predict(pri_data)
print "Private Test Accuracy: ", float(np.sum(boost_data_pred == pri_testlabels))/len(pri_testlabels)
Private Test Accuracy:  0.492059069379
In [31]:
np.save("GraphLabOutput/boost_data_privatepred_49_v2%.npy", boost_data_pred)
In [32]:
# In what % instances are both boostpred and cnn/knn's predictions the same
print float(np.sum(cnn_pri_testpred_perm == boost_data_pred))/len(boost_data_pred)
print float(np.sum(knn_pritestpred == boost_data_pred))/len(boost_data_pred)
0.80105879075
0.47339091669

Confusion Matrix

In [33]:
from sklearn.metrics import confusion_matrix
import pylab as pl
In [34]:
# Create a confusion matrix from the predictions
cm = confusion_matrix(pri_testlabels, boost_data_pred) 
In [35]:
print cm
[[178   1  38  30 135  28  81]
 [ 21  13   3   1  11   5   1]
 [ 75   1  87  18 129 108 110]
 [ 44   1  14 565 113  51  91]
 [ 74   2  26  30 281  30 151]
 [ 12   0  11  11  31 318  33]
 [ 46   2  22  39 157  36 324]]
In [36]:
cm_norm = cm.astype(float)/cm.sum(axis = 1, keepdims=True) # Standardized confusion matrix
In [37]:
# Plot the confusion matrix
labels = ['Angry', 'Disgust', 'Fear', 'Happy', 'Sad', 'Surprise', 'Neutral']

# With help from stackoverflow
fig = plt.figure(figsize=(12, 12))
ax = fig.add_subplot(111)
cax = ax.matshow(cm_norm)
pl.title('Confusion matrix')
fig.colorbar(cax)
ax.set_xticklabels([''] + labels)
ax.set_yticklabels([''] + labels)
pl.xlabel('Predicted')
pl.ylabel('True')
pl.show()

'Disgust' is one of the hardest facial expressions to predict. Unsurprising - it has the least number of training data. Even fear is pretty hard - model gets confused between 'fear' and 'sad'.

In [ ]:
 

Correct vs. incorrect predictions

In [38]:
from os import listdir
from os.path import isfile, join
In [39]:
# Verify ensemble model's accuracy
num_correct = np.count_nonzero(boost_data_pred == pri_testlabels)
num_not_correct = len(pri_testlabels) - num_correct
print num_correct, num_not_correct
print float(num_correct)/len(pri_testlabels), float(num_not_correct)/len(pri_testlabels)
1766 1823
0.492059069379 0.507940930621
In [40]:
# Get correct predictions
correct = np.where((boost_data_pred == pri_testlabels)==True)
print correct[0].shape

# Get incorrect predictions
incorrect = np.where((boost_data_pred == pri_testlabels)==False)
print incorrect[0].shape
(1766,)
(1823,)
In [65]:
# Randomly sample indices of a few correct and incorrect predictions
np.random.seed(40)
num_samples = 4
rand_correct = np.random.randint(0, num_correct, num_samples)
rand_incorrect = np.random.randint(0, num_correct, num_samples)
In [66]:
# Print the numbers
print rand_correct
print rand_incorrect 
[1350  219    7  165]
[1016 1464 1330 1729]
In [67]:
# Get the prediction indices corresponding to the random numbers (and sort them)
sorted_idx_correct = sorted(correct[0][rand_correct])
sorted_idx_incorrect = sorted(incorrect[0][rand_incorrect])
In [68]:
# Note: (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral)
# Get the labels of these correct predictions
print boost_data_pred[sorted_idx_correct]
print pri_testlabels[sorted_idx_correct]

# Get the labels of these incorrect predictions
print boost_data_pred[sorted_idx_incorrect]
print pri_testlabels[sorted_idx_incorrect]
[0 0 2 5]
[0 0 2 5]
[4 6 2 4]
[3 4 5 6]
In [69]:
# Read the corresponding images from the private data
overall_ctr = 0
correct_imgs = []
incorrect_imgs = []

with open('pri_test.txt','w') as f:
    for i in range(7):
        path = 'PrivateTest/'+str(i)
        onlyfiles = [fl for fl in listdir(path) if isfile(join(path, fl))]
        print len(onlyfiles), onlyfiles[0]
        for curr_file in onlyfiles:
            overall_ctr += 1
            
            # To avoid reading unwanted files
            if curr_file.find(".DS_Store") != -1:
                continue
            fname = path + '/' + curr_file + ' ' + str(i) + '\n'
            if overall_ctr in sorted_idx_correct:
                correct_imgs.append(fname)
            if overall_ctr in sorted_idx_incorrect:
                incorrect_imgs.append(fname) 

print overall_ctr
491 PrivateTest_10097204.jpg
55 PrivateTest_10754785.jpg
528 PrivateTest_10022244.jpg
879 PrivateTest_10055093.jpg
594 PrivateTest_10147544.jpg
416 PrivateTest_10106550.jpg
626 PrivateTest_10297218.jpg
3589
In [70]:
# print correct_imgs
# print incorrect_imgs
In [71]:
# Correctly predicted
emotions = ['Angry', 'Disgust', 'Fear', 'Happy', 'Sad', 'Surprise', 'Neutral']
plt.figure(figsize=(15, 10))
for i in range(len(correct_imgs)):
    img_path, img_label = correct_imgs[i].split()
    print i, "Actual vs. Predicted:", emotions[np.int(img_label)], "vs.",\
    emotions[boost_data_pred[sorted_idx_correct[i]]]
    plt.subplot('1'+str(num_samples)+str(i+1))
    curr_img = plt.imread(img_path)
    plt.imshow(curr_img, cmap='gray')    
    plt.title(i)
    plt.axis('off')
plt.show()
0 Actual vs. Predicted: Angry vs. Angry
1 Actual vs. Predicted: Angry vs. Angry
2 Actual vs. Predicted: Fear vs. Fear
3 Actual vs. Predicted: Surprise vs. Surprise
In [72]:
# Inorrectly predicted
emotions = ['Angry', 'Disgust', 'Fear', 'Happy', 'Sad', 'Surprise', 'Neutral']
plt.figure(figsize=(15, 10))
for i in range(len(incorrect_imgs)):
    img_path, img_label = incorrect_imgs[i].split()
    print i, "Actual vs. Predicted:", emotions[np.int(img_label)], "vs.", \
    emotions[boost_data_pred[sorted_idx_incorrect[i]]]
    plt.subplot('1'+str(num_samples)+str(i+1))
    curr_img = plt.imread(img_path)
    plt.imshow(curr_img, cmap='gray')    
    plt.title(i)
    plt.axis('off')
plt.show()
0 Actual vs. Predicted: Happy vs. Sad
1 Actual vs. Predicted: Sad vs. Neutral
2 Actual vs. Predicted: Surprise vs. Fear
3 Actual vs. Predicted: Neutral vs. Sad
In [ ]: