机器学习-分类问题D

数学公式在渲染器中会出现错误，目前还没有解决

数据集：[face.zip]

https://github.com/hhgw/hhgw.github.io/tree/main/zip

In this week you will train a classifier to detect whether there is a face in a small image patch. This type of face detector is used in your phone and camera whenever you take a picture!

First we need to initialize Python. Run the below cell.

%matplotlib inline
import matplotlib_inline   # setup output image format
matplotlib_inline.backend_inline.set_matplotlib_formats('svg')
import matplotlib.pyplot as plt
import matplotlib
from numpy import *
from sklearn import *
import os
import zipfile
import fnmatch
random.seed(100)
from scipy import ndimage
from scipy import signal
from scipy import stats
import skimage.color
import skimage.exposure
import skimage.io
import skimage.util
# import xgboost as xgb

Loading Data and Pre-processing

Next we need to load the images. Download faces.zip, and put it in the same direcotry as this ipynb file. Do not unzip the file. Then run the following cell to load the images.

imgdata = {'train':[], 'test':[]}
classes = {'train':[], 'test':[]}

# the dataset is too big, so subsample the training and test sets...
# reduce training set by a factor of 4
train_subsample = 4  
train_counter = [0, 0]
# maximum number of samples in each class for test set
test_maxsample = 472
test_counter = [0, 0]

# load the zip file
filename = 'faces.zip'
zfile = zipfile.ZipFile(filename, 'r')

for name in zfile.namelist():
    # check file name matches
    if fnmatch.fnmatch(name, "faces/*/*/*.png"):
        
        # filename is : faces/train/face/fname.png
        (fdir1, fname)  = os.path.split(name)     # get file name
        (fdir2, fclass) = os.path.split(fdir1) # get class (face, nonface)
        (fdir3, fset)   = os.path.split(fdir2) # get training/test set
        # class 1 = face; class 0 = non-face
        myclass = int(fclass == "face")  

        loadme = False
        if fset == 'train':
            if (train_counter[myclass] % train_subsample) == 0:
                loadme = True
            train_counter[myclass] += 1
        elif fset == 'test':
            if test_counter[myclass] < test_maxsample:
                loadme = True
            test_counter[myclass] += 1
            
        if (loadme):
            # open file in memory, and parse as an image
            myfile = zfile.open(name)
            #img = matplotlib.image.imread(myfile)
            img = skimage.io.imread(myfile, as_gray=True) # read as grayscale
            myfile.close()
            
            # append data
            imgdata[fset].append(img)
            classes[fset].append(myclass)

        
zfile.close()
imgsize = img.shape

print(len(imgdata['train']))
print(len(imgdata['test']))
trainclass2start = sum(classes['train'])

1745
944

Each image is a 19x19 array of pixel values. Run the below code to show an example:

print(img.shape)
plt.subplot(1,2,1)
plt.imshow(imgdata['train'][0], cmap='gray', interpolation='nearest')
plt.title("face sample")
plt.subplot(1,2,2)
plt.imshow(imgdata['train'][trainclass2start], cmap='gray', interpolation='nearest')
plt.title("non-face sample")
plt.show()

(19, 19)

svg

Run the below code to show more images!

# function to make an image montage
def image_montage(X, imsize=None, maxw=10):
    """X can be a list of images, or a matrix of vectorized images.
      Specify imsize when X is a matrix."""
    tmp = []
    numimgs = len(X)
    
    # create a list of images (reshape if necessary)
    for i in range(0,numimgs):
        if imsize != None:
            tmp.append(X[i].reshape(imsize))
        else:
            tmp.append(X[i])
    
    # add blanks
    if (numimgs > maxw) and (mod(numimgs, maxw) > 0):
        leftover = maxw - mod(numimgs, maxw)
        meanimg = 0.5*(X[0].max()+X[0].min())
        for i in range(0,leftover):
            tmp.append(ones(tmp[0].shape)*meanimg)
    
    # make the montage
    tmp2 = []
    for i in range(0,len(tmp),maxw):
        tmp2.append( hstack(tmp[i:i+maxw]) )
    montimg = vstack(tmp2) 
    return montimg

# show a few images
plt.figure(figsize=(9,9))
plt.imshow(image_montage(imgdata['train'][::20]), cmap='gray', interpolation='nearest')
plt.show()

svg

Each image is a 2d array, but the classifier algorithms work on 1d vectors. Run the following code to convert all the images into 1d vectors by flattening. The result should be a matrix where each row is a flattened image.

trainX = empty((len(imgdata['train']), prod(imgsize)))
for i,img in enumerate(imgdata['train']):
    trainX[i,:] = ravel(img)
trainY = asarray(classes['train'])  # convert list to numpy array
print(trainX.shape)
print(trainY.shape)

testX = empty((len(imgdata['test']), prod(imgsize)))
for i,img in enumerate(imgdata['test']):
    testX[i,:] = ravel(img)
testY = asarray(classes['test'])  # convert list to numpy array
print(testX.shape)
print(testY.shape)

(1745, 361)
(1745,)
(944, 361)
(944,)

Detection using pixel values

Train an AdaBoost and GradientBoosting classifiers to classify an image patch as face or non-face. Also train a kernel SVM classifier using either RBF or polynomial kernel, and a Random Forest Classifier. Evaluate all your classifiers on the test set.

First we will normalize the features.

from sklearn import preprocessing
scaler = preprocessing.MinMaxScaler(feature_range=(-1,1))    # make scaling object
trainXn = scaler.fit_transform(trainX)   # use training data to fit scaling parameters
testXn  = scaler.transform(testX)        # apply scaling to test data

# --------------------
# ------AdaBoost------
# --------------------
from sklearn import model_selection
from sklearn import ensemble
from sklearn import metrics

# setup the list of parameters to try
paramgrid = {
    "learning_rate": logspace(-5, 0, 8),
    "n_estimators": array([1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000]),
}

adacv = model_selection.GridSearchCV(
    ensemble.AdaBoostClassifier(random_state=1), paramgrid, cv=5, n_jobs=6, verbose=10
)  # n_jobs is selected according to the number of cpu cores; verbose is the level of detail of the output


adacv.fit(trainXn, trainY)
# print the best parameters found
print("best params:", adacv.best_params_)
print("best score:", adacv.best_score_)

# predict from the model
predY = adacv.predict(testXn)

# calculate accuracy
acc = metrics.accuracy_score(testY, predY)
print("test accuracy =", acc)

Fitting 5 folds for each of 88 candidates, totalling 440 fits
[CV 5/5; 1/88] START learning_rate=1e-05, n_estimators=1........................
[CV 2/5; 1/88] START learning_rate=1e-05, n_estimators=1........................
[CV 3/5; 1/88] START learning_rate=1e-05, n_estimators=1........................
[CV 4/5; 1/88] START learning_rate=1e-05, n_estimators=1........................
[CV 1/5; 1/88] START learning_rate=1e-05, n_estimators=1........................
[CV 1/5; 2/88] START learning_rate=1e-05, n_estimators=2........................
[CV 4/5; 1/88] END learning_rate=1e-05, n_estimators=1;, score=0.862 total time=   0.0s
[CV 2/5; 1/88] END learning_rate=1e-05, n_estimators=1;, score=0.774 total time=   0.0s
[CV 5/5; 1/88] END learning_rate=1e-05, n_estimators=1;, score=0.883 total time=   0.0s
[CV 3/5; 1/88] END learning_rate=1e-05, n_estimators=1;, score=0.874 total time=   0.0s
[CV 1/5; 1/88] END learning_rate=1e-05, n_estimators=1;, score=0.679 total time=   0.0s
[CV 2/5; 2/88] START learning_rate=1e-05, n_estimators=2........................
[CV 3/5; 2/88] START learning_rate=1e-05, n_estimators=2........................
[CV 4/5; 2/88] START learning_rate=1e-05, n_estimators=2........................
[CV 5/5; 2/88] START learning_rate=1e-05, n_estimators=2........................
[CV 1/5; 3/88] START learning_rate=1e-05, n_estimators=5........................
[CV 1/5; 2/88] END learning_rate=1e-05, n_estimators=2;, score=0.679 total time=   0.1s

[CV 4/5; 88/88] START learning_rate=1.0, n_estimators=2000......................
[CV 4/5; 87/88] END learning_rate=1.0, n_estimators=1000;, score=0.989 total time=  23.8s
[CV 5/5; 88/88] START learning_rate=1.0, n_estimators=2000......................
[CV 5/5; 87/88] END learning_rate=1.0, n_estimators=1000;, score=0.986 total time=  23.9s
[CV 1/5; 88/88] END learning_rate=1.0, n_estimators=2000;, score=0.877 total time=  47.0s
[CV 2/5; 88/88] END learning_rate=1.0, n_estimators=2000;, score=0.931 total time=  46.5s
[CV 3/5; 88/88] END learning_rate=1.0, n_estimators=2000;, score=0.991 total time=  46.3s
[CV 4/5; 88/88] END learning_rate=1.0, n_estimators=2000;, score=0.986 total time=  46.4s
[CV 5/5; 88/88] END learning_rate=1.0, n_estimators=2000;, score=0.983 total time=  46.6s
best params: {'learning_rate': 1.0, 'n_estimators': 1000}
best score: 0.954727793696275
test accuracy = 0.6260593220338984

# --------------------
# ------XGBoost-------
# --------------------

import xgboost as xgb

xclf = xgb.XGBClassifier(
    objective="binary:logistic",
    eval_metric="logloss",
    random_state=1,
    use_label_encoder=False,
)


paramgrid = {
    "learning_rate": logspace(-4, 0, 8),
    "n_estimators": array([1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000]),
}

xgbcv = model_selection.GridSearchCV(xclf, paramgrid, cv=3, n_jobs=6, verbose=10)
xgbcv.fit(trainXn, trainY)

print("best params:", xgbcv.best_params_)
print("best score:", xgbcv.best_score_)

predY = xgbcv.predict(testXn)
acc = metrics.accuracy_score(testY, predY)
print("test accuracy =", acc)

Fitting 3 folds for each of 96 candidates, totalling 288 fits


/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/xgboost/sklearn.py:1395: UserWarning: `use_label_encoder` is deprecated in 1.7.0.
  warnings.warn("`use_label_encoder` is deprecated in 1.7.0.")
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/xgboost/sklearn.py:1395: UserWarning: `use_label_encoder` is deprecated in 1.7.0.
  warnings.warn("`use_label_encoder` is deprecated in 1.7.0.")
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/xgboost/sklearn.py:1395: UserWarning: `use_label_encoder` is deprecated in 1.7.0.
  warnings.warn("`use_label_encoder` is deprecated in 1.7.0.")


[CV 3/3; 2/96] START learning_rate=0.0001, n_estimators=2.......................
[CV 1/3; 2/96] START learning_rate=0.0001, n_estimators=2.......................
[CV 1/3; 1/96] START learning_rate=0.0001, n_estimators=1.......................
[CV 3/3; 1/96] START learning_rate=0.0001, n_estimators=1.......................
[CV 2/3; 1/96] START learning_rate=0.0001, n_estimators=1.......................
[CV 2/3; 2/96] START learning_rate=0.0001, n_estimators=2.......................
[CV 2/3; 1/96] END learning_rate=0.0001, n_estimators=1;, score=0.895 total time=   0.1s
[CV 1/3; 3/96] START learning_rate=0.0001, n_estimators=5.......................
[CV 3/3; 1/96] END learning_rate=0.0001, n_estimators=1;, score=0.936 total time=   0.1s
[CV 2/3; 3/96] START learning_rate=0.0001, n_estimators=5.......................
[CV 1/3; 1/96] END learning_rate=0.0001, n_estimators=1;, score=0.770 total time=   0.1s
[CV 3/3; 3/96] START learning_rate=0.0001, n_estimators=5.......................
[CV 1/3; 2/96] END learning_rate=0.0001, n_estimators=2;, score=0.789 total time=   0.1s
[CV 1/3; 4/96] START learning_rate=0.0001, n_estimators=10......................
[CV 2/3; 2/96] END learning_rate=0.0001, n_estimators=2;, score=0.900 total time=   0.1s
[CV 2/3; 4/96] START learning_rate=0.0001, n_estimators=10......................
[CV 3/3; 2/96] END learning_rate=0.0001, n_estimators=2;, score=0.935 total time=   0.1s
[CV 3/3; 4/96] START learning_rate=0.0001, n_estimators=10......................
[CV 1/3; 3/96] END learning_rate=0.0001, n_estimators=5;, score=0.790 total time=   0.2s

[CV 2/3; 96/96] START learning_rate=1.0, n_estimators=5000......................
[CV 2/3; 84/96] END learning_rate=0.2682695795279725, n_estimators=5000;, score=0.985 total time=  27.7s
[CV 3/3; 96/96] START learning_rate=1.0, n_estimators=5000......................
[CV 2/3; 95/96] END learning_rate=1.0, n_estimators=2000;, score=0.976 total time=  10.6s
[CV 3/3; 84/96] END learning_rate=0.2682695795279725, n_estimators=5000;, score=0.990 total time=  27.8s
[CV 3/3; 95/96] END learning_rate=1.0, n_estimators=2000;, score=0.974 total time=  10.6s
[CV 1/3; 96/96] END learning_rate=1.0, n_estimators=5000;, score=0.830 total time=  18.6s
[CV 2/3; 96/96] END learning_rate=1.0, n_estimators=5000;, score=0.976 total time=  16.0s
[CV 3/3; 96/96] END learning_rate=1.0, n_estimators=5000;, score=0.974 total time=  15.7s
best params: {'learning_rate': 0.2682695795279725, 'n_estimators': 100}
best score: 0.9392858621525869
test accuracy = 0.6610169491525424

# --------------------
# ------SVM-----------
# --------------------

paramgrid = {"C": logspace(-1, 3, 20), "gamma": logspace(-3, 3, 20)}

svmcv = model_selection.GridSearchCV(
    svm.SVC(kernel="rbf"), paramgrid, cv=5, n_jobs=6, verbose=10
)

svmcv.fit(trainXn, trainY)

print("best params:", svmcv.best_params_)
print("best score:", svmcv.best_score_)

predY = svmcv.predict(testXn)

acc = metrics.accuracy_score(testY, predY)
print("test accuracy =", acc)

Fitting 5 folds for each of 400 candidates, totalling 2000 fits
[CV 5/5; 1/400] START C=0.1, gamma=0.001........................................
[CV 2/5; 1/400] START C=0.1, gamma=0.001........................................
[CV 3/5; 1/400] START C=0.1, gamma=0.001........................................
[CV 4/5; 1/400] START C=0.1, gamma=0.001........................................
[CV 1/5; 1/400] START C=0.1, gamma=0.001........................................
[CV 1/5; 2/400] START C=0.1, gamma=0.00206913808111479..........................
[CV 1/5; 2/400] END C=0.1, gamma=0.00206913808111479;, score=0.825 total time=   0.8s
[CV 2/5; 2/400] START C=0.1, gamma=0.00206913808111479..........................
[CV 1/5; 1/400] END .........C=0.1, gamma=0.001;, score=0.831 total time=   0.9s
[CV 3/5; 2/400] START C=0.1, gamma=0.00206913808111479..........................
[CV 2/5; 1/400] END .........C=0.1, gamma=0.001;, score=0.857 total time=   1.0s
[CV 4/5; 2/400] START C=0.1, gamma=0.00206913808111479..........................
[CV 4/5; 1/400] END .........C=0.1, gamma=0.001;, score=0.986 total time=   1.0s
[CV 5/5; 2/400] START C=0.1, gamma=0.00206913808111479..........................
[CV 3/5; 1/400] END .........C=0.1, gamma=0.001;, score=0.986 total time=   1.0s
[CV 1/5; 3/400] START C=0.1, gamma=0.004281332398719396.........................
[CV 5/5; 1/400] END .........C=0.1, gamma=0.001;, score=0.991 total time=   1.1s
[CV 2/5; 3/400] START C=0.1, gamma=0.004281332398719396.........................
[CV 2/5; 2/400] END C=0.1, gamma=0.00206913808111479;, score=0.874 total time=   0.7s

[CV 2/5; 399/400] START C=1000.0, gamma=483.2930238571752.......................
[CV 2/5; 398/400] END C=1000.0, gamma=233.57214690901213;, score=0.650 total time=   2.0s
[CV 3/5; 399/400] START C=1000.0, gamma=483.2930238571752.......................
[CV 3/5; 398/400] END C=1000.0, gamma=233.57214690901213;, score=0.650 total time=   2.0s
[CV 4/5; 399/400] START C=1000.0, gamma=483.2930238571752.......................
[CV 4/5; 398/400] END C=1000.0, gamma=233.57214690901213;, score=0.653 total time=   2.0s
[CV 5/5; 399/400] START C=1000.0, gamma=483.2930238571752.......................
[CV 5/5; 398/400] END C=1000.0, gamma=233.57214690901213;, score=0.653 total time=   2.0s
[CV 1/5; 400/400] START C=1000.0, gamma=1000.0..................................
[CV 1/5; 399/400] END C=1000.0, gamma=483.2930238571752;, score=0.650 total time=   2.0s
[CV 2/5; 400/400] START C=1000.0, gamma=1000.0..................................
[CV 2/5; 399/400] END C=1000.0, gamma=483.2930238571752;, score=0.650 total time=   1.9s
[CV 3/5; 400/400] START C=1000.0, gamma=1000.0..................................
[CV 3/5; 399/400] END C=1000.0, gamma=483.2930238571752;, score=0.650 total time=   2.0s
[CV 4/5; 400/400] START C=1000.0, gamma=1000.0..................................
[CV 4/5; 399/400] END C=1000.0, gamma=483.2930238571752;, score=0.653 total time=   2.0s
[CV 5/5; 400/400] START C=1000.0, gamma=1000.0..................................
[CV 5/5; 399/400] END C=1000.0, gamma=483.2930238571752;, score=0.653 total time=   2.0s
[CV 1/5; 400/400] END ...C=1000.0, gamma=1000.0;, score=0.650 total time=   2.0s
[CV 2/5; 400/400] END ...C=1000.0, gamma=1000.0;, score=0.650 total time=   2.0s
[CV 3/5; 400/400] END ...C=1000.0, gamma=1000.0;, score=0.650 total time=   1.8s
[CV 4/5; 400/400] END ...C=1000.0, gamma=1000.0;, score=0.653 total time=   0.7s
[CV 5/5; 400/400] END ...C=1000.0, gamma=1000.0;, score=0.653 total time=   0.7s
best params: {'C': 1.1288378916846888, 'gamma': 0.018329807108324356}
best score: 0.9679083094555875
test accuracy = 0.6610169491525424

# --------------------
# ---Random Forest----
# --------------------

paramsampler = {
    "max_depth": stats.randint(1, 5),
    "min_samples_split": stats.uniform(0, 0.5),
    "min_samples_leaf": stats.uniform(0, 0.5),
}


rfrcv = model_selection.RandomizedSearchCV(
    ensemble.RandomForestClassifier(n_estimators=100, random_state=1, n_jobs=6),
    param_distributions=paramsampler,
    random_state=1,
    n_iter=1000,
    cv=5,
    verbose=10,
    n_jobs=6,
)


rfrcv.fit(trainXn, trainY)

print("best params:", rfrcv.best_params_)
print("best score:", rfrcv.best_score_)
predY = rfrcv.predict(testXn)

acc = metrics.accuracy_score(testY, predY)
print("test accuracy =", acc)

Fitting 5 folds for each of 1000 candidates, totalling 5000 fits
[CV 2/5; 1/1000] START max_depth=2, min_samples_leaf=0.4985924054694343, min_samples_split=0.4662786796693294
[CV 1/5; 1/1000] START max_depth=2, min_samples_leaf=0.4985924054694343, min_samples_split=0.4662786796693294
[CV 3/5; 1/1000] START max_depth=2, min_samples_leaf=0.4985924054694343, min_samples_split=0.4662786796693294
[CV 4/5; 1/1000] START max_depth=2, min_samples_leaf=0.4985924054694343, min_samples_split=0.4662786796693294
[CV 5/5; 1/1000] START max_depth=2, min_samples_leaf=0.4985924054694343, min_samples_split=0.4662786796693294
[CV 1/5; 2/1000] START max_depth=2, min_samples_leaf=0.15116628631591988, 
[CV 1/5; 1000/1000] END max_depth=1, min_samples_leaf=0.17171004655433614, min_samples_split=0.1753085946534768;, score=0.745 total time=   0.5s
[CV 2/5; 1000/1000] END max_depth=1, min_samples_leaf=0.17171004655433614, min_samples_split=0.1753085946534768;, score=0.791 total time=   0.4s
[CV 3/5; 1000/1000] END max_depth=1, min_samples_leaf=0.17171004655433614, min_samples_split=0.1753085946534768;, score=0.883 total time=   0.4s
[CV 4/5; 1000/1000] END max_depth=1, min_samples_leaf=0.17171004655433614, min_samples_split=0.1753085946534768;, score=0.894 total time=   0.4s
[CV 5/5; 999/1000] END max_depth=2, min_samples_leaf=0.0752203373318121, min_samples_split=0.1452287642960985;, score=0.980 total time=   0.5s
[CV 5/5; 1000/1000] END max_depth=1, min_samples_leaf=0.17171004655433614, min_samples_split=0.1753085946534768;, score=0.894 total time=   0.4s
best params: {'max_depth': 4, 'min_samples_leaf': 0.0020517537799524255, 'min_samples_split': 0.0172701749254065}
best score: 0.9300859598853869
test accuracy = 0.673728813559322

Which classifier was best?

According to the test accuracy and F1 value (below), the random forest classifier works best。
Some of the advantages of Random Forest Classifier include:
- Low bias and low variance, making it less prone to overfitting than some other models.
- Ability to handle high-dimensional data with many features.
- Ability to handle both categorical and continuous data.
- Capable of handling missing values and outliers without the need for data pre-processing.
However, in this case, the advantages of the random forest classifier are not obvious, and the overall accuracy is only slightly higher than other classifiers‘. These classifiers all have the problem of insufficient generalization.

Error analysis

The accuracy only tells part of the classifier’s performance. We can also look at the different types of errors that the classifier makes:

True Positive (TP): classifier correctly said face
True Negative (TN): classifier correctly said non-face
False Positive (FP): classifier said face, but not a face
False Negative (FN): classifier said non-face, but was a face

This is summarized in the following table:

		Actual
		Face	Non-face
Prediction	Face	True Positive (TP)	False Positive (FP)
Prediction	Non-face	False Negative (FN)	True Negative (TN)

We can then look at the true positive rate and the false positive rate.

true positive rate (TPR): proportion of true faces that were correctly detected
false positive rate (FPR): proportion of non-faces that were mis-classified as faces.

Use the below code to calculate the TPR and FPR of your classifiers.


predY_list = []
predY_list.append(adacv.predict(testXn))
predY_list.append(xgbcv.predict(testXn))
predY_list.append(svmcv.predict(testXn))
predY_list.append(rfrcv.predict(testXn))
model_list=['AdaBoost','XGBoost','SVM','RandomForest']

# print the accuracy and F1 score for each model
for i in range(0,4):
    print('------------------------------------')
    print('         model : ',model_list[i])
    print('------------------------------------')
    print("Accuracy:", metrics.accuracy_score(testY, predY_list[i]))
    print("F1 score:", metrics.f1_score(testY, predY_list[i]))

    Pind = where(predY_list[i]==1) # indicies for face predictions
    Nind = where(predY_list[i]==0) # indicies for non-face predictions
    
    # calculate confusion matrix
    TP = count_nonzero(testY[Pind] == predY_list[i][Pind])
    FP = count_nonzero(testY[Pind] != predY_list[i][Pind])
    TN = count_nonzero(testY[Nind] == predY_list[i][Nind])
    FN = count_nonzero(testY[Nind] != predY_list[i][Nind])

    TPR = TP / (TP+FN)
    FPR = FP / (FP+TN)
    print("Confusion matrix:")
    print("TP=", TP, end=" | ")
    print("FP=", FP)
    print("TN=", TN, end=' | ')
    print("FN=", FN)
    print("TPR=", TPR)
    print("FPR=", FPR)

------------------------------------
         model :  AdaBoost
------------------------------------
Accuracy: 0.6260593220338984
F1 score: 0.4087102177554438
Confusion matrix:
TP= 122 | FP= 3
TN= 469 | FN= 350
TPR= 0.2584745762711864
FPR= 0.006355932203389831
------------------------------------
         model :  XGBoost
------------------------------------
Accuracy: 0.6610169491525424
F1 score: 0.4952681388012618
Confusion matrix:
TP= 157 | FP= 5
TN= 467 | FN= 315
TPR= 0.3326271186440678
FPR= 0.01059322033898305
------------------------------------
         model :  SVM
------------------------------------
Accuracy: 0.6610169491525424
F1 score: 0.48881789137380194
Confusion matrix:
TP= 153 | FP= 1
TN= 471 | FN= 319
TPR= 0.3241525423728814
FPR= 0.00211864406779661
------------------------------------
         model :  RandomForest
------------------------------------
Accuracy: 0.673728813559322
F1 score: 0.5333333333333334
Confusion matrix:
TP= 176 | FP= 12
TN= 460 | FN= 296
TPR= 0.3728813559322034
FPR= 0.025423728813559324

How does the classifier make errors?

The classifier recognizes face image as non-face, which is the main reason for classifier errors, however, the classifier almost never recognizes non-face image as face. In short, the classifier is not good at recognizing face image.
A model with a low TP value and a high TN value suggests that the model is not performing well in correctly identifying positive cases. There are several potential reasons for this:
- Imbalanced data: The dataset used to train the model may be imbalanced, meaning that there are significantly more negative cases than positive cases. This can lead the model to predict negative more often and result in a high TN value, but a low TP value.
- Inadequate features: The features used to train the model may not be informative enough to distinguish between positive and negative cases, leading to poor performance in identifying positive cases.
- Overfitting: The model may be overfitting to the training data, meaning that it is performing well on the training data but poorly on new, unseen data. This can lead to a high TN value, but a low TP value.
- Model complexity: The model may be too simple or too complex for the problem at hand, leading to poor performance in identifying positive cases.
To improve the performance of the model, I may need to adjust the dataset, feature selection or model complexity.

Classifier analysis

For the AdaBoost classifier, we can interpret what it is doing by looking at which features it uses most in the weak learners. Use the below code to visualize the pixel features used.

Note: if you used GridSearchCV to train the classifier, then you need to use the best_estimator_ field to access the classifier.

# adaboost classifier
fi = adacv.best_estimator_.feature_importances_.reshape(imgsize)
plt.imshow(fi, interpolation='nearest')
plt.colorbar()

<matplotlib.colorbar.Colorbar at 0x7ff1fc72e9d0>

svg

Similarly, we can also look at the important features for xgboost.

# xgboost classifier
fi = xgbcv.best_estimator_.feature_importances_.reshape(imgsize)
plt.imshow(fi, interpolation='nearest')
plt.colorbar()

<matplotlib.colorbar.Colorbar at 0x7ff1fe059520>

svg

Similarly for Random Forests, we can look at the important features.

# random forest classifier
fi = rfrcv.best_estimator_.feature_importances_.reshape(imgsize)
plt.imshow(fi, interpolation='nearest')
plt.colorbar()

<matplotlib.colorbar.Colorbar at 0x7ff1fcdcb100>

svg

Comment on which features (pixels) that AdaBoost and Random Forests are using

AdaBoost uses the pixel points around the corners of the image and part of the face contour for classification, while Random Forests uses the nose, eyes and cheeks for classification.
The reason why Random Forests uses different features than AdaBoost for face detection is that the two algorithms have different ways of selecting features. AdaBoost selects the most discriminative features for classification, Focusing on the full picture outline, while Random Forests focuses on selecting the most informative features for generalization.

For kernel SVM, we can look at the support vectors to see what the classifier finds difficult.

# svm classifier
print("num support vectors:", len(svmcv.best_estimator_.support_vectors_))
si  = svmcv.best_estimator_.support_  # get indicies of support vectors

# get all the patches for each support vector
simg = [ imgdata['train'][i] for i in si ]

# make montage
outimg = image_montage(simg, maxw=20)

plt.figure(figsize=(9,9))
plt.imshow(outimg, cmap='gray', interpolation='nearest')

num support vectors: 664





<matplotlib.image.AxesImage at 0x7ff1fce6cb50>

svg

Comment on anything you notice about what the SVM finds difficult (i.e., on the decision boundary or within the margin)

High Dimensionality: When the number of dimensions in the data is very high, SVM may find it challenging to find the optimal hyperplane that separates the classes. This is because as the number of dimensions increases, the data becomes more sparse, and the search space becomes more complex, making it more difficult to find a good separation.
Overfitting: In some cases, SVM may overfit the training data, resulting in a model that performs well on the training set but poorly on the testing set. This can happen if the SVM tries to fit too closely to the training data, resulting in a decision boundary that does not generalize well to new data.
Noise: If the data contains a significant amount of noise, SVM may struggle to find the optimal hyperplane that separates the classes accurately. This is because the noise can lead to misclassifications and make it difficult to find a clear separation.

In addition to the above challenges, there are some specific cases where SVM may find it difficult to classify certain images accurately. For example, SVM may struggle to recognize faces with glasses, as the glasses can obscure important facial features. Similarly, images of faces with deep eye sockets may be challenging to classify accurately, as these features can alter the appearance of the face and make it difficult to find a clear separation between the classes.

Custom kernel SVM

Now we will try to use a custom kernel with the SVM. We will consider the following RBF-like kernel based on L1 distance (i.e., cityblock or Manhattan distance),

$$ k(\mathbf{x},\mathbf{y}) = \exp \left(-\alpha \sum_{i=1}^d |x_i-y_i|\right)$$

where $x_i,y_i$ are the elements of the vectors $\mathbf{x},\mathbf{y}$, and $\alpha$ is the hyperparameter. The difference with the RBF kernel is that the new kernel uses the absolute difference rather than the squared difference. Thus, the new kernel does not “drop off” as fast as the RBF kernel using squared distance.

Implement the new kernel as a custom kernel function. The scipy.spatial.distance.cdist function will be helpful.
Train the SVM with the new kernel. To select the hyperparameter $\alpha$, you need to run cross-validation “manually” by: 1) trying different values of $\alpha$, and running cross-validation to select $C$; 2) selecting the $\alpha$ with the highest cross-validation score best_score_ in GridSearchCV.

from scipy import spatial
from sklearn import svm

def mykernel(X1, X2, alpha=1.0):
    # alpha is the hyperparameter
    D = spatial.distance.cdist(X1, X2, metric="cityblock")
    # return the kernel matrix
    return exp(-alpha * D)


paramgrid = {"C": logspace(-2, 3, 20)}

alphas = logspace(-3, 0, 10)

best_score = 0
best_alpha = 0
best_params = {}
best_custom_svmcv = None
# find the best alpha and C manually
for i in alphas:
    tmpkern = lambda X1, X2, alpha=i: mykernel(X1, X2, alpha=alpha)
    custom_svmcv = model_selection.GridSearchCV(
        svm.SVC(kernel=tmpkern), paramgrid, cv=3, n_jobs=6, verbose=0
    )
    custom_svmcv.fit(trainXn, trainY)

    print("---------------------------\nalpha :", i)
    print("best params:", custom_svmcv.best_params_)
    print("best score:", custom_svmcv.best_score_)

    if custom_svmcv.best_score_ > best_score:
        best_score = custom_svmcv.best_score_
        best_alpha = i
        best_params = custom_svmcv.best_params_
        best_custom_svmcv = custom_svmcv

print("===============================\nbest alpha:", best_alpha)
predY = best_custom_svmcv.predict(testXn)
# calculate accuracy and F1 score
acc = metrics.accuracy_score(testY, predY)
print("test accuracy =", acc)
print("F1 score:", metrics.f1_score(testY, predY))

# calculate confusion matrix
Pind = where(predY == 1)  
Nind = where(predY == 0)  
TP = count_nonzero(testY[Pind] == predY[Pind])
FP = count_nonzero(testY[Pind] != predY[Pind])
TN = count_nonzero(testY[Nind] == predY[Nind])
FN = count_nonzero(testY[Nind] != predY[Nind])
TPR = TP / (TP + FN)
FPR = FP / (FP + TN)
print("Confusion matrix:")
print("TP=", TP, end=" | ")
print("FP=", FP)
print("TN=", TN, end=" | ")
print("FN=", FN)
print("TPR=", TPR)
print("FPR=", FPR)

---------------------------
alpha : 0.001
best params: {'C': 26.366508987303583}
best score: 0.952457843154651
---------------------------
alpha : 0.0021544346900318843
best params: {'C': 7.847599703514607}
best score: 0.9558942692714895
---------------------------
alpha : 0.004641588833612777
best params: {'C': 2.3357214690901213}
best score: 0.9587589434813383
---------------------------
alpha : 0.01
best params: {'C': 2.3357214690901213}
best score: 0.9581871915743484
---------------------------
alpha : 0.021544346900318832
best params: {'C': 1.2742749857031335}
best score: 0.9352766983496085
---------------------------
alpha : 0.046415888336127774
best params: {'C': 1.2742749857031335}
best score: 0.6991549901126352
---------------------------
alpha : 0.1
best params: {'C': 0.01}
best score: 0.6515763594387368
---------------------------
alpha : 0.21544346900318823
best params: {'C': 0.01}
best score: 0.6515763594387368
---------------------------
alpha : 0.46415888336127775
best params: {'C': 0.01}
best score: 0.6515763594387368
---------------------------
alpha : 1.0
best params: {'C': 0.01}
best score: 0.6515763594387368
===============================
best alpha: 0.004641588833612777
test accuracy = 0.6641949152542372
F1 score: 0.49602543720190784
Confusion matrix:
TP= 156 | FP= 1
TN= 471 | FN= 316
TPR= 0.3305084745762712
FPR= 0.00211864406779661

Does using the new kernel improve the results?

Yes, the new kernel improved the results, but only to a very limited extent. However, the program does run faster.
- When using a kernel SVM classifier for face detection, the choice of kernel can have a significant impact on the performance of the classifier. A custom kernel based on cityblock distance can have advantages over an RBF-like kernel based on squared difference, depending on the specific characteristics of the data.
- Cityblock distance is a metric that measures the distance between two points by summing the absolute differences of their coordinates. This type of distance metric can be useful in face detection because it is robust to differences in lighting and contrast, which can cause pixel values to vary significantly. In contrast, an RBF-like kernel based on squared difference is sensitive to these differences, which can lead to overfitting and poor generalization performance.
- The advantage of using a custom kernel based on cityblock distance is that it can better capture the intrinsic structure of the face data, which can lead to improved classification performance. This is particularly true when the face data has significant variations in lighting, contrast, or other factors that can affect pixel values.

Image Feature Extraction

The detection performance is not that good. The problem is that we are using the raw pixel values as features, so it is difficult for the classifier to interpret larger structures of the face that might be important. To fix the problem, we will extract features from the image using a set of filters.

Run the below code to look at the filter output. The filters are a sets of black and white boxes that respond to similar structures in the image. After applying the filters to the image, the filter response map is aggregated over a 4x4 window. Hence each filter produces a 5x5 feature response. Since there are 4 filters, then the feature vector is 100 dimensions.

def extract_features(imgs, doplot=False):
    # the filter layout
    lay = [array([-1,1]), array([-1,1,-1]),  
               array([[1],[-1]]), array([[-1],[1],[-1]])]
    sc=8            # size of each filter patch
    poolmode = 'i'  # pooling mode (interpolate)
    cmode = 'same'  # convolution mode
    brick = ones((sc,sc))  # filter patch
    ks = []
    for l in lay:
        tmp = [brick*i for i in l]
        if (l.ndim==1):
            k = hstack(tmp)
        else:
            k = vstack(tmp)
        ks.append(k)

    # get the filter response size
    if (poolmode=='max') or (poolmode=='absmax'):
        tmpimg = maxpool(maxpool(imgs[0])) # do max pooling, but I forget where I got this function
    else:
        tmpimg = ndimage.zoom(imgs[0], 0.25)        
    fs = prod(tmpimg.shape)
    
    # get the total feature length
    fst = fs*len(ks)

    # filter the images
    X  = empty((len(imgs), fst))
    for i,img in enumerate(imgs):
        x = empty(fst)

        # for each filter
        for j,th in enumerate(ks):
            # filter the image
            imgk = signal.convolve(img, ks[j], mode=cmode)
            
            # do pooling
            if poolmode == 'maxabs':
                mimg = maxpool(maxpool(abs(imgk)))
            elif poolmode == 'max':
                mimg = maxpool(maxpool(imgk))
            else:
                mimg = ndimage.zoom(imgk, 0.25)
    
            # put responses into feature vector
            x[(j*fs):(j+1)*fs] = ravel(mimg)
               
            if (doplot):             
                plt.subplot(3,len(ks),j+1)
                plt.imshow(ks[j], cmap='gray', interpolation='nearest')
                plt.title("filter " + str(j))
                plt.subplot(3,len(ks),len(ks)+j+1)
                plt.imshow(imgk, cmap='gray', interpolation='nearest')
                plt.title("filtered image")
                plt.subplot(3,len(ks),2*len(ks)+j+1)
                plt.imshow(mimg, cmap='gray', interpolation='nearest')
                plt.title("image features")
        X[i,:] = x
    
    return X

# new features
img = imgdata['train'][0]
plt.imshow(img, cmap='gray', interpolation='nearest')
plt.title("image")
plt.figure(figsize=(9,9))
extract_features([img], doplot=True);

svg

Now lets extract image features on the training and test sets. It may take a few seconds.

trainXf = extract_features(imgdata['train'])
print(trainXf.shape)
testXf = extract_features(imgdata['test'])
print(testXf.shape)

(1745, 100)
(944, 100)

Detection using Image Features

Now train AdaBoost and SVM classifiers on the image feature data. Evaluate on the test set.

# first scale the features
scalerf = preprocessing.MinMaxScaler(feature_range=(-1,1))    # make scaling object
trainXfn = scalerf.fit_transform(trainXf)   # use training data to fit scaling parameters
testXfn  = scalerf.transform(testXf)        # apply scaling to test data

# AdaBoost
paramgrid = {
    "learning_rate": logspace(-5, 0, 8),
    "n_estimators": array([1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000]),
}

adacv_f = model_selection.GridSearchCV(
    ensemble.AdaBoostClassifier(random_state=1), paramgrid, cv=5, n_jobs=6, verbose=10
)


adacv_f.fit(trainXfn, trainY)

print("best params:", adacv_f.best_params_)
print("best score:", adacv_f.best_score_)

# predict from the model
predY = adacv_f.predict(testXfn)

# calculate accuracy
acc = metrics.accuracy_score(testY, predY)
print("test accuracy =", acc)

Fitting 5 folds for each of 88 candidates, totalling 440 fits
[CV 2/5; 1/88] START learning_rate=1e-05, n_estimators=1........................
[CV 1/5; 1/88] START learning_rate=1e-05, n_estimators=1........................
[CV 3/5; 1/88] START learning_rate=1e-05, n_estimators=1........................
[CV 2/5; 1/88] END learning_rate=1e-05, n_estimators=1;, score=0.725 total time=   0.0s
[CV 1/5; 1/88] END learning_rate=1e-05, n_estimators=1;, score=0.587 total time=   0.0s
[CV 4/5; 1/88] START learning_rate=1e-05, n_estimators=1........................
[CV 3/5; 1/88] END learning_rate=1e-05, n_estimators=1;, score=0.679 total time=   0.0s
[CV 5/5; 1/88] START learning_rate=1e-05, n_estimators=1........................
[CV 1/5; 2/88] START learning_rate=1e-05, n_estimators=2........................
[CV 2/5; 2/88] START learning_rate=1e-05, n_estimators=2........................
[CV 3/5; 2/88] START learning_rate=1e-05, n_estimators=2........................
[CV 4/5; 1/88] END learning_rate=1e-05, n_estimators=1;, score=0.693 total time=   0.0s
[CV 5/5; 1/88] END learning_rate=1e-05, n_estimators=1;, score=0.762 total time=   0.0s
[CV 4/5; 2/88] START learning_rate=1e-05, n_estimators=2........................
[CV 5/5; 2/88] START learning_rate=1e-05, n_estimators=2........................
[CV 1/5; 3/88] START learning_rate=1e-05, n_estimators=5........................
[CV 1/5; 2/88] END learning_rate=1e-05, n_estimators=2;, score=0.587 total time=   0.0s
[CV 2/5; 2/88] END learning_rate=1e-05, n_estimators=2;, score=0.725 total time=   0.0s
[CV 2/5; 3/88] START learning_rate=1e-05, n_estimators=5........................
[CV 3/5; 3/88] START learning_rate=1e-05, n_estimators=5........................
[CV 3/5; 2/88] END learning_rate=1e-05, n_estimators=2;, score=0.679 total time=   0.0s
[CV 5/5; 3/88] START learning_rate=1e-05, n_estimators=5........................
[CV 4/5; 2/88] END learning_rate=1e-05, n_estimators=2;, score=0.693 total time=   0.0s
[CV 5/5; 2/88] END learning_rate=1e-05, n_estimators=2;, score=0.762 total time=   0.0s

[CV 2/5; 399/400] START C=1000.0, gamma=483.2930238571752.......................
[CV 4/5; 398/400] END C=1000.0, gamma=233.57214690901213;, score=0.653 total time=   0.3s
[CV 5/5; 398/400] START C=1000.0, gamma=233.57214690901213......................
[CV 4/5; 397/400] END C=1000.0, gamma=112.88378916846884;, score=0.653 total time=   0.3s
[CV 5/5; 399/400] START C=1000.0, gamma=483.2930238571752.......................
[CV 1/5; 398/400] END C=1000.0, gamma=233.57214690901213;, score=0.650 total time=   0.3s
[CV 2/5; 400/400] START C=1000.0, gamma=1000.0..................................
[CV 3/5; 399/400] END C=1000.0, gamma=483.2930238571752;, score=0.650 total time=   0.3s
[CV 4/5; 399/400] START C=1000.0, gamma=483.2930238571752.......................
[CV 3/5; 398/400] END C=1000.0, gamma=233.57214690901213;, score=0.650 total time=   0.3s
[CV 4/5; 400/400] START C=1000.0, gamma=1000.0..................................
[CV 2/5; 399/400] END C=1000.0, gamma=483.2930238571752;, score=0.650 total time=   0.3s
[CV 5/5; 400/400] START C=1000.0, gamma=1000.0..................................
[CV 5/5; 398/400] END C=1000.0, gamma=233.57214690901213;, score=0.653 total time=   0.3s
[CV 5/5; 399/400] END C=1000.0, gamma=483.2930238571752;, score=0.653 total time=   0.3s
[CV 1/5; 400/400] START C=1000.0, gamma=1000.0..................................
[CV 2/5; 400/400] END ...C=1000.0, gamma=1000.0;, score=0.650 total time=   0.3s
[CV 3/5; 400/400] START C=1000.0, gamma=1000.0..................................
[CV 4/5; 399/400] END C=1000.0, gamma=483.2930238571752;, score=0.653 total time=   0.3s
[CV 4/5; 400/400] END ...C=1000.0, gamma=1000.0;, score=0.653 total time=   0.2s
[CV 5/5; 400/400] END ...C=1000.0, gamma=1000.0;, score=0.653 total time=   0.2s
[CV 1/5; 400/400] END ...C=1000.0, gamma=1000.0;, score=0.650 total time=   0.2s
[CV 3/5; 400/400] END ...C=1000.0, gamma=1000.0;, score=0.650 total time=   0.2s
best params: {'C': 1000.0, 'gamma': 0.008858667904100823}
best score: 0.9570200573065903
test accuracy = 0.7372881355932204

Error Analysis

Repeat the error analysis for the new classifiers.

predY_list = []
predY_list.append(adacv_f.predict(testXfn))
predY_list.append(svmcv_f.predict(testXfn))
model_list = ["AdaBoost", "SVM"]

for i in range(0, len(predY_list)):
    print("------------------------------------")
    print("         model : ", model_list[i])
    print("------------------------------------")
    print("Accuracy:", metrics.accuracy_score(testY, predY_list[i]))
    print("F1 score:", metrics.f1_score(testY, predY_list[i]))

    # confusion matrix
    Pind = where(predY_list[i] == 1)
    Nind = where(predY_list[i] == 0)
    TP = count_nonzero(testY[Pind] == predY_list[i][Pind])
    FP = count_nonzero(testY[Pind] != predY_list[i][Pind])
    TN = count_nonzero(testY[Nind] == predY_list[i][Nind])
    FN = count_nonzero(testY[Nind] != predY_list[i][Nind])
    TPR = TP / (TP + FN)
    FPR = FP / (FP + TN)
    print("Confusion matrix:")
    print("TP=", TP, end=" | ")
    print("FP=", FP)
    print("TN=", TN, end=" | ")
    print("FN=", FN)
    print("TPR=", TPR)
    print("FPR=", FPR)

------------------------------------
         model :  AdaBoost
------------------------------------
Accuracy: 0.715042372881356
F1 score: 0.6184397163120567
Confusion matrix:
TP= 218 | FP= 15
TN= 457 | FN= 254
TPR= 0.461864406779661
FPR= 0.03177966101694915
------------------------------------
         model :  SVM
------------------------------------
Accuracy: 0.7372881355932204
F1 score: 0.6545961002785515
Confusion matrix:
TP= 235 | FP= 11
TN= 461 | FN= 237
TPR= 0.4978813559322034
FPR= 0.023305084745762712

How has the classifier using image features improved?

The classifier increases the TP value and reduces the TN value to a considerable extent, which means that the classifier can recognize faces more easily, resulting in an increase in the accuracy and F1 value.
Training machine learning models on image feature data rather than the original image data can improve the performance of the models for a few reasons:
- Dimensionality reduction: Image feature data typically has fewer dimensions than the original image data. This reduces the number of features that the models have to learn from, making them more efficient and less prone to overfitting.
- Noise reduction: Image feature data is often pre-processed to remove noise and enhance relevant features. This can make the models more robust to noisy input data.
- Increased generalization: Image feature data can capture higher-level information about the images, such as edges, textures, and shapes, which can be more useful for classification than pixel-level information. This can improve the generalization of the models to new, unseen data.

Test image

Now lets try your face detector on a real image. Download the “nasa-small.png” image and put it in the same directory as your ipynb file. The below code will load the image, crop out image patches and then extract features. (this may take a few minutes)

1	fname = "nasa-small.png"

# load image
testimg = skimage.io.imread(fname, as_gray=True)
print(testimg.shape)
plt.imshow(testimg, cmap='gray')

(210, 480)





<matplotlib.image.AxesImage at 0x7ff202448580>

svg

# step size for the sliding window
step = 4

# extract window patches with step size of 4
patches = skimage.util.view_as_windows(testimg, (19,19), step=step)
psize = patches.shape
# collapse the first 2 dimensions
patches2 = patches.reshape((psize[0]*psize[1], psize[2], psize[3]))
print(patches2.shape )

# histogram equalize patches (improves contrast)
patches3 = empty(patches2.shape)
for i in range(patches2.shape[0]):
    patches3[i,:,:] = skimage.exposure.equalize_hist(patches2[i,:,:])

# extract features
newXf = extract_features(patches3)

(5568, 19, 19)

Now predict using your classifier. The extracted features are in newXf, and scaled features are newXfn.

1	newXfn = scalerf.transform(newXf) # apply scaling to test data

1
2
3

# use the SVM model to predict
prednewY = svmcv_f.predict(newXfn)

Now we we will view the results on the image. Use the below code. prednewY is the vector of predictions.

# reshape prediction to an image
imgY = prednewY.reshape(psize[0], psize[1])

# zoom back to image size
imgY2 = ndimage.zoom(imgY, step, output=None, order=0)
# pad the top and left with half the window size
imgY2 = vstack((zeros((9, imgY2.shape[1])), imgY2))
imgY2 = hstack((zeros((imgY2.shape[0],9)), imgY2))
# pad right and bottom to same size as image
if (imgY2.shape[0] != testimg.shape[0]):
    imgY2 = vstack((imgY2, zeros((testimg.shape[0]-imgY2.shape[0], imgY2.shape[1]))))
if (imgY2.shape[1] != testimg.shape[1]):
    imgY2 = hstack((imgY2, zeros((imgY2.shape[0],testimg.shape[1]-imgY2.shape[1]))))
    
# show detections with image
#detimg = dstack(((0.5*imgY2+0.5)*testimg, 0.5*testimg, 0.5*testimg))
nimgY2 = 1-imgY2
tmp = nimgY2*testimg
detimg = dstack((imgY2+tmp, tmp, tmp))

# show it!
plt.figure(figsize=(9,9))
plt.subplot(2,1,1)
plt.imshow(imgY2, interpolation='nearest')
plt.title('detection map')
plt.subplot(2,1,2)
plt.imshow(detimg)
plt.title('image')
plt.axis('image')

(-0.5, 479.5, 209.5, -0.5)

svg

How did your face detector do?

Among the 23 faces, 15 faces were successfully recognized. Clothes badges were easily mistaken for human faces, and places with complex background textures were easily mistaken for human faces. Faces with less prominent eyebrows are ignored.

You can try it on your own images. The faces should all be around 19x19 pixels though. We only used 1/4 of the training data. Try using more data to train it!