Extra-credit Lab - Digit Recognition

Summary

In this lab, you will build on the ideas presented in the prelab by creating and designing a multiclass linear classifier using logistic regression for handwritten digit recognition.

Downloads

Python

Part 1 - Visualize data

In this lab, we will use the MNIST dataset of handwritten digits to train our classifier. The dataset contains 40,000 training images and 10,000 test images with the corresponding labels. Each digit is a 28x28 pixel image stored as a 1D vector and normalized to the range [0,255]. All of the images are in grayscale.

Assignment 1

Using the following Python script as an example, display the 1st digit image and its label.

import numpy as np
import os
import matplotlib.pyplot as plt

image_size = 28

# Load data
train = np.load(os.getcwd()+'/mnist_train.npy', allow_pickle=True)
test = np.load(os.getcwd()+'/mnist_test.npy', allow_pickle=True)

train_data = train[()]['data']
train_labels = train[()]['labels']

test_data = test[()]['data']
test_labels = test[()]['labels']

# Print dataset dimensions
print(train_data.shape,  test_data.shape)
print(train_labels.shape,  test_labels.shape)


# Show the 1st digit image from dataset and print its label
# ======== YOUR CODE HERE ========


# ================================

Part 2 - Binary Classification

The classification problem is just like the regression problem, except we want to predict the responses as a discrete value. In binary classification, the possible value of the response is either 1 (true) or 0 (false). In this excercise, we will build a binary classifier that outputs 1 (true) if the digit is zero, and 0 (false) otherwise.

We could use the linear regression ignoring the response is discrete. However, it does not make sense that $h_{\theta}(x)$ outputs smaller than 0 or larger than 1 knowing that the possible outcome is either 1 or 0. Moreover, any outlier in the training dataset has a direct impact on the parameter. To fix these issues, we change our hypothesis $h_{\theta}(x)$ to satisfy $0 \leq h_{\theta} \leq 1$ and to have less impact from the outliers. Instead of the linear hypothesis, we plug $\theta^T x$ in the logistic function.

$h_{\theta}(x) = g(\theta^T x) \\ z = \theta^T x \\ g(z) = \frac{1}{1+ e^{-z}}$

The following figure shows the "sigmoid function" or "logistic function".

Intuitively, $h_{\theta}(x)$ gives us the probability of output 1. In our case, for example, if $h_{\theta}(x) = 0.8$ , it means a probability of 80% that the digit is zero.

Since we change our hypothesis $h_{\theta}(x)$ , the cost function is now updated. We will use the following cost function for logistic regression.

$J(\theta) = -\frac{1}{m} \sum^m_{i=1} [y^{(i)} \log (h_{\theta}(x^{(i)})) + (1-y^{(i)}) \log (1- h_{\theta}(x^{(i)}))]$

The following figures summerize how the cost function works. If the correct response is 1 (left figure), the cost function will become 0 as our prediction approaches to 1. If the prediction appraoches to 0 (which is a wrong direction), then the cost function will approach to infinity. If the correct response is 0 (right figure), the cost function will become 0 as our prediction approaches to 0. If the prediction appraoches to 1 (which is a wrong direction), then the cost function will approach to infinity.

Finally, we can update the parameters $\theta$ by gradient descent as follows.

$\begin{align} \theta_j(t+1) &:= \theta_j(t) - \alpha \frac{\partial }{\partial \theta_j(t)} J(\theta(t)) \\ &:= \theta_j(t) - \alpha \frac{1}{m} \sum^m_{i=1} (h_{\theta}(x^{(i)}) - y^{(i)})x^{(i)}_j \end{align}$ Note that the gradient looks identical to the linear regression, except $h_{\theta}(x)$ is defined by the sigmoid function.

Assignment 2

Using the code below, implement

sigmoid function
cost junction for logistic regression, $J(\theta)$
partial derivative with respect to the parameter $\theta$

def sigmoid(z):
# ======== YOUR CODE HERE ========


# ================================

def grad_log(theta, X, y):
    m = X.shape[0]
# ======== YOUR CODE HERE ========


# ================================

def cost_log(theta, X, y):
    m = X.shape[0]
# ======== YOUR CODE HERE ========


# ================================

Part 3 - Training for multiclass

We are now ready to train our classifier given the data set. In the following codes, we initialize the learning rate to 0.01 and the parameters by 0's. As we did in Prelab, we add another dimension to our features to accommodate $\theta_0$ (1+28*28=785 features). The response y for digit-0 classifier will be the first column of the labels, i.e. train_labels[:,0] .

Once the digit-0 classifier is trained, we can train the rest classifiers in the similar way. For example, digit-1 classifier will output the probability of digit 1 and can be trained by the second column of the labels, i.e. train_labels[:,1] . At the end, you will have 10 classifiers giving us the probability of 10 digits for each. Therefore, the parameters $\theta$ will be a 10x785 matrix (one classifier in a row).

Assignment 3

Using the code below, train the parameter using the cost function and the gradient function implemented in Assignment 2. You may change the learning rate, number of iterations and the initial parmeter values.
Plot the cost function of each classifier with the iterations.

alpha = 10**(-9)
num_iter = 3000
num_classes = 10

# Add a column of ones to train_x
ones = np.ones((train_data.shape[0],1))
train_X = np.hstack((ones, train_data))

# 10-by-785 matrix for the total classifier
theta = np.zeros((num_classes, train_X.shape[1]))
# ======== YOUR CODE HERE ========


# ================================

Part 4 - Testing for multiclass

We determine our prediction by choosing the classifier that gives us the maximum probability. When we have more than 2 classes, the overall accuracy alone can be misleading. To evaluate how accurate the classifiers are, we use a 'confusion matrix', a technique for summerizing the performance of a classification algorithm. For example, we may get the overall accuracy of 90%, but you don't know that's because all classes are being predicted equally well or whether one or two classes are being neglected. The confusion matrix counts the number of correct and incorrect predictions and breaks down the numbers by each class.

Assignment 4

Using the code below, print out the confusion matrices for the training and test dataset.

def test_classifier(theta, X, y):
    def squared_error(a,b):
        return (a-b)**2
    def conf_matrix(y, y_hat, num_classes):
        m = np.zeros((num_classes, num_classes))
        for i in range(len(y)):
            m[int(y[i]), int(y_hat[i])] += 1.0

        for i in range(m.shape[0]):
            m[i,:] = 1./np.sum(m[i,:]) * m[i,:]
        return m
    def accuracy(y, y_hat):
        acc = 0.0
        for i in range(y.shape[0]):
            if y[i] == y_hat[i]:
                acc += 1.0
        return 1./float(y.shape[0]) * acc

# Get one hot vector estimates for the trained linear classifier
    y_hat = sigmoid(theta @ X.T)

#argmax over number of classes
    y_labels = np.argmax(y.T, axis=0)
#argmin over number of classes (smallest error is the chosen label)
    y_hat_labels = np.argmin(squared_error(y_hat, np.ones(y_hat.shape)), axis=0)
    return accuracy(y_labels, y_hat_labels), conf_matrix(y_labels, y_hat_labels, 10)


ones = np.ones((train_data.shape[0],1))
X = np.hstack((ones, train_data))

# Use your trained theta and evaluate with training data
train_acc, train_conf = test_classifier(theta, X, train_labels)

ones = np.ones((test_data.shape[0],1))
X = np.hstack((ones, test_data))

# Use your trained theta and evaluate with training data
test_acc, test_conf = test_classifier(theta, X, test_labels)

np.set_printoptions(precision=3)
np.set_printoptions(suppress=True)

print("Training Set Accuracy: " + str(train_acc))
print("Training Set Confusion Matrix")
print(train_conf)

print("Testing Set Accuracy: " + str(test_acc))
print("Testing Set Confusion Matrix")
print(test_conf)

Android

For this lab, we will implement digit recognition system in the Android tablet. Note that we do not train the classifiers again in the tablet. Instead, we import the trained classifiers done in Python. Just like lab6 and lab7, we implement the system in Java.

Part 5 - Program Workflow

We use a similar workflow with Lab7, except loading the trained classifiers at the first stage and running the classification instead of KCF tracker.

Load the trained classifier by csv file format. class_flag is -2, meaning that the app is not classifying.
Open the camera and start previewing. The user uses the scroll bar on the right to modify window color and size. The window represents the ROI (region of interest). class_flag is -1, meaning that the app is not classifying.
When the user presses Button, class_flag is set to 0, meaning the app should start preprocessing and classification. The scroll bar is hidden since modifying ROI size is not allowed.
When class_flag is 0, preprocess the image in ROI and apply the classifiers to detect the digit. class_flag is then set to 1, meaning the app has completed digit recognition.
When class_flag is 1, report the detected digit.
When the user presses Button again, class_flag is then set back to -1. The app is ready to detect a new digit.

Part 6 - Import classifiers

First, export your trained classifiers $\theta$ (10-by-785 matrix) in csv file format from your python code, i.e. np.savetxt('theta.csv', theta, delimiter=','). Then, copy the csv file into app\res\raw folder. In the Java code, we already provided CSVFile class. To read the csv file, use the following code when class_flag is -2.

// Declare 10-by-785 matrix to store the classifiers
theta = new Mat(10, (28 * 28 + 1), CvType.CV_32FC1);

// Input stream
InputStream is = getResources().openRawResource(R.raw.theta_log);

CSVFile csv_file = new CSVFile(is);
List list = csv_file.read();
String[] l;

for (int i = 0; i < list.size(); i++) {
	l = (String[]) list.get(i);

	for (int j = 0; j < l.length; j++) {
		// Parsing string to double
		theta.put(i,j,Double.parseDouble(l[j]));
	}
}

Part 7 - Pre-processing image

Before using the raw image from the Android camera, we need to preprocess it for our classifier. Similar to Lab7, we use ROI to capture the target digit. Once we take the image, the size should be adjusted by 28-by-28 pixels as the training dataset. Use the following code to crop the image within the ROI (mCrop) and resize it by 28x28 patch (mResized).

// Crop ROI from image
Mat mCrop = mGray.submat((int) (myROI.y),(int) (myROI.y+myROIHeight),(int) (myROI.x), (int) (myROI.x+myROIWidth));

// Resize the image by 28x28 size of our tranining data
Mat mResized = new Mat(28, 28, CvType.CV_8UC1);
Imgproc.resize(mCrop,mResized,mResized.size(), 0,0, Imgproc.INTER_AREA);

// Vectorize the image
mResized = mResized.reshape(0,28*28);
mResized.convertTo(mResized,CvType.CV_32FC1);

To increase the accuracy of the classifier, consider intensity transformation learned in lab6. For example, if your test image taken from the Android has a black-colored digit with white background, you should invert the pixel values as the training images were in the opposite colors. Feel free to implement your own way to pre-process the intensity of the image. We recommend the use of histogram transforms in order to force the background color to be as close to black as possible just like the training images. One such example can be seen in the code below where the pixels were scaled to the full range of [0,255] before having their intensities inverted.

double Min = Core.minMaxLoc(mResized).minVal;
double Max = Core.minMaxLoc(mResized).maxVal;
double[] val1;
for (int i = 0; i < mResized.rows() ; i++) {
    val1 = mResized.get(i,0);
    val1[0] = Math.floor((val1[0]-Min)*255/(Max-Min));
    // if needed, create negative image to make black background & white digit
    mResized.put(i,0,255 - val1[0]);
}

Tip

You can quickly dump the pixel values in Logcat window. Use the following codes to debug.
String dump = mResized.dump(); Log.d(TAG, "DEBUG: mResized"+ dump);

Part 8 - Run classifiers

Complete the digit reconition system by applying the classifiers to the image. Similar to Part 4, compute y_hat by $\theta \cdot x$ with sigmoid function and choose the maximum value among the 10 classifiers. Note that we add another dimension to our input image to accommodate $\theta_0$ (1+28*28=785 features).

To perform the matrix multiplication, use Core.gemm function in OpenCV. The function Core.gemm(src1, src2, alpha, src3, beta, dest, flags) performs the multiplication according to the following function: dest = alpha * scr1 * src2 + beta * src3

Assignment 5

Implement the digit recognition system described above. You must show the detected digit on the Android screen.

// Add 1 at the beginning of the image vector
Mat x_val = new Mat(28*28+1,1, CvType.CV_32FC1);

for (int i = 0; i < x_val.rows(); i++) {
    if(i==0)
        x_val.put(i, 0, 1);
		else
        x_val.put(i, 0, mResized.get(i-1, 0));
}

// Calculate the probability vector
Mat yMat= new Mat(10,1, CvType.CV_32FC1);
float[] y_hat = new float[10];

// Using Core.gemm, compute yMat = theta * x_val
// ======== START YOUR CODE HERE ========
Core.gemm(???, ???, ???, new Mat(), 0.0, ???, 0);
// ======== END YOUR CODE HERE ========

// Copy yMat to y_hat
yMat.get(0,0, y_hat);
// Implement sigmoid function
// Hint: Use Math.exp for exponential function
for(int i=0; i< 10; i++){
// ======== START YOUR CODE HERE ========
    y_hat[i] = ??? ;
// ======== END YOUR CODE HERE ========
}

// Find the maximum element in y_hat and
// assign the detected digit in number_detected to print out on screen
// ======== START YOUR CODE HERE ========

// ======== END YOUR CODE HERE ========

Grading

Your lab will be graded as follows:

Prelab [2 points]
Lab [4 points]
- Python:
  - Assignment 1:
    Display the 1st digit image and its label in the training set. [0.5 point]
  - Assignment 2:
    Correct implementation of the cost function and gradient function of logistic regression and sigmoid function. [0.5 point]
  - Assignment 3:
    Plot of one of the cost function with iterations. [0.5 point]
  - Assignment 4:
    Display the confusion matrix for the training and test dataset. [0.5 point]
- Android:
  - Assignment 5:
    Correct app behaviors. [1.0 point]
    Reasonable y_hat and digit detection [1.0 point]