Extra-credit Prelab - Linear Regression

Summary

In this prelab, you will be introduced to the concept of linear regression and gradient descent. You will then use these concepts to try and find a best fit line to the given Boston housing price data.

Downloads

BostonHousePrices.csv

Submission Instruction

Refer to the Submission Instruction page for more information.

Part 1 - Plotting data

It is always a good practice to visualize the data before starting any task. The dataset contains the median value of house price in $1,000 (response, $y$ ) and the average number of rooms per dwelling (features, $x$ ). Use a scatter plot to display the house prices vs the number of rooms.

Assignment 1

Using the following Python script as an example, display the scatter plot of the training and test data.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Read data file
boston = pd.read_csv('BostonHousePrices.csv')

# Store the response in y and the feature in x
y = boston['Value'].to_numpy()
x = boston[['Rooms']].to_numpy()

# Create the training set and test set
train_y = y[:400]
train_x = x[:400]
test_x = x[400:]
test_y = y[400:]

# Plot the data using plt.scatter function
# ======== YOUR CODE HERE ========


# ================================

Part 2 - Gradient Descent

In this section, you will fit the linear regression parameter $\theta$ to the dataset. We use $x$ to denote the input features (i.e. number of rooms), and $y$ to denote the output responses (i.e. house prices). Our goal is to learn a function $h$ (or hypothesis) so that $h(x)$ can predict the correct responses. We use a cost funtion to measure the accuracy of the prediction. The cost function will be an average of the squared error between the predictions and the actual responses, defined by

$J(\theta) = \frac{1}{2 m} \sum^m_{i=1} (h_{\theta}(x^{(i)}) - y^{(i)})^2$

where $m$ is the size of the dataset. Note that the superscript (i) denotes the i-th dataset. We can choose the hypothesis $h_{\theta}$ to be "linear" given by

$h_{\theta}(x) = \theta^T x = \theta_0 + \theta_1 x_1$

Now we need to estimate the parameters $\theta$ that minimize the cost function. In order to achieve this, we will use an iterative algorithm called gradient descent which gently updates our parameters in accordance with the gradient of the cost function and a small step size or learning rate $\alpha$ . Recall from calculus that the gradient gives the direction of fastest increase of a function, thus by subtracting the gradient times alpha we are moving in the direction of fastest decrease of a function which we need in order to minimize our cost function $J(\theta)$ . In our update equation the current iteration is at timestep $t$ while the new update is timestep $t + 1$ . The following figure shows an example of the cost function as a function of $\theta$ . The derivative at the red point will give us a direction toward the minimum of the cost function, i.e. decrease $\theta$ if the derivative is positive.

Denoting $\theta (t)$ for the parameters at time step $t$ , $\theta (t+1)$ is updated by

$\begin{align} \theta_j(t+1) &:= \theta_j(t) - \alpha \frac{\partial }{\partial \theta_j(t)} J(\theta_0, \theta_1) \\ &:= \theta_j(t) - \alpha \frac{1}{m} \sum^m_{i=1} (h_{\theta}(x^{(i)}) - y^{(i)})x^{(i)}_j \end{align}$ where $j=0,1$ .

We can rewrite the equation for each parameter (note that $x_0$ is 1): $\begin{align} \theta_0(t+1) &:= \theta_0(t) - \alpha \frac{1}{m} \sum^m_{i=1} (h_{\theta}(x^{(i)}) - y^{(i)})\\ \theta_1(t+1) &:= \theta_1(t) - \alpha \frac{1}{m} \sum^m_{i=1} (h_{\theta}(x^{(i)}) - y^{(i)})x^{(i)}_1 \end{align}$ where $\theta_0$ and $\theta_1$ are updated simultaneously. .

Assignment 2

Using the code below, implement functions to compute the cost junction, $J(\theta)$ , and its partial derivative with respect to the parameter $\theta$ .

Python Hints

Matrix multiplication can be done by '@' operation in Python.
We can re-write the cost function $J(\theta)$ by the following matrix form. $\begin{align} J(\theta) = \frac{1}{2 m} (\theta X - y) (\theta X - y) ^T \end{align}$ where $\theta = \begin{bmatrix} \theta_0 \ \theta_1 \ \cdots \ \theta_n \end{bmatrix} \quad X = \begin{bmatrix} \vert & \vert & & \vert\\ x^{(1)} & x^{(2)} & \cdots & x^{(m)}\\ \vert & \vert & & \vert \end{bmatrix} \quad y = \begin{bmatrix} y^{(1)} \ y^{(2)} \ \cdots \ y^{(m)} \end{bmatrix}$

# Variables: x - the first array
#            y - the second array
# Description: calculate the mean squared error loss between two arrays
def mse(x, y):
    return np.mean((x-y)**2)

# Variables: theta -  linear regression parameter
#            X - features
#            y - responses
# Description: compute the cost function J, given theta and dataset
def cost_lin(theta, X, y):
# ======== YOUR CODE HERE ========


# ================================
    return J


# Variables: theta -  linear regression parameter
#            X - features
#            y - responses
# Description: computes the partial derivative of the cost with respect to parameter theta.
def grad_lin(theta, X, y):
    m = X.shape[0]
# ======== YOUR CODE HERE ========


# ================================
    return grad

Part 3 - Training

We are now ready to train our parameters given the data set. In the following codes, we initialize the learning rate to 0.01 and set the parameters $\theta$ to the arrays of zeros. Note that we need to add another dimension ( $x_0$ ) to our feature to accommodate $\theta_0$ .

Assignment 3

Using the code below, train the parameter using the cost function and the gradient function implemented in Assignment 2. You may change the learning rate, number of iterations and the initial parmeter values.
Plot the cost function with the iterations.
Use your final parameters to plot the linear fit together with the scatter plot in Assignment 1.


#Train Classifier
num_iter = 15000
alpha = 1*10**(-2)

# Add a column of ones to train_x
ones = np.ones((train_x.shape[0],1))
train_X = np.hstack((ones, train_x))

# Add a column of ones to test_x
ones = np.ones((test_x.shape[0],1))
test_X = np.hstack((ones, test_x))

# Initialize parameters
init_theta = np.zeros((1, train_X.shape[1]))

# ======== YOUR CODE HERE ========


# ================================

Grading

Prelab will be graded as follows:

Assignment 1 [0.5 point]

A scatterplot of the dataset (number of rooms vs house price). [0.5 point]
Assignment 2 [0.5 point]

Correct implementation of the cost function and gradient function. [0.5 point]
Assignmet 3 [1.0 point]

Plot of the cost function with iterations. [0.5 point]

Plot of the linear fit on the scatter plot. [0.5 point]