im3_t = tensor(im3)
df = pd.DataFrame(im3_t[4:15,4:22])
df.style.set_properties(**{'font-size':'6pt'}).background_gradient('Greys')
Here, we’ve isolated and measured the pixel values of each pixel within our 28x28 image, black is 255, white is 0, and any shades of gray are in between.
We might then think that we can generate a baseline model that is simple to implement, very easy to test, so that we can test our “improved” and more complex ideas to see if they actually perform better. (this is what a “baseline” means)
Our baseline model would find the average pixel value for every pixel in the 3s and then the 7s. Then we can compare any new images of 3s and 7s to our averages and categorize them accordingly.
So, step 1 of the model is to get the average pixel values.
We need to first create a tensor containing all of the 3s and then one for all of the 7s:
three_tensors = [tensor(Image.open(o)) for o in threes]
seven_tensors = [tensor(Image.open(o)) for o in sevens]
len(three_tensors),len(seven_tensors)
Let’s review our goal, here, for every pixel, we want to compute the average over all the images of that pixel’s intensity. First, we combine all the images in this list into single three-dimensional tensors.
Now let’s check that the images look ok:
show_image(three_tensors[1]);
For all pixels, we want to compute their average over all the images for that pixel. We need to stack all of the images together into a single three-dimensional tensor using the stack
function from pytorch.
stacked_sevens = torch.stack(seven_tensors).float()/255
stacked_threes = torch.stack(three_tensors).float()/255
stacked_threes.shape
Let’s display our new “ideal” numbers:
mean3 = stacked_threes.mean(0)
show_image(mean3);
This is what our ideal number 3 looks like, let’s see what our ideal 7 looks like:
mean7 = stacked_sevens.mean(0)
show_image(mean7);
Now, as a proof of concept, let’s pick an arbitrary 3 and measure how far away it is from our ideal 3. To measure the distance from the ideal 3, we can’t just add up the differences in pixel values for the image, some of those values will be negative, and others will be positive, which might end up canceling out. So the model might say our image is a perfect 3 when in reality some parts are too dark and other parts are too light.
There are two main ways that we use to combat this:
- Take the mean of the absolute value of differences. Mean absolute difference or L1 norm
- Take the mean of the square of differences, and then take the square root. Root mean squared error (RMSE) or L2 norm
For mean absolute difference:
dist_3_abs = (a_3 - mean3).abs().mean()
dist_3_sqr = ((a_3 - mean3)**2).mean().sqrt()
dist_3_abs,dist_3_sqr
>>> (tensor(0.1114), tensor(0.2021))
For Root mean squarred error:
dist_7_abs = (a_3 - mean7).abs().mean()
dist_7_sqr = ((a_3 - mean7)**2).mean().sqrt()
dist_7_abs,dist_7_sqr
>>> (tensor(0.1586), tensor(0.3021))
In both of these cases, the distance between the test 3, and the ideal three is less than the difference between the test 3 and the ideal 7, so we at least know that the simple model gives the right prediction in this case.
We don’t actually need to write out all of the code for the methods above, instead, PyTorch provides both of them as loss functions inside of torch.nn.functional
which you usually import as F
F.l1_loss(a_3.float(),mean7), F.mse_loss(a_3,mean7).sqrt()
>>> (tensor(0.1586), tensor(0.3021))
Questions
What are arrays and tensors?
- NumPy array is a multidimensional table of data, where all items are the same type, even other arrays.
- PyTorch tensor is basically the same thing as an array, but it has the added restriction that all items have to use a single basic numeric type. This means a pytorch tensor cannot be jagged, for example.
How to create a tensor/array?
data = [[1,2,3],[4,5,6]]
arr = array (data)
tns = tensor(data)
arr # numpy
>>> array([[1, 2, 3], [4, 5, 6]])
tns # pytorch
>>> tensor([[1, 2, 3], [4, 5, 6]])
What is a metric?
- A number that is calculated based on predictions of a model, and correct labels in a dataset to tell us how good our model is.
- In practice, we normally use accuracy as the metric
We only calculate metrics over validation sets so as to avoid overfitting of the training set,
Luckily, the makers of MNIST have already created for us separate training and validation sets, that’s what the directory called “valid” is for!
We first start with creating tensors for 3s and 7s from that directory
valid_3_tens = torch.stack([tensor(Image.open(o))
for o in (path/'valid'/'3').ls()])
valid_3_tens = valid_3_tens.float()/255
valid_7_tens = torch.stack([tensor(Image.open(o))
for o in (path/'valid'/'7').ls()])
valid_7_tens = valid_7_tens.float()/255
valid_3_tens.shape,valid_7_tens.shape
We see two tensors, one for the 3s validation set with 1,010 28x28 images, and one for the 7s validation set with 1028 28x28 images.
Calculating mean absolute error:
def mnist_distance(a,b): return (a-b).abs().mean((-1,-2))
mnist_distance(a_3, mean3)
Stochastic Gradient Descent
Steps of any deep learning model:
- Initialize the weights
- For each images, use these weights to predict whether it appears to be a 3 or a 7.
- Based on these predictions, calculate how good the model is (its loss).
- Calculate the gradient, which measures for each weight, how changing that weight would change the loss
- Step (that is, change) all the weights based on that calculation.
- Go back to the step 2, and repeat the process.
- Iterate until you decide to stop the training process (for instance, because the model is good enough or you don’t want to wait any longer).
Guidelines for each of the seven steps above:
- Initialize: We initialize the parameters to random values
- Loss: We need a function that will return a number that is small if performance is good, and a large number if performance is bad
- Step: A simple way to find if a weight should increase or decrease is to actually increase and decrease it and see which gives you the best performance. But this is slow, instead we can utilize calculus to directly figure out in what direction and by how much to change each weight, we do this by calculating gradients.
- Stop: Once we’ve decided how many epochs to train the model, we actually go through and train the model. For our digit classifier, we would keep training until we ran out of time, or the accuracy of our model started getting worse
Calculating Gradients
The magic step is the part where we calculate gradients.