The goal is to
- Train a convolutional neural network to classify images of articles of clothing.
- Calculate and prune the bottom 0.01th percentile of weights when ranked by magnitude.
- Evaluate the performance of the pruned network and compare to the unpruned network.
Exploring the use of weight magnitude as a pruning metric on this concrete example problem will illuminate whether it can be useful for other, more complex image recognition problems that require larger neural networks.
This project was motivated by a desire to become more familiar with PyTorch. While I had seen PyTorch code before, I had not written substantive original code with PyTorch and thus only had a tenuous grasp of how machine learning implementations are created with it.
Image Classification
Our definition of an image classification problem is as follows. Let there be a sample
A commonly-employed solution for image classification problems are convolutional neural networks (CNNs)– a variant of neural networks that employ
Convolutions
Given a
Since a result of a convolution operation on
Pruning based on metrics
With many CNNs employing multiple different convolutions, and with each kernel spanning two dimensions, the storage requirements can blow up. Thus, it is a natural desire to want to reduce the number of parameters needed while maintaining the model's accuracy.
We call the individual entries of each kernel in the model "weights". We consider the problem of choosing 0.01% of the model's weights to set to 0. An informed way of doing so would be to choose based on some metric that identifies "unimportant" weights.
In this project, we consider one such metric, rather simply, the magnitude of the weight.
We prune (set to 0) the weights in the bottom 0.01% of all weights in the model when ranked by magnitude, as these weights should be the 0.01% least impactful.
To evaluate the usefulness of this metric, we must perform pruning with it on a concrete example problem. We consider the problem of classifying images from the Fashion-MNIST dataset (Xiao et al, 2017). The Fashion-MNIST dataset is a set of 70,000 28
Before pruning, the accuracy on test data was 72.86%. After pruning, the accuracy on the test data was 71.31%.
Thus, the tradeoff seen was a 1.55% decrease in accuracy for a 0.01% decrease in the number of saved weights.
Whether this tradeoff is a good one is out of the scope of this paper and likely requires knowledge in computer systems and hardware. Thus, it is a difficult question to evaluate with my current background.