Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
588469b
Finished cpu version and naive version (stream) of stream_compaction …
Black-Phoenix Sep 11, 2019
424e10f
Finished Stream compact base assignment. Todo change pow to shift (se…
Black-Phoenix Sep 12, 2019
caf1e12
Finished Stream compaction (without extra credit)
Black-Phoenix Sep 12, 2019
58e386e
Finished shared memory scan and by extension speedup code enough to b…
Black-Phoenix Sep 13, 2019
a1e4fc1
First version of radix sort. TODO check version
Black-Phoenix Sep 13, 2019
a50718e
moved radix sort to src file
Black-Phoenix Sep 13, 2019
5754c22
Working version of radix sort. TODO implement checks on output
Black-Phoenix Sep 13, 2019
312aff0
Finished test cases for radix sort. TODO collect data and finish read…
Black-Phoenix Sep 13, 2019
c40fb9e
Skeleton code for NN; reformatted P1; Adds data generation code (todo…
Black-Phoenix Sep 14, 2019
e868e48
Finished simple forward pass
Black-Phoenix Sep 14, 2019
c5c6316
Finished data_gen v1 for stream compaction
Black-Phoenix Sep 15, 2019
7c209e4
Fixed memory leaks in stream_compaction gpu code; Fixed data type bug…
Black-Phoenix Sep 15, 2019
ddae0f8
Added data to generate plots; Reformated lines
Black-Phoenix Sep 15, 2019
a4c3feb
Increased max size of scan; todo check stream compaction max size; Ad…
Black-Phoenix Sep 15, 2019
f3698f3
Modified forward pass to use double precision numbers
Black-Phoenix Sep 15, 2019
30efb41
Sperated parameters from NN class
Black-Phoenix Sep 15, 2019
fb112b7
Pushed all raw data; Finished data_gen.cpp
Black-Phoenix Sep 16, 2019
3217bf4
Finished backprop (tested on xor)
Black-Phoenix Sep 17, 2019
5e8296e
Working image classification; todo improve speed to convergence
Black-Phoenix Sep 17, 2019
f847545
Added momentum gradient decent and adaptive learning rate to improve …
Black-Phoenix Sep 17, 2019
87680d4
Moved reduction to GPU
Black-Phoenix Sep 17, 2019
11753b7
Changed reset buffer to only clear end of array (improve performance)
Black-Phoenix Sep 17, 2019
62662ef
Added random rotations and achieved 100% acc on both kinds of data; A…
Black-Phoenix Sep 17, 2019
2a11a20
Finished README for Charecter Recognition
Black-Phoenix Sep 17, 2019
a770466
Updated Readme. Fixed typo in Stream compaction main file
Black-Phoenix Sep 17, 2019
c269b26
Fixed images in README
Black-Phoenix Sep 17, 2019
5df67fd
Added Plot images to repo
Black-Phoenix Sep 18, 2019
ec7bd3d
Added plots to readme
Black-Phoenix Sep 18, 2019
04f26b8
Fixed sort plots legend
Black-Phoenix Sep 18, 2019
43425ff
Update README.md
Black-Phoenix Sep 18, 2019
2db6a2d
Update README.md
Black-Phoenix Sep 18, 2019
f099406
Update README.md
Black-Phoenix Sep 18, 2019
671211a
Added 74k Code to main; Added loss plot for same
Black-Phoenix Sep 18, 2019
9d7112a
Reverted back to regular dataset
Black-Phoenix Sep 18, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions Project2-Character-Recognition/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ if(${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
endif()

include_directories(.)
link_directories(${CUDA_TOOLKIT_ROOT_DIR}/lib/x64)
add_subdirectory(character_recognition)

cuda_add_executable(${CMAKE_PROJECT_NAME}
Expand All @@ -32,4 +33,6 @@ cuda_add_executable(${CMAKE_PROJECT_NAME}
target_link_libraries(${CMAKE_PROJECT_NAME}
character_recognition
${CORELIBS}
cublas
curand
)
149 changes: 143 additions & 6 deletions Project2-Character-Recognition/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,149 @@ CUDA Character Recognition

**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 2**

* (TODO) YOUR NAME HERE
* (TODO) [LinkedIn](), [personal website](), [twitter](), etc.
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
* Name: Vaibhav Arcot

* [LinkedIn](https://www.linkedin.com/in/vaibhav-arcot-129829167/)

* Tested on: Windows 10, i7-7700HQ @ 2.8GHz (3.8 Boost) 32GB, External GTX 1080Ti, 11G (My personal laptop)

### (TODO: Your README)
### Overview
This code creates a fully connected neural network in **CUDA** which can identify the character from the image. For training this network, 52 images (1 for each letter case) were used along with random rotations (± 10°) were used. The results show that the network was able to identify the character with a 100% accuracy.

Include analysis, etc. (Remember, this is public, so don't put
anything here that you don't want to share with the world.)
![Sample Network](./img/sample_network.PNG)

### Architecture

The code was created to allow the addition of any number of hidden layers (while keeping the last layer as a softmax layer). For the toy problem, the network takes in 225 inputs (15x15) as a column vector and outputs a 52x1 array with the probability of each class. The structure, the weight matrix dimensions for the hidden layers are 98x65, 65x50, 30x25, 25x40, while the input layers weight matrix has dimensions 225x98 and the output layer has dimensions 40x52.

Between the layers, ReLu was the activation function used and softmax as the final layer and the loss function used was cross entropy loss.

### Dependencies
* OpenCV (to read images)
* CUDA 10
* Cublas (matrix multiplication)
* Curand (random GPU initialization)
### Neural Network overview

Neural networks are multi-layer networks of neurons (the blue and magenta nodes in the chart below) that we use to classify things, make predictions, etc. Each neuron activates on features, and the cascading of said neurons allows the network to activate on more complex representations of the input data. In a neural network, the final layer does the job of a support vector machine, which draws a hyperplane to classify the data (if that is the task).
![Neural Network](./img/MLP.png)

#### Neuron
A neuron is the building block of a neural network. It takes in a value and returns a nonlinear transformation of the input. We define a layer as a stack of neurons. The nonlinearity is the crucial part because you can show that any linear combination of layers can be simplified down to just 1 layer.
![Forward pass](./img/Weighting.png)

#### Activation functions

For the hidden layers, ReLu was the activation function of choice. This was because The function and its derivative both are monotonic. This is a nice property to have for the gradients. The only issue is the gradient blow up at zero (which can be solved using leaky ReLu)

For the final layer, I decided to use a softmax activation function because we are performing a multi class classification task. This way, we get a probability distribution over all possibilities. We can just take the max to get the prediction.

#### Forward propagation

To use the network for inference, the forward pass through the network is used. Input data is passed into the network and at each layer, the weight matrix (w) and the bias is added (b). The equations for the forward prop are shown below:

![](./img/fp.png)

#### Back propagation

To actually train the network, we need to update the weights and biases. This is done using gradient decent on each of the parameters with respect to the final loss. To find these gradients, the chain rule is used.

The gradients are shown below:

![Back Prop gradients](./img/bp.png)

Once we have the gradients, we use the following equation to update the weights and biases
![Gradient decent equation](./img/gradient_decent.png)

### Modifications

Besides getting the base neural network to work, listed below are some of the modifications I ended up doing in the quest for better performance

#### Stochastic Gradient Descent with momentum (SGD)

Regular gradient decent has a tendency of pulling the value rather quickly. This can result in loss curves being jagged. To combat this, SGD was implemented. The idea is that while updating the weights and biases. This changes the update equation to weight the last update along with the new update. This adds another hyper parameter β

![SGD](./img/SGD.png)

#### Adaptive learning rate

During gradient decent, the learning rate has a huge impact on the training performance. If the value is too high, the loss oscillates around the optimal value (near the end). Too slow and it takes too long to converge. To combat this, an adaptive learning rate was used. The learning rate starts out at a value, and every X epochs (hyper parameter), the learning rate is halved. This allows the neural network to learn rapidly initially and slow down near the end.

#### Reduction on GPU

For the softmax layer, the equation for the layer is given by

![Softmax](./img/softmax.png)

The denominator of this involves a sum over all the elements in the array. To do this, I used the upsweep phase of the work efficient scan I implemented as part of this assignment to allow for the sum to be calculated on the GPU (rather than actually copying it over to the CPU). The same function is also used to calculate the cross entropy loss (which also has a summation inside it).

#### Random rotations

Once the neural network was able to learn all the characters, to test the network, I started training the network using the same images but rotating them by a random angle (± 10°). The training data is using the rotated images while the testing was done using only the unrotated images. The results show that the network is kind of resilient to rotation (I did not push the limits).

#### Initializations

To initialize the weights, I decided to go with a modified version glorot initialization ([link](https://jamesmccaffrey.wordpress.com/2017/06/21/neural-network-glorot-initialization/)). Weights are drawn from a normal distribution, with 0 mean and
$$
Var = \frac{2}{inputs}
$$

#### Vectorized entire code

Another optimization done to the code was that all equations for the gradient propagation were done using matrix math. This made it faster to train and infer. This also made it such that no math was done on the CPU for the forward and back propagation.

#### Hyper Parameters

Here are the list of hyper parameters that had the network working at 100% accuracy (Weights and biases for these parameters were given in weights.csv)

| Parameter Name | Value |
| --------------------------- | -------------------------------- |
| Learning Rate | 0.0038 |
| SGD β | 0.65 |
| Adaptive learning rate | 52*100 epochs half learning rate |
| epochs | 40000 (batch size 1) |
| Random rotation limits | ± 10° |
| Number of hidden layers | 5 |
| Dimensions of hidden layers | {98, 65, 50, 30, 25, 40} |



### Results

#### Loss vs Epochs

![Loss vs Epoch](./img/loss_vs_epoch.png)

In this plot, it is clear that the loss with momentum and decay performed the best. The kink in the middle (I believe) was due to suboptimal tuning of the decay rate. With a little more tuning the rate would decay slightly faster to allow for the drop but not the oscillation. The raw data is also uploaded. One important thing to mention was that with the pure learning rate approach, the max correct it got (for this run) was 51 out of 52 characters. This is not always the case (I have seen it getting a full but it requires more time). The other methods achieve a 100% accuracy (shown below) on the dataset given.

![Accuracy](./img/regulat_acc.PNG)

For the speed, the code can run 1 forward pass in **1.13577 ms** and 1 backward pass in and **0.505299 ms** average (with the architecture mentioned above).



### Rotation loss plots (given more iterations)

![Loss vs Epochs for random rotations](./img/loss_vs_epoch_rand.PNG)

For the above plot, a random rotation of ± 10° was given to the training data. The performance of this was 52 out of 52 cases (shown below), which is amazing considering it took the same number of iterations!

![Rot acc](./img/rotation_matrix_acc.PNG)

### Observations & Random Thoughts

#### Neural network occasionally struggle with "L" and "h"

For some reason, with just gradient decent my network had a hard time distinguishing between L and h. This was kind of resolved by reducing the learning rate and giving it more time, but fully resolved when using SGD and adaptive learning rate fixes.

#### Cublas API

Cublas API isn't well documented. It took me a long long time to get matrix multiplication along with transpose working.

#### 74K Character dataset Attempt

With 1 hour left, I decided to run my network on a subset of the 74K Character dataset ([link](http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/)), which is a dataset with hand written characters. For this dataset, the training and testing data were sperate. For testing, each class has 54 examples and 1 testing example. With 400,000 epochs (10x more than was previously used), the neural network was able to identify 3 out of 26 (upper case hand written characters)

Possible reasons for such a poor performance is due to the network not being deep enough, not enough training examples per class, not using batches (batch normalization) to train the network (group multiple inputs and compute the gradients at the same time). Whatever the case, the network didn't perform very well, and the loss plot is shown below.

![74k Fail](./img/loss_vs_epoch_74k.PNG)
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@ set(SOURCE_FILES

cuda_add_library(character_recognition
${SOURCE_FILES}
OPTIONS -arch=sm_20
OPTIONS -arch=sm_30
)
Loading