nn_models

Neural network / AI models

My experiments with various neural network models via coding their implementations from scratch, just using pytorch.

For all architectures below I share my results of pre-training up to similar level of accuracy as in the original research papers (I think that's the key difference to multitude of other repos/articles that usually just pre-train on toy data, so unknown if they really work or not). I listed in the architectures order I was implementing it.

Transformer - everything from zero, following excelent tutorial by Peter Bloem
GPT-2 - adapted the earlier done transformer code for Masked Attention and later followed Karpathy awesome tutorial on writing GPT-2 from scratch
BERT - took what I learned from earlier GPT-2 tutorial and replicated the entire process to write BERT from scratch. Quite a bit harder, as there aren't any tutorials available that would go all the way with pre-training. Most publications/videos/code just play with tiny data or dont really follow through til the end to show the results. I did the entire thing and managed to get better results than BERT pre-trained by HuggingFace (which I considered as my target in this exercise).
Llama2 - modified my GPT-2 code with some small architectural changes applied in Llama series (RMS norm, removal of Dropout, SiLU activation instead of GELU, vocabulary size and biggest change being ROPE/Rotary positional embeddings). I kept the size of network similar as GPT-2 and done training with same data (fineweb_edu). Interestingly, with same amount of training steps, I got 0.36 Hellaswag accuracy compared to 0.306 for GPT-2. The Llama changes I did based on a very nice tutorial by Sebastian Raschka done for his book "Build a Large Language Model From Scratch".
T5 - similar principle as my Llama implementation, except that here I used BERT code as starting point (due to T5 authors using same MLM training routine). Key difference is usage of encoder/decoder architecture (same as original Transformer paper) instead of encoder only like BERT (or decoder only like GPT-2/Llama). The interesting part of this exercise is comparison to BERT, with almost identical training setup I got quite a bit of increase in validation accuracy (from 0.63 to 0.68).

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
bert		bert
gpt2		gpt2
llama2		llama2
t5		t5
transformer		transformer
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nn_models

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

nn_models

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages