A personal project where I'm experimenting with building a basic Transformer-based language model from scratch. The goal is to understand the internals of modern LLMs, including tokenization, training, sampling, and the Transformer architecture itself.
- Custom Character Tokenizer: Simple character-level tokenizer (
CharTokenizer) for converting text into token sequences and back. - Transformer Model: Implements a minimal Transformer with multi-head self-attention, feed-forward layers, and positional encoding.
- Training Pipeline: Training script for feeding text data into the Transformer and optimizing with cross-entropy loss.
- Text Generation: Sampling script with temperature and top-k support for generating sequences from a trained model.