Skip to content

gregorybchris/ngrams

Repository files navigation

N-grams

Introduction to generative language modeling using an n-gram model.

This project is an assignment for the Park Tudor data science class. See assignment.md for detailed instructions.

Requirements

This repo requires Python 3.12 or later. There are no additional dependencies.

Files

Name Description
assignment.md The instructions for the assignment
tiny_shakespeare.txt The dataset we use to train our language model
-- --
dataset.py Utilities for loading and splitting the dataset
model.py The n-gram model implementation
-- --
train.py A CLI script to train the model
generate.py A CLI script to generate text with the model
grade.py A CLI script to grade the assignment
-- --
grading_utils.py Utilities for grading, can be ignored

Dataset

The Tiny Shakespeare dataset has been downloaded from the GitHub of Andrej Karpathy.