Master Thesis: Memory Better Spent

Parameter-Efficient Machine Learning with Structured Matrices

This repository contains the code for the master thesis "Memory Better Spent", comparing different structured sparsity formats in machine learning from an efficiency per parameter perspective.

The thesis has been published in the ETH Zürich Research Collection

Sparsity is used to reduce the size of machine learning models. While unstructured sparsity techniques are widely used and offer the best tradeoff between memory-requirements and model-accuracy, their unstructured nature makes them suboptimal for modern accelerator hardware. Structured sparsity can offer more compute-efficient alternatives.

We can visually illustrate different structured sparsity formats, such as Low-Rank, Monarch, BlockTensorTrain and BLAST, by using them for image compression:

Low-Rank Light

We develop a new structured sparsity format, Low-Rank Light. It works by parameterizing Low-Rank matrices with fewer parameters than a standart Low-Rank product.

This is an illustration of the Low-Rank Light product. Parameters are saved compared to standart Low-Rank by fixing the left part of the second factor to be the identity matrix.

Results

A series of experiments have been conducted to assess the general viability of the different structured sparsity formats in a machine learning setting.

Projection

We compare how well different matrix types can approximate a given matrix filled with random numbers drawn from a gaussian distribution. Unstructured sparsity serves as a lower bound on the achievable approximation error. We see different matrix types achieve different approximation error per parameter, indicating that some formats are more parameter-efficient than others.

Pre-Training

In our pre-training experiments we train sparse vision transformers from scratch. The dense weight matrices in the linear layers of the transformer architecture have been replaced by structured sparse matrices. We can observe that Low-Rank Light is more suitable for pre-training than standart Low-Rank in this case.

Fine-Tuning

Additionally, we fine-tune sparsified GPT2 language models. Parameter-redundancies as in the projection results become again apparent. Interestingly, standart Low-Rank matrices achieve the best scaling behaviour in this experiment.

Installation

To run the code, first follow the instructions online to install Julia and Python programming languages.
Clone the repository

git clone https://github.com/Zhurgut/MemoryBetterSpent.git
cd MemoryBetterSpent

Set up the python virtual environment.

python -m .venv
source .venv/bin/activate   # or `venv\Scripts\activate` on Windows

Install the relevant Python packages. The requirements.txt file lists the package versions used during development.

Follow the instructions online to install Pytorch. Then run

pip install datasets, numpy, tokenizers, transformers, torcheval

Also install the required Julia packages. In the Julia REPL, type

] add JSON3, CSV, DataFrames, Plots, Latexify

Training commands can be specified in src/train_commands.jl and executed by calling

julia src/train_commands.jl

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
cscs_measurements/measurements		cscs_measurements/measurements
measurements		measurements
src		src
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Master Thesis: Memory Better Spent

Parameter-Efficient Machine Learning with Structured Matrices

Low-Rank Light

Results

Projection

Pre-Training

Fine-Tuning

Installation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Master Thesis: Memory Better Spent

Parameter-Efficient Machine Learning with Structured Matrices

Low-Rank Light

Results

Projection

Pre-Training

Fine-Tuning

Installation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages