This repository contains the code for the master thesis "Memory Better Spent", comparing different structured sparsity formats in machine learning from an efficiency per parameter perspective.
The thesis has been published in the ETH Zürich Research Collection
Sparsity is used to reduce the size of machine learning models. While unstructured sparsity techniques are widely used and offer the best tradeoff between memory-requirements and model-accuracy, their unstructured nature makes them suboptimal for modern accelerator hardware. Structured sparsity can offer more compute-efficient alternatives.
We can visually illustrate different structured sparsity formats, such as Low-Rank, Monarch, BlockTensorTrain and BLAST, by using them for image compression:
We develop a new structured sparsity format, Low-Rank Light. It works by parameterizing Low-Rank matrices with fewer parameters than a standart Low-Rank product.
This is an illustration of the Low-Rank Light product. Parameters are saved compared to standart Low-Rank by fixing the left part of the second factor to be the identity matrix.
A series of experiments have been conducted to assess the general viability of the different structured sparsity formats in a machine learning setting.
We compare how well different matrix types can approximate a given matrix filled with random numbers drawn from a gaussian distribution. Unstructured sparsity serves as a lower bound on the achievable approximation error. We see different matrix types achieve different approximation error per parameter, indicating that some formats are more parameter-efficient than others.
In our pre-training experiments we train sparse vision transformers from scratch. The dense weight matrices in the linear layers of the transformer architecture have been replaced by structured sparse matrices. We can observe that Low-Rank Light is more suitable for pre-training than standart Low-Rank in this case.
Additionally, we fine-tune sparsified GPT2 language models. Parameter-redundancies as in the projection results become again apparent. Interestingly, standart Low-Rank matrices achieve the best scaling behaviour in this experiment.
-
To run the code, first follow the instructions online to install Julia and Python programming languages.
-
Clone the repository
git clone https://github.com/Zhurgut/MemoryBetterSpent.git
cd MemoryBetterSpent- Set up the python virtual environment.
python -m .venv
source .venv/bin/activate # or `venv\Scripts\activate` on Windows- Install the relevant Python packages. The
requirements.txtfile lists the package versions used during development.
Follow the instructions online to install Pytorch. Then run
pip install datasets, numpy, tokenizers, transformers, torcheval- Also install the required Julia packages. In the Julia REPL, type
] add JSON3, CSV, DataFrames, Plots, Latexify- Training commands can be specified in
src/train_commands.jland executed by calling
julia src/train_commands.jlMemory Better Spent © 2025 by Damian Camenisch is licensed under CC BY-NC 4.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc/4.0/








