Skip to content

ssslakter/FastFF

Repository files navigation

FastFF

This repository contains experiments comparing Mixture-of-Experts (MoE) and Fast Feed-Forward (FFF) models introduced in FFF and UltraFastBert papers (author’s repository).

The experiments folder contains (almost) self-contained Jupyter notebooks with benchmarks and experiments with the architecture.

The FastFF folder contains several implementations of the FFF model, including the reference one, with additional tools to get data from models and train them.

Installation

Use pip or other package manager to install the package from this repository

pip install git+https://github.com/ssslakter/FastFF

Results

The main results are:

  • SMEAR gives slight improvements in the FFF model as well as MoE, although the hierarchical structure makes it harder to train. Jupyter notebook

  • Data distribution between experts shifts to the single peak when increasing the number of neurons in the experts. Jupyter notebook

Distribution of data between 16 experts for classification task with 10 classes
  • FFF can be formulated as a MoE with a sparse binary matrix of transitions and additional activation function (Softplus in the reference formulation). Additional experiments show that linear activation function performs better. Jupyter notebook

  • With matrix formulation, the utilization of parallelism is higher than in the reference implementation, therefore for shallow layers there is a speedup. For deep layers the sequential branch selection becomes faster, when dense matrices require lots of space.Jupyter notebook

About

Some FFF experiments

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published