This repository is entirely written in python. We make use of jupyter notebooks, calling custom functions libraries storing the main objects and functions used in the project.
You can clone the git locally and install a conda environment we provide, to be used for the project once it's linked to your jupyter environment. Here is the command to install all needed packages
conda env create -f environment.yaml
You first need to fun FBA on the experiment dataset to get based line, then run ABC/Active learning to find input fluxes
You have 2 options
- Jupyter notebook: FBA.ipynb and active_learning.ipynb (you can change input parameters within jupyter)
- Python files: fba.py and active learning.py (for quick run, using argparse to input dataset name in terminal, example below)
python fba.py --name iML1515_EXP
python active_learning.py --name iML1515_EXP --run 3
From experimental medium concentrations, finding input fluxes for GEM model so that FBA output as close as possible to experiments measure (in regression:growth rate or in classification: growth/not)
We assume we have a dataset of different media composition for which we have measured growth rate (y-true). The process starts by building an initial training set randomly drawing initial import rates (V-in) for each media composition and running the mechanistic model (FBA) to acquire FBA calculated growth rate (y-fba). Next ensemble of 5 machine learning models (feedforward neural networks) are trained with different architectures (hidden layers) and initialization parameters. The ML models ensemble is then used to predict from new V-in values (generated from prior) not yet in the training set. We then select rates V-in for which the UCB value shows the highest informativeness, run FBA for these V-in values and select V-in values minimizing UCB or the difference between y-fba and y-true. If not satisfied with current fitness, the selected V-in values are added to the training set and the whole process is repeated.
