Introdcution

This is an implementation of DNA gene family classification. To run the code:

python train_and_eval.py

Requirements

Please install the required packages in requirements.txt:

pip install -r requirements.txt

Data Preprocessing

The data preprocessing is implemented in data_process.py. Please see DNADataset class for details.

The DNA sequences are:

Sliced into subsequences of maximum length of 512 nucleotides.
Optionally, before slicing and padding, the sequences can be augmented by their reverse complement.
The sequences are tokenized into 5-mers, and then converted to one-hot encoding.

Model

A Convolutional Neural Network (CNN) is implemented in model.py.

Data splits and evaluation

The input data is split into 5 folds of 20% test sequences and 80% training. These splits are iterated over for 5 times, allowing each fold to be used as a test set once. For any iteration, the training set is further split into 80% training and 20% validation.

The final test score is computed over the predictions of all the sequences when they were used in a test set. We report the accuracy and f1 score of the model on the whole dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
tests		tests
README.md		README.md
__init__.py		__init__.py
data_process.py		data_process.py
dna_seq_families.csv		dna_seq_families.csv
eda.ipynb		eda.ipynb
model.py		model.py
requirements.txt		requirements.txt
train_and_eval.py		train_and_eval.py
train_helpers.py		train_helpers.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introdcution

Requirements

Data Preprocessing

Model

Data splits and evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Introdcution

Requirements

Data Preprocessing

Model

Data splits and evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages