Skip to content

Implementation of a simplified version of the FastA algorithm in the context of a project for my Master's course

Notifications You must be signed in to change notification settings

thanstavridis/FastA_algorithm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

FastA_algorithm

This project is a simplified implementation of the FastA algorithm, designed to compare a query DNA sequence against the complete set of coding (gene) sequences of Saccharomyces cerevisiae (baker's yeast). The purpose is to identify significant matches based on word-level similarity using a lightweight scoring system and filtering criteria.

Objective

To simulate a basic version of the FastA sequence alignment algorithm with the following parameters:

  • Scoring Scheme: +1 for each exact word match (5-mer), 0 for mismatches

  • Word size/ k-mer size (k): 5

  • Minimum matched diagonal size (m): 20

  • Maximum gap allowed when joining regions (g): 3

  • Threshold for total score: 50% of the query sequence length

  • Outputs:

    • Identification of the best matching sequence based on the matching highest score

    • Identification of the best matching gene based on the highest score

    • Highest matching score

Data format

  • Query sequence: single-sequence FASTA file

  • Gene sequences (S. cerevisiae): multi- FASTA file

About

Implementation of a simplified version of the FastA algorithm in the context of a project for my Master's course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages