Skip to content

folded-permuted/seq-smith

 
 

Repository files navigation

seq-smith

seq-smith

A Rust-based sequence alignment library for Python.

Installation

You can install seq-smith using pip:

pip install seq-smith

Usage

seq-smith provides several alignment functions and helper functions to make sequence alignment easy. Here's a basic example of how to perform a global alignment:

from seq_smith import global_align, make_score_matrix, encode

# Define your alphabet
alphabet = "ACGT"

# Create a scoring matrix
score_matrix = make_score_matrix(alphabet, match_score=1, mismatch_score=-1)

# Encode sequences
seqa = encode("ACGT", alphabet)
seqb = encode("AGCT", alphabet)

# Define gap penalties
gap_open = -2
gap_extend = -1

# Perform the alignment
alignment = global_align(seqa, seqb, score_matrix, gap_open, gap_extend)

# Print the alignment score
print(f"Alignment score: {alignment.score}")

# Print the alignment fragments
for frag in alignment.fragments:
    print(frag)

Alignment Types

seq-smith supports the following alignment strategies:

  • Global Alignment (global_align): Uses the Needleman-Wunsch algorithm to align every residue in both sequences.
  • Local Alignment (local_align): Uses the Smith-Waterman algorithm to find the best-scoring local region of similarity.
  • Local-Global Alignment (local_global_align): Finds the best local alignment of the first sequence within the second, requiring the second sequence to be aligned globally.
  • Overlap Alignment (overlap_align): Does not penalize gaps at the start or end of either sequence, making it ideal for finding overlaps between sequences (e.g., in sequence assembly).

Multi-threaded Alignment

seq-smith alignment functions release the GIL while computing alignments. In order to reduce overhead associated with releasing and acquiring the GIL for each alignment, seq-smith also provides *_many versions of alignment functions for 1-vs-many scenarios. This is useful when you have a query sequence and want to align it against a large number of target sequences.

Example:

from seq_smith import global_align_many

# ... setup alphabet, score_matrix, gap penalties ...

seqa = encode("ACGT", alphabet)
seqbs = [encode("ACGT", alphabet), encode("AGCT", alphabet), encode("AAAA", alphabet)]

# Align seqa against all sequences in seqbs in parallel
alignments = global_align_many(seqa, seqbs, score_matrix, gap_open, gap_extend, num_threads=4)

for aln in alignments:
    print(aln.score)

full documentation.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 54.0%
  • Python 46.0%