Skip to content

fix alignBioPairwise for new biopython#2224

Closed
jamesmkrieger wants to merge 1 commit intoprody:mainfrom
jamesmkrieger:jamesk/fix-alignchains-new-biopython
Closed

fix alignBioPairwise for new biopython#2224
jamesmkrieger wants to merge 1 commit intoprody:mainfrom
jamesmkrieger:jamesk/fix-alignchains-new-biopython

Conversation

@jamesmkrieger
Copy link
Contributor

This fixes some of the issues around #2205. This doesn't fix all my problems but it does fix alignChains with match_func=sameChid.

With Biopython 1.79, we always used to get full alignments with enough gaps:

In [1]: import prody

In [2]: import Bio

In [3]: Bio.__version__
Out[3]: '1.79'

In [4]: prody_ref_model = prody.parsePDB("ref_model.pdb")
@> 2243 atoms and 1 coordinate set(s) were parsed in 0.02s.

In [5]: prody_model = prody.parsePDB("centered_model.pdb")
@> 2280 atoms and 1 coordinate set(s) were parsed in 0.01s.

In [6]: target = list(prody_model.getHierView())[3]

In [7]: chain = list(prody_ref_model.getHierView())[3]

In [8]: from prody.utilities import MATCH_SCORE, MISMATCH_SCORE, GAP_PENALTY, GAP_EXT_PENALTY, ALIGNMENT_METHOD, alignBioPairwise

In [9]:             alignments = alignBioPairwise(target.getSequence(),
   ...:                                           chain.getSequence()[3:],
   ...:                                           "local",
   ...:                                           MATCH_SCORE, MISMATCH_SCORE,
   ...:                                           GAP_PENALTY,  GAP_EXT_PENALTY,
   ...:                                           max_alignments=1)

In [10]: alignments
Out[10]: 
[('KKRFEVKKWNAVALWAWDIVNCAICRNHIMDLCIECQANQASATSEECTVAWGVCNHAFHFHCISRWLKTRQVCPLDNREWE',
  '---FEVKKWNAVALWAWDIVV-------------------------------------------------------------',
  17.0,
  3,
  19)]

With Biopython 1.80 and onwards, this didn't happen anymore and that's what broke alignChains.

In [1]: from prody.utilities import MATCH_SCORE, MISMATCH_SCORE, GAP_PENALTY, GAP_EXT_PENALTY, ALIGNMENT_METHOD, alignBioPairwise

In [2]: import prody

In [3]: prody_model = prody.parsePDB("centered_model.pdb")
@> 2280 atoms and 1 coordinate set(s) were parsed in 0.02s.

In [4]: prody_ref_model = prody.parsePDB("ref_model.pdb")
@> 2243 atoms and 1 coordinate set(s) were parsed in 0.02s.

In [5]: target = list(prody_model.getHierView())[3]

In [6]: chain = list(prody_ref_model.getHierView())[3]

In [7]:             alignments = alignBioPairwise(target.getSequence(),
   ...:                                           chain.getSequence()[3:],
   ...:                                           "local",
   ...:                                           MATCH_SCORE, MISMATCH_SCORE,
   ...:                                           GAP_PENALTY,  GAP_EXT_PENALTY,
   ...:                                           max_alignments=1)

In [8]: alignments
Out[8]: [('FEVKKWNAVALWAWDIV', 'FEVKKWNAVALWAWDIV', 17.0, '3', '20')]

With the fix, this is restored.

In [1]: import prody

In [2]: prody_model = prody.parsePDB("centered_model.pdb")
@> 2280 atoms and 1 coordinate set(s) were parsed in 0.02s.

In [3]: prody_ref_model = prody.parsePDB("ref_model.pdb")
@> 2243 atoms and 1 coordinate set(s) were parsed in 0.02s.

In [4]: from prody.utilities import MATCH_SCORE, MISMATCH_SCORE, GAP_PENALTY, GAP_EXT_PENALTY, ALIGNMENT_METHOD, alignBioPairwise

In [5]: target = list(prody_model.getHierView())[3]

In [6]: chain = list(prody_ref_model.getHierView())[3]

In [7]:             alignments = alignBioPairwise(target.getSequence(),
   ...:                                           chain.getSequence()[3:],
   ...:                                           "local",
   ...:                                           MATCH_SCORE, MISMATCH_SCORE,
   ...:                                           GAP_PENALTY,  GAP_EXT_PENALTY,
   ...:                                           max_alignments=1)

In [8]: alignments
Out[8]: 
[('KKRFEVKKWNAVALWAWDIVNCAICRNHIMDLCIECQANQASATSEECTVAWGVCNHAFHFHCISRWLKTRQVCPLDNREWE',
  '---FEVKKWNAVALWAWDIV--------------------------------------------------------------',
  17.0,
  3,
  20)]
In [4]: prody_ref_amap = prody.alignChains(prody_ref_model, prody_model, overlap=15, match_func=prody.sameChid)
@> Trying to map atoms based on residue numbers and identities:
@>   Comparing Chain A from ref_model (len=1132) with Chain A from centered_model:
@>      Mapped: 1132 residues match with 100% sequence identity and 99% overlap.
@> Trying to map atoms based on residue numbers and identities:
@>   Comparing Chain B from ref_model (len=381) with Chain B from centered_model:
@>      Mapped: 346 residues match with 100% sequence identity and 99% overlap.
@> Trying to map atoms based on residue numbers and identities:
@>   Comparing Chain E from ref_model (len=709) with Chain E from centered_model:
@>      Mapped: 709 residues match with 100% sequence identity and 100% overlap.
@> Trying to map atoms based on residue numbers and identities:
@>   Comparing Chain F from ref_model (len=21) with Chain F from centered_model:
@> Trying to map atoms based on local sequence alignment:
@>   Comparing Chain F from ref_model (len=21) with Chain F from centered_model:
@>      Mapped: 21 residues match with 95% sequence identity and 26% overlap.
@> Finding the atommaps based on their coverages...
@> Identified that there exists 1 atommap(s) potentially.

In [5]: prody_ref_amap
Out[5]: [<AtomMap: (Chain F from ref_model -> Chain F from centered_model) + (Chain E from ref_model -> Chain E from centered_model) + (Chain B from ref_model -> Chain B from centered_model) + (Chain A from ref_model -> Chain A from centered_model) from ref_model (2280 atoms, 2208 mapped, 72 dummy)>]

@jamesmkrieger
Copy link
Contributor Author

This isn't right. It only handles the case where the target is longer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments