-
Notifications
You must be signed in to change notification settings - Fork 9
Description
-
Create a test interval tree object that can be used to develop downstream processes without waiting for the actual interval tree implementation
-
implement a interval tree constructor which takes the n GTF and n fasta, and also the reference genome that was used to create these transriptomes
- maybe the reference genome should be optional -- don't know what the landscape is like in terms of reference guided vs reference free methods for long read RNAseq
-
Create something like the current IsoformLibrary that takes the interval tree and the fasta files and can extract "clusters" and sequences (not sure if this will be useful or not, but i think it would be)
-
Write a method which classifies coordinate mismatches at the transcript level -- this will take some thinking to come up with classifications and definitions of those classifications. A single tx might have multiple labels, too
- There are a lot of places we can reference for this -- the best i can think of is the gffCompare docs. They define these categories
-
A "identical transcript" (suitable for pairwise-alignment) should be defined something like as follows: a Transcript where every exon overlaps by a user defined amount (eg, 95%)
-
It is these identical transcripts where the sequence comparison should happen. BUT that sequence comparison should exclusively be over places where two exons overlap. There should never be a time that we are aligning across splice sites, for instance
-
figure out how to report all of this information -- there will likely be multiple outputs. This requires thinking about users and what they want