-
Notifications
You must be signed in to change notification settings - Fork 80
Open
Labels
Description
I have the coordinates of domains mapped to transcripts so that the coordinates are relative to transcripts. Now I would like to transfer these transcript-coordinates to genome-coordinates. I wonder if there is some magic in gffutils that makes this reasonably easy. Hopefully an example will clarify:
Say my genome gff file contains this transcript with 3 exons:
import gffutils
data = """\
chr1 . gene 11 30 . . . ID=g1
chr1 . mRNA 11 30 . . . ID=t1;Parent=g1
chr1 . exon 11 14 . . . ID=e1;Parent=t1
chr1 . exon 19 22 . . . ID=e2;Parent=t1
chr1 . exon 27 30 . . . ID=e3;Parent=t1"""
db = gffutils.create_db(data.replace(' ', '\t'), ":memory:", from_string=True)
Which would look like this as text ideogram:
GGGGGGGG_g1_GGGGGGGG
EEEE----EEEE----EEEE
1 11 21 31
I have a domain that in transcript coordinates start at position 7 and ends at position 10. So it would look like this:
GGGGGGGG_g1_GGGGGGGG
EEEE----EEEE----EEEE
||----||
1 11 21 31
I would like a function that given the transcript ID and the domain coordinates in transcript-space returns the coordinates in genome-space:
transcriptToGenome(db, txid='t1', tx_start=7, tx_end=10)
# returns something like:
chr1 . dom 21 28 . . . ID=d1;Parent=t1
chr1 . dom 21 22 . . . ID=d1.1;Parent=d1
chr1 . dom 27 28 . . . ID=d1.2;Parent=d1
Before I bang my head on it, I wonder if an easy solution already exists. Thanks!