-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Hi again @SchistoDan ,
Having some trouble using gene-fetch to download 28S records.
For instance, we have https://www.ncbi.nlm.nih.gov/nuccore/KC869866.1
When I do:
gene-fetch -e [myemail] -k [mykey] --header detailed -ms 2000 -ns 2000 -t nucleotide -c -s 6170 -g 28S -o 28S_prorhynchida
It really should be picking this (and many other sequences up, but instead I get:
====== Starting Gene Fetch ======
Version: 1.0.20
Written by Dan Parsons & Ben Price, Natural History Museum London
Validating NCBI credentials: email='myemail', api_key='[mykey]'
Credential validation passed
2025-12-10 16:55:27,084 [INFO] Logging initialised. Log file: 28S_prorhynchida/gene_fetch.log
2025-12-10 16:55:27,084 [INFO] Single-taxid mode activated: using protein size threshold 500 and nucleotide size threshold 2000
2025-12-10 16:55:27,084 [INFO] Using generic search terms for lsu
2025-12-10 16:55:27,084 [INFO] Output directory: 28S_prorhynchida
2025-12-10 16:55:27,084 [INFO] Sequence type: nucleotide
2025-12-10 16:55:27,084 [INFO] Single-taxid mode activated for taxid: 6170
2025-12-10 16:55:27,084 [INFO] Maximum number of sequences to fetch: 2000
2025-12-10 16:55:27,084 [INFO] Retrieving taxonomy details from NCBI for taxID: 6170
2025-12-10 16:55:29,155 [INFO] Successfully retrieved taxonomy information from NCBI for taxID: 6170
2025-12-10 16:55:29,155 [INFO] Attempting search at family level: Prorhynchidae (taxid: 6170)
2025-12-10 16:55:29,155 [INFO] Searching nucleotide database at rank family (Prorhynchidae) with term: (lsu[Title] OR lsu[Gene] OR "lsu"[Protein Name]) AND txid6170[Organism:exp] AND 2000:60000[SLEN]
2025-12-10 16:55:30,199 [WARNING] No sequences found
2025-12-10 16:55:30,200 [WARNING] No sequences found for taxid 6170
2025-12-10 16:55:30,200 [INFO] Single taxid processing completed
I think it's a similar problem to what you were having with the 18S leading to the new version. The terms lsu[Title] OR lsu[Gene] OR "lsu"[Protein Name] just aren't inclusive enough - we should have "28S" at least in there as well, and maybe "large subunit".
There will also presumably be analogous problems when people are searching 12S and 16S mtDNA rRNAs, right?
Regards,
Chris L