Question Regarding Batch Effect Consideration in StringTie's Hybrid Assembly Mode

Dear StringTie Development Team,

First, thank you for developing and maintaining such a powerful and widely-used tool.

I am planning to use StringTie's hybrid mode to assemble a comprehensive transcriptome by integrating short-read (Illumina) and long-read (Oxford Nanopore) RNA-seq data from the same set of biological samples. My question concerns the potential for batch effects between these two technologies. As the short-read and long-read data were generated from separate library preparations and sequenced on different platforms, they inherently contain non-biological technical variations.
Could you please clarify how StringTie's hybrid mode handles such technical discrepancies? Specifically, I am interested in knowing:
1. Does the hybrid assembly algorithm implicitly account for systematic differences in coverage or representation between the two technologies?
2.  Are there recommended best practices for pre-processing or normalizing the data (e.g., the BAM files) before inputting them into StringTie to mitigate these batch effects?
3. Alternatively, is it considered better practice to perform assemblies separately and then merge the results, rather than using a direct hybrid approach when such strong technical biases are expected? 

Any insights or recommendations you could provide would be greatly appreciated. Thank you for your time and for your contributions to the community.


Cheers,
Ting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question Regarding Batch Effect Consideration in StringTie's Hybrid Assembly Mode #496

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question Regarding Batch Effect Consideration in StringTie's Hybrid Assembly Mode #496

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions