Skip to content

Question Regarding Batch Effect Consideration in StringTie's Hybrid Assembly Mode #496

@TingQi2020

Description

@TingQi2020

Dear StringTie Development Team,

First, thank you for developing and maintaining such a powerful and widely-used tool.

I am planning to use StringTie's hybrid mode to assemble a comprehensive transcriptome by integrating short-read (Illumina) and long-read (Oxford Nanopore) RNA-seq data from the same set of biological samples. My question concerns the potential for batch effects between these two technologies. As the short-read and long-read data were generated from separate library preparations and sequenced on different platforms, they inherently contain non-biological technical variations.
Could you please clarify how StringTie's hybrid mode handles such technical discrepancies? Specifically, I am interested in knowing:

  1. Does the hybrid assembly algorithm implicitly account for systematic differences in coverage or representation between the two technologies?
  2. Are there recommended best practices for pre-processing or normalizing the data (e.g., the BAM files) before inputting them into StringTie to mitigate these batch effects?
  3. Alternatively, is it considered better practice to perform assemblies separately and then merge the results, rather than using a direct hybrid approach when such strong technical biases are expected?

Any insights or recommendations you could provide would be greatly appreciated. Thank you for your time and for your contributions to the community.

Cheers,
Ting

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions