SQANTI-evidence

Long-Read Evidence-Driven Structural Annotation Pipeline
A Snakemake workflow to produce structural genome annotations leveraging long-read sequencing data.

Overview

This repository implements a Snakemake pipeline (with auxiliary scripts) to generate structural genome annotations guided by long-read sequencing data (e.g. PacBio, Oxford Nanopore).
It aims to produce high-quality annotations by combining transcript evidence from long reads with conventional annotation strategies. The main structure of the pipeline and use of the long-read transcriptomics is derived from this paper.

Features

Modular pipeline built with Snakemake
Integration of long-read data to inform exon/intron boundaries
Flexible configuration for different organisms & datasets
Support for cluster execution (e.g. SLURM)
Scripts to assist in annotation processing and QC

Requirements

Snakemake (version >= X.X)
Python (>= 3.8) + dependencies
Linux / Unix environment
Long-read RNA (or cDNA) sequencing aligned data (BAM or SAM)
Reference genome (FASTA)
(Optional) Annotation hints / protein / transcript evidence

You’ll find an envs/ folder for environment / dependency configurations.

Installation

Clone the repository:

git clone https://github.com/pabloati/LR_annotation.git
cd LR_annotation

Create and activate a conda / mamba environment (if using):
```
conda env create -f envs/env.yaml
conda activate <env_name>
```
(You may have multiple environments defined under envs/, inspect and choose the appropriate one.)
Install any extra Python packages not handled by the environment file:
```
pip install -r requirements.txt
```
(If requirements.txt does not exist, you can generate one from the environment.)

Usage

The main script to run the pipeline is SQANTI_evidence

Configuration

The behaviour of the pipeline is controlled via:

config.yaml — main configuration file (genome paths, sample IDs, parameters)
profile_slurm.yaml — parameters and settings for SLURM (if using cluster)

Edit config.yaml to point to your reference genome, aligned reads, and other evidence files.

Pipeline Workflow

Rough outline of the major steps / rules (in rules/):

Preprocessing of reads / alignments
Transcript feature extraction
Long-read informed exon/intron boundary refinement
Evidence merging with other annotation sources
Final structural annotation (e.g. GFF3 output)
QC and filtering steps

Refer to the individual rule files in rules/ for detailed logic.

Scripts & Rules

scripts/ — utility scripts used by the workflow (e.g. parsing, filtering)
snakefile — main workflow entry
rules/ — subrules modularizing steps
lr_annot.py — core Python module / driver (if used in pipeline)

You can read through them to see custom parameters, function calls, and expected behavior.

Output

Typical outputs include:

GFF3 / GTF annotated structural models
Transcript / exon / intron files
QC reports
Intermediate alignment / feature files

Output paths and filenames are configurable via config.yaml.

Examples

(You may want to include a small example or test dataset to demonstrate pipeline execution. If you have one, mention it here. E.g.):

example/ — folder with toy genome + reads, config, and expected outputs
Usage:
```
cd example
snakemake -j 4
```
Compare output GFF3 with expected reference.

If you don’t have an example, you could add one in future to help users.

License & Citations

State your license (e.g. MIT, GPL, etc.) here. Also include citations to relevant tools or papers used in this pipeline.

MIT License
(c) 2025 Pablo A. Oti (or your name)

Please cite this repository as:

Oti, P. (2025). LR_annotation: Long-Read Guided Structural Annotation Pipeline. GitHub. https://github.com/pabloati/LR_annotation

Contact / Support

For questions or issues, open an Issue on GitHub. You can also reach me at: your_email@example.com.

Future Improvements

You might want to add:

A Docker or Singularity container for reproducibility
Automated tests / CI
Support for additional evidence types
Visualization modules
More extensive examples & documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SQANTI-evidence

Table of Contents

Overview

Features

Requirements

Installation

Usage

Configuration

Pipeline Workflow

Scripts & Rules

Output

Examples

License & Citations

Contact / Support

Future Improvements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 146 Commits
envs		envs
rules		rules
scripts		scripts
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
env.yaml		env.yaml
lr_annot.py		lr_annot.py
profile_slurm.yaml		profile_slurm.yaml
snakefile		snakefile

ConesaLab/SQANTI_evidence

Folders and files

Latest commit

History

Repository files navigation

SQANTI-evidence

Table of Contents

Overview

Features

Requirements

Installation

Usage

Configuration

Pipeline Workflow

Scripts & Rules

Output

Examples

License & Citations

Contact / Support

Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages