Skip to content

cbhindex/circos_plot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fusion Circos Plot

This repository generates a fusion circos plot from an Excel table of fusion calls (for example MYH9::USP6). The workflow:

  1. Prepares plotting inputs through a dedicated data-flow module.
  2. Renders a circos plot through a dedicated plotting module.
  3. Saves the rendered figure as PDF with center summary and legend (and when weighted mode is enabled, annotates only the top-fusion partner genes on the outer ring). The center summary shows total samples, unique fusions, and up to three ranked fusions (Top, Second, Third) depending on available unique fusion count.

Data-flow sequence: Read Excel -> Resolve fusion column -> Split fusion names -> Resolve gene loci -> Aggregate frequencies -> Rank fusion keys -> Build color map.

Rendering sequence: Build base tracks -> Draw fusion links -> Annotate top-fusion partner genes (weighted mode only) -> Apply inner circle -> Add center summary -> Finalize layout -> Add legend -> Save PDF

Color policy:

  • Chromosome sectors use a rainbow sequence ordered as Y, 1..22, X.
  • Fusion link/ribbon colors are determined by the source gene chromosome (gene1::gene2 uses gene1 chromosome color).

Repository Structure

circos_plot/
|-- fusion_plot.py                 # CLI entrypoint and orchestration
|-- interactive_fusion_plot.py     # Interactive HTML CLI entrypoint and orchestration
`-- fusion_circos/
    |-- cli.py                     # Argument parsing and primitive CLI type conversion
    |-- config.py                  # Central runtime configuration values
    |-- dataflow.py                # Data preparation pipeline before plotting
    |-- fusion_io.py               # Fusion-column resolution and fusion-string splitting
    |-- gene_locator.py            # Ensembl-based gene-locus lookup with JSON cache
    |-- genome.py                  # Genome metadata and sector construction
    |-- geometry.py                # Coordinate transforms and edge clamping
    |-- interactive/
    |   |-- cli.py                 # Argument parsing for interactive HTML export
    |   `-- plotting.py            # Plotly-based interactive circos rendering helpers
    |-- plotting.py                # Rendering helpers for tracks, links, optional top-fusion outer labels, center summary, and legend
    |-- style.py                   # Backend-agnostic style tokens and visual mappings
    `-- locus_type.py              # Core genomic interval dataclass

Environment Setup

Use Conda to create a reproducible environment.

conda env create -f environment.yml
conda activate fusion_circos

Notes:

  • numpy is pinned to <2 to avoid ABI incompatibility issues.
  • On osx-arm64, some dependencies are more reliable via pip (already listed under pip in environment.yml).

Verify installation:

python -c "import numpy,pandas,openpyxl,matplotlib,pycirclize,plotly,pyensembl; print('ok', numpy.__version__)"

CLI Usage

Run:

python fusion_plot.py \
  --excel_file source_data/nodular_fasciitis.xlsx \
  --excel_sheet Sheet1 \
  --fusion_column_name Fusions \
  --output_pdf nodular_fasciitis_circos.pdf \
  --save_next_to_script true \
  --connection_style ribbon \
  --frequency_weighted_links true

Note: --ensembl_release defaults to 110, so it is omitted above.

Interactive HTML Usage

Run:

python interactive_fusion_plot.py \
  --excel_file source_data/nodular_fasciitis.xlsx \
  --excel_sheet Sheet1 \
  --fusion_column_name Fusions \
  --output_html nodular_fasciitis_interactive.html \
  --save_next_to_script true \
  --connection_style ribbon \
  --frequency_weighted_links true

This command writes one standalone offline HTML file. You can open it directly by double-clicking in Finder/File Explorer (no running server required).

Architecture

  • fusion_plot.py: Thin orchestration entrypoint (CLI, path resolution, cache wiring, pipeline invocation).
  • fusion_circos/dataflow.py: Owns pre-render data preparation and returns DataflowResult.
  • fusion_circos/plotting.py: Owns rendering implementation (tracks, links, optional top-fusion outer labels, center summary, legend, figure save helpers).
  • interactive_fusion_plot.py + fusion_circos/interactive/: Own interactive HTML pipeline (Plotly renderer + standalone HTML export).

Arguments:

  • --excel_file (str): Path to input Excel file.
  • --excel_sheet (int | str): Sheet index or sheet name.
  • --fusion_column_name (str): Column storing fusion strings. Canonicalized fallback matching is always enabled (case/space/underscore tolerant).
  • --output_pdf (str): Output PDF path or filename.
  • --save_next_to_script (bool): Save output next to fusion_plot.py if true.
  • --connection_style (line | ribbon): Fusion link rendering style.
  • --frequency_weighted_links (bool): true scales ribbon width by fusion frequency (line mode remains uniform-width for both true and false). In PDF output, this mode also annotates only the top-fusion partner genes on the outer ring, and legend (n=...) counts are shown only when this option is true. false uses uniform-width links for all observed fusion pairs while keeping reciprocal directions visually separable.
  • --ensembl_release (int): Optional Ensembl release for locus lookup. Default is 110.

Runtime Outputs

  • Plot PDF at the path passed to --output_pdf.
  • Interactive standalone HTML at the path passed to --output_html.
  • Runtime cache files under runtime/:
    • runtime/pyensembl_cache/
    • runtime/gene_locus_cache.json

License

This project is licensed under the Apache License 2.0. See LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages