vep attributions #49

MuhammedHasan · 2025-12-10T19:02:52Z

No description provided.

Copilot

Pull request overview

This PR adds variant effect attribution functionality to DECIMA, enabling computation of feature attributions for genetic variants to interpret their effects on gene expression through transcription factor motifs. The changes refactor the VEP module into separate files for better organization and add comprehensive attribution analysis capabilities.

Key changes:

Refactored VEP code from a single __init__.py into separate vep.py and attributions.py modules for better code organization
Added variant_effect_attribution function to compute attributions for variants with support for different attribution methods and transformations
Introduced VariantAttributionResult class extending AttributionResult to handle variant-specific attribution data and analysis
Added CLI command vep-attribution for variant attribution analysis
Enhanced Attribution class with subtraction operator (__sub__) for differential attribution analysis

Reviewed changes

Copilot reviewed 12 out of 14 changed files in this pull request and generated 15 comments.

Show a summary per file

File	Description
src/decima/vep/init.py	Refactored to import from new submodules instead of containing all VEP code
src/decima/vep/vep.py	New file containing variant effect prediction functions moved from `__init__.py` with extracted helper functions
src/decima/vep/attributions.py	New file with variant effect attribution functionality including the main `variant_effect_attribution` function
src/decima/utils/io.py	Added `VariantAttributionWriter` class for writing variant attribution results to HDF5 files
src/decima/core/attribution.py	Added `__sub__` method to `Attribution` class; added `VariantAttributionResult` class; refactored `AttributionResult` methods to support variant-gene pairs
src/decima/cli/vep_attribution.py	New CLI command for variant attribution analysis with comprehensive options
src/decima/cli/init.py	Registered new `vep-attribution` CLI command
tests/test_vep_attributions.py	New test file for variant attribution functionality
tests/test_vep.py	Updated import path for VEP functions
tests/test_interpret_attribution.py	Added tests for `Attribution.__sub__` method
tests/test_attribution.py	Changed assertions to use `all()` for numpy array comparisons; removed duplicate assertion
tests/test_cli.py	Added test for new `vep-attribution` CLI command
.gitignore	Added `logs/` directory to gitignore

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/decima/vep/vep.py

src/decima/core/attribution.py

Copilot · 2025-12-18T21:08:24Z

src/decima/vep/attributions.py

+                output_prefix=(str(output_prefix) + "_{model}").format(model=idx),
+                tasks=tasks,
+                off_tasks=off_tasks,
+                model=model,


In the recursive call at line 125, the parameter model is being passed, but it should be the individual model from the loop variable. The loop iterates over models with for idx, model in enumerate(models), so this passes the original model parameter instead of the current iteration's model, which will cause infinite recursion or incorrect behavior.

src/decima/cli/vep_attribution.py

src/decima/core/attribution.py

Copilot · 2025-12-18T21:08:26Z

src/decima/vep/attributions.py

+        df_variant (Union[pd.DataFrame, str]): Input variants. Can be a pandas DataFrame or a path
+            to a file (.tsv, .csv, or .vcf). If a file path is provided, it will be loaded.
+            Required columns/fields depend on the input format but generally include chromosome,
+            position, reference allele, alternate allele.
+        output_prefix (str): Path to the output HDF5 file where attributions will be saved.


The docstring parameter name is df_variant but the actual parameter is named variants. This inconsistency could confuse users reading the documentation.

Copilot · 2025-12-18T21:08:26Z

src/decima/cli/vep_attribution.py

+    "--metadata",
+    default=None,
+    callback=parse_metadata,
+    help=f"Path to the metadata anndata file or name of the model. If not provided, the compabilite metadata for the model will be used. Default: {DEFAULT_ENSEMBLE}.",


The word 'compabilite' should be 'compatible'.

Suggested change

help=f"Path to the metadata anndata file or name of the model. If not provided, the compabilite metadata for the model will be used. Default: {DEFAULT_ENSEMBLE}.",

help=f"Path to the metadata anndata file or name of the model. If not provided, the compatible metadata for the model will be used. Default: {DEFAULT_ENSEMBLE}.",

Copilot · 2025-12-18T21:08:26Z

src/decima/core/attribution.py

                    self.attribution_h5,
                    self._idx[gene],
-                    gene,
+                    "_".join(gene),


The code at line 996 uses "_".join(gene) which works when gene is a tuple of (variant, gene_name) from VariantAttributionResult, but would fail when gene is a string in the base AttributionResult case. The genes parameter type hint should be Union[List[str], List[Tuple[str, str]]] to accurately reflect this polymorphic behavior, or the API should be refactored to make it clearer and type-safe.

Suggested change

"_".join(gene),

gene if isinstance(gene, str) else "_".join(gene),

Copilot · 2025-12-18T21:08:27Z

src/decima/core/attribution.py

+            )
+            return attribution_alt - attribution
+
+    def _get_metadata(self, idx: List[str], metadata_anndata: Optional[str] = None, custom_genome: bool = False):


The type hint for the idx parameter is incorrect. It's declared as List[str] but should be Union[int, List[int]] since it's used to index into self.genes and can be either a single index (as called from load_attribution at line 895) or a list of indices (as called from recursive_seqlet_calling at line 989).

Suggested change

def _get_metadata(self, idx: List[str], metadata_anndata: Optional[str] = None, custom_genome: bool = False):

def _get_metadata(self, idx: Union[int, List[int]], metadata_anndata: Optional[str] = None, custom_genome: bool = False):

Copilot · 2025-12-18T21:08:27Z

src/decima/utils/io.py

+    def add(
+        self,
+        variant: str,
+        gene: str,
+        rel_pos: int,
+        seqs_ref: np.ndarray,
+        attrs_ref: np.ndarray,
+        seqs_alt: np.ndarray,
+        attrs_alt: np.ndarray,
+        gene_mask_start: int,
+        gene_mask_end: int,
+    ):
+        """Add attribution data for a variant gene pair.
+
+        Args:
+            variant: Variant name from the variants list.
+            gene: Gene name from the genes list.
+            attrs: Attribution scores, shape (4, DECIMA_CONTEXT_SIZE).
+            seqs: One-hot DNA sequence, shape (4, DECIMA_CONTEXT_SIZE).
+        """
+        super().add((variant, gene), seqs_ref, attrs_ref, gene_mask_start, gene_mask_end)
+        idx = self.idx[(variant, gene)]
+        self.h5_writer["variants"][idx] = np.array(variant, dtype="S100")
+        self.h5_writer["rel_pos"][idx] = int(rel_pos)
+        self.h5_writer["sequence_alt"][idx, :] = convert_input_type(
+            torch.from_numpy(seqs_alt),  # convert_input_type only support Tensor


This method requires 10 positional arguments, whereas overridden AttributionWriter.add requires at most 6.

Suggested change

def add(

self,

variant: str,

gene: str,

rel_pos: int,

seqs_ref: np.ndarray,

attrs_ref: np.ndarray,

seqs_alt: np.ndarray,

attrs_alt: np.ndarray,

gene_mask_start: int,

gene_mask_end: int,

):

"""Add attribution data for a variant gene pair.

Args:

variant: Variant name from the variants list.

gene: Gene name from the genes list.

attrs: Attribution scores, shape (4, DECIMA_CONTEXT_SIZE).

seqs: One-hot DNA sequence, shape (4, DECIMA_CONTEXT_SIZE).

"""

super().add((variant, gene), seqs_ref, attrs_ref, gene_mask_start, gene_mask_end)

idx = self.idx[(variant, gene)]

self.h5_writer["variants"][idx] = np.array(variant, dtype="S100")

self.h5_writer["rel_pos"][idx] = int(rel_pos)

self.h5_writer["sequence_alt"][idx, :] = convert_input_type(

torch.from_numpy(seqs_alt), # convert_input_type only support Tensor

def add(self, *args, **kwargs):

"""Add attribution data for a variant-gene pair.

This method supports the original extended calling convention:

add(

variant, gene, rel_pos,

seqs_ref, attrs_ref,

seqs_alt, attrs_alt,

gene_mask_start, gene_mask_end,

)

and can also be called in a base-class style where the first argument

is a (variant, gene) key:

add(

(variant, gene),

seqs_ref, attrs_ref,

gene_mask_start, gene_mask_end,

rel_pos=..., seqs_alt=..., attrs_alt=...,

)

"""

# Case 1: extended subclass-specific signature with 9 positional args.

if len(args) == 9:

(

variant,

gene,

rel_pos,

seqs_ref,

attrs_ref,

seqs_alt,

attrs_alt,

gene_mask_start,

gene_mask_end,

) = args

# Case 2: base-class-like positional prefix, with extras supplied via

# remaining positional args or keyword arguments.

elif len(args) >= 5:

key, seqs_ref, attrs_ref, gene_mask_start, gene_mask_end = args[:5]

try:

variant, gene = key

except (TypeError, ValueError): # not iterable of length 2

raise TypeError(

"First argument must be (variant, gene) when calling "

"VariantAttributionWriter.add with base-class-style arguments."

)

remaining = args[5:]

if len(remaining) == 3:

rel_pos, seqs_alt, attrs_alt = remaining

else:

try:

rel_pos = kwargs.pop("rel_pos")

seqs_alt = kwargs.pop("seqs_alt")

attrs_alt = kwargs.pop("attrs_alt")

except KeyError as e:

raise TypeError(

f"Missing required keyword argument {e.args[0]!r} for "

"VariantAttributionWriter.add."

)

else:

raise TypeError(

"VariantAttributionWriter.add expected either 9 positional "

"arguments or at least 5 positional arguments matching the "

"base-class signature."

)

super().add((variant, gene), seqs_ref, attrs_ref, gene_mask_start, gene_mask_end)

idx = self.idx[(variant, gene)]

self.h5_writer["variants"][idx] = np.array(variant, dtype="S100")

self.h5_writer["rel_pos"][idx] = int(rel_pos)

self.h5_writer["sequence_alt"][idx, :] = convert_input_type(

torch.from_numpy(seqs_alt), # convert_input_type only supports Tensor

Muhammed Hasan Celik added 2 commits December 10, 2025 19:02

vep attributions

0e7f1df

tutorial added and more testcases

37db685

MuhammedHasan requested review from avantikalal and Copilot December 18, 2025 21:01

Copilot started reviewing on behalf of MuhammedHasan December 18, 2025 21:01 View session

Copilot AI reviewed Dec 18, 2025

View reviewed changes

Muhammed Hasan Celik added 2 commits December 25, 2025 05:00

docs added & test fix

020e083

doc string fix

bfe97f2

MuhammedHasan merged commit fff776d into main Dec 25, 2025
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vep attributions #49

vep attributions #49

Uh oh!

MuhammedHasan commented Dec 10, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 18, 2025

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 18, 2025

Uh oh!

Copilot AI Dec 18, 2025

Uh oh!

Copilot AI Dec 18, 2025

Uh oh!

Copilot AI Dec 18, 2025

Uh oh!

Copilot AI Dec 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	help=f"Path to the metadata anndata file or name of the model. If not provided, the compabilite metadata for the model will be used. Default: {DEFAULT_ENSEMBLE}.",
	help=f"Path to the metadata anndata file or name of the model. If not provided, the compatible metadata for the model will be used. Default: {DEFAULT_ENSEMBLE}.",

	"_".join(gene),
	gene if isinstance(gene, str) else "_".join(gene),

	def _get_metadata(self, idx: List[str], metadata_anndata: Optional[str] = None, custom_genome: bool = False):
	def _get_metadata(self, idx: Union[int, List[int]], metadata_anndata: Optional[str] = None, custom_genome: bool = False):

vep attributions #49

vep attributions #49

Uh oh!

Conversation

MuhammedHasan commented Dec 10, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants