Model interpretability and understanding for PyTorch
-
Updated
Jan 3, 2026 - Python
Model interpretability and understanding for PyTorch
Shapley Interactions and Shapley Values for Machine Learning
Zennit is a high-level framework in Python using PyTorch for explaining/exploring neural networks using attribution methods like LRP.
An Open-Source Library for the interpretability of time series classifiers
Collection of NLP model explanations and accompanying analysis tools
Explainable AI in Julia.
A set of notebooks as a guide to the process of fine-grained image classification of birds species, using PyTorch based deep neural networks.
Counterfactual SHAP: a framework for counterfactual feature importance
Materials for "Quantifying the Plausibility of Context Reliance in Neural Machine Translation" at ICLR'24 🐑 🐑
This article explores the theory behind explainable car pricing using value decomposition, showing how machine learning models can break a predicted price into intuitive components such as brand premium, age depreciation, mileage influence, condition effects, and transmission or fuel-type adjustments.
Materials for the Lab "Explaining Neural Language Models from Internal Representations to Model Predictions" at AILC LCL 2023 🔍
The official repo for the EACL 2023 paper "Quantifying Context Mixing in Transformers"
Similarity-first interpretability studio for breast tumor samples: pick a case, find its closest “twins” (benign/malignant look-alikes), visualize neighborhood structure, compare feature fingerprints, and run minimal-change counterfactual edits toward a target class. Educational demo only, not for diagnosis.
Code and data for the ACL 2023 NLReasoning Workshop paper "Saliency Map Verbalization: Comparing Feature Importance Representations from Model-free and Instruction-based Methods" (Feldhus et al., 2023)
Efficient and accurate explanation estimation with distribution compression (ICLR 2025 Spotlight)
⛈️ Code for the paper "End-to-End Prediction of Lightning Events from Geostationary Satellite Images"
Implementation of the Integrated Directional Gradients method for Deep Neural Network model explanations.
Sum-of-Parts: Self-Attributing Neural Networks with End-to-End Learning of Feature Groups
Reproducible code for our paper "Explainable Learning with Gaussian Processes"
Robustness of Global Feature Effect Explanations (ECML PKDD 2024)
Add a description, image, and links to the feature-attribution topic page so that developers can more easily learn about it.
To associate your repository with the feature-attribution topic, visit your repo's landing page and select "manage topics."