Skip to content

Add Donut Regression Discontinuity functionality#610

Open
drbenvincent wants to merge 12 commits intomainfrom
donut
Open

Add Donut Regression Discontinuity functionality#610
drbenvincent wants to merge 12 commits intomainfrom
donut

Conversation

@drbenvincent
Copy link
Collaborator

@drbenvincent drbenvincent commented Dec 23, 2025

Closes #609

This pull request adds support for "donut" regression discontinuity designs (RDD), a robustness technique that excludes data near the treatment threshold to mitigate issues like manipulation or heaping. The changes introduce a new donut_hole parameter to the RegressionDiscontinuity class, update data filtering logic, enhance plotting and summary outputs, extend input validation, and provide comprehensive tests and documentation for the new feature.

Regression Discontinuity Model Enhancements

  • Added a new donut_hole parameter to RegressionDiscontinuity, allowing exclusion of observations within a specified distance from the treatment threshold. Data filtering now respects both bandwidth and donut hole constraints.
  • Improved input validation to ensure donut_hole is non-negative and less than bandwidth, raising errors for invalid configurations.

User Interface and Visualization

  • Updated summary output to report the donut hole parameter and the number of observations used for model fitting.
  • Enhanced plots to visually distinguish excluded data (light gray), fit data (black), and donut hole boundaries (orange dashed lines) for both Bayesian and OLS RDD models.

Testing and Documentation

  • Added tests covering donut hole behavior, validation, and interaction with bandwidth and running variable naming.
  • Expanded glossary to define "donut regression discontinuity," "heaping," and "manipulation," and referenced relevant literature for these concepts.
  • Added a new notebook example for donut RDD and updated the documentation index.

References

  • Added new bibliography entries for donut RDD, manipulation, heaping, and general RDD literature.

📚 Documentation preview 📚: https://causalpy--610.org.readthedocs.build/en/610/

Introduces a donut_hole parameter to RegressionDiscontinuity, allowing exclusion of observations within a specified distance from the treatment threshold for robustness against manipulation or heaping. Updates plotting, summary, and input validation to support donut RDD, adds comprehensive tests for donut_hole behavior and validation, expands glossary with donut RDD concepts, and provides a new notebook demonstrating donut RDD usage. References on donut RDD, manipulation, and heaping are added to the bibliography.
@drbenvincent drbenvincent added the enhancement New feature or request label Dec 23, 2025
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@codecov
Copy link

codecov bot commented Dec 23, 2025

Codecov Report

❌ Patch coverage is 97.56098% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.42%. Comparing base (156d3f7) to head (d69f1ff).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
causalpy/experiments/regression_discontinuity.py 93.02% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #610      +/-   ##
==========================================
+ Coverage   94.35%   94.42%   +0.06%     
==========================================
  Files          44       44              
  Lines        7517     7626     +109     
  Branches      456      466      +10     
==========================================
+ Hits         7093     7201     +108     
  Misses        262      262              
- Partials      162      163       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Raised the number of generated observations from 500 to 1000 and increased manipulation parameters to better demonstrate the donut RDD approach. Updated output and data table examples to reflect the new data generation settings.
@drbenvincent drbenvincent marked this pull request as ready for review December 24, 2025 04:06
Added tests for warnings when bandwidth or donut_hole filters leave few datapoints, for unrecognized model types, and for donut hole boundary lines in OLS and Bayesian plots. Also updated interrogate badge coverage from 96.3% to 96.4%.
@drbenvincent drbenvincent requested review from NathanielF and juanitorduz and removed request for juanitorduz January 7, 2026 14:13
drbenvincent and others added 4 commits February 9, 2026 19:49
- Conditional two-layer scatter: only show excluded/fit data distinction
  when data is actually excluded; default case shows single "data" layer
- Fix Bayesian plot legend to include all labeled artists (donut
  boundaries, threshold, scatter labels) instead of only posterior mean
- Fix malformed warning when filter_desc is empty on small datasets
- Use ValueError instead of DataException for donut_hole param validation
- Update tests to expect ValueError for donut_hole validation

Co-authored-by: Cursor <cursoragent@cursor.com>
@drbenvincent
Copy link
Collaborator Author

Review fixes (d69f1ff)

Addressed four issues identified during review:

1. Conditional scatter plots (was: misleading labels when donut_hole=0)

Both _bayesian_plot and _ols_plot now check has_exclusion = len(self.fit_data) < len(self.data). The lightgray "excluded data" layer is only drawn when there's actual exclusion. When no data is excluded, a single black scatter labeled "data" is drawn (matching the original behavior). When exclusion is active, the two-layer approach shows "excluded data" in gray and "fit data" in black.

2. Bayesian legend fix (was: donut elements invisible in legend)

Replaced the custom ax.legend(handles=..., labels=...) that only showed "Posterior mean" with ax.legend(fontsize=LEGEND_FONT_SIZE), which auto-collects all labeled artists (scatter labels, treatment threshold, donut boundaries). The label="Posterior mean" is now passed directly to plot_xY so the line is auto-discovered by the legend.

3. Warning message fix (was: malformed message on small datasets)

Added a guard — when filter_desc is empty (raw data has ≤10 rows without any filtering), a clean fallback message "Only N datapoints in the dataset." is used instead of the malformed "Choice of parameters...".

4. ValueError for parameter validation (was: DataException for config errors)

Changed both donut_hole validation checks from DataException to ValueError, which is standard Python for invalid parameter values. Updated corresponding tests to expect ValueError.

@read-the-docs-community
Copy link

Documentation build overview

📚 causalpy | 🛠️ Build #31338425 | 📁 Comparing d69f1ff against latest (278e947)


🔍 Preview build

Show files changed (12 files in total): 📝 11 modified | ➕ 1 added | ➖ 0 deleted
File Status
404.html 📝 modified
genindex.html 📝 modified
knowledgebase/glossary.html 📝 modified
notebooks/index.html 📝 modified
notebooks/its_pymc_comparative.html 📝 modified
notebooks/rd_donut_pymc.html ➕ added
notebooks/sc_pymc.html 📝 modified
notebooks/sc_pymc_brexit.html 📝 modified
notebooks/staggered_did_pymc.html 📝 modified
api/generated/causalpy.experiments.regression_discontinuity.RegressionDiscontinuity.init.html 📝 modified
api/generated/causalpy.experiments.regression_discontinuity.RegressionDiscontinuity.html 📝 modified
_modules/causalpy/experiments/regression_discontinuity.html 📝 modified

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: Donut ("Dohnut") Regression Discontinuity (RDD)

1 participant