Skip to content

Avoid mutating sklearn estimators in DiD experiments#693

Open
drbenvincent wants to merge 4 commits intomainfrom
codex/issue-664-fit-intercept
Open

Avoid mutating sklearn estimators in DiD experiments#693
drbenvincent wants to merge 4 commits intomainfrom
codex/issue-664-fit-intercept

Conversation

@drbenvincent
Copy link
Collaborator

@drbenvincent drbenvincent commented Feb 2, 2026

Summary

  • Avoid mutating user-supplied sklearn estimators by cloning when fit_intercept=False is required
  • Emit a UserWarning when the clone occurs so users are aware and can fix their code
  • Centralize fit_intercept handling in BaseExperiment._ensure_sklearn_fit_intercept_false()
  • Add integration coverage to assert original estimators are unchanged and the warning fires
  • Fix codecov/patch failure by ensuring full branch coverage of the new method

Details

DiD experiments require fit_intercept=False because the design matrix already contains an explicit intercept column (~ 1 + ...). Previously, the experiment constructor mutated the user's model in-place (model.fit_intercept = False), which was a hidden side effect. Now the method clones the estimator via sklearn.base.clone(), sets the parameter on the copy, and issues a warning explaining why.

Coverage fix

The _ensure_sklearn_fit_intercept_false() method in base.py has a defensive isinstance guard that returns early for non-sklearn models (line 70). Originally, the method was called from inside elif isinstance(self.model, RegressorMixin): blocks in both diff_in_diff.py and staggered_did.py, which made the guard unreachable — the caller had already confirmed the model was sklearn, so the isinstance check could never be True. This caused the codecov/patch check to fail.

The fix moves the _ensure_sklearn_fit_intercept_false() call to before the model-type dispatch (the if PyMCModel / elif RegressorMixin block) in both experiment classes. Since the method already has a built-in guard for non-sklearn models, calling it unconditionally is safe and semantically cleaner. Now any PyMC integration test naturally exercises the early-return path, providing full branch coverage without artificial unit tests.

A new integration test test_did_sklearn_fit_intercept_false was also added to cover the "no warning needed" path (when the model already has fit_intercept=False).

Testing

MPLCONFIGDIR=/tmp/mplconfig XDG_CACHE_HOME=/tmp conda run -n CausalPy python -m pytest -o addopts= causalpy/tests/test_integration_skl_examples.py::test_did causalpy/tests/test_integration_skl_examples.py::test_did_sklearn_fit_intercept_false causalpy/tests/test_staggered_did.py::test_staggered_did_sklearn causalpy/tests/test_staggered_did.py::test_staggered_did_sklearn_model_without_fit_intercept

Issues

@drbenvincent drbenvincent added the bug Something isn't working label Feb 2, 2026
@read-the-docs-community
Copy link

read-the-docs-community bot commented Feb 2, 2026

Documentation build overview

📚 causalpy | 🛠️ Build #31334013 | 📁 Comparing 83e476a against latest (156d3f7)


🔍 Preview build

Show files changed (4 files in total): 📝 4 modified | ➕ 0 added | ➖ 0 deleted
File Status
404.html 📝 modified
_modules/causalpy/experiments/base.html 📝 modified
_modules/causalpy/experiments/diff_in_diff.html 📝 modified
_modules/causalpy/experiments/staggered_did.html 📝 modified

@drbenvincent drbenvincent marked this pull request as draft February 2, 2026 21:29
@drbenvincent drbenvincent marked this pull request as ready for review February 9, 2026 13:06
@codecov
Copy link

codecov bot commented Feb 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.39%. Comparing base (156d3f7) to head (83e476a).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #693      +/-   ##
==========================================
+ Coverage   94.35%   94.39%   +0.03%     
==========================================
  Files          44       44              
  Lines        7517     7544      +27     
  Branches      456      456              
==========================================
+ Hits         7093     7121      +28     
  Misses        262      262              
+ Partials      162      161       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

drbenvincent and others added 3 commits February 9, 2026 13:54
…tion tests

Move the _ensure_sklearn_fit_intercept_false() call from inside the
`elif isinstance(self.model, RegressorMixin)` block to before the
model-type dispatch in both DiD and StaggeredDiD. This makes the
isinstance early-return guard (line 70 in base.py) naturally reachable
by existing PyMC integration tests, fixing the codecov/patch failure.

Also add test_did_sklearn_fit_intercept_false integration test covering
the "no warning needed" path when fit_intercept is already False.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DifferenceInDifferences mutates fit_intercept on user models

1 participant