Skip to content

Add PiecewiseITS experiment for known interruption dates#614

Open
drbenvincent wants to merge 17 commits intomainfrom
piecewise-its
Open

Add PiecewiseITS experiment for known interruption dates#614
drbenvincent wants to merge 17 commits intomainfrom
piecewise-its

Conversation

@drbenvincent
Copy link
Collaborator

@drbenvincent drbenvincent commented Dec 24, 2025

Closes #613

This pull request introduces support for Piecewise Interrupted Time Series (ITS) analysis in the codebase. The main changes include adding a new experiment class, stateful patsy transforms for specifying level and slope changes at multiple intervention points, a simulation utility for generating piecewise ITS data, and updates to the package API and documentation to expose these new features.

Piecewise ITS support and API exposure:

  • Added the PiecewiseITS experiment class to the codebase and included it in the main package API (__init__.py) and experiments API (causalpy/experiments/__init__.py). This enables users to import and use PiecewiseITS directly.

Patsy transforms for segmented regression:

  • Introduced a new module causalpy/transforms.py providing stateful patsy transforms: step for level changes and ramp for slope changes at arbitrary intervention points. These can be used in regression formulas for flexible piecewise ITS modeling, supporting both numeric and datetime time variables.
  • Exposed step and ramp transforms in the main package API (__init__.py) to allow easy access.

Data simulation utilities:

  • Added the generate_piecewise_its_data function to causalpy/data/simulate_data.py for simulating time series data with multiple interventions, customizable level and slope changes, and ground truth counterfactuals for testing and demonstration.

Documentation and notebook updates:

  • Added a reference to a new notebook piecewise_its_pymc.ipynb in the documentation index to demonstrate piecewise ITS analysis.

Pre-commit configuration:

  • Updated the pre-commit configuration to exclude the new notebook piecewise_its_pymc.ipynb from large file checks, ensuring smoother development workflow.

📚 Documentation preview 📚: https://causalpy--614.org.readthedocs.build/en/614/

@drbenvincent drbenvincent added enhancement New feature or request major labels Dec 24, 2025
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@drbenvincent
Copy link
Collaborator Author

bugbot run

@cursor
Copy link

cursor bot commented Dec 24, 2025

PR Summary

Adds a new segmented-regression ITS workflow with explicit level/slope changes at known interruptions.

  • New PiecewiseITS experiment: builds step/ramp design matrix, supports PyMC and OLS, computes counterfactual/effects, plotting, summaries, and plot-data extraction
  • New simulator generate_piecewise_its_data for synthetic piecewise ITS datasets
  • Exposes PiecewiseITS via causalpy/__init__.py and experiments/__init__.py
  • Extensive unit/integration tests covering validation, OLS/PyMC paths, plotting, controls, and datetime time columns
  • Docs: add piecewise_its_pymc.ipynb to notebooks index; pre-commit excludes that notebook; update interrogate badge

Written by Cursor Bugbot for commit c14f15c. This will update automatically on new commits. Configure here.

@codecov
Copy link

codecov bot commented Dec 24, 2025

Codecov Report

❌ Patch coverage is 93.63958% with 54 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.74%. Comparing base (560b4e9) to head (0f86191).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
causalpy/experiments/piecewise_its.py 89.25% 11 Missing and 15 partials ⚠️
causalpy/tests/test_piecewise_its.py 96.23% 18 Missing and 1 partial ⚠️
causalpy/transforms.py 88.57% 4 Missing and 4 partials ⚠️
causalpy/data/simulate_data.py 96.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main     #614    +/-   ##
========================================
  Coverage   93.74%   93.74%            
========================================
  Files          41       44     +3     
  Lines        6827     7676   +849     
  Branches      458      517    +59     
========================================
+ Hits         6400     7196   +796     
- Misses        267      300    +33     
- Partials      160      180    +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Added detailed explanations comparing Piecewise ITS to Regression Discontinuity and Regression Kink designs. Introduced new real-world scenarios for level and slope changes, multiple interventions, and level-only models. Enhanced example code and output to illustrate these cases, improving clarity and practical guidance for users.
Improved clarity and conciseness throughout the Piecewise Interrupted Time Series (ITS) notebook. Rewrote several sections for better readability, combined and streamlined example scenarios, and clarified distinctions between level and slope changes, as well as the relationship to regression discontinuity and regression kink designs.
Refactors the PiecewiseITS experiment to use flexible patsy formulas with new stateful step() and ramp() transforms for specifying level and slope changes at interventions. Adds the causalpy.transforms module with robust, datetime-aware step/ramp transforms, updates tests to cover new formula interface and transform behavior, and improves documentation and error handling. This enables more flexible modeling of multiple interventions and supports both numeric and datetime time columns.
@drbenvincent
Copy link
Collaborator Author

bca3699 adds the most amazing patsy-based API for segmented/piecewise regression!

Added a new section describing the formula-based API for PiecewiseITS, including explanations of the custom step() and ramp() transforms, usage examples, and clarification on how the counterfactual is computed. This improves documentation clarity and helps users understand flexible model specification.
Implemented creation of post_impact, datapost, and post_pred attributes in PiecewiseITS for compatibility with effect_summary() from BaseExperiment. Added tests to verify effect_summary works for both OLS and PyMC models and that the new attributes are correctly created.
Added mathematical definitions for step and ramp functions using LaTeX for clarity, and moved import/setup code to the top of the notebook for better organization. Improved explanations of function arguments and removed duplicate import cell.
The introductory markdown in the piecewise_its_pymc.ipynb notebook has been significantly expanded and reorganized. The new content provides clearer explanations of when to use Piecewise ITS, the distinction between level and slope changes, the mathematical model, and its relationship to regression discontinuity and regression kink designs. Redundant sections were removed and a more structured, didactic flow was introduced.
Expanded explanations of level and slope changes in piecewise ITS, referencing a new illustrative figure. Added a code cell to display the figure, and clarified the description of multiple interventions for improved instructional clarity.
Inserted a markdown cell with a table summarizing model formulas for single and two intervention cases, covering level, slope, and combined effects. This provides clearer guidance on specifying models for each panel in the notebook.
Condenses and reorganizes introductory explanations for piecewise interrupted time series (ITS), splitting out key concepts, model details, and comparisons to related methods into clearer, more focused sections. Adds collapsible dropdowns and card formatting for scenario examples, and improves clarity and flow for users learning the model and its API.
Adds a comprehensive suite of tests for the PiecewiseITS class, including class and instance attribute checks, formula parsing, plotting, PyMC integration, counterfactuals, data generation, and error handling. Also updates the interrogate badge to reflect increased coverage.
Added detailed references and in-text citations to the piecewise_its_pymc.ipynb notebook to support methodological explanations. Updated the references.bib file with key literature on segmented regression and interrupted time series analysis. Improved clarity on model parameterization and corrected the references section to use the Sphinx bibliography directive.
@drbenvincent
Copy link
Collaborator Author

Tagging @tomicapretto in case you are interested in the stateful transforms (transforms.py) added in this PR. If I understand correctly, then if these were implemented in formulae then Bambi could be a great way for users to explore piecewise regression models?

@drbenvincent drbenvincent marked this pull request as ready for review December 25, 2025 20:12
@drbenvincent
Copy link
Collaborator Author

TODO: does this api work when we have datetime rather than integer time index?

Copy link
Contributor

@JeanVanDyk JeanVanDyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I’ve done a brief review of the changes, and I must say the notebook is extensive and really interesting—the variety of examples makes the functionality very clear. I’ve also run the notebook and the tests locally: everything works and passes as expected.

However, I noticed a few points that might need addressing before we merge:

Data Structure & Types: There are a few places where date/threshold handling is quite defensive (using multiple try/except and isinstance checks). I suspect we could simplify the entire class by standardizing these to pd.Timestamp or numeric types at the initial extraction point. This would also allow us to remove the redundant _convert_threshold_for_plotting helper.

Missing Method: It looks like the effect_summary method is currently missing. Since the global refactor, this seems to have been dropped or overlooked, but it's quite central to the experiment's output.

Overall, great work on the documentation and examples! Let me know what you think about streamlining the type handling and re-adding the summary method.

if matches:
return matches[0]
# Fallback: try to find a time-like column
return "t"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the current logic merges all thresholds into a single list, regardless of the variable name (for example, step(t, 10) and step(month, 5) would result in thresholds = [10, 5]). This loses the context of which limit applies to which variable.

Is it intended to support multiple tracking variables within a single formula?

If yes: We should consider storing these in a dictionary (e.g., {"t": [10], "month": [5]}) to ensure the thresholds are applied to the correct variables later in the execution.

If no: It might be safer to check the number of unique variables found and raise a ValueError if more than one is detected. This would prevent unexpected behavior if a user provides a complex formula.

else:
# Numeric threshold
post_mask = self.data[time_col] >= first_interruption

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we handle str to Timestamp conversion with a fallback to direct comparison. Is there a specific case where we need to compare raw strings that aren't timestamps?

If we are primarily dealing with dates or numbers, I wonder if it wouldn't be safer to convert everything to the proper type (using pd.to_datetime) right at the beginning of the pipeline?

My thinking is that it might allow us to "fail fast" if a user provides an invalid date, and it would simplify the final comparison to a single line: self.data[time_col] >= first_interruption.

I might be overlooking a specific scenario where this late-stage conversion is necessary, so I'd love to hear your thoughts on the intent here!

return pd.Timestamp(threshold)
except Exception:
return threshold # type: ignore[return-value]
return threshold
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we standardize the threshold types to pd.Timestamp or numeric values immediately upon extraction, this method likely becomes redundant as the data would already be in its final, usable form. I might be overlooking a specific edge case, but it seems that ensuring clean types at the entry point would allow us to simplify the class by removing this defensive logic and the repetitive try/except checks.

@drbenvincent
Copy link
Collaborator Author

Thanks @JeanVanDyk. I'll take these comments into account and ping you whenever I've got an improved version.

@tomicapretto
Copy link

tomicapretto commented Jan 13, 2026

Tagging @tomicapretto in case you are interested in the stateful transforms (transforms.py) added in this PR. If I understand correctly, then if these were implemented in formulae then Bambi could be a great way for users to explore piecewise regression models?

@drbenvincent, thanks for tagging me here.

Since a long time ago we have this issue opened.
If I understand correctly, what I create with truncate in that snippet is similar to what you would create with RampTransform.

In Bambi (but also with any formula-based modeling interface), I think one could do:

def step(x, threshold):
    return 1.0 * (x >= threshold) # ensure numeric output

def ramp(x, threshold):
    return  (x - threshold) * (x >= threshold)

Then you would write

formula = "y ~ x + step(x, 10)"  # level (aka intercept) changes when x>=10
formula = "y ~ x + ramp(x, 10)"  # slope changes when x>=10
formula = "y ~ x + step(x, 10) + ramp(x, 10)" # both level and slope change when x>=10

With that said, I’m not sure whether those examples provide enough motivation for a stateful transformation. A stateful transformation is useful when some aspect of the transformation (e.g., a threshold) depends on the initial (training) dataset. If the value is hardcoded in the formula, there is no need for a stateful transformation (although using one would not cause any harm).


While adding the examples, I just realized one could achieve the same via the interaction operator in combination with the special identity function I, which is useful for escaping expressions:

f = "y ~ x + I(x >= 10)" # step
f = "y ~ x + x:I(x >= 10)" # ramp
f = "y ~ x + I(x >= 10) + x:I(x >= 10)" # step + ramp

@drbenvincent, just let me know if you want to have a further chat about this

@drbenvincent
Copy link
Collaborator Author

Thanks for this! I think you are right, there's no need for this to be a stateful transform. It can just be a regular function.

I think this specific example

f = "y ~ x + x:I(x >= 10)" # ramp

might be problematic. Because x is not zero at 10 it creates a step and a ramp. I have memories of seeing a paper which flagged that this interaction approach could be problematic. And I guess the size of the step will vary with x which could make the interpretation of the coefficient a bit tricky. Would it have to be this?

f = "y ~ x + I(x-10):I(x >= 10)" # ramp

(Typing this at 3am after a toddler wake-up, so we'll see if this makes sense in the morning 🤣)

@tomicapretto
Copy link

@drbenvincent: yes, you are right about the problem and the solution, I missed it.

Anyway, I think having specific keywords such as step and ramp are quite self-explanatory and that is great for users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request major

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add PiecewiseITS experiment for known interruption dates

3 participants