Add estimands knowledgebase page for core methods#683
Add estimands knowledgebase page for core methods#683ErikRingen wants to merge 4 commits intopymc-labs:mainfrom
Conversation
Documents the causal estimands, computation approaches, and key assumptions for DiD, ITS, Synthetic Control, and Regression Discontinuity. Clarifies the distinction between parametric interaction models (ATT via coefficients) and pre-post counterfactual models (time-varying unit-specific impacts). Addresses pymc-labs#681
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #683 +/- ##
==========================================
+ Coverage 93.37% 94.35% +0.98%
==========================================
Files 39 44 +5
Lines 5930 7517 +1587
Branches 386 456 +70
==========================================
+ Hits 5537 7093 +1556
- Misses 255 262 +7
- Partials 138 162 +24 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Nice. Over the weekend I had the realisation that in most (all?) cases we are basing our estimands upon coefficient estimates. But we might be better off using a g-computation type approach. That might make things more robust - for example, coefficient based estimates will presumably break as soon as you add link functions (which we don't yet have, but should probably add some examples in the docs). Or if you have a moderation of an effect by an interaction term with a treatment variable. Though in that case I guess the estimand needs to be CATE and the experiment will most likely be aware of it. Anyway, this is not a fleshed out proposal (one may follow in a separate issue), but early thoughts on using g-computation as the default method of estimating the estimand? |
Early thought is that it would be a great idea--at least when we are using PyMC models as the backend/estimator and it is straightforward to do g-computation. |
PR SummaryLow Risk Overview Updates the Knowledge Base Written by Cursor Bugbot for commit b8bd273. This will update automatically on new commits. Configure here. |
drbenvincent
left a comment
There was a problem hiding this comment.
I think this will be an excellent addition. My own personal request would be to not hard-wrap lines.
I got a review from Claude - which picked up a few things.
I'm not 100% sure of its point 2.3. And I don't care if it thinks g-computation might surprise Epi readers 🤣 We could maybe add a short description in the glossary, but don't feel you have to do that in this PR. We are likely going to increase the mention of g-computation in the repo in the near future anyway.
Overall Assessment
This is a valuable addition to CausalPy's knowledgebase. The Lundberg et al. (2021) framework is well-chosen as an organizing principle, and the per-method sections provide a consistent structure (research question → estimand → assumptions → estimator → interpretation). The writing is clear and accessible.
However, the document contains several factual errors about how CausalPy actually computes treatment effects, particularly for Regression Discontinuity. These must be corrected before merging. Below, findings are organized by severity.
1. Errors (Must Fix)
1.1 RD estimator is mislabeled as "Coefficient-based"
Lines 190–193 and Quick Reference table (line 228)
The document states:
Estimator: Coefficient-based. The treatment effect is estimated as the discontinuity in predicted outcomes at the cutoff, typically using local polynomials around the running variable threshold.
This is self-contradictory and factually wrong. The actual implementation in regression_discontinuity.py (lines 175–203) computes the treatment effect by:
- Creating two prediction points:
threshold - epsilon(untreated) andthreshold + epsilon(treated). - Predicting outcomes at both points using the fitted model.
- Taking the difference:
pred[above] - pred[below].
This is prediction-based (closer to G-computation), not coefficient extraction. The effect is stored in self.discontinuity_at_threshold, not extracted from a coefficient vector.
Suggested fix: Change "Coefficient-based" to "Prediction-based" (or "G-computation") and update the Quick Reference table accordingly.
1.2 RD does NOT use "local polynomials"
Line 191
CausalPy does not implement local polynomial regression. The implementation uses standard regression formulas via patsy.dmatrices() (e.g., y ~ 1 + x + treated + x:treated). There is an optional bandwidth parameter that filters data to a window around the cutoff, but the model fitted within that window is still a standard parametric regression — not a local polynomial estimator (as used by e.g. rdrobust).
Suggested fix: Remove the phrase "typically using local polynomials" and replace with something like: "using a parametric regression model evaluated on both sides of the cutoff. An optional bandwidth parameter restricts the data to a window around the threshold."
1.3 RD estimand terminology conflicts with IV
Line 175
The document calls the RD estimand:
Local average treatment effect at the cutoff
In the causal inference literature, "LATE" (Local Average Treatment Effect) is specifically associated with Instrumental Variable designs and refers to the complier average causal effect. CausalPy's own glossary defines LATE as:
"Also known as the complier average causal effect (CACE), is the effect of a treatment for subjects who comply with the experimental treatment assigned to their sample group. It is the quantity we're estimating in IV designs."
Using the same term for RD creates confusion. The standard RD terminology is "local treatment effect at the cutoff" or simply "the sharp RD estimand".
Suggested fix: Change "Local average treatment effect at the cutoff" to "Local treatment effect at the cutoff" (dropping "average" removes the LATE association) or use a more explicit label like "Sharp RD treatment effect at the cutoff."
2. Inaccuracies / Omissions (Should Fix)
2.1 IPW weighting schemes are incomplete
Lines 208–209
The document lists three weighting_scheme options:
"raw","robust"→ ATE;"overlap"→ overlap population estimand
But the actual implementation (inverse_propensity_weighting.py, line 42-44) supports four options:
"raw"— ATE"robust"— ATE"overlap"— overlap population (ATO)"doubly robust"/"doubly_robust"— omitted from the document
Suggested fix: Add "doubly_robust" to the list and clarify which estimand it targets.
2.2 Quick Reference table is inconsistent with implementation
Line 228
The table lists RD's computation as "Coefficient-based". Per finding 1.1, this should be "Prediction-based" or "G-computation (local)".
| Method | Empirical Estimand | Computation |
|---|---|---|
| Regression Discontinuity | Local treatment effect at cutoff | Prediction-based |
2.3 SC "Convex hull coverage" assumption may not apply to all CausalPy models
Line 152
The "convex hull coverage" assumption comes from the original Abadie et al. synthetic control framework, which requires non-negative weights summing to one. CausalPy's Bayesian SC implementation does not necessarily enforce these constraints — the Bayesian model can produce weights outside [0, 1] or that don't sum to one.
Suggested fix: Qualify this assumption, e.g., "In the standard (non-Bayesian) formulation, the treated unit must lie within the convex hull of control units. CausalPy's Bayesian implementation relaxes this constraint."
3. Conceptual / Pedagogical Suggestions (Nice to Have)
3.1 DiD "No interference" assumption is SUTVA, not DiD-specific
Line 83
"No interference between units" (SUTVA) is a general assumption required by all causal methods covered in the document, not unique to DiD. Listing it only under DiD implies it's specific to that method. Consider either:
- Stating SUTVA once at the top as a universal assumption, or
- Listing it under all methods for consistency.
3.2 The ASCII pipeline diagram could be improved
Lines 12–14
Theoretical Estimand --> Empirical Estimand --> Estimator
This could be improved with a proper Mermaid diagram or a styled admonition. The plain ASCII is functional but feels out of place in otherwise well-formatted documentation. For example:
:::{mermaid}
graph LR
A["Theoretical Estimand<br/>(research question)"] --> B["Empirical Estimand<br/>(ATT, ATE, LATE, ...)"]
B --> C["Estimator<br/>(model + computation)"]
:::
3.3 Cross-reference style inconsistency
Lines 86, 118, 156, 187
Links to quasi_dags use raw Markdown link syntax:
[Difference in Differences section of quasi_dags](quasi_dags.ipynb#difference-in-differences)While the rest of the document uses MyST cross-reference syntax ({doc}, {term}, {cite:t}). The auto-generated section anchors from notebook headers are fragile — renaming a header silently breaks the link. Consider using {ref} with explicit labels if possible, or at minimum note this fragility.
3.4 ITS and SC could mention effect_summary() more prominently
The ITS section (line 129) mentions effect_summary() but SC does not, despite both supporting it. Since effect_summary() is the primary programmatic way users access results, it would be helpful to mention it for SC as well.
3.5 IV and ANCOVA are referenced but have limited CausalPy support
Lines 206–216, 230
The "Context-Dependence" note references IV, and the closing paragraph mentions IV, IPW, and ANCOVA. However:
- IV:
plot()andsummary()both raiseNotImplementedError. - IPW: Bayesian-only (
supports_ols = False). - ANCOVA (PrePostNEGD): Bayesian-only.
Users reading this doc might expect full feature parity. A brief note about maturity levels would help set expectations.
3.6 Minor: "G-computation" terminology
The use of "G-computation" for ITS and SC is technically defensible (predict-under-intervention), but in the epidemiology literature, G-computation (Robins, 1986) refers specifically to standardization over the joint distribution of confounders. Readers from that background may find the term surprising. A brief parenthetical — e.g., "G-computation (counterfactual prediction)" — on first use would help.
4. Summary Table
| # | Issue | Severity | Lines |
|---|---|---|---|
| 1.1 | RD estimator is NOT "Coefficient-based" | Error | 190–193, 228 |
| 1.2 | RD does NOT use "local polynomials" | Error | 191 |
| 1.3 | RD "LATE" terminology conflicts with IV/glossary | Error | 175 |
| 2.1 | IPW missing "doubly_robust" weighting scheme |
Omission | 208–209 |
| 2.2 | Quick Reference table wrong for RD | Inaccuracy | 228 |
| 2.3 | SC convex hull assumption may not apply to Bayesian SC | Inaccuracy | 152 |
| 3.1 | SUTVA listed only under DiD | Pedagogical | 83 |
| 3.2 | ASCII diagram could be a Mermaid diagram | Style | 12–14 |
| 3.3 | Cross-reference style inconsistency | Style | 86, 118, 156, 187 |
| 3.4 | SC section doesn't mention effect_summary() |
Omission | ~165 |
| 3.5 | IV/ANCOVA maturity not noted | Clarity | 206–216, 230 |
| 3.6 | "G-computation" may confuse epi readers | Clarity | 121, 159 |
5. What Works Well
- The Lundberg et al. framework is an excellent organizing choice and provides a principled way for users to think about estimands.
- Consistent per-method structure (question → estimand → assumptions → estimator → interpretation) makes the document scannable and easy to reference.
- Explicit identification assumptions per method is very valuable — users often skip this step.
- Interpretation notes calling out common pitfalls (e.g., DiD's coefficient vs. counterfactual plot, ITS temporal variation, RD locality) are practical and helpful.
- The Quick Reference table is a nice summary (once corrected).
- Glossary term linking via
{term}is done correctly throughout. - The Lundberg citation is properly included in
references.biband cited via{cite:t}.
- Fix RD estimator: change "Coefficient-based" to "Prediction-based" (implementation uses prediction at threshold +/- epsilon, not coefficient extraction) - Remove incorrect "local polynomials" claim for RD; describe parametric regression with optional bandwidth - Fix RD estimand: "Local treatment effect at the cutoff" (drop "average" to avoid LATE/IV confusion) - Add "doubly_robust" to IPW weighting scheme list - Qualify SC convex hull assumption with Dirichlet prior detail - Move SUTVA to shared assumption note (not DiD-specific) - Add effect_summary() mention to SC interpretation note - Add IV/ANCOVA implementation maturity caveat - Unwrap hard-wrapped lines per reviewer request - Remove LATE from empirical estimand examples (IV-specific) Co-authored-by: Cursor <cursoragent@cursor.com>
|
@drbenvincent I think all relevant feedback addressed in b8bd273 |
Documents the causal estimands, computation approaches, and key assumptions for DiD, ITS, Synthetic Control, and Regression Discontinuity. Clarifies the distinction between parametric interaction models (ATT via coefficients) and pre-post counterfactual models (time-varying unit-specific impacts).
Towards #681
📚 Documentation preview 📚: https://causalpy--683.org.readthedocs.build/en/683/