Add PanelRegression experiment class for fixed effects estimation#628
Add PanelRegression experiment class for fixed effects estimation#628
PanelRegression experiment class for fixed effects estimation#628Conversation
Co-authored-by: drbenvincent <6765047+drbenvincent@users.noreply.github.com>
Co-authored-by: drbenvincent <6765047+drbenvincent@users.noreply.github.com>
Co-authored-by: drbenvincent <6765047+drbenvincent@users.noreply.github.com>
…, remove unused parameter Co-authored-by: drbenvincent <6765047+drbenvincent@users.noreply.github.com>
drbenvincent
left a comment
There was a problem hiding this comment.
- The
summarymethod gives just point estimates but not the HDI's - The
plot_coefficientsdoes not include error bars for the HDI's
- summary() now filters FE dummies and calls model.print_coefficients() which shows HDI - plot_coefficients() uses 94% HDI (matching print_coefficients) with clearer title - Added note when FE coefficients are hidden from summary output Co-authored-by: drbenvincent <6765047+drbenvincent@users.noreply.github.com>
Fixed in commit 4b67e10. Changes made:
The horizontal lines in the forest plot represent the HDI intervals, with the dot showing the posterior mean. |
|
bugbot review |
PR SummaryAdds a new
Written by Cursor Bugbot for commit 4b67e10. This will update automatically on new commits. Configure here. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #628 +/- ##
==========================================
+ Coverage 94.35% 94.57% +0.21%
==========================================
Files 44 46 +2
Lines 7517 8036 +519
Branches 456 509 +53
==========================================
+ Hits 7093 7600 +507
Misses 262 262
- Partials 162 174 +12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Changed the summary print statement to use the number of rows in self.X instead of the length of self.data, ensuring the observation count reflects the actual data used in the model.
Added a check to prevent including C(time_var) in the formula when using fe_method='within', as the within transformation already accounts for time fixed effects. Updated tests to cover this validation.
Improved plotting for Bayesian panel regression by using az.plot_forest and az.plot_hdi directly, allowing for HDI customization and more accurate visualizations. Updated plot_trajectories to support HDI intervals for Bayesian models and refactored code for clarity. Updated the panel_fixed_effects.ipynb notebook to use Bayesian models, added sampling settings, and revised output and code cells to reflect Bayesian workflow.
Enhances the panel fixed effects notebook with a detailed causal inference framework, including DAG visualizations, identification assumptions, and cautionary examples illustrating when fixed effects methods succeed or fail. Adds discussion of time-varying confounders, connection to difference-in-differences, and clarifies the interpretation of results. Also improves structure, explanations, and pedagogical clarity throughout the notebook.
Reorganized code cells for clarity, added cell metadata to hide input/output in some cells, and improved section headings for better structure. Split code and output for the time-varying confounder example, and updated example numbering for consistency.
The QQ plot in PanelRegression now uses consistent colors for markers and lines to match other plots. The panel_fixed_effects notebook was reorganized and clarified, with improved explanations of panel data confounders, fixed effects, and identification assumptions, as well as updated code and output for data simulation.
|
@drbenvincent do you wanna fix the conflicts or can bugbot do it? |
|
I'm in the process of resolving conflicts for my open pr's. Will get to this one soon :) |
|
TODO: check the changes to codespell |
Critical fixes: - Fix summary() printing wrong OLS coefficients for fe_method='dummies' (positional zip mismatch with filtered labels) - Fix boolean treatment columns silently skipped by _within_transform (select_dtypes excludes bool; now includes bool and casts to float) Moderate fixes: - Add effect_summary() stub with helpful NotImplementedError message - Document balanced-panel limitation for two-way within transformation - Fix _group_means to store means from original data, not demeaned data Minor fixes: - Fix summary() header to not say "excluding FE dummies" for within method - Implement plot_coefficients(var_names=...) parameter (was ignored) - Implement plot_trajectories select='extreme' and 'high_variance' strategies - Clarify treated_units coordinate placeholder in y DataArray Also adds 7 new tests covering the fixed functionality. Co-authored-by: Cursor <cursoragent@cursor.com>
Code Review: 9 issues found and fixedCommit: 83cc395 Critical Bugs Fixed1. The sklearn 2. Boolean treatment columns were silently NOT demeaned by
Moderate Issues Fixed3. Missing All other experiment classes implement this abstract method from 4. Two-way within transformation only correct for balanced panels The sequential single-pass demeaning (first by unit, then by time) is algebraically equivalent to the standard two-way within transformation only for balanced panels. For unbalanced panels, iterative alternating demeaning is needed. Added documentation in both the class docstring and the method docstring explaining this limitation. 5. When unit demeaning was applied first and then time demeaning, Minor Issues Fixed6. 7. 8. 9. Tests Added7 new tests covering the fixed functionality. All 19 tests pass. All pre-commit checks pass. |
Break up the monolithic __init__ into the canonical pipeline used by all
other experiment classes on main:
self.input_validation()
self._build_design_matrices()
self._prepare_data()
self.algorithm()
- Rename _validate_inputs() -> input_validation()
- Extract _build_design_matrices() (includes within transform + patsy)
- Extract _prepare_data() (numpy -> xarray conversion)
- Extract algorithm() (model fitting)
Co-authored-by: Cursor <cursoragent@cursor.com>
Follow-up: Refactor
|
- Move expt_type from class attribute to instance attribute in __init__ - Set data.index.name on original data before assignment (not on a copy) - Use standard "unit_0" label for treated_units coordinate - Pass xarray directly to sklearn fit() instead of .values/.ravel() - Use get_coeffs() instead of direct coef_ access (handles 2D arrays) - Squeeze predict() output where 1D arrays are needed Co-authored-by: Cursor <cursoragent@cursor.com>
Convention alignment with other experiment classesAddressed non-conformances found by comparing Changes
All 19 tests pass and all pre-commit hooks are clean. |
Fixes codecov/patch failure by covering the 11 previously uncovered statements: OLS branches for plot_unit_effects and plot_residuals, edge cases in plot_trajectories (all-units and single-unit), and defensive ValueError guards in get_plot_data_bayesian, get_plot_data_ols, and plot_unit_effects. Co-authored-by: Cursor <cursoragent@cursor.com>
Fix codecov/patch failure — 11 uncovered statements in
|
| Test | Lines Covered | What it tests |
|---|---|---|
test_plot_unit_effects_ols |
664-671 (5 stmts) | OLS branch of plot_unit_effects() |
test_plot_residuals_ols |
867 | OLS branch of plot_residuals() |
test_plot_trajectories_all_units |
738 | n_sample >= n_units branch (all units shown) |
test_plot_trajectories_single_unit |
766 | Single-unit subplot edge case |
test_get_plot_data_bayesian_raises_on_ols |
539 | ValueError guard when called with OLS model |
test_get_plot_data_ols_raises_on_pymc |
567 | ValueError guard when called with PyMC model |
test_plot_unit_effects_no_fe_labels |
641 | ValueError when no C(unit) terms in formula |
Result: panel_regression.py statement coverage went from 92% (11 missing) → 97% (0 missing). The remaining 3% is partial branch coverage (11 partial branches), which does not affect the codecov/patch line-coverage check.
| time-specific unobserved heterogeneity. This is the standard approach in | ||
| difference-in-differences estimation. | ||
|
|
||
| **Balanced panels**: When both unit and time fixed effects are requested |
There was a problem hiding this comment.
Interesting note. Wonder how prevalent this is in applications?
| formula: str, | ||
| unit_fe_variable: str, | ||
| time_fe_variable: str | None = None, | ||
| fe_method: Literal["dummies", "within"] = "dummies", |
There was a problem hiding this comment.
Maybe slight doubt about the convention here, "within" seems less descriptive to me than "de-meaned"
| Specific coefficient names to plot. If ``None``, plots all | ||
| non-FE coefficients (as determined by ``_get_non_fe_labels``). | ||
| """ | ||
| coeff_names = var_names if var_names is not None else self._get_non_fe_labels() |
There was a problem hiding this comment.
Nice. Yeah, don't want ugly forest plots.
| Specific unit IDs to plot. If provided, ignores n_sample and select. | ||
| n_sample : int, default=10 | ||
| Number of units to sample if units not specified. | ||
| select : {"random", "extreme", "high_variance"}, default="random" |
| ax.legend(fontsize=8) | ||
|
|
||
| # Hide unused subplots | ||
| for idx in range(n_units_plot, len(axes)): |
There was a problem hiding this comment.
why are we creating more than required?
NathanielF
left a comment
There was a problem hiding this comment.
Very cool. Simple model, but lots of nice plotting.


Implementation Plan for PanelRegression - COMPLETE ✅
Phase 1: Core Implementation ✅
Phase 2: Specialized Plotting Methods ✅
Phase 3: Testing ✅
Phase 4: Documentation ✅
Phase 5: Final Integration ✅
Phase 6: Address Review Feedback ✅
Original prompt
This section details on the original issue you should resolve
<issue_title>Feature: Panel Fixed Effects (
PanelRegressionexperiment class)</issue_title><issue_description>## Summary
Add a
PanelRegressionexperiment wrapper that enables panel-aware visualization and diagnostics, with support for both dummy variable and within-transformation approaches to fixed effects.Motivation
Panel data methods are foundational in applied econometrics. Chapter 8 of Causal Inference: The Mixtape covers fixed effects estimation, which is a workhorse for causal inference when there are unobserved time-invariant confounders.
The Mixtape code repository contains Python and R implementations of these methods.
Mixtape Coverage
sasp.pybail.pySee also the R implementations:
sasp.Randbail_1.R.Why Panel FE Matters
Current State
Panel fixed effects already works with
LinearRegressionusing patsy formula syntax:What's missing is a dedicated experiment class that provides panel-aware visualization, diagnostics, and efficient handling of large panels.
Proposed API
Core Parameters
Two Approaches: Dummies vs Within
1. Dummy Variables (
fe_method="dummies")User includes
C(unit)in the formula explicitly:Pros:
Cons:
2. Within Transformation (
fe_method="within")User does NOT include
C(unit)— the experiment class demeans the data:Pros:
Cons:
Design Matrix Comparison
dummiesy ~ C(unit) + Xwithiny ~ X(on demeaned data)For 10,000 units with 5 covariates:
Implementation
Main Class