-
Notifications
You must be signed in to change notification settings - Fork 6
Add NICE method #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add NICE method #33
Conversation
to calculate proximity
methods/catalog/nice/reproduce.py
Outdated
| std_ae_error = ae_errors.std() | ||
|
|
||
| # ============================================ | ||
| # PRINT ALL FOUR METRICS (like Table 5 in paper) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran these tests locally, but the printed results don’t match the values reported in the paper. Please turn these print statements into assertions using the numbers from the table (a small tolerance is acceptable).
Also, these tests use the Random Forest model, so the correct reference is Table 6.
methods/catalog/nice/reproduce.py
Outdated
| elif optimization == "none": | ||
| # None should be very plausible (it's an actual instance!) | ||
| # But we allow some tolerance since we measure on test set | ||
| assert avg_ae_error <= 0.02, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn’t find where the paper reports the average error rate. The only place I see something similar is in Table 7, but that value seems different from what’s being checked here. Could you point me to the exact reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an online_appendix.xlsx table in a NICE_Experiments repo (https://github.com/DBrughmans/NICE_experiments/online_appendix.xlsx) that contains raw results instead of ranks. The dataset I accessed through DataCatalog is normalized while the author has a different preprocessing workflow. As a result, my AE error is much smaller. Still working on it, and we need the same preprocessing as the author's in our implementation?
methods/catalog/nice/reproduce.py
Outdated
| for opt in ["none", "sparsity", "proximity", "plausibility"]: | ||
| nice = NICE(mlmodel=model, hyperparams={"optimization": opt}) | ||
|
|
||
| # Measure CPU time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPU time isn’t a reliable metric for unit tests, since it depends on the hardware and environment where the code is executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed cpu time assertions
methods/catalog/nice/reproduce.py
Outdated
| print(f" NICE({opt:<12}): {metrics['cpu_time_total_ms']:>8.2f} ms total " | ||
| f"({metrics['cpu_time_avg_ms']:>6.2f} ms per instance)") | ||
|
|
||
| # Verify expectations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d suggest making each of these assertions a separate unit test for better clarity.
| print(f"✓ NICE integrates correctly with {dataset_name} dataset") | ||
|
|
||
|
|
||
| if __name__ == "__main__": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script runs tests manually with print statements, but we should convert it into proper unit tests (e.g., using pytest) instead of using "print" outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 3 tests failed when I tried to run them. Please check them to run successfully
test_nice_quality[mlp-proximity]
test_nice_quality[mlp-plausibility]
nice_variants_comparison[mlp]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid having multiple assertions in a single unit test. I’d suggest keeping one assertion per test to make it clearer and easier to debug later.
methods/catalog/nice/reproduce.py
Outdated
| """ | ||
| Test that NICE produces quality counterfactuals with all metrics in expected ranges. | ||
| """ | ||
| data = DataCatalog("adult", model_type=model_type, train_split=0.7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use pytest fixtures to build DataCatalog/ModelCatalog and the AutoEncoder once per dataset/model, then reuse them across tests. Each test can still create a fresh NICE instance and slice the same factuals for isolation.
zkhotanlou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please fetch the changes from the main branch so that the pre-commit hooks could be run successfully.
use the author's data and model
to calculate proximity
use the author's data and model
…ourse_benchmarks into feature/add-nice
NICE,
Probe,
in methods/__init__.py
Raw data: label-encoded categorical + original scale continuous Preprocessor: One-Hot Encoding + MinMaxScaler(-1, 1) Fix autoencoder training on preprocessed data
Implementation Details
(1) 4 variants implemented: none, sparsity, proximity, plausibility
(2) Source: Adapted from official NICE repository (https://github.com/DBrughmans/NICE)
(3) Paper: Brughmans et al. (2024) "NICE: an algorithm for nearest instance counterfactual explanations" Data Mining and Knowledge Discovery
(4) Dataset: Adult
(5) Predictive model: Random Forest and MLP
Potential Differences
(1) Original autoencoder's structure was not provided so we have to create our own
(2) The 200 test samples were originally chosen at random so our samples could be different
(3) Run time will be different but the rankings of the four variants are the same
Reproduced Results
(1) RF as predictive model (updated on 11/13/2025)
(2) MLP as predictive model
Files Added/Modified
Main implementation:
methods/catalog/nice/model.py - Main NICE wrapper class implementing RecourseMethod interface
methods/catalog/nice/reproduce.py - Comprehensive test reproducing paper results (part of table 6)
Library components:
methods/catalog/nice/library/init.py - Library exports
methods/catalog/nice/library/autoencoder.py - Autoencoder for plausibility measurement
methods/catalog/nice/library/data.py - Data handling and candidate filtering
methods/catalog/nice/library/distance.py - HEOM distance metric implementation
methods/catalog/nice/library/heuristic.py - Best-first greedy search
methods/catalog/nice/library/reward.py - Three reward functions (sparsity, proximity, plausibility)
Integration:
Updated methods/init.py to export NICE
Updated methods/catalog/init.py to include NICE