Skip to content

Comments

LC2ST Module Refactoring#1727

Open
janfb wants to merge 9 commits intomainfrom
refactor-lc2st
Open

LC2ST Module Refactoring#1727
janfb wants to merge 9 commits intomainfrom
refactor-lc2st

Conversation

@janfb
Copy link
Contributor

@janfb janfb commented Jan 14, 2026

LC2ST Module Refactoring

Summary

This PR refactors the LC2ST diagnostics module to improve API clarity, error handling, and maintainability while preserving full backward compatibility.

Note 1: I used Claude Code with this refactoring. Every commit was reviewed by me and a very critical Gemini Pro reviewing agent. I believe the changes here are very useful and correct, but the reviewer should please take extra care and be extra skeptical when reviewing this.

Note 2: I started this in parallel or motivated by the flaky test spotted in #1715 , so we need to resolve potential conflicts between these PRs down the line.

Motivation

The LC2ST implementation had accumulated several issues:

  1. Confusing API workflow - Users could call methods in arbitrary order with silent failures or cryptic errors
  2. Assertions for validation - Input validation used assert statements, which are stripped in optimized builds
  3. Code duplication - Z-score normalization logic was repeated in 4+ locations
  4. Unstructured returns - get_scores() returned different types based on a boolean flag
  5. Double-normalization bug - Data was normalized twice in null hypothesis training paths

Design Choices

State Machine (LC2STState enum): Rather than documenting method call order, the refactoring enforces it through explicit state transitions. The LC2ST object progresses through INITIALIZED → OBSERVED_TRAINED/NULL_TRAINED → READY, with methods checking state before execution. Both training orders are valid and reach READY.

Structured Returns (LC2STScores dataclass): Replaces the Union[array, Tuple[array, array]] return type with a dataclass containing scores and optional probabilities. The old return_probs=True behavior is preserved but deprecated.

Parameter Rename with Deprecation: thetasprior_samples for clarity (it's samples from the prior, not parameters). The old parameter works via keyword-only argument with DeprecationWarning.

DRY Normalization: Extracted _normalize_theta() and _normalize_x() helpers. Normalization now happens exactly once in _train() and get_scores(), fixing the double-normalization bug.

Validation via Exceptions: All assert statements replaced with ValueError/TypeError with actionable messages citing actual vs expected values.

Backward Compatibility

  • Positional arguments work unchanged
  • thetas=... keyword works with deprecation warning
  • get_scores(..., return_probs=True) works with deprecation warning
  • All existing tests pass without modification

Test Improvements

Reduced parametrization from 48 combinations to 28 focused tests using pytest patterns (dataclass fixtures, composable fixture hierarchy, parametrized validation tests).

@codecov
Copy link

codecov bot commented Jan 14, 2026

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
5793 1 5792 146
View the full list of 1 ❄️ flaky test(s)
tests/torchutils_test.py::TorchUtilsTest::test_searchsorted

Flake rate in main: 48.45% (Passed 50 times, Failed 47 times)

Stack Traces | 0.006s run time
.venv/lib/python3.10....../site-packages/xdist/remote.py:289: in pytest_runtest_logreport
    self.sendevent("testreport", data=data)
.venv/lib/python3.10....../site-packages/xdist/remote.py:126: in sendevent
    self.channel.send((name, kwargs))
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:912: in send
    self.gateway._send(Message.CHANNEL_DATA, self.id, dumps_internal(item))
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1629: in dumps_internal
    return _Serializer().save(obj)  # type: ignore[return-value]
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1647: in save
    self._save(obj)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1667: in _save
    dispatch(self, obj)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1744: in save_tuple
    self._save(item)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1667: in _save
    dispatch(self, obj)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1740: in save_dict
    self._write_setitem(key, value)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1734: in _write_setitem
    self._save(value)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1667: in _save
    dispatch(self, obj)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1740: in save_dict
    self._write_setitem(key, value)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1734: in _write_setitem
    self._save(value)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1667: in _save
    dispatch(self, obj)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1740: in save_dict
    self._write_setitem(key, value)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1734: in _write_setitem
    self._save(value)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1667: in _save
    dispatch(self, obj)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1740: in save_dict
    self._write_setitem(key, value)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1734: in _write_setitem
    self._save(value)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <execnet.gateway_base._Serializer object at 0x7effb5bda500>
obj = tensor([0.0000, 0.1111, 0.2222, 0.3333, 0.4444, 0.5556, 0.6667, 0.7778, 0.8889])

    def _save(self, obj: object) -> None:
        tp = type(obj)
        try:
            dispatch = self._dispatch[tp]
        except KeyError:
            methodname = "save_" + tp.__name__
            meth: Callable[[_Serializer, object], None] | None = getattr(
                self.__class__, methodname, None
            )
            if meth is None:
>               raise DumpError(f"can't serialize {tp}") from None
E               execnet.gateway_base.DumpError: can't serialize <class 'torch.Tensor'>

.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1665: DumpError

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@janfb janfb requested a review from JuliaLinhart January 15, 2026 10:57
…tionality

- Updated parameter names in LC2ST initialization for consistency (thetas -> prior_samples).
- Modified get_scores and get_statistics_under_null_hypothesis methods to return LC2STScores objects, encapsulating probabilities and scores.
- Adjusted usage of get_scores in the tutorial and tests to reflect the new return type.
- input validation in LC2ST to prevent indexing errors.
- Updated tests to assert the presence of scores in the returned null statistics.
@janfb janfb requested review from plcrodrigues and removed request for JuliaLinhart January 22, 2026 08:46
Resolved conflicts integrating GPU support (PR #1715) into the
refactored LC2ST code:
- Added device parameter to LC2ST.__init__ with GPU detection
- Integrated NeuralNetBinaryClassifier for GPU-accelerated training
- Added skorch ValidSplit and EarlyStopping configuration
- Fixed target dtype for skorch (float32) vs sklearn (int64)
- Updated tests to use refactored cal_data fixture

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants