-
-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Fenn already reduces experiment boilerplate, but time-series projects still require repeated, error-prone preprocessing:
- parsing and sorting timestamps
- handling missing / duplicate time points
- resampling to a regular cadence
- building sliding windows for sequence models (e.g., LSTMs)
- time-aware train/val/test splits (no leakage)
This issue proposes:
-
a small, composable set of utilities under
fenn.timeseriesfor early-stage time-series preparation, and -
a first LSTM template in the fenn templates repository to demonstrate the workflow end-to-end.
Goal
Create a minimal, well-documented fenn.timeseries module for common preprocessing patterns, and ship a working LSTM template that uses it.
Proposed API (some ideas)
1) Timestamp + index normalization:
-
ensure_datetime_index(df, time_col, *, tz="UTC", sort=True, drop_duplicates="last") -> pd.DataFrame- converts
time_colto datetime, sets it as index, sorts, and resolves duplicates deterministically.
- converts
2) Frequency inference + resampling:
-
infer_frequency(index, *, max_samples=5000) -> str | None -
resample(df, rule, *, agg="mean", fill=None) -> pd.DataFramefillcould beNone | "ffill" | "bfill" | "interpolate".
3) Windowing for sequence models:
make_windows(values, *, lookback, horizon=1, stride=1, drop_last=True) -> (X, y)- output shapes:
X: (N, lookback, C)andy: (N, horizon, C_or_targets)(keep it simple; document clearly).
- output shapes:
4) Time-aware splitting (no leakage):
-
time_split(df, *, train_ratio=0.7, val_ratio=0.15) -> (train, val, test)- split by chronological order only.
These functions should be pure (no hidden global state) and should not modify inputs in place.
Tasks
Part A — fenn.timeseries utilities (in pyfenn/fenn)
-
Create package structure:
fenn/experimental/timeseries/__init__.pyfenn/experimental/timeseries/preprocessing.py
(or similar)
-
Implement the baseline functions above (or an agreed subset if needed).
-
Add type hints + docstrings with short examples.
-
Add unit tests covering:
-
unsorted timestamps, duplicated timestamps
-
missing timestamps / gaps
-
resampling with each
fillmode -
window shapes for univariate and multivariate signals
-
split sizes and chronological ordering
-
Part B — LSTM template (in pyfenn/templates)
-
Add a new template folder, e.g.
lstm-timeseries(name can be bikeshedded later). -
Template should include:
-
a small dataset loader (either synthetic sine wave or a tiny CSV checked into the template)
-
preprocessing using
fenn.timeseries(datetime normalization → resample → windowing → split) -
an LSTM model (PyTorch or Keras/TensorFlow, whichever is the project standard for templates)
-
a minimal training loop + evaluation (MAE/MSE + a simple plot is optional)
-
Acceptance criteria
-
fenn.timeseriesfunctions are importable from the public API. -
Clear docstrings + at least one minimal documentation page/section (“Time-series preprocessing”).
-
Tests pass in CI and cover edge cases listed above.
-
The template runs end-to-end and demonstrates:
- deterministic preprocessing
- correct window shapes
- leakage-free splitting
- a training run that produces a sensible loss curve
Where to contribute
-
Core library: https://github.com/pyfenn/fenn
-
Templates repository (used by
fenn pull): https://github.com/pyfenn/templates
If you want to work on this issue, comment which part you want to take:
- Part A (utilities), Part B (template), or both (if you have time).