Skip to content

Add fenn.timeseries preprocessing helpers + first LSTM forecasting template #46

@blkdmr

Description

@blkdmr

Fenn already reduces experiment boilerplate, but time-series projects still require repeated, error-prone preprocessing:

  • parsing and sorting timestamps
  • handling missing / duplicate time points
  • resampling to a regular cadence
  • building sliding windows for sequence models (e.g., LSTMs)
  • time-aware train/val/test splits (no leakage)

This issue proposes:

  1. a small, composable set of utilities under fenn.timeseries for early-stage time-series preparation, and

  2. a first LSTM template in the fenn templates repository to demonstrate the workflow end-to-end.

Goal

Create a minimal, well-documented fenn.timeseries module for common preprocessing patterns, and ship a working LSTM template that uses it.

Proposed API (some ideas)

1) Timestamp + index normalization:

  • ensure_datetime_index(df, time_col, *, tz="UTC", sort=True, drop_duplicates="last") -> pd.DataFrame

    • converts time_col to datetime, sets it as index, sorts, and resolves duplicates deterministically.

2) Frequency inference + resampling:

  • infer_frequency(index, *, max_samples=5000) -> str | None

  • resample(df, rule, *, agg="mean", fill=None) -> pd.DataFrame

    • fill could be None | "ffill" | "bfill" | "interpolate".

3) Windowing for sequence models:

  • make_windows(values, *, lookback, horizon=1, stride=1, drop_last=True) -> (X, y)
    • output shapes: X: (N, lookback, C) and y: (N, horizon, C_or_targets) (keep it simple; document clearly).

4) Time-aware splitting (no leakage):

  • time_split(df, *, train_ratio=0.7, val_ratio=0.15) -> (train, val, test)

    • split by chronological order only.

These functions should be pure (no hidden global state) and should not modify inputs in place.

Tasks

Part A — fenn.timeseries utilities (in pyfenn/fenn)

  1. Create package structure:

    • fenn/experimental/timeseries/__init__.py
    • fenn/experimental/timeseries/preprocessing.py

(or similar)

  1. Implement the baseline functions above (or an agreed subset if needed).

  2. Add type hints + docstrings with short examples.

  3. Add unit tests covering:

    • unsorted timestamps, duplicated timestamps

    • missing timestamps / gaps

    • resampling with each fill mode

    • window shapes for univariate and multivariate signals

    • split sizes and chronological ordering

Part B — LSTM template (in pyfenn/templates)

  1. Add a new template folder, e.g. lstm-timeseries (name can be bikeshedded later).

  2. Template should include:

    • a small dataset loader (either synthetic sine wave or a tiny CSV checked into the template)

    • preprocessing using fenn.timeseries (datetime normalization → resample → windowing → split)

    • an LSTM model (PyTorch or Keras/TensorFlow, whichever is the project standard for templates)

    • a minimal training loop + evaluation (MAE/MSE + a simple plot is optional)

Acceptance criteria

  • fenn.timeseries functions are importable from the public API.

  • Clear docstrings + at least one minimal documentation page/section (“Time-series preprocessing”).

  • Tests pass in CI and cover edge cases listed above.

  • The template runs end-to-end and demonstrates:

    • deterministic preprocessing
    • correct window shapes
    • leakage-free splitting
    • a training run that produces a sensible loss curve

Where to contribute

If you want to work on this issue, comment which part you want to take:

  • Part A (utilities), Part B (template), or both (if you have time).

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions