Skip to content

FreaxMATE/timedb

 
 

Repository files navigation

timedb

TL;DR

timedb is a opinionated schema and API built on top of PostgreSQL design to handle overlapping time series revisions and auditable human-in-the-loop updates.

Most time series systems assume a single immutable value per timestamp. timedb is built for domains where data is revised, forecasted, reviewed, and corrected over time.

timedb lets you:

  • ⏱️ Retain "time-of-knowledge" history through a three-dimensional time series data model;
  • ✍️ Make versioned ad-hoc updates to the time series data with annotations and tags; and
  • 🔀 Represent both timestamp and time-interval time series simultaneously.

Why timedb?

Most time series systems assume:

  • one value per timestamp;
  • immutable historical data; and
  • no distinction between when something was true vs when it was known.

This pattern is a major drawback in situations such as:

  • forecasting, where multiple forecast revisions predicts the same timestamp;
  • backtesting, where "time-of-knowledge" history is required by algorithms;
  • data communication, where and auditable history of updates is required.
  • Human review and correction, where values are manually adjusted, annotated, or validated over time
  • Late-arriving data and backfills, where new information must be incorporated without rewriting history

In practice, teams work around these limitations by overwriting data, duplicating tables and columns, or encoding semantics in column names — making systems fragile, opaque, and hard to reason about.

timedb addresses this by making revisions, provenance, and temporal semantics explicit in the data model, rather than treating them as edge cases.

Installation

pip install timedb

Basic usage

import timedb as td
import pandas as pd
from datetime import datetime, timezone, timedelta

# Create database schema (requires TIMEDB_DSN or DATABASE_URL env var)
td.create()

# Create time series data
base_time = datetime(2025, 1, 1, 0, 0, tzinfo=timezone.utc)
df = pd.DataFrame({
    'valid_time': [base_time + timedelta(hours=i) for i in range(24)],
    'value': [20.0 + i * 0.3 for i in range(24)]
})

# Insert time series
result = td.insert_batch(df=df)
print(f"Inserted series: {result.series_id}")

# Read data back
df_read = td.read()
print(df_read)

See the examples/ folder for interactive Jupyter notebooks demonstrating more advanced usage.

Tables

batches_table

Field Type Purpose
batch_id (primary key) UUID Unique identifier for the batch (generated by the API)
tenant_id UUID Tenant identifier for multi-tenant support
workflow_id (optional) text Identifier for the workflow/pipeline (NULL for manual insertions)
batch_start_time (optional) timestamptz When the batch started
batch_finish_time (optional) timestamptz When the batch finished
known_time timestamptz When the data was known/available (default: now())
batch_params (optional) jsonb Parameters/config used for this batch (e.g., model version, API args)
inserted_at timestamptz When the row was inserted (default: now())

series_table

Field Type Purpose
series_id (primary key) UUID Unique identifier for the series
name text Parameter name (e.g., 'wind_power', 'temperature')
unit text Canonical unit for the series (e.g., 'MW', 'degC')
labels jsonb JSON object differentiating this series (e.g., {"site": "Gotland", "turbine": "T01"})
description (optional) text Human-readable description
inserted_at timestamptz When the series was created (default: now())

Uniqueness constraint: (name, labels) - A series is uniquely identified by its name and labels. The unit is NOT part of the uniqueness constraint, so you cannot create two series with the same name+labels but different units.


values_table

Field Type Purpose
value_id (primary key) attribute Unique identifier for each version of a value
batch_id (foreign key) attribute References the batch that produced this value (batches_table.batch_id)
tenant_id attribute Tenant identifier for multi-tenant support
series_id (foreign key) attribute References the series (series_table.series_id)
valid_time time dimension Timestamp the value is valid for
valid_time_end (optional) time dimension Optional interval end time; NULL means point-in-time at valid_time
value (optional) measure The numeric value (nullable; NULL can be a valid stored value)
annotation (optional) attribute Optional human annotation (whitespace-only disallowed)
metadata (optional) attribute Optional JSONB metadata (e.g., {"quality": "good", "source": "manual"})
tags (optional) attribute Optional semantic labels / quality flags (empty arrays disallowed; use NULL)
changed_by (optional) attribute User or service responsible for the change
change_time time dimension When this version row was created (default now())
is_current attribute Whether this row is the active version for its key (default true)

metadata_table

Field Type Purpose
metadata_id (primary key) attribute Surrogate primary key for metadata rows
batch_id (foreign key) attribute References batch context (batches_table.batch_id)
tenant_id attribute Tenant identifier for multi-tenant support
valid_time time dimension Time context for the metadata (joins onto values via (batch_id, valid_time))
metadata_key attribute Name of the metadata field (e.g. contractId, deliveryStart)
value_number (optional) attribute Numeric metadata value (exactly one typed value must be set per row)
value_string (optional) attribute String metadata value (mutually exclusive with other typed values)
value_bool (optional) attribute Boolean metadata value (mutually exclusive with other typed values)
value_time (optional) attribute Timestamp metadata value (mutually exclusive with other typed values)
value_json (optional) attribute JSON metadata value (mutually exclusive with other typed values)
inserted_at time dimension When the metadata row was inserted (default now())

Changed

Three-dimensional time series data model

Most time series databases are two-dimensional: they map a timestamp to a value. timedb adds a third dimension: time-of-knowledge (known_time).

This allows you to store overlapping forecasts and revisions while preserving the full history of how values evolved. For example:

  • A forecast made on Monday for Tuesday can coexist with a forecast made on Tuesday for the same time
  • You can query "what did we know about Tuesday on Monday?" vs "what do we know now?"

The key dimensions are:

  • valid_time: When the value is valid (the timestamp being forecasted/measured)
  • known_time: When this value became known (when the forecast was made or data arrived)
  • batch_id: Groups values that were inserted together in one batch

This design naturally supports:

  • Forecast revisions: Multiple predictions for the same valid_time from different known_times
  • Data corrections: Updates that preserve the original value with full audit trail
  • Backtesting: Reconstruct what was known at any point in the past

Additional attributes

Beyond the core time dimensions, timedb includes attributes for human-in-the-loop corrections:

  • tags: Array of strings for quality flags (e.g., ["reviewed", "corrected"])
  • annotation: Text descriptions of changes
  • metadata: JSONB field for flexible metadata storage
  • changed_by: Who made the change
  • version: Incremented on each update to track history
  • is_current: Flag indicating if this is the active version

Updates create new rows rather than overwriting, so you maintain a complete audit trail of all changes.

Roadmap

  • Decouple the knowledge time from the batch_start_time
  • Python SDK that allows time series data manipulations, reads and writes
  • RESTful API layer that serves data to users
  • Built in authentication and multi-tenancy support
  • Unit handling (e.g. MW, kW)
  • Handle different time zones in the API layer while always storing in UTC in the database
  • Support for postgres time intervals (tsrange/tstzrange)
  • Built in data retention, TTL, and archiving
  • Support for subscribing to database updates through the API
  • Xarray integration for multidimensional time series
  • Polars integration for lazy computations
  • Parquet file integration
  • Real-Time Subscriptions through websocket subscription
  • Store time series with geographic coordinates. Query by spatial region (e.g., "all temperature sensors in this polygon")
  • Automatic alignment and interpolation of different time series resolutions
  • Symbolic time series + serialization

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 94.1%
  • PLpgSQL 4.7%
  • Shell 1.2%