Migrate from SQLAlchemy to Ibis #62

daniel-thom · 2026-01-19T16:37:35Z

This is a prototype, mostly generated by Claude. The goal is to see if we can simplify interaction with dsgrid when running Spark jobs. The current code based on SQLAlchemy requires a separate Spark session: dsgrid creates a session with pyspark and chronify relies on an Apache Thrift Server (Hive).

We would get the following benefits by migrating:

Pass an Ibis Table object from dsgrid to chronify to perform time validation instead of only a path to a Parquet file.
Support of PyHive (the SQLAlchemy driver for Hive) is unclear.
dsgrid could drop much of its special handling for Spark vs DuckDB (DuckDB has an experimental Spark API, but it is incomplete and has an uncertain future). Ibis appears to be a better long term solution.

We would lose this functionality in SQLAlchemy:

Database transactions with rollback. Ibis does not support this natively.
We currently allow the user to ingest rows from multiple DataFrames into an existing table. If the first DataFrame is valid but the second is not, we perform a rollback and the state of the database is the same as the original state. With Ibis, we do not have code to delete the added rows. (It could be done with special-casing for backends that support it.)
This is not import for dsgrid as we do not ingest data like this. We need to ask other chronify users.

Outstanding work:

Some tests are failing due to time zone / DST handling with Spark.
Talk to other chronify users about dropping transaction support.

codecov-commenter · 2026-01-20T01:11:15Z

Codecov Report

❌ Patch coverage is 87.17625% with 203 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (ll/local_time2@4f2a3f3). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/chronify/ibis/spark_backend.py	73.01%	34 Missing ⚠️
src/chronify/ibis/functions.py	77.61%	30 Missing ⚠️
src/chronify/ibis/types.py	51.92%	25 Missing ⚠️
src/chronify/store.py	88.00%	21 Missing ⚠️
src/chronify/time_series_mapper_base.py	86.61%	19 Missing ⚠️
src/chronify/ibis/duckdb_backend.py	79.74%	16 Missing ⚠️
src/chronify/ibis/sqlite_backend.py	80.72%	16 Missing ⚠️
src/chronify/ibis/base.py	80.00%	14 Missing ⚠️
src/chronify/time_zone_localizer.py	82.75%	5 Missing ⚠️
tests/conftest.py	87.80%	5 Missing ⚠️
... and 9 more

Additional details and impacted files

@@                Coverage Diff                @@
##             ll/local_time2      #62   +/-   ##
=================================================
  Coverage                  ?   91.85%           
=================================================
  Files                     ?       55           
  Lines                     ?     4962           
  Branches                  ?        0           
=================================================
  Hits                      ?     4558           
  Misses                    ?      404           
  Partials                  ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lixiliu · 2026-01-24T20:51:13Z

This is a prototype, mostly generated by Claude. The goal is to see if we can simplify interaction with dsgrid when running Spark jobs. The current code based on SQLAlchemy requires a separate Spark session: dsgrid creates a session with pyspark and chronify relies on an Apache Thrift Server (Hive).

We would get the following benefits by migrating:

Pass an Ibis Table object from dsgrid to chronify to perform time validation instead of only a path to a Parquet file.

Support of PyHive (the SQLAlchemy driver for Hive) is unclear.

dsgrid could drop much of its special handling for Spark vs DuckDB (DuckDB has an experimental Spark API, but it is incomplete and has an uncertain future). Ibis appears to be a better long term solution.

We would lose this functionality in SQLAlchemy:

Database transactions with rollback. Ibis does not support this natively.

We currently allow the user to ingest rows from multiple DataFrames into an existing table. If the first DataFrame is valid but the second is not, we perform a rollback and the state of the database is the same as the original state. With Ibis, we do not have code to delete the added rows. (It could be done with special-casing for backends that support it.)

This is not import for dsgrid as we do not ingest data like this. We need to ask other chronify users.

Outstanding work:

Some tests are failing due to time zone / DST handling with Spark.

Talk to other chronify users about dropping transaction support.

For rollback behavior, can't we just make a copy of the original dataframe to do the next operation so we have something to fall back on?

daniel-thom added 4 commits January 19, 2026 09:20

Replace sqlalchemy with ibis

a28ef3c

Fix Spark time zone handling

90d955c

Refactor

314e6f7

Refactor

9d47b9a

mypy fixes

cf105fa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate from SQLAlchemy to Ibis #62

Migrate from SQLAlchemy to Ibis #62

daniel-thom commented Jan 19, 2026

Uh oh!

codecov-commenter commented Jan 20, 2026 •

edited

Loading

Uh oh!

lixiliu commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Migrate from SQLAlchemy to Ibis #62

Are you sure you want to change the base?

Migrate from SQLAlchemy to Ibis #62

Conversation

daniel-thom commented Jan 19, 2026

Uh oh!

codecov-commenter commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lixiliu commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Jan 20, 2026 •

edited

Loading