Skip to content

I/O: Adapter for Delta Lake#664

Open
amotl wants to merge 1 commit intoicebergfrom
deltalake
Open

I/O: Adapter for Delta Lake#664
amotl wants to merge 1 commit intoicebergfrom
deltalake

Conversation

@amotl
Copy link
Member

@amotl amotl commented Feb 20, 2026

@coderabbitai
Copy link

coderabbitai bot commented Feb 20, 2026

Warning

Rate limit exceeded

@amotl has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 14 minutes and 59 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Walkthrough

This pull request adds Delta Lake support to CrateDB Toolkit, enabling bidirectional data transfer between Delta Lake and CrateDB. Changes include a new Delta Lake integration module, cluster core updates, dependency configuration, comprehensive documentation, and test coverage.

Changes

Cohort / File(s) Summary
Core Delta Lake Module
cratedb_toolkit/io/deltalake.py
New module implementing Delta Lake integration with DeltaLakeAddress dataclass for URL parsing and storage configuration, plus from_deltalake() and to_deltalake() functions for bidirectional data transfer via Polars and CrateDB APIs.
Cluster Integration
cratedb_toolkit/cluster/core.py
Added Delta Lake handling in StandaloneCluster.load_table() and save_table() methods to conditionally route to from_deltalake() and to_deltalake() functions for Delta Lake URLs.
Configuration
pyproject.toml
Added optional dependency group for Delta Lake with polars[deltalake] and updated io-opentable group to include deltalake dependency.
Documentation
doc/io/deltalake/index.md, doc/io/index.md
New Delta Lake I/O documentation covering installation, usage examples for multiple backends (S3, Azure, GCS, HDFS, LakeFS), and configuration options; updated main I/O index to reference new module.
Testing & Changelog
tests/io/test_deltalake.py, CHANGES.md
Added comprehensive test suite for Delta Lake load/save operations via CLI with version handling and error scenarios; expanded changelog with Delta Lake I/O adapter entry.

Sequence Diagrams

sequenceDiagram
    participant User as User/CLI
    participant Cluster as StandaloneCluster
    participant DLAdapter as from_deltalake()
    participant DLAddress as DeltaLakeAddress
    participant Polars as Polars
    participant CrateDB as CrateDB

    User->>Cluster: load_table(source_url, target_url)
    Cluster->>DLAdapter: from_deltalake(source_url, target_url)
    DLAdapter->>DLAddress: DeltaLakeAddress.from_url(source_url)
    DLAddress->>DLAddress: Parse URL & extract options
    DLAddress->>Polars: scan_delta(location, version, storage_options)
    Polars-->>DLAddress: LazyFrame
    DLAdapter->>Polars: load_table() → collect data
    Polars-->>DLAdapter: DataFrame
    DLAdapter->>CrateDB: polars_to_cratedb(batch_size)
    CrateDB-->>DLAdapter: Success
    DLAdapter-->>Cluster: True
    Cluster-->>User: Table loaded
Loading
sequenceDiagram
    participant User as User/CLI
    participant Cluster as StandaloneCluster
    participant DLAdapter as to_deltalake()
    participant DLAddress as DeltaLakeAddress
    participant CrateDB as CrateDB
    participant Polars as Polars
    participant DeltaLake as Delta Lake Storage

    User->>Cluster: save_table(source_url, target_url)
    Cluster->>DLAdapter: to_deltalake(source_url, target_url)
    DLAdapter->>DLAddress: DeltaLakeAddress.from_url(target_url)
    DLAddress->>DLAddress: Parse URL & extract options
    DLAdapter->>CrateDB: read_cratedb(source_url, chunk_size)
    CrateDB-->>DLAdapter: DataFrame chunks
    loop For each chunk
        DLAdapter->>Polars: write_delta(chunk, mode=overwrite/append)
        Polars->>DeltaLake: Write data with mode
        DeltaLake-->>Polars: Success
    end
    Polars-->>DLAdapter: Complete
    DLAdapter-->>Cluster: True
    Cluster-->>User: Table saved
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Poem

🐰 Hopping through Delta Lakes so grand,
CrateDB welcomes a data caravan!
Polars ferry data, left and right,
From clouds to clusters, shining bright! ✨
The toolkit bounds to greater heights! 🚀

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding a Delta Lake adapter to the I/O subsystem.
Description check ✅ Passed The description is directly related to the changeset, explaining the purpose of adding Delta Lake import/export functionality and providing documentation reference.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch deltalake

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@amotl amotl marked this pull request as ready for review February 20, 2026 02:31
coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@amotl amotl force-pushed the deltalake branch 2 times, most recently from 2612f2b to 7600ec0 Compare February 20, 2026 03:22
@amotl amotl requested review from matriv and seut February 20, 2026 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments