Skip to content

FFI for Arrow C Stream Interface #1348

@wjones127

Description

@wjones127

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Enable receiving/sending a stream of Record Batches from/to another Arrow implementation. For example, datafusion-contrib/datafusion-python#21 could benefit from a way to import a RecordBatchReader into Rust so it can be used by DataFusion.

Describe the solution you'd like

It might be worth implementing the Arrow C Stream interface, which allows exporting a stream of record batches. This could enable PyArrow conversion between a PyArrow RecordBatchReader and some structure on the Rust side (an iterator of Record Batches?).

Describe alternatives you've considered

We can use FFI to bring over record batches already. In datafusion-contrib/datafusion-python#21 , I experimented with just wrapping a Python iterator and moving each batch individually, but encountered some issues with deadlocks in the Python GIL.

Additional context

The Arrow C Stream interface was introduced in August 2020, in apache/arrow#8052. It's been used so far to enable sending record batch streams to DuckDB from the R and Python implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrowChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changeloghelp wanted

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions