Skip to content

Official support for the Apache Parquet format #410

@kazuakiyama

Description

@kazuakiyama

I'm a radio astronomer interested in using this Julia-native implementation of the Apache Arrow in-memory format for black hole imaging with the Event Horizon Telescope. First of all, thanks for developing this package! We get interested in this package because the Apache Arrow and Parquet formats have been considered as a major candidate for the next generation radio astronomy data format.

I'm wondering if the package envisions implementing IO functions of the Apache Parquet format in the future. I read a previous issue regarding this topic. I believe that no method is yet available to directly load/write columnar data in Parquest file into the Arrow.jl's in-memory data ---- the only way to handle this in a pure Julia way seems to be converting disk-based data into the one in the Apache IPC format by using both Parquet.jl and Arrow.jl, and then reloading it into memory using Arrow.jl.

This seems to be a bit problematic for our use case appearing as a major issue preventing us from using this package and apache's columnar formats in Julia. I think the key issues are

Given a lot of similarities and cross sections between the specifications of the Apache Parquet and Arrow formats, I feel it is more straightforward to request the IO features of Parquet formats in Arrow.jl rather than request some missing features to the existing Julia Parquet packages. Any thoughts on this are appreciated. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions