Skip to content

Conversation

@samimshoaib01
Copy link
Contributor

The function:

Accepts either a single Parquet file or a directory

Recursively discovers .parquet files when given a directory

Reads each file using the existing readParquet

Vertically merges the results using the existing DataFrame Semigroup / Monoid instance

The existing readParquet behavior is unchanged.

readParquetFiles is re-exported from DataFrame so it is available as D.readParquetFiles.

Performance considerations

The implementation relies on existing DataFrame merge semantics (mconcat) and performs a recursive filesystem traversal for file discovery. No changes were made to Parquet decoding or in-memory column handling.

Testing

Manually tested by reading a partitioned dataset stored as nested directories of Parquet files.

If there is something Which i am missing kindly mention and all suggestions are welcom.

@samimshoaib01
Copy link
Contributor Author

Kindly check if its alright

@mchav mchav merged commit 74565ef into mchav:main Jan 19, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants