Skip to content

feat: replace s3_client with object_store to add gcs support#12

Draft
nadim-az wants to merge 2 commits intodanielbeach:mainfrom
nadim-az:feat/add-gcs-support
Draft

feat: replace s3_client with object_store to add gcs support#12
nadim-az wants to merge 2 commits intodanielbeach:mainfrom
nadim-az:feat/add-gcs-support

Conversation

@nadim-az
Copy link

@nadim-az nadim-az commented Oct 21, 2025

This PR adds support for gcs by replacing the AWS SDK with object_store, while still keeping the same existing s3 client interface functions (list_objects, get_object, etc.). This makes it super easy in the future to add Azure blobs support or any other storage backend supported by object_store

Core changes made:

  • Replaced AWS SDK with object_store crate
  • Updated Py03 to latest version
  • Deletes duplicate unit tests in *_test.rs files

The only breaking change introduced is replacing s3_path with storage_path in the analyze functions

@nadim-az nadim-az marked this pull request as draft October 21, 2025 19:32
@@ -1,419 +0,0 @@
#[cfg(test)]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the separate unit test files because they seemed to duplicate the inline tests. Totally open to reverting if I missed a reason to keep both - let me know! 😄

@nadim-az
Copy link
Author

nadim-az commented Oct 21, 2025

Still need to run some perf tests to see if replacing the AWS SDK with object_store introduces any overhead. Will post results here and compare to the current estimates posted in the readme for S3 and GCS

Copy link

@andrei-ionescu andrei-ionescu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to make this even more generic, adding support for all major cloud storage providers.

Comment on lines +28 to +33
// AWS credentials (for s3:// URLs)
aws_access_key_id: Option<String>,
aws_secret_access_key: Option<String>,
aws_region: Option<String>,
// GCS credentials (for gs:// URLs)
gcs_service_account_key: Option<String>,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be made more generic? Having these AWS and GCS variables burned in there makes this hard to be used with other cloud storage. I would suggest using something sort of a HashMap with key values that will be passed down to table readers. Delta-Rs does support this, see try_from_uri_with_storage_optionshttps://github.com/delta-io/delta-rs/blob/main/crates/core/src/operations/mod.rs#L166.


Arc::new(builder.build()?)
}
_ => {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any support for other cloud providers like Azure Data Lake?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants