feat: replace s3_client with object_store to add gcs support#12
feat: replace s3_client with object_store to add gcs support#12nadim-az wants to merge 2 commits intodanielbeach:mainfrom
Conversation
| @@ -1,419 +0,0 @@ | |||
| #[cfg(test)] | |||
There was a problem hiding this comment.
I removed the separate unit test files because they seemed to duplicate the inline tests. Totally open to reverting if I missed a reason to keep both - let me know! 😄
|
Still need to run some perf tests to see if replacing the AWS SDK with object_store introduces any overhead. Will post results here and compare to the current estimates posted in the readme for S3 and GCS |
andrei-ionescu
left a comment
There was a problem hiding this comment.
I would suggest to make this even more generic, adding support for all major cloud storage providers.
| // AWS credentials (for s3:// URLs) | ||
| aws_access_key_id: Option<String>, | ||
| aws_secret_access_key: Option<String>, | ||
| aws_region: Option<String>, | ||
| // GCS credentials (for gs:// URLs) | ||
| gcs_service_account_key: Option<String>, |
There was a problem hiding this comment.
Can this be made more generic? Having these AWS and GCS variables burned in there makes this hard to be used with other cloud storage. I would suggest using something sort of a HashMap with key values that will be passed down to table readers. Delta-Rs does support this, see try_from_uri_with_storage_options — https://github.com/delta-io/delta-rs/blob/main/crates/core/src/operations/mod.rs#L166.
|
|
||
| Arc::new(builder.build()?) | ||
| } | ||
| _ => { |
There was a problem hiding this comment.
Any support for other cloud providers like Azure Data Lake?
This PR adds support for gcs by replacing the AWS SDK with object_store, while still keeping the same existing s3 client interface functions (list_objects, get_object, etc.). This makes it super easy in the future to add Azure blobs support or any other storage backend supported by object_store
Core changes made:
*_test.rsfilesThe only breaking change introduced is replacing
s3_pathwithstorage_pathin the analyze functions