-
Notifications
You must be signed in to change notification settings - Fork 8
183: add a method to add file info to assets #200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
183: add a method to add file info to assets #200
Conversation
pjhartzell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just a few questions
| "jsonpath_ng>=1.5.3", | ||
| "requests>=2.28.1", | ||
| "s3fs>=2022.8.2", | ||
| "multihash>=0.1.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless there a reason we should use the multihash library, maybe we should look at using a more active library? This one was last released over 10 years ago, and I can't find its source code. The PyPI page does link to multiformats/multihash, and there are some links under the Python heading in the Implementations section that have more recent activity. I've used this one in other projects, but there may be better ones.
| if algorithm not in multihash.NAMES: | ||
| raise ValueError(f"Algorithm '{algorithm}' not supported by multihash.") | ||
|
|
||
| return str(multihash.encode(digest, multihash.NAMES[algorithm]).hex()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hex() method returns a string. Can we safely remove the enclosing str() call?
| except OSError as e: | ||
| self.logger.error( | ||
| "Failed to compute hash for %s: %s", | ||
| path, | ||
| e, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a user's expected—or desired—behavior? If we fail to create a hash, do we want an error to percolate upward or be ignored other than logging an error? I tend to fall on the side of "If I ask a function to do a thing, and something goes wrong, I want the function to fail". Unless, perhaps, I tell the function to "ignore any failures and just log an error message".
|
This PR also closes #154 |
Related Issue(s):
Proposed Changes:
add_fileinfo_to_local_assets(plural) is the expected primary user-facing function - for an Item, derives file size in bytes and checksum for all Assets with a local href.add_fileinfo_to_local_asset(singular) is called byadd_fileinfo_to_local_assetsas it loops through assets: this is (a) the workhorse method that adds the STAC file extension to the Item and metadata fields to each local asset. It supports all file extension fields, but, since most of the fields are not easily calculated on an automated basis, leaves it up to the user to calculate and pass in directly (only size and checksum are automated).compute_multihashcalculates file hash based on sha2-256 (but is flexible and allows for other supported algorithms to be used)._is_local_fileis a supporting method that allows the user-facing methods to add file metadata or not (if the asset href is non-local).PR Checklist:
Use of AI: