What is the "hash" in the metadata dataset exacly, are you sure it is SHA256?

According to the [docs](https://py-code.org/datasets#metadata), the metadata-dataset about every file uploaded to PyPI, i.e. the parquet files listed in https://github.com/pypi-data/data/raw/main/links/dataset.txt, contain a **SHA256** hash. However, it is not described how the hash is calculated.

When trying to verify that you calculate the SHA256 over the respective file itself, i encountered some issues:
* your hash is too short for a SHA256, it has the same length as a SHA1 though
* however, when i calculate the SHA1 of a downloaded file, it does not match yours (neither does SHA256)
* two files in your dataset that have the same hash, also have the same SHA1 hash on my end, however, my and your hashes are different

Can you explain, which hash you are using and if you are hashing the contents of the file linked to via the `path`?

Thank you very much for the awesome dataset!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What is the "hash" in the metadata dataset exacly, are you sure it is SHA256? #39

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

What is the "hash" in the metadata dataset exacly, are you sure it is SHA256? #39

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions