According to the docs, the metadata-dataset about every file uploaded to PyPI, i.e. the parquet files listed in https://github.com/pypi-data/data/raw/main/links/dataset.txt, contain a SHA256 hash. However, it is not described how the hash is calculated.
When trying to verify that you calculate the SHA256 over the respective file itself, i encountered some issues:
- your hash is too short for a SHA256, it has the same length as a SHA1 though
- however, when i calculate the SHA1 of a downloaded file, it does not match yours (neither does SHA256)
- two files in your dataset that have the same hash, also have the same SHA1 hash on my end, however, my and your hashes are different
Can you explain, which hash you are using and if you are hashing the contents of the file linked to via the path?
Thank you very much for the awesome dataset!