-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Context
When calculating downloads for the SDV ecosystem, we take into account the library dependencies. For example, SDV downloads are subtracted from RDT (since SDV depends on RDT). This avoids double counting downloads. See below formula for this calculation.
This download adjustment needs to be extended to other ecosystem that maintain multiple related packages on PyPI. This will allow us to get an accurate picture of usage of other ecosystems.
Formula for SDV Ecosystem Download Count
- To calculate the download count for SDV ecosystem:
- Get the number of downloads for
sdgymandsdv - Adjust the number of downloads
sdvby subtracting the number ofsdgymdownload - Get the number of downloads for
rdt,copulas,ctgan,deepecho,sdmetrics - Adjust the downloads for
rdt,copulas,ctgan,deepecho,sdmetricsby subtracting the number ofsdvdownloads.- These libraries are all direct dependencies of SDV.
- Ensure no download count is negative (ex
max(0, copulas_adjusted_count)) - Sum all downloads to get SDV (ecosystem) download count
- Get the number of downloads for
- A Colab example.
Problem
Download calculations for external library ecosystems currently don't account for their internal dependencies, leading to inflated download numbers. We need a system to:
- Identify all libraries within each ecosystem
- Map their interdependencies
- Apply a similar dependency-aware calculation used for SDV ecosystem.
- Keep this information current as ecosystems evolve (external libraries will add/remove dependencies)
Description
When generating the Summary of downloads, the download counts should be adjusted for Gretel, ydata, and mostly. If future libraries are added, the dependencies of these libraries should be identified and taken into account.
Ecosystem Definition
A library is considered part of an ecosystem if:
- It shares the same maintainers with other packages, AND
- It has internal dependencies within the ecosystem
Deliverables
- In the daily workflow, determine the libraries that are in the ecosystem.
- For each ecosystem, adjust the download numbers when populating the Summary of Downloads
- Use the above calculation as an example for how to adjust the downloads
- Ensure ecosystem downloads are clearly marked in the Google Sheet (by (ecosystem) in the name)
Tools to get dependencies
- pipgrip
- This requires the library to not be installed (ideal)
- PyPI API (look at PyPI users via API, gretal-ai, ydata )
- Manually define it in a YAML file?
- pipdeptree
- This requires the library to be installed (not ideal)