Skip to content

Update daily workflow to eliminate double counting for ydata #16

@gsheni

Description

@gsheni

Problem

When calculating downloads for the SDV ecosystem, we take into account the library dependencies. For example, SDV downloads are subtracted from RDT (since SDV depends on RDT). This avoids double counting downloads.

We would like to apply similar logic when calculating downloads for the ydata ecosystem to capture a more accurate picture on the usage of ydata.

This issue is a sub-task for this issue: #14
This issue is blocked until this issue is completed: #13

Packages

The ydata user on PyPI maintains the following packages (whose downloads we currently track):

The ydata user also maintains these packages, whose downloads we do not currently track. The reason we do not track them is listed:

  • ydata-fabric-sdk
    • This has a core dependency on ydata-core, and ydata-datascience
    • This seems to client for the a web-based, UI software.
  • ydata-datascience
  • ydata-core
    • No release since Sept 2024, no recent commits on GitHub
    • This seems to be the exact same package as ydata-datascience. Both the links to GitHub and ReadMe are the same.
  • ydata-profiling
  • ydata-quality
    • no release since Sept 2021, no recent commits on GitHub

Formula for ydata ecosystem download count

  • To calculate the download count for ydata ecosystem:
    • Get the number of downloads for ydata-sdk, ydata-synthetic
    • Adjust the downloads for ydata-sdk by subtracting the number of ydata-synthetic downloads.
      • This is because ydata-synthetic has a core dependency on ydata-sdk and it has the same maintainer.
    • Ensure no download count is negative (ex max(0, ydata_adjusted_count))
    • Sum all downloads to get ydata (ecosystem) download count

Description

  • We will have a daily workflow that summarize the downloads counts for external libraries. This daily workflow should be updated so that the downloads for ydata use the above formula.

Deliverables

  • Change the daily workflow to use the above formula for calculating downloads for ydata (rather than summing the downloads)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions