Add workflow to run SDGym multi-table benchmark monthly and publish results #518

R-Palazzo · 2025-12-22T17:48:26Z

Resolve #516
CU-86b7w17ze

Run benchmark action (working end-to-end)
Uploading result action (waiting on results to test all the cases; I may create mock ones to test it)

@amontanez24 From what I've experienced, we can currently run 4-gpu machines at the same location:

RuntimeError: GCP instance creation failed: Quota 'NVIDIA_T4_GPUS' exceeded.  Limit: 4.0 in region us-central1.

The initial plan is to have 6 instances running on January 1st (4 for single-table, 2 for multi-table), so I wanted to discuss the options with you:

Run only 4 machines (start with single-table on January 1st and multi-table on January 5th, so machines are available again)
Run 6 machines in a different location, if that's allowed.
Increase the number of machines, if possible.

Maybe there is a better solution that I'm happy to discuss.

sdv-team · 2025-12-22T17:48:31Z

Task linked: CU-86b7w17ze SDGym - Add workflow to run SDGym multi-table benchmark monthly and publish results #516

R-Palazzo · 2025-12-22T17:51:42Z

.github/workflows/run_benchmark_multi_table.yaml

+  workflow_dispatch:
+  schedule:
+    - cron: '0 5 1 * *'
+  push:


This is for testing the workflows, I will remove it before merging

R-Palazzo · 2025-12-22T17:52:20Z

sdgym/run_benchmark/utils.py

-OUTPUT_DESTINATION_AWS = 's3://sdgym-benchmark/Benchmarks/'
-UPLOAD_DESTINATION_AWS = 's3://sdgym-benchmark/Benchmarks/'
+OUTPUT_DESTINATION_AWS = (
+    's3://sdgym-benchmark/Debug/GCP_Github/'  # 's3://sdgym-benchmark/Benchmarks/'


For testing purposes, TODO: Update it before merging

R-Palazzo · 2025-12-22T17:52:29Z

sdgym/run_benchmark/utils.py

+def post_benchmark_launch_message(date_str, compute_service='AWS', modality='single_table'):
    """Post a message to the SDV Alerts Slack channel when the benchmark is launched."""
-    channel = SLACK_CHANNEL
+    channel = DEBUG_SLACK_CHANNEL


For testing purposes, TODO: Update it before merging

R-Palazzo · 2025-12-22T17:52:36Z

sdgym/run_benchmark/utils.py

+def post_benchmark_uploaded_message(folder_name, commit_url=None, modality='single_table'):
    """Post benchmark uploaded message to sdv-alerts slack channel."""
-    channel = SLACK_CHANNEL
+    channel = DEBUG_SLACK_CHANNEL


For testing purposes, TODO: Update it before merging

codecov · 2025-12-22T17:55:01Z

Codecov Report

❌ Patch coverage is 92.06349% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.93%. Comparing base (279a867) to head (7f02321).

Files with missing lines	Patch %	Lines
sdgym/run_benchmark/utils.py	89.65%	3 Missing ⚠️
sdgym/benchmark.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@                               Coverage Diff                                @@
##           issue-515-_benchmark_multi_table_compute_gcp     #518      +/-   ##
================================================================================
+ Coverage                                         78.76%   78.93%   +0.16%     
================================================================================
  Files                                                33       33              
  Lines                                              2793     2825      +32     
================================================================================
+ Hits                                               2200     2230      +30     
- Misses                                              593      595       +2

Flag	Coverage Δ
integration	`54.69% <0.00%> (-0.56%)`	⬇️
unit	`73.45% <92.06%> (+0.16%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

pvk-developer · 2025-12-23T11:59:23Z

sdgym/_benchmark/benchmark.py

    compute_quality_score=True,
    compute_diagnostic_score=True,
-    compute_privacy_score=True,
+    compute_privacy_score=False,


Is this for testing purposes only or it will change to this now ?

For the benchmark we don't compute the privacy_score for now:

SDGym/sdgym/run_benchmark/run_benchmark.py

Line 55 in b8b7b13

compute_privacy_score=False,

pvk-developer · 2025-12-23T12:03:48Z

sdgym/run_benchmark/utils.py

+MODALITY_TO_GDRIVE_LINK = {
+    'single_table': 'https://docs.google.com/spreadsheets/d/1W3tsGOOtbtTw3g0EVE0irLgY_TN_cy2W4ONiZQ57OPo/edit?usp=sharing',
+    'multi_table': 'https://docs.google.com/spreadsheets/d/1R13RktVvKnxRecYIge07OBpbX1vbEkE2D1_2idNAKSY/edit?usp=sharing',
+}


I know this won't change probably, but could we use the id from sdgym/run_benchmark/upload_benchmark_results.py if it doesn't cause some circular dependencies ? Or maybe define them in a constants.py file. That way we should just change the id in one file and not have to worry about it.

if we do this, let's move to constants file

.github/workflows/run_benchmark_multi_table.yaml

sdgym/_benchmark/credentials_utils.py

sdgym/result_explorer/result_explorer.py

amontanez24

LGTM!

sdgym/_benchmark/credentials_utils.py

R-Palazzo requested review from amontanez24 and pvk-developer December 22, 2025 17:48

R-Palazzo self-assigned this Dec 22, 2025

R-Palazzo requested a review from a team as a code owner December 22, 2025 17:48

R-Palazzo removed the request for review from a team December 22, 2025 17:50

R-Palazzo commented Dec 22, 2025

View reviewed changes

R-Palazzo added 13 commits December 23, 2025 12:39

def 516

92b6827

run workflow on push

d1bcad5

test 1

592d0eb

test 2

a03f80c

test 3

a10a828

test 4

b5de39c

test 5

39b8028

test 6

925d971

test 7

280f9ab

test 8

cbcdba3

test 9

59451c3

update upload results

0d7613c

update upload 2

9f10efc

R-Palazzo force-pushed the issue-516-add-workflows branch from 223d2b2 to 9f10efc Compare December 23, 2025 11:48

pvk-developer reviewed Dec 23, 2025

View reviewed changes

R-Palazzo added 5 commits December 23, 2025 14:24

address comments

38922c1

lazy import

cfa1c2e

remove task for installing sdv_enterprise

fec2c0c

use invoke install-sdv-enterprise

038584e

clean runnning part

562d2d1

R-Palazzo commented Dec 23, 2025

View reviewed changes

.github/workflows/run_benchmark_multi_table.yaml Outdated Show resolved Hide resolved

fix tests

5b162ff

add trigger on push

83356ae

amontanez24 reviewed Dec 23, 2025

View reviewed changes

.github/workflows/run_benchmark_multi_table.yaml Show resolved Hide resolved

sdgym/_benchmark/credentials_utils.py Outdated Show resolved Hide resolved

sdgym/result_explorer/result_explorer.py Outdated Show resolved Hide resolved

R-Palazzo added 3 commits December 23, 2025 18:18

address comments

f74afde

test single-table

375f773

test launching single and multi-table together

7f02321

R-Palazzo requested review from amontanez24 and pvk-developer December 23, 2025 17:34

amontanez24 approved these changes Dec 23, 2025

View reviewed changes

sdgym/_benchmark/credentials_utils.py Show resolved Hide resolved

pvk-developer approved these changes Dec 29, 2025

View reviewed changes

Add workflow to run SDGym multi-table benchmark monthly and publish results #518

Are you sure you want to change the base?

Add workflow to run SDGym multi-table benchmark monthly and publish results #518

Uh oh!

Conversation

R-Palazzo commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sdv-team commented Dec 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amontanez24 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

R-Palazzo commented Dec 22, 2025 •

edited

Loading

codecov bot commented Dec 22, 2025 •

edited

Loading