Add Cosmos DB benchmark agent with workflow-aligned skills by xinlian12 · Pull Request #18 · xinlian12/azure-sdk-for-java

xinlian12 · 2026-02-27T22:02:41Z

Summary

Add a Copilot agent and 5 workflow-aligned skills for running Cosmos DB benchmark and DR drill workflows.

Benchmark Infrastructure

8 shell/Python scripts for VM provisioning, setup, execution, monitoring, diagnostics, dashboards
Java changes to BenchmarkConfig, BenchmarkOrchestrator, TenantWorkloadConfig for multi-tenant orchestration
.gitignore updates and test scaffolding (no real credentials)

Agent + 5 Skills

Skill	Purpose
provision	Create/reuse Cosmos DB accounts, App Insights, Azure VMs
setup	Install tools, clone repo (branch/PR/commit/tag), build in tmux
run	CHURN preset, multi-VM parallel, auto App Insights config
analyze	CSV analysis, run comparison, heap/thread dumps, Kusto export
status	Resource health, run overview, App Insights verification

Skill-creator utility for authoring new skills
Reference docs: VM sizing, 20 operation types, presets, thresholds, Kusto schema

Design Decisions

Single vs multi-tenant is config only (CLI flags vs tenants.json)
CHURN only preset for now (extensible later)
Always tmux for remote operations
App Insights KQL queries are placeholder

37 files changed | No secrets committed

* Using latest spec * Updates to README and Changelog agents * Updated projects * Diff addressed

@glecaros

…ects (Azure#48086) Add dedicated entries with @glecaros and @kaylieee as additional owners. Also update azure-ai-agents-persistent to include the new owners.

* bump (Azure#48138) * Increment package versions for containerinstance releases (Azure#48139) * mgmt, prepare 2.60.0 (Azure#48140) * bump * update containerregistry * make containerregistry unreleased * update dependency * changelog * Increment package versions for resourcemanager releases * update containerregistry to 2.55.0-beta.2 * changelog --------- Co-authored-by: Xiaofei Cao <92354331+XiaofeiCao@users.noreply.github.com> Co-authored-by: xiaofeicao <xiaofeicao@microsoft.com>

…zure#48039) * Move azure-ai-voicelive SDK from sdk/ai to sdk/voicelive Moved the azure-ai-voicelive library to its own service directory to decouple its CI pipeline from the other AI libraries. Changes: - Moved sdk/ai/azure-ai-voicelive/ to sdk/voicelive/azure-ai-voicelive/ - Created sdk/voicelive/ci.yml with ServiceDirectory: voicelive - Created sdk/voicelive/pom.xml aggregator POM - Removed voicelive references from sdk/ai/ci.yml and sdk/ai/pom.xml * update codeowner * update typespec * update typespec * update cspell * fix test path * update metadata --------- Co-authored-by: Xiting Zhang <xitzhang@microsoft.com>

…applicable (Azure#47822) * Updated `CryptographyClientImpl` to return a versioned `keyId` where applicable * Updated CHANGELOG * Updated tests * Fixed formatting * Applied PR feedback

) Replace workspaceFolder with relative path

…e#48143) * Reverted explicit inclusion of Brotli library for Key Vault JCA * Added missing word to .cspell

…n-public clouds (Azure#48137) * Allowing audience/scope for management endpoint to be specified in non-public clouds * Update pom.xml * Changelogs * Update pom.xml

…-from-SDK Generation - Java-5919241 (Azure#48089)

* Fix java-spring-cloud-azure-starter-monitor-tests failure * Add tools/linting-extensions in sparse checkout * Add .vscode in sparse checkout

Add benchmark shell scripts for VM provisioning, setup, execution, monitoring, diagnostics capture, and dashboard generation. Update BenchmarkConfig, BenchmarkOrchestrator, and TenantWorkloadConfig to support multi-tenant benchmark orchestration with per-tenant configuration overrides. Add .gitignore entries for benchmark artifacts and Copilot skills. Add test-setup and test-results directory scaffolding with READMEs and a sample tenants.json template (no real credentials). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Agent routing file dispatches to 5 skills covering the full benchmark/DR drill lifecycle: - provision: Cosmos DB accounts, App Insights, Azure VMs - setup: JDK/Maven install, repo clone, config generation, build - run: CHURN preset execution, multi-VM parallel, App Insights config - analyze: CSV metrics, run comparison, heap/thread dumps, Kusto export - status: resource health, run overview, App Insights verification Also includes skill-creator utility for authoring new skills. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…8145) Co-authored-by: Xiting Zhang <xitzhang@microsoft.com>

…re#48163) Co-authored-by: Xiting Zhang <xitzhang@microsoft.com>

…ate runtime config Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Consolidate the benchmark agent from 5 skills down to 3, with deterministic script-driven flows replacing inline commands. Skills: - setup-resources: provision Azure infra (Cosmos DB, App Insights, VM) with parallel creation, capacity validation, region fallback, and verification gate - run: clone/build/verify/execute benchmarks via single SSH session per ref, supports multiple refs for comparison, SIMPLE/EXPAND/CHURN presets - analyze: download results to config-dir/results, generate markdown report with time-series SVG charts and multi-run comparison tables Key changes: - Rename provision -> setup-resources, merge setup into run, remove status - .github/skills and .github/agents use symlinks to copilot/ (single source) - Default region westus2, resource group rg-cosmos-benchmark-YYYYMMDD - Config directory prompted with credential-in-repo warning - provision-all.sh orchestrates parallel resource creation + verification - vm-prepare-and-run.sh consolidates checkout/build/verify/run in 1 SSH session - run-all-refs.sh loops over user-provided refs with per-ref result directories - generate-report.py reads monitor.csv + metrics/*.csv, outputs report.md - Remove parse_hprof.py, kusto-schema.md, generate-dashboard.py (deferred) - Remove trigger-benchmark.sh (superseded by vm-prepare-and-run.sh) - Merge setup-benchmark-vm.sh into provision-benchmark-vm.sh Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add timestamped progress logging to validate-capacity.sh - Fix restriction detection to handle all types (Zone, NotAvailableForSubscription) - Replace slow per-SKU API calls with single-call alternative SKU search - Add --find-alternatives flag to control similar SKU search - Add restriction_reason field to JSON output - Derive quota family dynamically from effective SKU - Add --fallback-regions flag to find-region.sh for user-specified regions - Implement 4-phase search: preferred exact → preferred similar → fallback exact → fallback similar - Add [N/M] progress updates printed as each region completes - Add --stop-on-first flag (default: true) - Fix integration bugs: JSON path, exit code logic, stdin-based parsing - Update SKILL.md to document new flags and search strategy Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add capacity validation step before resource creation that blocks unless all checks pass (VM SKU, quota, Cosmos DB, App Insights) - Add --skip-capacity-check flag to override the gate - Add timestamped log() function for all progress messages - Add elapsed time tracking per resource and total provisioning time - Fix JSON parsing to match validate-capacity.sh output format - Update SKILL.md to document new behavior and flag Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Wrap benchmark execution in tmux session ('bench') on VM so the process survives SSH disconnections - Add async execution guidance to SKILL.md so the agent runs the orchestrator in background mode, keeping the user's context free - Use scenario-based poll intervals (2min for SIMPLE, 5min for EXPAND/CHURN) instead of 10s fixed polling - Expand monitoring section with local and VM-side status checks Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Detect refs like 'xinlian12/branchName' by checking if the part before the first slash matches an existing git remote - If remote exists, fetch from that remote; otherwise treat the slash as part of the branch name on origin - Document fork branch format in SKILL.md ref examples Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Instruct agent to proactively verify the run is progressing after async launch — if the shell exits too quickly, investigate - Add diagnosis steps: check results dirs, git state, JAR, tmux - Document common failures table (checkout, build, startup, SSH) - Require confirming with user before relaunching after a failure Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- New script checks tmux session, results directories (with per-run status), git state, build status, and optionally system resources - Supports --run-name for run-specific details (monitor samples, metrics, disk usage) and --verbose for system resource info - Updated SKILL.md to reference check-status.sh in monitoring and troubleshooting sections - Fix SSH stdin consumption in while-read loop with -n flag Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- SCP vm-prepare-and-run.sh, run-benchmark.sh, monitor.sh, and capture-diagnostics.sh to ~/benchmark-scripts/ on the VM - Execute remotely via 'bash ~/benchmark-scripts/vm-prepare-and-run.sh' instead of 'bash -s' stdin piping which broke heredocs - Update vm-prepare-and-run.sh to reference co-located scripts from ~/benchmark-scripts/ in the tmux run script Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- run-all-refs.sh now only SCPs vm-prepare-and-run.sh (the bootstrapper) instead of all 4 scripts - After checkout, vm-prepare-and-run.sh resolves scripts from the cloned repo (copilot/skills/.../scripts/) so they match the ref being benchmarked - Falls back to ~/benchmark-scripts/ if the repo doesn't include the scripts yet (e.g., older branches) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- run-all-refs.sh: --force-copy-scripts copies ALL scripts to VM (not just the bootstrapper) and passes --force-scripts to the bootstrapper - vm-prepare-and-run.sh: --force-scripts overrides repo-first resolution, using ~/benchmark-scripts/ (the SCP'd copies) instead - Default behavior unchanged: repo scripts used after checkout Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- run-all-refs.sh now starts vm-prepare-and-run.sh inside a tmux session, so checkout, build, verify AND run all survive SSH disconnection - vm-prepare-and-run.sh Step 4 simplified: runs run-benchmark.sh directly (no nested tmux, no .run.sh heredoc generation) - Polling and exit code logic moved to run-all-refs.sh orchestrator Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Write a small /tmp/bench-launch.sh on the VM that wraps vm-prepare-and-run.sh and writes the exit code - Avoids nested quoting issues (SSH -> tmux -> bash -> args) - Fix stale EXIT_CODE_FILE variable reference Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Use $HOME instead of ~ in double-quoted string to ensure correct path expansion when interpolated into SSH commands. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When run-benchmark.sh is executed from ~/benchmark-scripts/ (SCP'd copy), SCRIPT_DIR/../ doesn't point to the benchmark module. Fall back to PWD if the script's parent doesn't contain a target/ dir. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…gDirectory - --tenantsFile -> -tenantsFile (JCommander uses single dash) - Remove --scenario and --outputDir (not valid Configuration params) - Add -reportingDirectory for CSV metrics output Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace fire-and-forget async launch with a two-step workflow: Step A: Launch orchestrator with sync mode (initial_wait: 60) Step B: Mandatory verify via check-status.sh within 90s Prevents the agent from telling the user 'it's running' without actually confirming tmux is alive and results directory exists. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Previously, createBenchmarks() initialized Cosmos clients sequentially in a for loop. With 50 tenants, each taking ~10-15s (connect + create DB/container + populate docs), initialization alone took ~8-10 minutes. Now submits all tenant initializations to the existing ExecutorService in parallel, collecting results via Future.get(). With 50 tenants on a 50-thread pool, initialization completes in ~15-20s instead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

This reverts commit 3639600.

Report changes: - Overlay all runs on one chart per metric (threads, heap, FDs, etc.) - White background, monospace font, grid lines, color legend - Remove pass/fail threshold verdicts (to be decided later) - Use short run labels from run-name instead of branch/commit Run skill changes: - Agent must proactively poll for completion and notify user immediately - Do not wait for user to ask 'peek' to discover run has completed Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Instead of separate Comparison + per-run JVM Metrics tables, now generates one table with baseline/peak/final sub-columns per run. Throughput also merged into a single table with count/mean/1m-rate per run. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

azure-sdk and others added 22 commits February 25, 2026 22:42

Increment package versions for identity releases (Azure#48129)

42c87e9

bump (Azure#48133)

1e7818c

Fix error about test by docker (Azure#48105)

8f82f8c

Increment package versions for compute releases (Azure#48136)

b6b42fb

Increment package versions for cosmos releases (Azure#48130)

da4aa5e

Latest spec changes and fix for openai-tsp 1.11.0 (Azure#48135)

b298aa4

* Using latest spec * Updates to README and Changelog agents * Updated projects * Diff addressed

Add explicit CODEOWNERS entries for azure-ai-agents and azure-ai-proj…

259d3f4

…ects (Azure#48086) Add dedicated entries with @glecaros and @kaylieee as additional owners. Also update azure-ai-agents-persistent to include the new owners.

Updated CryptographyClientImpl to return a versioned keyId where …

debbb5b

…applicable (Azure#47822) * Updated `CryptographyClientImpl` to return a versioned `keyId` where applicable * Updated CHANGELOG * Updated tests * Fixed formatting * Applied PR feedback

Use relative path in .vscode/mcp.json for azure-sdk-mcp.ps1 (Azure#48147

49bee32

) Replace workspaceFolder with relative path

Reverted explicit inclusion of Brotli library for Key Vault JCA (Azur…

680d5b0

…e#48143) * Reverted explicit inclusion of Brotli library for Key Vault JCA * Added missing word to .cspell

Allowing audience/scope for management endpoint to be specified in no…

411c33f

…n-public clouds (Azure#48137) * Allowing audience/scope for management endpoint to be specified in non-public clouds * Update pom.xml * Changelogs * Update pom.xml

[AutoPR azure-resourcemanager-servicefabricmanagedclusters]-generated…

347a9dc

…-from-SDK Generation - Java-5919241 (Azure#48089)

Fix java-spring-cloud-azure-starter-monitor-tests failure (Azure#48148)

f032f0c

* Fix java-spring-cloud-azure-starter-monitor-tests failure * Add tools/linting-extensions in sparse checkout * Add .vscode in sparse checkout

[VoiceLive] Update default API version to 2026-01-01-preview (Azure#4…

4e29994

…8145) Co-authored-by: Xiting Zhang <xitzhang@microsoft.com>

Sync eng/common directory with azure-sdk-tools repository (Azure#48162)

4f8ccdc

[VoiceLive] Add interim response configuration to AgentV2 sample (Azu…

81d3b4b

…re#48163) Co-authored-by: Xiting Zhang <xitzhang@microsoft.com>

Merge branch 'main' into cosmos-benchmark-agent

61a2589

Reorganize benchmark copilot: co-locate scripts with skills, consolid…

c43f5b6

…ate runtime config Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

xinlian12 force-pushed the cosmos-benchmark-agent branch from bf46a2f to c43f5b6 Compare February 27, 2026 22:50

Annie Liang and others added 7 commits March 2, 2026 10:31

Annie Liang and others added 14 commits March 2, 2026 21:05

Fix tilde expansion in BENCH_DIR_VM path

b538c48

Use $HOME instead of ~ in double-quoted string to ensure correct path expansion when interpolated into SSH commands. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fix benchmark main class to com.azure.cosmos.benchmark.Main

33bf001

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Revert "Parallelize benchmark client creation across tenants"

aa147ec

This reverts commit 3639600.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Cosmos DB benchmark agent with workflow-aligned skills#18

Add Cosmos DB benchmark agent with workflow-aligned skills#18
xinlian12 wants to merge 43 commits intomainfrom
cosmos-benchmark-agent

xinlian12 commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

xinlian12 commented Feb 27, 2026

Summary

Benchmark Infrastructure

Agent + 5 Skills

Design Decisions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants