Skip to content

Add Cosmos DB benchmark agent with workflow-aligned skills#18

Open
xinlian12 wants to merge 43 commits intomainfrom
cosmos-benchmark-agent
Open

Add Cosmos DB benchmark agent with workflow-aligned skills#18
xinlian12 wants to merge 43 commits intomainfrom
cosmos-benchmark-agent

Conversation

@xinlian12
Copy link
Owner

Summary

Add a Copilot agent and 5 workflow-aligned skills for running Cosmos DB benchmark and DR drill workflows.

Benchmark Infrastructure

  • 8 shell/Python scripts for VM provisioning, setup, execution, monitoring, diagnostics, dashboards
  • Java changes to BenchmarkConfig, BenchmarkOrchestrator, TenantWorkloadConfig for multi-tenant orchestration
  • .gitignore updates and test scaffolding (no real credentials)

Agent + 5 Skills

Skill Purpose
provision Create/reuse Cosmos DB accounts, App Insights, Azure VMs
setup Install tools, clone repo (branch/PR/commit/tag), build in tmux
run CHURN preset, multi-VM parallel, auto App Insights config
analyze CSV analysis, run comparison, heap/thread dumps, Kusto export
status Resource health, run overview, App Insights verification
  • Skill-creator utility for authoring new skills
  • Reference docs: VM sizing, 20 operation types, presets, thresholds, Kusto schema

Design Decisions

  • Single vs multi-tenant is config only (CLI flags vs tenants.json)
  • CHURN only preset for now (extensible later)
  • Always tmux for remote operations
  • App Insights KQL queries are placeholder

37 files changed | No secrets committed

azure-sdk and others added 22 commits February 25, 2026 22:42
* Using latest spec

* Updates to README and Changelog agents

* Updated projects

* Diff addressed
…ects (Azure#48086)

Add dedicated entries with @glecaros and @kaylieee as additional owners.
Also update azure-ai-agents-persistent to include the new owners.
* bump (Azure#48138)

* Increment package versions for containerinstance releases (Azure#48139)

* mgmt, prepare 2.60.0 (Azure#48140)

* bump

* update containerregistry

* make containerregistry unreleased

* update dependency

* changelog

* Increment package versions for resourcemanager releases

* update containerregistry to 2.55.0-beta.2

* changelog

---------

Co-authored-by: Xiaofei Cao <92354331+XiaofeiCao@users.noreply.github.com>
Co-authored-by: xiaofeicao <xiaofeicao@microsoft.com>
…zure#48039)

* Move azure-ai-voicelive SDK from sdk/ai to sdk/voicelive

Moved the azure-ai-voicelive library to its own service directory to
decouple its CI pipeline from the other AI libraries.

Changes:
- Moved sdk/ai/azure-ai-voicelive/ to sdk/voicelive/azure-ai-voicelive/
- Created sdk/voicelive/ci.yml with ServiceDirectory: voicelive
- Created sdk/voicelive/pom.xml aggregator POM
- Removed voicelive references from sdk/ai/ci.yml and sdk/ai/pom.xml

* update codeowner

* update typespec

* update typespec

* update cspell

* fix test path

* update metadata

---------

Co-authored-by: Xiting Zhang <xitzhang@microsoft.com>
…applicable (Azure#47822)

* Updated `CryptographyClientImpl` to return a versioned `keyId` where applicable

* Updated CHANGELOG

* Updated tests

* Fixed formatting

* Applied PR feedback
…e#48143)

* Reverted explicit inclusion of Brotli library for Key Vault JCA

* Added missing word to .cspell
…n-public clouds (Azure#48137)

* Allowing audience/scope for management endpoint to be specified in non-public clouds

* Update pom.xml

* Changelogs

* Update pom.xml
* Fix java-spring-cloud-azure-starter-monitor-tests failure

* Add tools/linting-extensions in sparse checkout

* Add .vscode in sparse checkout
Add benchmark shell scripts for VM provisioning, setup, execution,
monitoring, diagnostics capture, and dashboard generation.

Update BenchmarkConfig, BenchmarkOrchestrator, and TenantWorkloadConfig
to support multi-tenant benchmark orchestration with per-tenant
configuration overrides.

Add .gitignore entries for benchmark artifacts and Copilot skills.
Add test-setup and test-results directory scaffolding with READMEs
and a sample tenants.json template (no real credentials).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Agent routing file dispatches to 5 skills covering the full
benchmark/DR drill lifecycle:

- provision: Cosmos DB accounts, App Insights, Azure VMs
- setup: JDK/Maven install, repo clone, config generation, build
- run: CHURN preset execution, multi-VM parallel, App Insights config
- analyze: CSV metrics, run comparison, heap/thread dumps, Kusto export
- status: resource health, run overview, App Insights verification

Also includes skill-creator utility for authoring new skills.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…8145)

Co-authored-by: Xiting Zhang <xitzhang@microsoft.com>
…re#48163)

Co-authored-by: Xiting Zhang <xitzhang@microsoft.com>
…ate runtime config

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@xinlian12 xinlian12 force-pushed the cosmos-benchmark-agent branch from bf46a2f to c43f5b6 Compare February 27, 2026 22:50
Annie Liang and others added 7 commits March 2, 2026 10:31
Consolidate the benchmark agent from 5 skills down to 3, with deterministic
script-driven flows replacing inline commands.

Skills:
- setup-resources: provision Azure infra (Cosmos DB, App Insights, VM) with
  parallel creation, capacity validation, region fallback, and verification gate
- run: clone/build/verify/execute benchmarks via single SSH session per ref,
  supports multiple refs for comparison, SIMPLE/EXPAND/CHURN presets
- analyze: download results to config-dir/results, generate markdown report
  with time-series SVG charts and multi-run comparison tables

Key changes:
- Rename provision -> setup-resources, merge setup into run, remove status
- .github/skills and .github/agents use symlinks to copilot/ (single source)
- Default region westus2, resource group rg-cosmos-benchmark-YYYYMMDD
- Config directory prompted with credential-in-repo warning
- provision-all.sh orchestrates parallel resource creation + verification
- vm-prepare-and-run.sh consolidates checkout/build/verify/run in 1 SSH session
- run-all-refs.sh loops over user-provided refs with per-ref result directories
- generate-report.py reads monitor.csv + metrics/*.csv, outputs report.md
- Remove parse_hprof.py, kusto-schema.md, generate-dashboard.py (deferred)
- Remove trigger-benchmark.sh (superseded by vm-prepare-and-run.sh)
- Merge setup-benchmark-vm.sh into provision-benchmark-vm.sh

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add timestamped progress logging to validate-capacity.sh
- Fix restriction detection to handle all types (Zone, NotAvailableForSubscription)
- Replace slow per-SKU API calls with single-call alternative SKU search
- Add --find-alternatives flag to control similar SKU search
- Add restriction_reason field to JSON output
- Derive quota family dynamically from effective SKU

- Add --fallback-regions flag to find-region.sh for user-specified regions
- Implement 4-phase search: preferred exact → preferred similar → fallback exact → fallback similar
- Add [N/M] progress updates printed as each region completes
- Add --stop-on-first flag (default: true)
- Fix integration bugs: JSON path, exit code logic, stdin-based parsing

- Update SKILL.md to document new flags and search strategy

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add capacity validation step before resource creation that blocks
  unless all checks pass (VM SKU, quota, Cosmos DB, App Insights)
- Add --skip-capacity-check flag to override the gate
- Add timestamped log() function for all progress messages
- Add elapsed time tracking per resource and total provisioning time
- Fix JSON parsing to match validate-capacity.sh output format
- Update SKILL.md to document new behavior and flag

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Wrap benchmark execution in tmux session ('bench') on VM so the
  process survives SSH disconnections
- Add async execution guidance to SKILL.md so the agent runs the
  orchestrator in background mode, keeping the user's context free
- Use scenario-based poll intervals (2min for SIMPLE, 5min for
  EXPAND/CHURN) instead of 10s fixed polling
- Expand monitoring section with local and VM-side status checks

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Detect refs like 'xinlian12/branchName' by checking if the part
  before the first slash matches an existing git remote
- If remote exists, fetch from that remote; otherwise treat the
  slash as part of the branch name on origin
- Document fork branch format in SKILL.md ref examples

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Instruct agent to proactively verify the run is progressing after
  async launch — if the shell exits too quickly, investigate
- Add diagnosis steps: check results dirs, git state, JAR, tmux
- Document common failures table (checkout, build, startup, SSH)
- Require confirming with user before relaunching after a failure

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- New script checks tmux session, results directories (with per-run
  status), git state, build status, and optionally system resources
- Supports --run-name for run-specific details (monitor samples,
  metrics, disk usage) and --verbose for system resource info
- Updated SKILL.md to reference check-status.sh in monitoring and
  troubleshooting sections
- Fix SSH stdin consumption in while-read loop with -n flag

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Annie Liang and others added 14 commits March 2, 2026 21:05
- SCP vm-prepare-and-run.sh, run-benchmark.sh, monitor.sh, and
  capture-diagnostics.sh to ~/benchmark-scripts/ on the VM
- Execute remotely via 'bash ~/benchmark-scripts/vm-prepare-and-run.sh'
  instead of 'bash -s' stdin piping which broke heredocs
- Update vm-prepare-and-run.sh to reference co-located scripts from
  ~/benchmark-scripts/ in the tmux run script

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- run-all-refs.sh now only SCPs vm-prepare-and-run.sh (the bootstrapper)
  instead of all 4 scripts
- After checkout, vm-prepare-and-run.sh resolves scripts from the
  cloned repo (copilot/skills/.../scripts/) so they match the ref
  being benchmarked
- Falls back to ~/benchmark-scripts/ if the repo doesn't include
  the scripts yet (e.g., older branches)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- run-all-refs.sh: --force-copy-scripts copies ALL scripts to VM
  (not just the bootstrapper) and passes --force-scripts to the
  bootstrapper
- vm-prepare-and-run.sh: --force-scripts overrides repo-first
  resolution, using ~/benchmark-scripts/ (the SCP'd copies) instead
- Default behavior unchanged: repo scripts used after checkout

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- run-all-refs.sh now starts vm-prepare-and-run.sh inside a tmux
  session, so checkout, build, verify AND run all survive SSH
  disconnection
- vm-prepare-and-run.sh Step 4 simplified: runs run-benchmark.sh
  directly (no nested tmux, no .run.sh heredoc generation)
- Polling and exit code logic moved to run-all-refs.sh orchestrator

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Write a small /tmp/bench-launch.sh on the VM that wraps
  vm-prepare-and-run.sh and writes the exit code
- Avoids nested quoting issues (SSH -> tmux -> bash -> args)
- Fix stale EXIT_CODE_FILE variable reference

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use $HOME instead of ~ in double-quoted string to ensure correct
path expansion when interpolated into SSH commands.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When run-benchmark.sh is executed from ~/benchmark-scripts/ (SCP'd
copy), SCRIPT_DIR/../ doesn't point to the benchmark module. Fall
back to PWD if the script's parent doesn't contain a target/ dir.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…gDirectory

- --tenantsFile -> -tenantsFile (JCommander uses single dash)
- Remove --scenario and --outputDir (not valid Configuration params)
- Add -reportingDirectory for CSV metrics output

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace fire-and-forget async launch with a two-step workflow:
Step A: Launch orchestrator with sync mode (initial_wait: 60)
Step B: Mandatory verify via check-status.sh within 90s

Prevents the agent from telling the user 'it's running' without
actually confirming tmux is alive and results directory exists.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previously, createBenchmarks() initialized Cosmos clients sequentially
in a for loop. With 50 tenants, each taking ~10-15s (connect + create
DB/container + populate docs), initialization alone took ~8-10 minutes.

Now submits all tenant initializations to the existing ExecutorService
in parallel, collecting results via Future.get(). With 50 tenants on
a 50-thread pool, initialization completes in ~15-20s instead.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Report changes:
- Overlay all runs on one chart per metric (threads, heap, FDs, etc.)
- White background, monospace font, grid lines, color legend
- Remove pass/fail threshold verdicts (to be decided later)
- Use short run labels from run-name instead of branch/commit

Run skill changes:
- Agent must proactively poll for completion and notify user immediately
- Do not wait for user to ask 'peek' to discover run has completed

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Instead of separate Comparison + per-run JVM Metrics tables, now generates
one table with baseline/peak/final sub-columns per run. Throughput also
merged into a single table with count/mean/1m-rate per run.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants