[Azure-cosmos-benchmark]Refactor azure-cosmos-benchmark: move all configs from CLI to workload config JSON by xinlian12 · Pull Request #48250 · Azure/azure-sdk-for-java

xinlian12 · 2026-03-04T20:56:02Z

Summary

Refactors the azure-cosmos-benchmark configuration system to eliminate CLI parameter sprawl, centralize all workload configuration in a single JSON file (-workloadConfig), and unify all benchmark types under a single BenchmarkOrchestrator.

Before: ~60 CLI parameters in Configuration.java, 5 separate code paths in Main.java.
After: 6 CLI params remain, single orchestrator entry point for all benchmark types.

Architecture: Before vs After

Before

Main.main()
  ├─ if (isSync)         → syncBenchmark()       [own MetricRegistry, reporter, result uploader]
  ├─ if (CtlWorkload)    → asyncCtlWorkload()    [own MetricRegistry, reporter]
  ├─ if (LinkedInCtl)    → linkedInCtlWorkload()  [own MetricRegistry, reporter]
  ├─ if (encryption)     → asyncEncryptionBenchmark() [own MetricRegistry, reporter]
  └─ else                → BenchmarkOrchestrator  [shared MetricRegistry, reporter]

After

Main.main()
  └─ BenchmarkOrchestrator.run(benchConfig)
       ├─ Sets up shared infrastructure (MetricRegistry, reporters, system properties)
       ├─ Factory dispatches based on operationType + flags (isSync, isEncryptionEnabled):
       │   ├─ Sync:       SyncReadBenchmark / SyncWriteBenchmark
       │   ├─ CTL:        AsyncCtlWorkload
       │   ├─ LinkedIn:   LICtlWorkload
       │   ├─ Encryption: AsyncEncryptionRead/Write/Query benchmarks
       │   └─ Async:      AsyncRead/Write/Query/Mixed/ReadMany benchmarks
       └─ Lifecycle loop: create → run → shutdown → settle × N cycles

All benchmark types now implement a common Benchmark interface (run() + shutdown()) and accept an injected MetricRegistry from the orchestrator.

Workload Config JSON Structure

{
  "tenantDefaults": {
    "connectionMode": "DIRECT",
    "consistencyLevel": "SESSION",
    "concurrency": "50",
    "numberOfOperations": "100000",
    "operation": "ReadThroughput",
    "numberOfPreCreatedDocuments": "1000",
    "isPartitionLevelCircuitBreakerEnabled": "true",
    "isPerPartitionAutomaticFailoverRequired": "true",
    "minConnectionPoolSizePerEndpoint": "0"
  },
  "metrics": {
    "enableJvmStats": true,
    "enableNettyHttpMetrics": false,
    "printingInterval": 10,
    "reportingDirectory": "/tmp/benchmark-output",
    "resultUpload": {
      "serviceEndpoint": "https://results-account.documents.azure.com:443/",
      "masterKey": "",
      "database": "benchresults",
      "container": "runs"
    },
    "runMetadata": {
      "testVariationName": "baseline-direct-mode",
      "branchName": "main",
      "commitId": "abc123def"
    }
  },
  "tenants": [
    {
      "id": "tenant-0",
      "serviceEndpoint": "https://cosmos-account-0.documents.azure.com:443/",
      "masterKey": "",
      "databaseId": "benchdb",
      "containerId": "benchcol",
      "operation": "ReadThroughput",
      "concurrency": 100
    }
  ]
}

How defaults work:

Each tenant inherits from tenantDefaults (e.g., both tenants get connectionMode: DIRECT)
Per-tenant values override defaults (e.g., tenant-0 gets concurrency: 100)
metrics section is global (not per-tenant)
JVM system properties (isPartitionLevelCircuitBreakerEnabled, etc.) live in tenantDefaults but apply globally

All Configurations — Before vs After

Account & Connection

Config Name	Before	After
`serviceEndpoint`	CLI `-serviceEndpoint`	`tenants[]` / `tenantDefaults`
`masterKey`	CLI `-masterKey`	`tenants[]` / `tenantDefaults`
`databaseId`	CLI `-databaseId`	`tenants[]` / `tenantDefaults`
`containerId`	CLI `-collectionId`	`tenants[]` / `tenantDefaults`
`connectionMode`	CLI `-connectionMode`	`tenants[]` / `tenantDefaults`
`consistencyLevel`	CLI `-consistencyLevel`	`tenants[]` / `tenantDefaults`
`maxConnectionPoolSize`	CLI `-maxConnectionPoolSize`	`tenants[]` / `tenantDefaults`
`preferredRegionsList`	CLI `-preferredRegionsList`	`tenants[]` / `tenantDefaults`
`applicationName`	CLI `-applicationName`	`tenants[]` / `tenantDefaults`
`manageDatabase`	CLI `-manageDatabase`	`tenants[]` / `tenantDefaults`

Workload Parameters

Config Name	Before	After
`operation`	CLI `-operation`	`tenants[]` / `tenantDefaults`
`concurrency`	CLI `-concurrency`	`tenants[]` / `tenantDefaults`
`numberOfOperations`	CLI `-numberOfOperations`	`tenants[]` / `tenantDefaults`
`useSync`	CLI `-useSync`	`tenants[]` / `tenantDefaults`
`encryptionEnabled`	CLI `-encryptionEnabled`	`tenants[]` / `tenantDefaults`

JVM System Properties

Config Name	Before	After
`isPartitionLevelCircuitBreakerEnabled`	CLI	`tenantDefaults` only
`isPerPartitionAutomaticFailoverRequired`	CLI	`tenantDefaults` only
`minConnectionPoolSizePerEndpoint`	CLI	`tenantDefaults` only

Metrics & Reporting

Config Name	Before	After
`enableJvmStats`	CLI `-enableJvmStats`	`metrics`
`printingInterval`	CLI `-printingInterval`	`metrics`
`reportingDirectory`	CLI `-reportingDirectory`	`metrics`
Result upload fields	CLI flags	`metrics.resultUpload`
Run metadata fields	CLI flags	`metrics.runMetadata`

Remaining CLI Parameters (unchanged)

Config Name	Description
`-workloadConfig`	Path to workload config JSON (required)
`-cycles`	Number of create/destroy cycles
`-settleTimeMs`	Wait between cycles
`-gcBetweenCycles`	Force GC during settle
`-suppressCleanup`	Skip DB/container cleanup
`-help`	Show help

Key Changes

Main.java: 190 → 62 lines. Single code path through BenchmarkOrchestrator
Benchmark.java (new): Common interface (run() + shutdown()) implemented by all benchmark types
BenchmarkOrchestrator.java: Expanded factory dispatches sync, CTL, encryption, LinkedIn benchmarks based on operationType + flags
SyncBenchmark, AsyncCtlWorkload, AsyncEncryptionBenchmark, LICtlWorkload: Accept injected MetricRegistry, removed self-managed reporters/JVM stats/result uploaders
BenchmarkConfig.java: Separated config parsing into focused methods (loadJvmSystemProperties, loadMetricsConfig, loadResultUploadConfig, loadRunMetadata)
Configuration.java: 1057 → 131 lines. Only CLI lifecycle params remain
TenantWorkloadConfig.java: All workload params with tenantDefaults merge
Renamed: globalDefaults → tenantDefaults, -tenantsFile → -workloadConfig
Removed: Graphite reporting, redundant TenantWorkloadConfigFromConfigurationTest.java

Verification

mvn compile -pl sdk/cosmos/azure-cosmos-benchmark -DskipTests passes cleanly

…d config JSON - Remove ~50 CLI parameters from Configuration.java; all workload/connection params now come from the workload config JSON file (TenantWorkloadConfig) - Configuration.java reduced from 1057 to 131 lines with only 6 CLI params: workloadConfig, cycles, settleTimeMs, gcBetweenCycles, suppressCleanup, help - Move metrics/reporting/result-upload configs into a nested 'metrics' section in the workload config JSON with 'resultUpload' and 'runMetadata' sub-objects - Remove all Graphite reporting support (use Application Insights instead) - Remove TenantWorkloadConfig.fromConfiguration() factory method - Make workload config file mandatory (-workloadConfig flag) - Rename -tenantsFile CLI flag to -workloadConfig - Update all consumer classes (29 files) to use TenantWorkloadConfig for workload params and BenchmarkConfig for global/reporting params Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Route sync, CTL, encryption, and LinkedIn benchmarks through the orchestrator instead of having separate code paths in Main.java. - Add Benchmark interface (run + shutdown) implemented by all benchmark types - Refactor SyncBenchmark, AsyncCtlWorkload, AsyncEncryptionBenchmark, and LICtlWorkload to accept injected MetricRegistry (like AsyncBenchmark) - Remove self-managed reporter, result uploader, and JVM stats from each benchmark — orchestrator handles all infrastructure concerns - Expand orchestrator factory to dispatch based on operationType + flags (isSync, isEncryptionEnabled) - Simplify Main.java from 5 code paths to a single orchestrator call Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The section provides default values for tenant-level configuration, so tenantDefaults better reflects its purpose. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkConfig.java

- Split loadGlobalSystemPropertiesFromWorkloadConfig into 4 focused methods: loadJvmSystemProperties, loadMetricsConfig, loadResultUploadConfig, loadRunMetadata - Add clarifying comment in Main.java explaining Configuration -> BenchmarkConfig -> BenchmarkOrchestrator flow Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR refactors the azure-cosmos-benchmark tool to eliminate ~60 CLI parameters and centralize all workload configuration in a single JSON file (-workloadConfig). The result is a unified BenchmarkOrchestrator entry point for all benchmark types (sync, async, CTL, encryption, LinkedIn), with a new Benchmark interface and per-benchmark MetricRegistry injection from the orchestrator.

Changes:

Config system: Configuration.java reduced from ~1057 to 131 lines; all workload params moved to TenantWorkloadConfig and loaded from JSON; BenchmarkConfig now only loads from JSON (no more CLI single-tenant path).
Benchmark unification: New Benchmark interface (run() + shutdown()) implemented by all benchmark types; BenchmarkOrchestrator now dispatches sync, CTL, encryption, and LinkedIn benchmarks in addition to async ones.
Test updates: Tests migrated from building CLI strings + JCommander parsing to direct TenantWorkloadConfig construction; TenantWorkloadConfigFromConfigurationTest deleted.

Reviewed changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`Configuration.java`	Reduced to 6 CLI lifecycle params only
`TenantWorkloadConfig.java`	All workload params added; `fromConfiguration()` factory removed; `parseWorkloadConfig` renamed from `parseTenantsFile`
`BenchmarkConfig.java`	Now always requires a JSON config file; split loading into focused methods
`BenchmarkOrchestrator.java`	Factory now dispatches all benchmark types via `Benchmark` interface; `buildCosmosMicrometerRegistry` inlined
`Benchmark.java`	New interface with `run()` and `shutdown()`
`Main.java`	Simplified to single code path through `BenchmarkOrchestrator`
`SyncBenchmark.java`, `SyncReadBenchmark.java`, `SyncWriteBenchmark.java`	Accept injected `MetricRegistry`; implement `Benchmark`
`AsyncBenchmark.java`	Implements `Benchmark`; `run()`/`shutdown()` made public
`AsyncEncryptionBenchmark.java`	Accept injected `MetricRegistry`; implements `Benchmark`
`AsyncCtlWorkload.java`	Accept injected `MetricRegistry`; implements `Benchmark`
`LICtlWorkload.java`	Accept injected `MetricRegistry`; implements `Benchmark`; `setup()` merged into `run()`
`ScheduledReporterFactory.java`	Accepts `BenchmarkConfig` instead of `Configuration`
`WorkflowTest.java`	Tests migrated to direct `TenantWorkloadConfig` construction
`ReadMyWritesConsistencyTest.java`	Test migrated to direct `TenantWorkloadConfig` construction
`TenantWorkloadConfigFromConfigurationTest.java`	Deleted (no longer needed)
LinkedIn classes	`Configuration` replaced with `TenantWorkloadConfig`

Comments suppressed due to low confidence (3)

sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/encryption/AsyncEncryptionBenchmark.java:219

In AsyncEncryptionBenchmark.shutdown(), the suppressCleanup flag is not respected. Unlike AsyncBenchmark, SyncBenchmark, and AsyncCtlWorkload which all check workloadConfig.isSuppressCleanup() before deleting database/container, AsyncEncryptionBenchmark.shutdown() unconditionally deletes them when databaseCreated or collectionCreated is true. This means the encryption benchmark will always clean up even when cycles > 1 and suppressCleanup=true is set.

    public void shutdown() {
        if (this.databaseCreated) {
            cosmosAsyncDatabase.delete().block();
            logger.info("Deleted temporary database {} created for this test", this.workloadConfig.getDatabaseId());
        } else if (this.collectionCreated) {
            cosmosAsyncContainer.delete().block();
            logger.info("Deleted temporary collection {} created for this test", this.workloadConfig.getContainerId());
        }

        cosmosClient.close();
    }

sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ScheduledReporterFactory.java:36

The ScheduledReporterFactory.create() method signature changed from accepting Configuration to accepting BenchmarkConfig, but the method is no longer called anywhere (reporter creation is now inlined in BenchmarkOrchestrator.run()). The ScheduledReporterFactory class appears to be dead code after this refactor. Verify whether it is still used, and if not, consider removing it to avoid confusion.

    public static ScheduledReporter create(final BenchmarkConfig benchConfig,
        final MetricRegistry metricsRegistry) {
        if (benchConfig.getReportingDirectory() != null) {
            return CsvReporter.forRegistry(metricsRegistry)
                .convertDurationsTo(TimeUnit.MILLISECONDS)
                .convertRatesTo(TimeUnit.SECONDS)
                .build(new File(benchConfig.getReportingDirectory()));
        } else {
            return ConsoleReporter.forRegistry(metricsRegistry)
                .convertDurationsTo(TimeUnit.MILLISECONDS)
                .convertRatesTo(TimeUnit.SECONDS)
                .build();
        }

sdk/cosmos/azure-cosmos-benchmark/src/test/java/com/azure/cosmos/benchmark/WorkflowTest.java:99

The readMyWritesCLI and writeLatencyCLI tests (lines 33–46 and 86–99) call Main.main(StringUtils.split(cmd)) with the old CLI parameters (-serviceEndpoint, -masterKey, -databaseId, -collectionId, -operation, etc.). After this refactor, Main.main() no longer accepts those parameters — it requires -workloadConfig pointing to a JSON file. These tests will fail at runtime with a ParameterException from JCommander (unknown parameters) followed by a NullPointerException in BenchmarkConfig.fromConfiguration() because workloadConfig will be null. These tests need to be either updated to write a temporary JSON config file and pass -workloadConfig to them, or removed if they are superseded by the readMyWrites/writeLatency test variants below them.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-04T23:28:38Z

sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/Main.java

+            validateConfiguration(firstTenant);
+
+            new BenchmarkOrchestrator().run(benchConfig);
        } catch (ParameterException e) {
            System.err.println("INVALID Usage: " + e.getMessage());
            System.err.println("Try '-help' for more information.");
            throw e;
        }
    }

-    private static void validateConfiguration(Configuration cfg) {
-        switch (cfg.getOperationType()) {
+    private static void validateConfiguration(TenantWorkloadConfig workloadCfg) {
+        switch (workloadCfg.getOperationType()) {
            case WriteLatency:
            case WriteThroughput:
                break;
            default:
-                if (!cfg.isContentResponseOnWriteEnabled()) {
+                if (!workloadCfg.isContentResponseOnWriteEnabled()) {
                    throw new IllegalArgumentException("contentResponseOnWriteEnabled parameter can only be set to false " +
                        "for write latency and write throughput operations");
                }
        }

-        switch (cfg.getOperationType()) {
+        switch (workloadCfg.getOperationType()) {
            case ReadLatency:
            case ReadThroughput:
                break;
            default:
-                if (cfg.getSparsityWaitTime() != null) {
-                    throw new IllegalArgumentException("sparsityWaitTime is not supported for " + cfg.getOperationType());
+                if (workloadCfg.getSparsityWaitTime() != null) {
+                    throw new IllegalArgumentException("sparsityWaitTime is not supported for " + workloadCfg.getOperationType());
                }
        }


The validateConfiguration method in Main.java only validates firstTenant (the first element of the tenant workloads list), but the workload config JSON can define multiple tenants. Configuration constraints such as contentResponseOnWriteEnabled and sparsityWaitTime should be validated for all tenants, not just the first one. Invalid configurations in tenants 2..N will only be discovered at runtime when the benchmark starts.

Copilot · 2026-03-04T23:28:38Z

sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/Main.java

+            TenantWorkloadConfig firstTenant = benchConfig.getTenantWorkloads().get(0);

-            if (cfg.isSync()) {
-                syncBenchmark(cfg);
-            } else {
-                if (cfg.getOperationType().equals(CtlWorkload)) {
-                    asyncCtlWorkload(cfg);
-                } else if (cfg.getOperationType().equals(LinkedInCtlWorkload)) {
-                    linkedInCtlWorkload(cfg);
-                } else if (cfg.isEncryptionEnabled()) {
-                    asyncEncryptionBenchmark(cfg);
-                } else {
-                    asyncBenchmark(cfg);
-                }
-            }
+            validateConfiguration(firstTenant);


The parseWorkloadConfig method returns an empty list silently when the tenants array is absent or empty in the JSON. Then Main.java at line 31 calls benchConfig.getTenantWorkloads().get(0) which will throw an IndexOutOfBoundsException with no useful error message. The orchestrator does check if (config.getTenantWorkloads().isEmpty()) at line 61 of BenchmarkOrchestrator, but Main.java crashes before that check is reached. Either parseWorkloadConfig should throw an IllegalArgumentException when no tenants are defined, or Main.java should guard against the empty list before calling get(0).

Copilot · 2026-03-04T23:28:38Z

...s/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkOrchestrator.java

+        }
+
+        String roleInstance = System.getenv("APPLICATIONINSIGHTS_ROLE_INSTANCE");
+        if (roleName != null) {


In BenchmarkOrchestrator.buildCosmosMicrometerRegistry(), at line 454, there is a copy-paste bug: the condition if (roleName != null) is used to add the cloud_RoleInstance tag, but it should be if (roleInstance != null). As written, if APPLICATIONINSIGHTS_ROLE_NAME env var is not set (roleName == null), the cloud_RoleInstance tag is never added even when roleInstance has a value. Conversely, if roleName is set but roleInstance is null, then Tag.of("cloud_RoleInstance", null) may cause a NullPointerException. This same bug existed in the old Configuration.java code that was inlined here.

Suggested change

if (roleName != null) {

if (roleInstance != null) {

Copilot · 2026-03-04T23:28:39Z

...os/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/TenantWorkloadConfig.java

+    public Configuration.Environment getEnvironment() {
+        if (environment == null) return Configuration.Environment.Daily;
+        return Configuration.Environment.valueOf(environment);
+    }


In TenantWorkloadConfig.getEnvironment() (line 317-320), the method returns a Configuration.Environment type. This creates a coupling between TenantWorkloadConfig and the now-reduced Configuration class (which is meant to hold only CLI lifecycle params). The Environment enum should either be moved to TenantWorkloadConfig or to a shared location, rather than being accessed from Configuration.

Suggested change

public Configuration.Environment getEnvironment() {

if (environment == null) return Configuration.Environment.Daily;

return Configuration.Environment.valueOf(environment);

}

/**

* Benchmark execution environment (for example, Daily).

* <p>

* This enum is defined here to avoid coupling {@link TenantWorkloadConfig}

* to the CLI {@code Configuration} class.

* </p>

*/

public enum Environment {

Daily

}

public Environment getEnvironment() {

if (environment == null) {

return Environment.Daily;

}

return Environment.valueOf(environment);

}

Annie Liang and others added 2 commits March 4, 2026 12:51

Merge branch 'main' into reduceCLIArgs

42637b5

github-actions bot added the Cosmos label Mar 4, 2026

Annie Liang and others added 2 commits March 4, 2026 14:50

Rename globalDefaults to tenantDefaults in workload config

a486439

The section provides default values for tenant-level configuration, so tenantDefaults better reflects its purpose. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

xinlian12 commented Mar 4, 2026

View reviewed changes

sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkConfig.java Outdated Show resolved Hide resolved

xinlian12 marked this pull request as ready for review March 4, 2026 23:22

xinlian12 requested review from a team and kirankumarkolli as code owners March 4, 2026 23:22

Copilot AI review requested due to automatic review settings March 4, 2026 23:22

xinlian12 requested a review from a team as a code owner March 4, 2026 23:22

Copilot started reviewing on behalf of xinlian12 March 4, 2026 23:24 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Azure-cosmos-benchmark]Refactor azure-cosmos-benchmark: move all configs from CLI to workload config JSON#48250

[Azure-cosmos-benchmark]Refactor azure-cosmos-benchmark: move all configs from CLI to workload config JSON#48250
xinlian12 wants to merge 5 commits intoAzure:mainfrom
xinlian12:reduceCLIArgs

xinlian12 commented Mar 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    public Configuration.Environment getEnvironment() {
-        if (environment == null) return Configuration.Environment.Daily;
-        return Configuration.Environment.valueOf(environment);
-    }
+    /**
+     * Benchmark execution environment (for example, Daily).
+     * <p>
+     * This enum is defined here to avoid coupling {@link TenantWorkloadConfig}
+     * to the CLI {@code Configuration} class.
+     * </p>
+     */
+    public enum Environment {
+        Daily
+    }
+    public Environment getEnvironment() {
+        if (environment == null) {
+            return Environment.Daily;
+        }
+        return Environment.valueOf(environment);
+    }

Conversation

xinlian12 commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture: Before vs After

Before

After

Workload Config JSON Structure

How defaults work:

All Configurations — Before vs After

Account & Connection

Workload Parameters

JVM System Properties

Metrics & Reporting

Remaining CLI Parameters (unchanged)

Key Changes

Verification

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xinlian12 commented Mar 4, 2026 •

edited

Loading