Skip to content

Releases: google/fleetbench

v2.0.7

01 Feb 00:31

Choose a tag to compare

v2.0.7 Pre-release
Pre-release
Handle missing performance counters in parallel_bench.

PiperOrigin-RevId: 861168827
Change-Id: I1f1c4e4405994281af7ac7009a2d9f156cff80c4

Fleetbench v2.1

20 Jan 17:21

Choose a tag to compare

Following the major architectural overhaul in v2.0, v2.1 improves the benchmarks in data fidelity, better observability, and framework stabilization.

Key Changes

  1. High-Fidelity Benchmark Data
    We re-implemented field sample logic. This ensures the distribution of message types, enums, and nesting levels more accurately matches the statistical profile of our production traffic. We also improved message type generation to better simulate cache pressure.

  2. Framework Stability & Parallel Execution
    The Multiprocessing Framework parallel controller now gracefully handles failed benchmark worker runs, preventing individual workload crashes from stalling the entire suite. It also now support passing additional flags directly to the underlying benchmark targets for more granular configuration of individual workloads.

  3. Observability & Result Processing
    Cache size overrides are now included in the output context. This helps verifying system topology when running on emulators or non-standard hardware.

  4. Bug fixes and dependency updates
    We fixed several issues in the Multiprocessing Framework and update third-party dependencies and CI environment configurations.

We hope these improvements help you get more accurate and reliable performance data. If you have any questions or feedback, please feel free to contact us. Happy benchmarking!

v2.0.6

01 Jan 00:25

Choose a tag to compare

v2.0.6 Pre-release
Pre-release
Upgrade GitHub Actions for Node 24 compatibility

See https://github.com/google/fleetbench/pull/32

PiperOrigin-RevId: 845386328
Change-Id: I62106e7c22ef08cd858dabde23ca2d720bfc9c3c

v2.0.5

01 Dec 00:25

Choose a tag to compare

v2.0.5 Pre-release
Pre-release
Pass additional flags to the benchmark target.

PiperOrigin-RevId: 834415354
Change-Id: Id5ddd4bfffaa08c72765a9d9badb35f0cdb30fe0

v2.0.4

01 Nov 00:21

Choose a tag to compare

v2.0.4 Pre-release
Pre-release
Replace NaN in std columns with string "NaN" for JSON compatibility.

PiperOrigin-RevId: 822745633
Change-Id: I4337e86104f4d3b1325a623146c1497aafd476f7

v2.0.3

01 Oct 00:21

Choose a tag to compare

v2.0.3 Pre-release
Pre-release
Update GitHub workflow for releases.

PiperOrigin-RevId: 810176172
Change-Id: I8d12b22776598cef5082bb9f0c4b8dcc3b11d18c

v2.0.2

01 Aug 00:25

Choose a tag to compare

v2.0.2 Pre-release
Pre-release
Internal build changes.

PiperOrigin-RevId: 789375628
Change-Id: I195c32da1afe9ad343fbf404be1014141045d719

v2.0.1

01 Jul 00:23

Choose a tag to compare

v2.0.1 Pre-release
Pre-release
Update readme and version.

PiperOrigin-RevId: 776609725
Change-Id: Idfed7de6be033c4e7ca09144ff598541c8977b3e

V2.0.0 Unlocking Deeper Performance Insights with Multi-Core Simulation and Enhanced Workloads

27 Jun 14:43

Choose a tag to compare

We are thrilled to announce the release of Fleetbench v2.0, a major milestone that significantly enhances our benchmarking suite's capability to accurately characterize system performance under realistic, concurrent workloads. This release introduces the powerful Multiprocessing Framework, alongside critical New Benchmarks (gRPC and SIMD), and substantial Improvements and Bug Fixes across the suite.

This version represents a substantial step forward in capturing system performance from diverse angles, enabling developers and performance engineers to gain granular insights into how important libraries behave in complex, multi-core environments.

πŸš€ New Features & Capabilities

Broadened Hardware & Environment Supports

  • Runnability on Emulation and Real Hardwares: Fleetbench is now rigorously tested and validated for consistent performance measurement across both emulated environments and physical hardware. This ensures that development and testing workflows utilizing platforms like QEMU can accurately predict real-world performance characteristics, enabling a more seamless transition from concept to development to deployment.

Multiprocessing Framework (/fleetbench/parallel/)

The new Fleetbench Multiprocessing framework is designed for precise CPU load simulation, moving beyond simplistic single-threaded measurements to analyze system behavior under controlled, concurrent loads.

  • Core Architecture: At its heart, parallel_bench.py orchestrates parallel benchmark execution. A central controller dynamically schedules Fleetbench binaries across a configurable pool of worker threads, distributed over multiple CPU cores.

  • Adaptive Load Simulation: Load maintenance is achieved through an adaptive scheduling approach. The controller continuously monitors real-time CPU utilization and dynamically adjusts benchmark scheduling strategy to ensure sustained target CPU utilization.

  • Granular Control: We've introduced extensive customization options, including:

    • Workload Distribution Strategies: Users can define workload composition with strategies like WORKLOAD_WEIGHTED (based on aggregate workload runtime) or DCTAX_WEIGHTED (user-defined proportional weights via weights.csv), allowing for fine-tuned synthetic load generation.

    • Hyperthreading Control (x86_64): Advanced SMT state manipulation via --hyperthreading_mode enables detailed analysis of core contention and cache behavior.

    • Flexible Execution Parameters: Flags such as --duration, --num_cpus, and --workload_filter provide precise control over the benchmark environment.

  • Google Benchmark Integration: The framework seamlessly integrates with the underlying Google Benchmark library, supporting familiar flags like --benchmark_repetitions, --benchmark_filter, and --benchmark_perf_counters for detailed metric collection.

Usage

First build two targets, one for the Fleetbench binary and the other is the multiprocessing framework:

bazel build --config=clang --config=opt --config=haswell fleetbench:fleetbench
bazel build --config=clang --config=opt --config=haswell fleetbench/parallel:parallel_bench

Then run with command:

bazel-bin/fleetbench/parallel/parallel_bench --benchmark_target=bazel-bin/fleetbench/fleetbench

For more usages, please check the README.md or get the flag list via bazel-bin/fleetbench/parallel/parallel_bench --help.

New Benchmarks

We've expanded our suite with two crucial, real-world representative benchmarks:

SIMD Benchmark

  • Purpose: Accurately measures the performance of Single Instruction, Multiple Data (SIMD) operations.

  • Workload: Based on the SIMD-heavy computational patterns from ScaNN LUT16, reflecting operations common in database query processing, cryptography, and approximate nearest neighbor search.

  • Mechanism: It calculates distance scores by indexing into query-specific Look-Up Tables (LUTs) using database item codes and accumulating retrieved values. Leverages parallel data loading, table lookups, and accumulation to harness SIMD power. The benchmark focuses entirely on the performance of the SIMD-heavy lookup-and-accumulate loop.

  • Relevance: SIMD instructions are fundamental to high performance in modern computing, accounting for a large portion of CPU instructions in our fleet and growing rapidly.

gRPC Benchmark

  • Purpose: Provides a realistic assessment of kernel and scheduling performance for remote procedure calls (RPC).

  • Workload: Utilizes synthesized representative protos reflecting common request/response patterns derived from real-world fleet traffic, similar to our existing Proto Benchmark.

  • Mechanism: Built upon the open-source gRPC framework, this benchmark employs a streamlined, asynchronous callback client/server architecture operating on a local host to minimize network interference.

  • Relevance: This benchmark addresses the need for accurately evaluating Hyperscale SoC performance under realistic and complex traffic patterns and server loads.

✨ Benchmark Updates & Enhancements

Overall Suite Improvements

  • Updated Fleet Data: All V1.0 benchmarks now use the more recent fleet data for continuous representativeness.

  • Explicit Iteration Counts: Benchmarks have explicit iteration counts, ensuring more consistent and reproducible results.

  • Enhanced Stability with Warmup Phases: A warmup phase has been added to benchmarks to reduce initial variance, leading to more consistent performance measurements.

  • Accurate L3 Cache Size Detection on AMD Platforms: Fleetbench now correctly aggregates L3 cache size across all CCXs per socket, providing more accurate cold benchmark constructions.

Dedicated Benchmark Refinements

Proto Benchmark

  • Improved Representativeness: Re-implemented logic for field sample messages (now weight-based), better cold message generation, improved enum fields, and smarter message type generation with reused types.

  • Data Synthesis: Better distinguishing between synthesized data for varint and fixed integers.

  • Memory Optimization: Optimized memory usage for improved emulator compatibility.

Swissmap Benchmarks

  • Improved Capacity Sizing: More accurate Swissmap's capacity sizing and including fleet size-capacity parameters.

  • New InsertMiss Benchmarks: Introduced InsertMiss_Hot and InsertMiss_Cold for measuring insertion performance of non-present elements.

  • Optimized Destructor Benchmarks: Adjusted batch sizes in IntDestructor and StrDestructor benchmarks to reduce overhead from helper functions for more accurate measurements.

  • Improved Hash Function: Updated to use a low-cost hash function for better entropy with random 32-bit integer keys.

LIBC Benchmarks

  • Realistic Branching Behavior: Incorporated a more fleet representative branching pattern for realistic branch prediction.

  • Improved memcmp & bcmp benchmarks: Now using the same source and destination buffer for correctly accounts for buffer overlaps.

  • memmove and Compare Benchmarks Fix: Corrected buffer size calculation for non-overlapping destination addresses, preventing potential infinite loops.

  • Integer Overflow Protection: Added checks for maximum supported L3 cache size to enhance robustness.

πŸ› Bug Fixes

We also fixed a series of bugs across the suite to improve stability, accuracy, and reliability.

πŸš€ Get Started

We encourage everyone to try Fleetbench v2.0 for the performance analysis and let us know how you think!

πŸ™Œ Special Thanks to Our Contributors!

This release is a testament to the power of collaborative development. We extend our deepest gratitude to everyone who contributed to Fleetbench! Your insightful feedback, diligent bug reports, and valuable code contributions have been instrumental in making this release a reality and significantly advancing the capabilities of our benchmarking suite. A big thank you to everyone! 🎊🎊🎊

v1.0.15

01 Jun 00:25

Choose a tag to compare

v1.0.15 Pre-release
Pre-release
Update swissmap benchmarks to use a low cost hash function that has e…