[pull] master from ray-project:master by pull[bot] · Pull Request #834 · garymm/ray

pull · 2026-03-17T07:18:16Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

https://buildkite.com/ray-project/postmerge/builds/16480#019cf89a-569c-4482-86aa-ccf8f559675c/L15115 Signed-off-by: abrar <abrar@anyscale.com>

…equest completion, reduce replica update overhead (#61755) Fixes elevated P99 latency observed when scaling Ray Serve deployments with `max_ongoing_requests=1`. The root cause is that the queue length cache is incremented when a request is sent (`on_send_request`) but never decremented when the request completes, causing cache entries to get "stuck" at values >= `max_ongoing_requests`. This forces every subsequent routing decision to fall back to blocking probe RPCs instead of using cached values. This regression was introduced when `on_send_request` was added in the router refactor (commit de1494e, Aug 2025). Prior to that (Ray <= 2.10), the cache was only updated from replica-reported values (probes and rejection protocol responses), so there was no increment-without-matching-decrement problem. ### Changes **1. Decrement queue length cache on request completion (primary fix)** Implements `decrement_queue_len_cache` in `RequestRouter` to decrement the cache entry by 1 when a request finishes. This restores the increment/decrement symmetry that was missing since `on_send_request` was introduced. With `max_ongoing_requests=1`, the routing algorithm in `_select_from_candidate_replicas` treats any cache entry >= `max_ongoing_requests` as "needs probing". Before this fix, every routed request would bump the cache to 1, and it would stay there until either the 10s TTL expired or a probe happened to refresh it. This meant the cache was nearly useless, and most routing decisions required a blocking probe RPC (~20-40ms round trip), directly explaining the observed P99 increase. **2. Reuse existing replica wrappers in `_update_running_replicas`** Previously, every replica update created new `RunningReplica` wrappers for *all* replicas, even those that hadn't changed. During scaling storms (100+ updates with 250+ replicas each), this caused O(n) synchronous work per update on the router's event loop. Now reuses existing wrappers for known replicas, only creating wrappers for genuinely new ones. This reduces per-update work from O(all_replicas) to O(new_replicas). **3. Reduce replica update log noise** The "Got updated replicas" log line previously serialized every replica ID (250+) into a string on every update. Changed to log only the total count and the added/removed counts, reducing both log volume and the synchronous formatting cost on the event loop. ## Load Test Results | Scale | Client | Master QPS | Master P99 Latency | Optimized QPS | Optimized P99 Latency | |------|--------|------------|--------------------|---------------|-----------------------| | **Up to 100 Users** | <img src="https://github.com/user-attachments/assets/d87573da-9fb8-4c22-b93d-04ce4cac2635" width="250"> | <img src="https://github.com/user-attachments/assets/d0de09c8-5932-4ea0-9988-cb798e8d6328" width="400"> | <img src="https://github.com/user-attachments/assets/c46dc371-c19d-4e2d-8327-b78648e2c393" width="400"> | <img src="https://github.com/user-attachments/assets/fccfd88c-6cd2-4d56-bf32-da5cec54e937" width="400"> | <img src="https://github.com/user-attachments/assets/6fc9260d-f164-415d-baa9-da8a837a1d23" width="400"> | | **Up to 200 Users** | <img src="https://github.com/user-attachments/assets/8455e66d-9945-480c-89eb-78c1d3641e4e" width="250"> | <img src="https://github.com/user-attachments/assets/5a755c67-3588-4add-bfa5-f7e974f8b547" width="400"> | <img src="https://github.com/user-attachments/assets/2a7cfe6d-c660-4433-b350-a2b406421a70" width="400"> | <img src="https://github.com/user-attachments/assets/1ecea44d-2eb0-445f-a9e2-70b61eb4eba9" width="400"> | <img src="https://github.com/user-attachments/assets/e683bffb-9c45-4755-a86c-c6f39f84568b" width="400"> | --------- Signed-off-by: abrar <abrar@anyscale.com>

…er (#61299) ## Description This PR implements support for elastic training on TPUs using the `JaxTrainer` API and the elastic scaling policy. Specifically, this PR utilizes a new TPU utility `get_num_ready_tpu_slices` to return the number of full, ready TPU slices in the RayCluster and then adjusts the `_count_possible_workers` calculation when running on TPUs to scale atomically by TPU slices. This PR also adds comprehensive unit tests and an e2e test for the new support. I'll separate the `ray.util.tpu` change in a separate PR, but left it in for now so that the tests could pass. ## Related issues Implements milestone 3 of #55162 --------- Signed-off-by: ryanaoleary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

jeffreywang-anyscale and others added 4 commits March 16, 2026 18:33

[docs][serve][llm] Update data parallel attention documentation (#61706)

0670feb

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

[Serve] increase timeout of test_custom_autoscaling_metrics (#61776)

34593f1

https://buildkite.com/ray-project/postmerge/builds/16480#019cf89a-569c-4482-86aa-ccf8f559675c/L15115 Signed-off-by: abrar <abrar@anyscale.com>

pull bot locked and limited conversation to collaborators Mar 17, 2026

pull bot added the ⤵️ pull label Mar 17, 2026

pull bot merged commit 4397fcb into garymm:master Mar 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ray-project:master#834

[pull] master from ray-project:master#834
pull[bot] merged 4 commits intogarymm:masterfrom
ray-project:master

pull bot commented Mar 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pull bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pull bot commented Mar 17, 2026 •

edited

Loading