[pull] master from ray-project:master#834
Merged
pull[bot] merged 4 commits intogarymm:masterfrom Mar 17, 2026
Merged
Conversation
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
https://buildkite.com/ray-project/postmerge/builds/16480#019cf89a-569c-4482-86aa-ccf8f559675c/L15115 Signed-off-by: abrar <abrar@anyscale.com>
…equest completion, reduce replica update overhead (#61755) Fixes elevated P99 latency observed when scaling Ray Serve deployments with `max_ongoing_requests=1`. The root cause is that the queue length cache is incremented when a request is sent (`on_send_request`) but never decremented when the request completes, causing cache entries to get "stuck" at values >= `max_ongoing_requests`. This forces every subsequent routing decision to fall back to blocking probe RPCs instead of using cached values. This regression was introduced when `on_send_request` was added in the router refactor (commit de1494e, Aug 2025). Prior to that (Ray <= 2.10), the cache was only updated from replica-reported values (probes and rejection protocol responses), so there was no increment-without-matching-decrement problem. ### Changes **1. Decrement queue length cache on request completion (primary fix)** Implements `decrement_queue_len_cache` in `RequestRouter` to decrement the cache entry by 1 when a request finishes. This restores the increment/decrement symmetry that was missing since `on_send_request` was introduced. With `max_ongoing_requests=1`, the routing algorithm in `_select_from_candidate_replicas` treats any cache entry >= `max_ongoing_requests` as "needs probing". Before this fix, every routed request would bump the cache to 1, and it would stay there until either the 10s TTL expired or a probe happened to refresh it. This meant the cache was nearly useless, and most routing decisions required a blocking probe RPC (~20-40ms round trip), directly explaining the observed P99 increase. **2. Reuse existing replica wrappers in `_update_running_replicas`** Previously, every replica update created new `RunningReplica` wrappers for *all* replicas, even those that hadn't changed. During scaling storms (100+ updates with 250+ replicas each), this caused O(n) synchronous work per update on the router's event loop. Now reuses existing wrappers for known replicas, only creating wrappers for genuinely new ones. This reduces per-update work from O(all_replicas) to O(new_replicas). **3. Reduce replica update log noise** The "Got updated replicas" log line previously serialized every replica ID (250+) into a string on every update. Changed to log only the total count and the added/removed counts, reducing both log volume and the synchronous formatting cost on the event loop. ## Load Test Results | Scale | Client | Master QPS | Master P99 Latency | Optimized QPS | Optimized P99 Latency | |------|--------|------------|--------------------|---------------|-----------------------| | **Up to 100 Users** | <img src="https://github.com/user-attachments/assets/d87573da-9fb8-4c22-b93d-04ce4cac2635" width="250"> | <img src="https://github.com/user-attachments/assets/d0de09c8-5932-4ea0-9988-cb798e8d6328" width="400"> | <img src="https://github.com/user-attachments/assets/c46dc371-c19d-4e2d-8327-b78648e2c393" width="400"> | <img src="https://github.com/user-attachments/assets/fccfd88c-6cd2-4d56-bf32-da5cec54e937" width="400"> | <img src="https://github.com/user-attachments/assets/6fc9260d-f164-415d-baa9-da8a837a1d23" width="400"> | | **Up to 200 Users** | <img src="https://github.com/user-attachments/assets/8455e66d-9945-480c-89eb-78c1d3641e4e" width="250"> | <img src="https://github.com/user-attachments/assets/5a755c67-3588-4add-bfa5-f7e974f8b547" width="400"> | <img src="https://github.com/user-attachments/assets/2a7cfe6d-c660-4433-b350-a2b406421a70" width="400"> | <img src="https://github.com/user-attachments/assets/1ecea44d-2eb0-445f-a9e2-70b61eb4eba9" width="400"> | <img src="https://github.com/user-attachments/assets/e683bffb-9c45-4755-a86c-c6f39f84568b" width="400"> | --------- Signed-off-by: abrar <abrar@anyscale.com>
…er (#61299) ## Description This PR implements support for elastic training on TPUs using the `JaxTrainer` API and the elastic scaling policy. Specifically, this PR utilizes a new TPU utility `get_num_ready_tpu_slices` to return the number of full, ready TPU slices in the RayCluster and then adjusts the `_count_possible_workers` calculation when running on TPUs to scale atomically by TPU slices. This PR also adds comprehensive unit tests and an e2e test for the new support. I'll separate the `ray.util.tpu` change in a separate PR, but left it in for now so that the tests could pass. ## Related issues Implements milestone 3 of #55162 --------- Signed-off-by: ryanaoleary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )