[pull] master from ray-project:master#826
Merged
pull[bot] merged 7 commits intogarymm:masterfrom Mar 14, 2026
Merged
Conversation
…61663) It looks like we changed some log lines that were used to detect when memory pressure monitor killed a worker: #61210 Causing the memory pressure test to now consistently fail since the test conditions waited for these lines to be generated. Updating the memory pressure test to now wait on these new log lines being generated Signed-off-by: Joshua Lee <joshlee@anyscale.com>
…ests (#61668) java_test targets do not produce a _deploy.jar in Bazel 7+. Add a companion java_binary (all_tests_bin) that produces all_tests_bin_deploy.jar in both Bazel 6 and Bazel 7, and update all references accordingly. The java_binary includes //cpp:counter.so and //cpp:plus.so as resources so that CrossLanguageInvocationTest.getResourceAsStream("/cpp/counter.so") finds them in the deploy jar classpath. Signed-off-by: andrew <andrew@anyscale.com>
## Description 1. Inlining `ActorPoolResizingPolicy` 2. Rebasing `_ActorPool` to compute utilization based on all actors, not just running 3. Allow autoscaler to scale up while pending actors are still starting up 4. Updated tests ## Related issues > Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
`test_data_parallel_trainer::test_config_accelerator_type` has been timing out in CI ([Buildkite #61885](https://buildkite.com/ray-project/premerge/builds/61885#019ce2b6-7daa-4dfb-932c-d687cc33edac)). This PR deflakes the test by replacing the expensive 6-node heterogeneous cluster with a single-node `ray.init` cluster and reducing the parameter space from 6 cases to 2. This cuts runtime significantly while preserving the core coverage of the `accelerator_type` scheduling constraint. Signed-off-by: JasonLi1909 <jasli1909@gmail.com>
…#61374) ## Description This PR optimizes the amount of calls to `_try_schedule_one` which are causing the autoscaler to hang. It reduces the time complexity of trying to fit resource requests on in-flight nodes by grouping requests by their shape. Currently, the v2 scheduler evaluates every individual request against every node, the time complexity is approximately O(N^2*M). By using `SerializeToString(deterministic=True)` to generate a deterministic hash, we cache infeasible request shapes per node. If a shape fails to fit on a given node, the scheduler now skips the expensive `_try_schedule_one` check for all subsequent identical requests on that node. This PR includes a unit test in `test_scheduler.py` to verify the caching logic correctly short-circuits redundant evaluations, a manual test is included in the additional info. ## Related issues [#3794](ray-project/kuberay#3794) ## Additional information Can verify the optimization by running the below test on a RayCluster with Autoscaler V2 enabled: ``` import ray import time import os import logging logging.getLogger("ray").setLevel(logging.DEBUG) @ray.remote def ten_minute_task(task_id): start = time.time() while time.time() - start < 300: _ = sum([i * i for i in range(10000)]) time.sleep(0.1) return task_id def main(): tasks = [] for i in range(4000): task = ten_minute_task.remote(i) tasks.append(task) results = ray.get(tasks) if __name__ == "__main__": main() ``` --------- Signed-off-by: ryanaoleary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: Rueian <rueiancsie@gmail.com>
#61731) When a deployment starts up and replicas are scheduled but not yet RUNNING (`current_num_replicas=0`), the autoscaling policy runs with `total_num_requests=0`. The cold start fast path returns `None` (no traffic), so the core policy returns `target_num_replicas` and it flows into `_apply_scaling_factors`. The scaling formula is: `ceil(current + factor * (desired - current))` When `current=0`, this becomes `ceil(factor * desired)`, which amplifies the entire target as if it were growth. Combined with the delay bypass for `current==0`, this compounds every tick: | Tick | target\_in | formula | target\_out | |------|-----------|---------|------------| | 0 | 2 | `ceil(2.0 × 2)` | **4** | | 1 | 4 | `ceil(2.0 × 4)` | **8** | | 2 | 8 | `ceil(2.0 × 8) → 16, clamped` | **10 (max)** | In 3 ticks, with zero traffic, a `min_replicas=2, max_replicas=10, upscaling_factor=2.0` deployment scales to `max_replicas`. This was introduced in #60851 which removed the cold start fallback (`return ctx.target_num_replicas` when `current==0` and no traffic) so that custom policies like `AsyncInferenceAutoscalingPolicy` could detect queue work. That change was correct for custom policies but exposed the default policy to the amplification loop. ## Fix Skip scaling factor amplification when `current_num_replicas == 0` in `_apply_scaling_factors`. Scaling factors control the *rate of change from a baseline* — when there is no baseline, amplifying the full target as delta is incorrect. The cold start fast path already handles the `current==0` with-traffic case separately (applying `upscaling_factor` once), so this is consistent. This preserves the async inference scale-from-zero behavior: custom policies still run, return their desired value (e.g. `1` for queue work, `0` for idle), and the delay bypass lets legitimate scale-ups through immediately. Signed-off-by: abrar <abrar@anyscale.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )