Add adaptive concurrency filter configuration to DestinationRule#3661
Add adaptive concurrency filter configuration to DestinationRule#3661prashanthjos wants to merge 3 commits intoistio:masterfrom
Conversation
Adds AdaptiveConcurrency message to the DestinationRule proto, enabling configuration of Envoy's adaptive concurrency filter via TrafficPolicy. The message structure mirrors Envoy's v3 AdaptiveConcurrency proto with GradientControllerConfig, ConcurrencyLimitCalculationParams, and MinimumRTTCalculationParams. Field names and numbers are aligned with the upstream Envoy proto.
cc2f2a8 to
aac1d4b
Compare
| // Adaptive concurrency settings for dynamically adjusting the allowed number of | ||
| // outstanding requests based on sampled latencies. This enables the Envoy | ||
| // adaptive concurrency filter for the destination. | ||
| AdaptiveConcurrency adaptive_concurrency = 9; |
There was a problem hiding this comment.
-
Does this completely override the static
max_connectionscircuit breaker threshold? Should we describe that here so that its more clear what to expect if both are configured? -
Is it also worth noting that new connections that exceed the adaptive concurrency threshold won't be reflected in circuit breaker statistics, the way they would when connections exceed the
max_connectionsthreshold? (At least that is how I understand it).
There was a problem hiding this comment.
Normally when the circuit breaker threshold is exceeded in Envoy, you also have the x-envoy-overloaded header returned as part of the response. Some users might rely on that header, or maybe the client-side Envoy proxy relies on that in some way.
Should we note that the x-envoy-overloaded header won't be present when the adaptive concurrency threshold is exceeded?
There was a problem hiding this comment.
- Does this completely override the static
max_connectionscircuit breaker threshold? Should we describe that here so that its more clear what to expect if both are configured?- Is it also worth noting that new connections that exceed the adaptive concurrency threshold won't be reflected in circuit breaker statistics, the way they would when connections exceed the
max_connectionsthreshold? (At least that is how I understand it).
@ericdbishop these are good points, I digged a little bit more into Envoy code, here is their adaptive filter. When the adaptive concurrency filter blocks a request, it does so here , the request never reaches the router at all:
if (controller_->forwardingDecision() == Controller::RequestForwardingAction::Block) {
decoder_callbacks_->sendLocalReply(config_->concurrencyLimitExceededStatus(),
"reached concurrency limit", nullptr, absl::nullopt,
"reached_concurrency_limit");
return Http::FilterHeadersStatus::StopIteration;
Whereas circuit breakers are checked much later, inside the connection pool when the router tries to establish an upstream connection:
const bool can_create_connection = host_->canCreateConnection(priority_);
if (!can_create_connection) {
host_->cluster().trafficStats()->upstream_cx_overflow_.inc();
}
if (!host_->cluster().resourceManager(priority_).pendingRequests().canCreate()) {
ENVOY_LOG(debug, "max pending streams overflow");
onPoolFailure(nullptr, absl::string_view(), ConnectionPool::PoolFailureReason::Overflow,
context);
host_->cluster().trafficStats()->upstream_rq_pending_overflow_.inc();
return nullptr;
}
So, No, Adaptive Concurrency Does NOT Override Circuit Breakers. If both are configured, a request must pass both checks independently:
- First, the adaptive concurrency filter checks if num_rq_outstanding < concurrency_limit. If not, the request is rejected immediately with a 503.
- If it passes, the request continues through the filter chain to the router, where circuit breaker thresholds (max_connections, max_pending_requests, max_requests) are checked independently at the connection pool level.
In practice, whichever limit is more restrictive will be the binding constraint. If the adaptive concurrency limit is lower than max_connections, most requests will be shed by the filter before circuit breakers ever see them.
When the adaptive concurrency filter blocks a request, it increments only its own stat:
adaptive_concurrency.gradient_controller.rq_blocked
Since the request never reaches the router or connection pool, none of the circuit breaker stats are affected:
- No
upstream_cx_overflow_ - No
upstream_rq_pending_overflow_ - No
upstream_rq_retry_overflow_
Someone monitoring only circuit breaker stats would not see that requests are being shed.This makes the three key facts explicit:
- They don't override each other, they're layered
- They limit different things at different points
- They have separate stats, so operators need to monitor both
This is part of Envoy and little we can do with Istio, In any case I have updated the comment.
There was a problem hiding this comment.
Normally when the circuit breaker threshold is exceeded in Envoy, you also have the
x-envoy-overloadedheader returned as part of the response. Some users might rely on that header, or maybe the client-side Envoy proxy relies on that in some way.Should we note that the
x-envoy-overloadedheader won't be present when the adaptive concurrency threshold is exceeded?
This is true, the header is not added, I have updated the comment !
There was a problem hiding this comment.
The notes you added to the comment look good to me, so at least users are aware of the differences. Thanks!
|
@ramaraochavali @keithmattix can you please help with the review |
| message MinimumRTTCalculationParams { | ||
| // The time interval between recalculating the minimum request round-trip time. | ||
| // Has to be positive. If set to zero, dynamic sampling of the minRTT is disabled. | ||
| google.protobuf.Duration interval = 1; |
| // The fixed value for the minRTT. This value is used when minRTT is not sampled | ||
| // dynamically. If dynamic sampling of the minRTT is disabled (i.e. `interval` is | ||
| // set to zero), this field must be set. | ||
| google.protobuf.Duration fixed_value = 6; |
There was a problem hiding this comment.
So at least one of fixed_value and or interval must be set? Should it be a oneof?
| // calculation window. This is represented as a percentage of the interval duration. | ||
| // Defaults to 15%. This is recommended to prevent all hosts in a cluster from | ||
| // being in a minRTT calculation window at the same time. | ||
| google.protobuf.DoubleValue jitter = 3; |
There was a problem hiding this comment.
Does this really need to be a double if, as in the example, folks are setting whole number percentages? Same goes for buffer
| MinimumRTTCalculationParams min_rtt_calc_params = 3 [(google.api.field_behavior) = REQUIRED]; | ||
| } | ||
|
|
||
| oneof concurrency_controller_config { |
| } | ||
|
|
||
| // If set to false, the adaptive concurrency filter will operate as a pass-through | ||
| // filter. If unspecified, the filter will be enabled. |
There was a problem hiding this comment.
This is going to change every single Envoy deployment to use adaptive concurrency right? That is very likely to throttle people's production traffic
Adds support for configuring Envoy's adaptive concurrency filter through Istio's DestinationRule TrafficPolicy.
The adaptive concurrency filter dynamically adjusts the allowed number of outstanding requests to upstream hosts based on sampled latency measurements, providing automatic concurrency limiting without manual tuning.
Example usage: