diff --git a/gateway/gateway-controller/default-policies/advanced-ratelimit.yaml b/gateway/gateway-controller/default-policies/advanced-ratelimit.yaml index 1033bc613..186633dce 100644 --- a/gateway/gateway-controller/default-policies/advanced-ratelimit.yaml +++ b/gateway/gateway-controller/default-policies/advanced-ratelimit.yaml @@ -1,5 +1,5 @@ name: advanced-ratelimit -version: v0.1.4 +version: v0.3.0 description: | Rate limiting policy supporting multiple algorithms (GCRA, Fixed Window), multi-dimensional quotas, weighted rate limiting, flexible key extraction, and both in-memory and Redis backends. Supports diff --git a/gateway/gateway-controller/default-policies/api-key-auth.yaml b/gateway/gateway-controller/default-policies/api-key-auth.yaml index 291ad3b19..d9504c36c 100644 --- a/gateway/gateway-controller/default-policies/api-key-auth.yaml +++ b/gateway/gateway-controller/default-policies/api-key-auth.yaml @@ -1,5 +1,5 @@ name: api-key-auth -version: v0.3.0 +version: v0.8.0 description: | Validates API keys in incoming requests and authorizes access when a valid key is present. Reads the key from a configured header or query parameter. diff --git a/gateway/gateway-controller/default-policies/azure-content-safety-content-moderation.yaml b/gateway/gateway-controller/default-policies/azure-content-safety-content-moderation.yaml index aa64a2356..c4b73d3de 100644 --- a/gateway/gateway-controller/default-policies/azure-content-safety-content-moderation.yaml +++ b/gateway/gateway-controller/default-policies/azure-content-safety-content-moderation.yaml @@ -1,5 +1,5 @@ name: azure-content-safety-content-moderation -version: v0.3.0 +version: v0.8.0 description: | Validates request and response content with Azure Content Safety moderation checks. Supports configurable category thresholds, optional JSONPath diff --git a/gateway/gateway-controller/default-policies/basic-auth.yaml b/gateway/gateway-controller/default-policies/basic-auth.yaml index a1aea6331..baf7b4e87 100644 --- a/gateway/gateway-controller/default-policies/basic-auth.yaml +++ b/gateway/gateway-controller/default-policies/basic-auth.yaml @@ -1,5 +1,5 @@ name: basic-auth -version: v0.1.0 +version: v0.8.0 description: | Implements HTTP Basic Authentication to protect APIs with username and password credentials. Validates the Authorization header against configured credentials and sets authentication diff --git a/gateway/gateway-controller/default-policies/basic-ratelimit.yaml b/gateway/gateway-controller/default-policies/basic-ratelimit.yaml index 37bb5bcff..1eff25ccf 100644 --- a/gateway/gateway-controller/default-policies/basic-ratelimit.yaml +++ b/gateway/gateway-controller/default-policies/basic-ratelimit.yaml @@ -1,5 +1,5 @@ name: basic-ratelimit -version: v0.3.0 +version: v0.8.1 description: | Enforces request rate limits by restricting how many requests are allowed within one or more configured time windows. diff --git a/gateway/gateway-controller/default-policies/cors.yaml b/gateway/gateway-controller/default-policies/cors.yaml index 845b9154e..bdfc1a36e 100644 --- a/gateway/gateway-controller/default-policies/cors.yaml +++ b/gateway/gateway-controller/default-policies/cors.yaml @@ -1,5 +1,5 @@ name: cors -version: v0.3.0 +version: v0.8.0 description: | Applies Cross-Origin Resource Sharing (CORS) rules by handling preflight requests and adding CORS headers to responses. Controls cross-origin access diff --git a/gateway/gateway-controller/default-policies/jwt-auth.yaml b/gateway/gateway-controller/default-policies/jwt-auth.yaml index c72398507..d957fc808 100644 --- a/gateway/gateway-controller/default-policies/jwt-auth.yaml +++ b/gateway/gateway-controller/default-policies/jwt-auth.yaml @@ -1,5 +1,5 @@ name: jwt-auth -version: v0.3.0 +version: v0.8.0 description: | Validates JWT access tokens in API requests using configured JWKS providers. Verifies signature and claims such as expiry, issuer, audience, scopes, and diff --git a/gateway/gateway-controller/default-policies/model-round-robin.yaml b/gateway/gateway-controller/default-policies/model-round-robin.yaml index f6564c5a8..52c958f6f 100644 --- a/gateway/gateway-controller/default-policies/model-round-robin.yaml +++ b/gateway/gateway-controller/default-policies/model-round-robin.yaml @@ -1,5 +1,5 @@ name: model-round-robin -version: v0.3.0 +version: v0.8.0 description: | Distributes requests across configured AI models in round-robin order to balance traffic and reduce overloading on any single model. diff --git a/gateway/gateway-controller/default-policies/pii-masking-regex.yaml b/gateway/gateway-controller/default-policies/pii-masking-regex.yaml index c7cc7ef74..e66452ddc 100644 --- a/gateway/gateway-controller/default-policies/pii-masking-regex.yaml +++ b/gateway/gateway-controller/default-policies/pii-masking-regex.yaml @@ -1,5 +1,5 @@ name: pii-masking-regex -version: v0.3.0 +version: v0.8.0 description: | Masks or redacts Personally Identifiable Information (PII) in request and response payloads using configured regex patterns. Supports reversible diff --git a/gateway/gateway-controller/default-policies/prompt-decorator.yaml b/gateway/gateway-controller/default-policies/prompt-decorator.yaml index 1495763fe..9c8d9b869 100644 --- a/gateway/gateway-controller/default-policies/prompt-decorator.yaml +++ b/gateway/gateway-controller/default-policies/prompt-decorator.yaml @@ -1,5 +1,5 @@ name: prompt-decorator -version: v0.3.0 +version: v0.8.0 description: | Applies configured prompt decorations to request payloads before upstream processing. diff --git a/gateway/gateway-controller/default-policies/prompt-template.yaml b/gateway/gateway-controller/default-policies/prompt-template.yaml index 0e91b9263..df3d3c70f 100644 --- a/gateway/gateway-controller/default-policies/prompt-template.yaml +++ b/gateway/gateway-controller/default-policies/prompt-template.yaml @@ -1,5 +1,5 @@ name: prompt-template -version: v0.3.0 +version: v0.8.0 description: | Applies configured prompt templates to request payloads to transform prompts before upstream processing. diff --git a/gateway/gateway-controller/default-policies/remove-headers.yaml b/gateway/gateway-controller/default-policies/remove-headers.yaml index f4616fe86..46d850008 100644 --- a/gateway/gateway-controller/default-policies/remove-headers.yaml +++ b/gateway/gateway-controller/default-policies/remove-headers.yaml @@ -1,5 +1,5 @@ name: remove-headers -version: v0.3.0 +version: v0.8.0 description: | Removes configured headers from requests and/or responses. Header matching is case-insensitive, and removing a non-existent header is ignored. diff --git a/gateway/gateway-controller/default-policies/semantic-cache.yaml b/gateway/gateway-controller/default-policies/semantic-cache.yaml index 78a26f319..85c057f90 100644 --- a/gateway/gateway-controller/default-policies/semantic-cache.yaml +++ b/gateway/gateway-controller/default-policies/semantic-cache.yaml @@ -1,5 +1,5 @@ name: semantic-cache -version: v0.3.0 +version: v0.8.0 description: | Caches LLM responses using semantic similarity over request embeddings to reduce repeated upstream calls. Returns cached responses on similarity hits diff --git a/gateway/gateway-controller/default-policies/semantic-prompt-guard.yaml b/gateway/gateway-controller/default-policies/semantic-prompt-guard.yaml index 89465bb2a..b1f6b04bd 100644 --- a/gateway/gateway-controller/default-policies/semantic-prompt-guard.yaml +++ b/gateway/gateway-controller/default-policies/semantic-prompt-guard.yaml @@ -1,5 +1,5 @@ name: semantic-prompt-guard -version: v0.1.0 +version: v0.8.0 description: | Blocks or allows prompts based on semantic similarity to configured allow/deny phrase embeddings. The incoming prompt is embedded via the configured embedding provider diff --git a/gateway/gateway-controller/default-policies/set-headers.yaml b/gateway/gateway-controller/default-policies/set-headers.yaml index fcb347468..84f76303c 100644 --- a/gateway/gateway-controller/default-policies/set-headers.yaml +++ b/gateway/gateway-controller/default-policies/set-headers.yaml @@ -1,5 +1,5 @@ name: set-headers -version: v0.3.0 +version: v0.8.0 description: | Sets configured headers on requests and/or responses. If the same header is set multiple times, the latest configured value overwrites earlier values. diff --git a/gateway/gateway-controller/default-policies/token-based-ratelimit.yaml b/gateway/gateway-controller/default-policies/token-based-ratelimit.yaml index c52dca974..6c2903748 100644 --- a/gateway/gateway-controller/default-policies/token-based-ratelimit.yaml +++ b/gateway/gateway-controller/default-policies/token-based-ratelimit.yaml @@ -1,5 +1,5 @@ name: token-based-ratelimit -version: v0.3.0 +version: v0.8.1 description: | Enforces token-based rate limits for LLM traffic by resolving token extraction paths from provider templates and delegating enforcement to the diff --git a/gateway/it/features/basic-ratelimit.feature b/gateway/it/features/basic-ratelimit.feature new file mode 100644 index 000000000..24bda7ed5 --- /dev/null +++ b/gateway/it/features/basic-ratelimit.feature @@ -0,0 +1,318 @@ +# -------------------------------------------------------------------- +# Copyright (c) 2025, WSO2 LLC. (https://www.wso2.com). +# +# WSO2 LLC. licenses this file to you under the Apache License, +# Version 2.0 (the "License"); you may not use this file except +# in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# -------------------------------------------------------------------- + +@basic-ratelimit +Feature: Basic Rate Limiting + As an API developer + I want a simple rate limiting policy + So that I can easily protect my APIs without complex configuration + + Background: + Given the gateway services are running + + Scenario: Enforce basic rate limit on API resource + Given I authenticate using basic auth as "admin" + When I deploy this API configuration: + """ + apiVersion: gateway.api-platform.wso2.com/v1alpha1 + kind: RestApi + metadata: + name: basic-ratelimit-test-api + spec: + displayName: Basic RateLimit Test API + version: v1.0 + context: /basic-ratelimit/$version + upstream: + main: + url: http://sample-backend:9080/api/v1 + operations: + - method: GET + path: /limited + policies: + - name: basic-ratelimit + version: v0 + params: + limits: + - requests: 5 + duration: "1h" + """ + Then the response should be successful + And I wait for the endpoint "http://localhost:8080/basic-ratelimit/v1.0/limited" to be ready + + # Send 4 requests - all should succeed (readiness check used ~1) + When I send 4 GET requests to "http://localhost:8080/basic-ratelimit/v1.0/limited" + Then the response status code should be 200 + + # Send 1 more request to exhaust the quota (total ~6 requests including readiness) + When I send a GET request to "http://localhost:8080/basic-ratelimit/v1.0/limited" + Then the response status code should be 429 + And the response body should contain "Rate limit exceeded" + + Scenario: Rate limit headers are returned + Given I authenticate using basic auth as "admin" + When I deploy this API configuration: + """ + apiVersion: gateway.api-platform.wso2.com/v1alpha1 + kind: RestApi + metadata: + name: basic-ratelimit-headers-api + spec: + displayName: Basic RateLimit Headers API + version: v1.0 + context: /basic-ratelimit-headers/$version + upstream: + main: + url: http://sample-backend:9080/api/v1 + operations: + - method: GET + path: /check + policies: + - name: basic-ratelimit + version: v0 + params: + limits: + - requests: 100 + duration: "1h" + """ + Then the response should be successful + And I wait for the endpoint "http://localhost:8080/basic-ratelimit-headers/v1.0/check" to be ready + + When I send a GET request to "http://localhost:8080/basic-ratelimit-headers/v1.0/check" + Then the response status code should be 200 + And the response header "X-RateLimit-Limit" should be "100" + And the response header "X-RateLimit-Remaining" should exist + And the response header "X-RateLimit-Reset" should exist + + Scenario: Multiple limits enforce most restrictive limit + Given I authenticate using basic auth as "admin" + When I deploy this API configuration: + """ + apiVersion: gateway.api-platform.wso2.com/v1alpha1 + kind: RestApi + metadata: + name: basic-ratelimit-multi-limits-api + spec: + displayName: Basic RateLimit Multi Limits API + version: v1.0 + context: /basic-ratelimit-multi/$version + upstream: + main: + url: http://sample-backend:9080/api/v1 + operations: + - method: GET + path: /health + - method: GET + path: /resource + policies: + - name: basic-ratelimit + version: v0 + params: + limits: + - requests: 10 + duration: "1h" + - requests: 5 + duration: "24h" + """ + Then the response should be successful + And I wait for the endpoint "http://localhost:8080/basic-ratelimit-multi/v1.0/health" to be ready + + # 24h limit (5) is more restrictive than 1h limit (10) + # Send 5 requests - should succeed (5/5 for 24h, 5/10 for 1h) + When I send 5 GET requests to "http://localhost:8080/basic-ratelimit-multi/v1.0/resource" + Then the response status code should be 200 + + # 6th request should be blocked by 24h limit + When I send a GET request to "http://localhost:8080/basic-ratelimit-multi/v1.0/resource" + Then the response status code should be 429 + And the response body should contain "Rate limit exceeded" + + Scenario: Per-route rate limiting with basic-ratelimit + Given I authenticate using basic auth as "admin" + When I deploy this API configuration: + """ + apiVersion: gateway.api-platform.wso2.com/v1alpha1 + kind: RestApi + metadata: + name: basic-ratelimit-per-route-api + spec: + displayName: Basic RateLimit Per Route API + version: v1.0 + context: /basic-ratelimit-per-route/$version + upstream: + main: + url: http://sample-backend:9080/api/v1 + operations: + - method: GET + path: /health + - method: GET + path: /route1 + policies: + - name: basic-ratelimit + version: v0 + params: + limits: + - requests: 3 + duration: "1h" + - method: GET + path: /route2 + policies: + - name: basic-ratelimit + version: v0 + params: + limits: + - requests: 3 + duration: "1h" + """ + Then the response should be successful + And I wait for the endpoint "http://localhost:8080/basic-ratelimit-per-route/v1.0/health" to be ready + + # Each route has its own quota (basic-ratelimit uses routename as key) + # Send 3 requests to route1 - should succeed (uses route1's quota) + When I send 3 GET requests to "http://localhost:8080/basic-ratelimit-per-route/v1.0/route1" + Then the response status code should be 200 + + # route1's 4th request should be rate limited + When I send a GET request to "http://localhost:8080/basic-ratelimit-per-route/v1.0/route1" + Then the response status code should be 429 + + # route2 has its own separate quota - should still work + When I send 3 GET requests to "http://localhost:8080/basic-ratelimit-per-route/v1.0/route2" + Then the response status code should be 200 + + # route2's 4th request should also be rate limited + When I send a GET request to "http://localhost:8080/basic-ratelimit-per-route/v1.0/route2" + Then the response status code should be 429 + + Scenario: 429 response includes Retry-After header + Given I authenticate using basic auth as "admin" + When I deploy this API configuration: + """ + apiVersion: gateway.api-platform.wso2.com/v1alpha1 + kind: RestApi + metadata: + name: basic-ratelimit-retry-after-api + spec: + displayName: Basic RateLimit Retry After API + version: v1.0 + context: /basic-ratelimit-retry/$version + upstream: + main: + url: http://sample-backend:9080/api/v1 + operations: + - method: GET + path: /health + - method: GET + path: /resource + policies: + - name: basic-ratelimit + version: v0 + params: + limits: + - requests: 3 + duration: "1h" + """ + Then the response should be successful + And I wait for the endpoint "http://localhost:8080/basic-ratelimit-retry/v1.0/health" to be ready + + # Exhaust the rate limit (limit=3) + When I send 3 GET requests to "http://localhost:8080/basic-ratelimit-retry/v1.0/resource" + Then the response status code should be 200 + + # Next request should be rate limited with Retry-After header + When I send a GET request to "http://localhost:8080/basic-ratelimit-retry/v1.0/resource" + Then the response status code should be 429 + And the response header "Retry-After" should exist + + # Scenario: Rate limit scope based on policy attachment level + # Given I authenticate using basic auth as "admin" + # When I deploy this API configuration: + # """ + # apiVersion: gateway.api-platform.wso2.com/v1alpha1 + # kind: RestApi + # metadata: + # name: basic-ratelimit-scope-api + # spec: + # displayName: Basic RateLimit Scope API + # version: v1.0 + # context: /basic-ratelimit-scope/$version + # upstream: + # main: + # url: http://sample-backend:9080/api/v1 + # policies: + # - name: basic-ratelimit + # version: v0 + # params: + # limits: + # - requests: 5 + # duration: "1h" + # operations: + # - method: GET + # path: /health + # policies: + # - name: basic-ratelimit + # version: v0 + # params: + # limits: + # - requests: 100 + # duration: "1h" + # - method: GET + # path: /resource-a + # - method: GET + # path: /resource-b + # policies: + # - name: basic-ratelimit + # version: v0 + # params: + # limits: + # - requests: 3 + # duration: "1h" + # - method: GET + # path: /resource-c + # """ + # Then the response should be successful + # And I wait for the endpoint "http://localhost:8080/basic-ratelimit-scope/v1.0/health" to be ready + + # # Resource B has its own route-level policy (Limit: 3) + # # Send 3 requests to B -> Should succeed + # When I send 3 GET requests to "http://localhost:8080/basic-ratelimit-scope/v1.0/resource-b" + # Then the response status code should be 200 + + # # 4th request to B -> Should fail (Limit 3 exhausted) + # When I send a GET request to "http://localhost:8080/basic-ratelimit-scope/v1.0/resource-b" + # Then the response status code should be 429 + + # # Resource A and C fall back to API-level policy (Limit: 5, Shared) + # # Send 2 requests to A -> Should succeed + # When I send 2 GET requests to "http://localhost:8080/basic-ratelimit-scope/v1.0/resource-a" + # Then the response status code should be 200 + + # # Send 2 requests to C -> Should succeed (Total 4/5) + # When I send 2 GET requests to "http://localhost:8080/basic-ratelimit-scope/v1.0/resource-c" + # Then the response status code should be 200 + + # # Send 1 request to A -> Should succeed (Total 5/5) + # When I send a GET request to "http://localhost:8080/basic-ratelimit-scope/v1.0/resource-a" + # Then the response status code should be 200 + + # # Send 1 request to C -> Should fail (Total 6/5, Limit 5 exhausted) + # When I send a GET request to "http://localhost:8080/basic-ratelimit-scope/v1.0/resource-c" + # Then the response status code should be 429 + + # # Verify B is still rate limited (independent of A/C bucket) + # When I send a GET request to "http://localhost:8080/basic-ratelimit-scope/v1.0/resource-b" + # Then the response status code should be 429 diff --git a/gateway/it/features/ratelimit.feature b/gateway/it/features/ratelimit.feature index 4911f4811..c784fc73d 100644 --- a/gateway/it/features/ratelimit.feature +++ b/gateway/it/features/ratelimit.feature @@ -353,6 +353,62 @@ Feature: Rate Limiting Then the response status code should be 429 And the response body should contain "Rate limit exceeded" + Scenario: Response cost overage clamps quota to zero + Given I authenticate using basic auth as "admin" + When I deploy this API configuration: + """ + apiVersion: gateway.api-platform.wso2.com/v1alpha1 + kind: RestApi + metadata: + name: ratelimit-response-clamp-api + spec: + displayName: RateLimit Response Clamp API + version: v1.0 + context: /ratelimit-response-clamp/$version + upstream: + main: + url: http://echo-backend:80 + operations: + - method: GET + path: /anything + - method: POST + path: /anything + policies: + - name: advanced-ratelimit + version: v0 + params: + quotas: + - name: response-token-quota + limits: + - limit: 20 + duration: "1h" + costExtraction: + enabled: true + sources: + - type: response_body + jsonPath: "$.json.custom_cost" + default: 0 + """ + Then the response should be successful + And I wait for the endpoint "http://localhost:8080/ratelimit-response-clamp/v1.0/anything" to be ready + + # custom_cost=50 exceeds remaining=20 on first request. + # Expected clamp behavior: consume remaining quota, return 200, remaining becomes 0. + When I send a POST request to "http://localhost:8080/ratelimit-response-clamp/v1.0/anything" with body: + """ + {"custom_cost": 50} + """ + Then the response status code should be 200 + And the response header "X-RateLimit-Remaining" should be "0" + + # Next request must be blocked because previous overage clamped quota to zero. + When I send a POST request to "http://localhost:8080/ratelimit-response-clamp/v1.0/anything" with body: + """ + {"custom_cost": 1} + """ + Then the response status code should be 429 + And the response body should contain "Rate limit exceeded" + Scenario: API-level rate limiting with apiname key extraction Given I authenticate using basic auth as "admin" When I deploy this API configuration: diff --git a/gateway/it/features/token-based-ratelimit.feature b/gateway/it/features/token-based-ratelimit.feature index dd70e898a..f84c5a91d 100644 --- a/gateway/it/features/token-based-ratelimit.feature +++ b/gateway/it/features/token-based-ratelimit.feature @@ -1284,3 +1284,331 @@ Feature: Token-Based Rate Limiting Then the response status code should be 200 When I delete the LLM provider template "shared-template" Then the response status code should be 200 + + Scenario: Empty prompt/completion limits with total-only limit still enforces rate limiting + Given I authenticate using basic auth as "admin" + + # Create template with all token extraction paths + When I create this LLM provider template: + """ + apiVersion: gateway.api-platform.wso2.com/v1alpha1 + kind: LlmProviderTemplate + metadata: + name: empty-limits-template + spec: + displayName: Empty Limits Template + promptTokens: + location: payload + identifier: $.json.usage.prompt_tokens + completionTokens: + location: payload + identifier: $.json.usage.completion_tokens + totalTokens: + location: payload + identifier: $.json.usage.total_tokens + requestModel: + location: payload + identifier: $.json.model + responseModel: + location: payload + identifier: $.json.model + """ + Then the response status code should be 201 + + # Create provider with explicit empty prompt/completion limit arrays and only total limit configured + Given I authenticate using basic auth as "admin" + When I create this LLM provider: + """ + apiVersion: gateway.api-platform.wso2.com/v1alpha1 + kind: LlmProvider + metadata: + name: empty-limits-provider + spec: + displayName: Empty Limits Provider + version: v1.0 + context: /empty-limits + template: empty-limits-template + upstream: + url: http://echo-backend-multi-arch:8080/anything + auth: + type: api-key + header: Authorization + value: test-api-key + accessControl: + mode: deny_all + exceptions: + - path: /chat/completions + methods: [POST, GET] + policies: + - name: token-based-ratelimit + version: v0 + paths: + - path: /chat/completions + methods: [POST] + params: + promptTokenLimits: [] + completionTokenLimits: [] + totalTokenLimits: + - count: 5 + duration: "1m" + algorithm: fixed-window + backend: memory + """ + Then the response status code should be 201 + And I wait for the endpoint "http://localhost:8080/empty-limits/chat/completions" to be ready + + # Must use application/json content-type for the echo backend to parse the body + Given I set header "Content-Type" to "application/json" + + # First request consumes the entire total token quota + When I send a POST request to "http://localhost:8080/empty-limits/chat/completions" with body: + """ + { + "model": "gpt-4", + "usage": { + "prompt_tokens": 0, + "completion_tokens": 5, + "total_tokens": 5 + } + } + """ + Then the response status code should be 200 + + # Next request should be blocked by total token quota + When I send a POST request to "http://localhost:8080/empty-limits/chat/completions" with body: + """ + { + "model": "gpt-4", + "usage": { + "prompt_tokens": 0, + "completion_tokens": 1, + "total_tokens": 1 + } + } + """ + Then the response status code should be 429 + + # Cleanup + Given I authenticate using basic auth as "admin" + When I delete the LLM provider "empty-limits-provider" + Then the response status code should be 200 + When I delete the LLM provider template "empty-limits-template" + Then the response status code should be 200 + + Scenario: Empty completion/total limits with prompt-only limit still enforces rate limiting + Given I authenticate using basic auth as "admin" + + # Create template with all token extraction paths + When I create this LLM provider template: + """ + apiVersion: gateway.api-platform.wso2.com/v1alpha1 + kind: LlmProviderTemplate + metadata: + name: prompt-only-empty-limits-template + spec: + displayName: Prompt Only Empty Limits Template + promptTokens: + location: payload + identifier: $.json.usage.prompt_tokens + completionTokens: + location: payload + identifier: $.json.usage.completion_tokens + totalTokens: + location: payload + identifier: $.json.usage.total_tokens + requestModel: + location: payload + identifier: $.json.model + responseModel: + location: payload + identifier: $.json.model + """ + Then the response status code should be 201 + + # Create provider with only prompt limits configured + Given I authenticate using basic auth as "admin" + When I create this LLM provider: + """ + apiVersion: gateway.api-platform.wso2.com/v1alpha1 + kind: LlmProvider + metadata: + name: prompt-only-empty-limits-provider + spec: + displayName: Prompt Only Empty Limits Provider + version: v1.0 + context: /prompt-only-empty-limits + template: prompt-only-empty-limits-template + upstream: + url: http://echo-backend-multi-arch:8080/anything + auth: + type: api-key + header: Authorization + value: test-api-key + accessControl: + mode: deny_all + exceptions: + - path: /chat/completions + methods: [POST, GET] + policies: + - name: token-based-ratelimit + version: v0 + paths: + - path: /chat/completions + methods: [POST] + params: + promptTokenLimits: + - count: 5 + duration: "1m" + completionTokenLimits: [] + totalTokenLimits: [] + algorithm: fixed-window + backend: memory + """ + Then the response status code should be 201 + And I wait for the endpoint "http://localhost:8080/prompt-only-empty-limits/chat/completions" to be ready + + Given I set header "Content-Type" to "application/json" + + # First request consumes the entire prompt token quota + When I send a POST request to "http://localhost:8080/prompt-only-empty-limits/chat/completions" with body: + """ + { + "model": "gpt-4", + "usage": { + "prompt_tokens": 5, + "completion_tokens": 0, + "total_tokens": 5 + } + } + """ + Then the response status code should be 200 + + # Next request should be blocked by prompt token quota + When I send a POST request to "http://localhost:8080/prompt-only-empty-limits/chat/completions" with body: + """ + { + "model": "gpt-4", + "usage": { + "prompt_tokens": 1, + "completion_tokens": 0, + "total_tokens": 1 + } + } + """ + Then the response status code should be 429 + + # Cleanup + Given I authenticate using basic auth as "admin" + When I delete the LLM provider "prompt-only-empty-limits-provider" + Then the response status code should be 200 + When I delete the LLM provider template "prompt-only-empty-limits-template" + Then the response status code should be 200 + + Scenario: Empty prompt/total limits with completion-only limit still enforces rate limiting + Given I authenticate using basic auth as "admin" + + # Create template with all token extraction paths + When I create this LLM provider template: + """ + apiVersion: gateway.api-platform.wso2.com/v1alpha1 + kind: LlmProviderTemplate + metadata: + name: completion-only-empty-limits-template + spec: + displayName: Completion Only Empty Limits Template + promptTokens: + location: payload + identifier: $.json.usage.prompt_tokens + completionTokens: + location: payload + identifier: $.json.usage.completion_tokens + totalTokens: + location: payload + identifier: $.json.usage.total_tokens + requestModel: + location: payload + identifier: $.json.model + responseModel: + location: payload + identifier: $.json.model + """ + Then the response status code should be 201 + + # Create provider with only completion limits configured + Given I authenticate using basic auth as "admin" + When I create this LLM provider: + """ + apiVersion: gateway.api-platform.wso2.com/v1alpha1 + kind: LlmProvider + metadata: + name: completion-only-empty-limits-provider + spec: + displayName: Completion Only Empty Limits Provider + version: v1.0 + context: /completion-only-empty-limits + template: completion-only-empty-limits-template + upstream: + url: http://echo-backend-multi-arch:8080/anything + auth: + type: api-key + header: Authorization + value: test-api-key + accessControl: + mode: deny_all + exceptions: + - path: /chat/completions + methods: [POST, GET] + policies: + - name: token-based-ratelimit + version: v0 + paths: + - path: /chat/completions + methods: [POST] + params: + promptTokenLimits: [] + completionTokenLimits: + - count: 5 + duration: "1m" + totalTokenLimits: [] + algorithm: fixed-window + backend: memory + """ + Then the response status code should be 201 + And I wait for the endpoint "http://localhost:8080/completion-only-empty-limits/chat/completions" to be ready + + Given I set header "Content-Type" to "application/json" + + # First request consumes the entire completion token quota + When I send a POST request to "http://localhost:8080/completion-only-empty-limits/chat/completions" with body: + """ + { + "model": "gpt-4", + "usage": { + "prompt_tokens": 0, + "completion_tokens": 5, + "total_tokens": 5 + } + } + """ + Then the response status code should be 200 + + # Next request should be blocked by completion token quota + When I send a POST request to "http://localhost:8080/completion-only-empty-limits/chat/completions" with body: + """ + { + "model": "gpt-4", + "usage": { + "prompt_tokens": 0, + "completion_tokens": 1, + "total_tokens": 1 + } + } + """ + Then the response status code should be 429 + + # Cleanup + Given I authenticate using basic auth as "admin" + When I delete the LLM provider "completion-only-empty-limits-provider" + Then the response status code should be 200 + When I delete the LLM provider template "completion-only-empty-limits-template" + Then the response status code should be 200