Split the config maps based on size while being created. So that many config maps are created with max size of 950KB and rule evaluator loads all configmaps to file for rules generator #1801

balakumarpg · 2025-11-10T15:22:50Z

Why:
Since the k8s configmaps have hard limit of 1MB, when there is a need of creating more GlobalRules, ClusterRules or Rules, all are going into one ConfigMap by rule-generator and from there it is being mounted as volume to rule-evaluator. Which brings the hard limit of alert definitions in one GKE cluster which is enabled the GMP can not go beyond 1 MB size.

This change will overcome that hard limit.

… config maps are created with max size of 950KB and rule evaluator loads all configmaps to file for rules generator

google-cla · 2025-11-10T15:22:55Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

bwplotka · 2025-11-18T16:42:11Z

Thanks!

So this makes sense on operator side, but how rule-eval should consume this? rule-eval currently only reads one config map and there's no easy way (or is there?) to tell rule-val pod to load dynamic number of those files? 🤔

We can't change deployment dynamically in practice within the security constraints we need to work with for managed GMP solution at the moment. That would be the only solution, right?

Create one ConfigMap per rule type (rules, clusterrules, globalrules) to work around the 1MB Kubernetes ConfigMap size limit. Each type stores all resources of that type in a single ConfigMap with retry logic and error tracking. Changes: - Implement one ConfigMap per type approach (rules, clusterrules, globalrules) - Add retry logic with exponential backoff for ConfigMap operations - Update deployment to mount 3 ConfigMaps via projected volumes - Fix all Dockerfiles to use awk instead of yq for version extraction - Add comprehensive tests for ConfigMap creation and recovery - Update documentation with architecture details Total capacity increased from 1MB to 3MB (1MB per type).

balakumarpg · 2025-11-23T22:13:18Z

Thanks!

So this makes sense on operator side, but how rule-eval should consume this? rule-eval currently only reads one config map and there's no easy way (or is there?) to tell rule-val pod to load dynamic number of those files? 🤔

We can't change deployment dynamically in practice within the security constraints we need to work with for managed GMP solution at the moment. That would be the only solution, right?

Thanks for the valuable input. Please check now, I have made some changes with respect to your comments.

bwplotka · 2025-11-24T08:41:58Z

Nice, sounds like the plan would be to do projections and split by 3 at least. Is that solving your use case? Do you have more or less equal distribution of rules across those types?

We could do projection of 10 then and split up to 10? Would that be reasonable?

bwplotka · 2025-11-24T14:02:34Z

Also before we add some complexity, have you tried compression option? https://github.com/GoogleCloudPlatform/prometheus-engine/blob/main/doc/api.md#monitoring.googleapis.com/v1.ConfigSpec

balakumarpg · 2025-11-25T07:48:38Z

Also before we add some complexity, have you tried compression option? main/doc/api.md#monitoring.googleapis.com/v1.ConfigSpec

Yes, after compression it is 1.3 MB, only the GlobalRules.

balakumarpg · 2025-11-25T07:50:39Z

Nice, sounds like the plan would be to do projections and split by 3 at least. Is that solving your use case? Do you have more or less equal distribution of rules across those types?

We could do projection of 10 then and split up to 10? Would that be reasonable?

Not equally distributed. We are only using GlobalRules, but we can use Rules and ClusterRules as well. If this 1MB problem is solved or extended limit of 3MB then, we can live with that for a while and design our alerts in distributed using these 3 types.

# |<---- Max 80 chars ---->| # # Types: build, chore, ci, docs, perf, refactor, revert, style, test # Scopes: configs, deps, e2e, export, main, operator, prometheus, # frontend, datasource-syncer, config-reloader, rule-evaluator # # Rules: # - Use lowercase # - Use imperative mood ("add" not "adding") # - No period at end of header # - Body wraps at 72 chars # # Example: # refactor(operator): split rules into separate configmaps per type # # Create one ConfigMap per rule type (rules, clusterrules, globalrules) # to work around the 1MB Kubernetes ConfigMap size limit. # # - Implement one ConfigMap per type approach # - Add retry logic with exponential backoff # - Update deployment to mount ConfigMaps # - Fix the linting issues

Fix all 8 golangci-lint violations and manifest comment format: - Add periods to comments (godot) - Use integer range for Go 1.22+ (intrange) - Use t.Context() in tests instead of context.Background() (usetesting) - Update manifest comments to match regeneration format Files changed: - pkg/operator/rules.go: Add periods to function comments, use range loop - pkg/operator/rules_test.go: Add period to type comment, use t.Context() - manifests/rule-evaluator.yaml: Update ConfigMap mount comments

Restore status update logic that was accidentally removed during the ConfigMap-per-type refactoring. This ensures Rule/ClusterRules/GlobalRules objects have their MonitoringStatus properly updated with success/failure conditions. Also fix golangci-lint errors and update tests to use newFakeClientBuilder for proper status subresource support. Changes: - pkg/operator/rules.go: Restore status tracking and patchMonitoringStatus calls - pkg/operator/rules.go: Fix linter errors (godot, intrange, usetesting) - pkg/operator/rules_test.go: Use newFakeClientBuilder in new tests - pkg/operator/rules_test.go: Fix linter errors, remove unused imports - manifests/rule-evaluator.yaml: Update ConfigMap mount comments Fixes failing TestRulesStatus and TestEnsureRuleConfigs tests. All tests now pass successfully.

Update e2e tests to work with the new ConfigMap-per-type architecture. The rules are now split into three ConfigMaps (rules, clusterrules, globalrules) instead of a single rules-generated ConfigMap. Changes: - e2e/ruler_test.go: Aggregate data from all three ConfigMaps in test - pkg/operator/rules.go: Update controller to watch all three ConfigMaps - pkg/operator/rules.go: Rename constants and update predicates This fixes the TestAlertmanager/rules-create timeout failure. Also includes previous fixes for status updates, linter errors, and manifest formatting.

Split the config maps based on size while being created. So that many…

64b3f2f

… config maps are created with max size of 950KB and rule evaluator loads all configmaps to file for rules generator

balakumarpg force-pushed the fix-rule-evaluator-config-map-size-issue branch from 9dfd927 to e873885 Compare November 23, 2025 22:06

balakumarpg added 4 commits November 25, 2025 08:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Split the config maps based on size while being created. So that many config maps are created with max size of 950KB and rule evaluator loads all configmaps to file for rules generator #1801

Split the config maps based on size while being created. So that many config maps are created with max size of 950KB and rule evaluator loads all configmaps to file for rules generator #1801

Uh oh!

balakumarpg commented Nov 10, 2025 •

edited

Loading

Uh oh!

google-cla bot commented Nov 10, 2025

Uh oh!

bwplotka commented Nov 18, 2025

Uh oh!

balakumarpg commented Nov 23, 2025

Uh oh!

bwplotka commented Nov 24, 2025

Uh oh!

bwplotka commented Nov 24, 2025

Uh oh!

balakumarpg commented Nov 25, 2025

Uh oh!

balakumarpg commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Split the config maps based on size while being created. So that many config maps are created with max size of 950KB and rule evaluator loads all configmaps to file for rules generator #1801

Are you sure you want to change the base?

Split the config maps based on size while being created. So that many config maps are created with max size of 950KB and rule evaluator loads all configmaps to file for rules generator #1801

Uh oh!

Conversation

balakumarpg commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

google-cla bot commented Nov 10, 2025

Uh oh!

bwplotka commented Nov 18, 2025

Uh oh!

balakumarpg commented Nov 23, 2025

Uh oh!

bwplotka commented Nov 24, 2025

Uh oh!

bwplotka commented Nov 24, 2025

Uh oh!

balakumarpg commented Nov 25, 2025

Uh oh!

balakumarpg commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

balakumarpg commented Nov 10, 2025 •

edited

Loading