Skip to content

Comments

CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers#2086

Open
ardaguclu wants to merge 3 commits intoopenshift:masterfrom
ardaguclu:kms-mode
Open

CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers#2086
ardaguclu wants to merge 3 commits intoopenshift:masterfrom
ardaguclu:kms-mode

Conversation

@ardaguclu
Copy link
Member

@ardaguclu ardaguclu commented Jan 23, 2026

This PR adds the implementation proposed in openshift/enhancements#1900.

This PR introduces KMS as a new mode in encryption controllers to be functioning along side with the other modes. Basically, the idea is to track the KMS configuration to detect any configuration changes. So that key_controller can create new Secret to initiate migration. KMS Secret will not contain any confidential data. It will simply store the KMS configuration.

For Tech Preview v1, we will only support static unix domain socket (i.e. unix:///var/run/kmsplugin/kms.sock). Consequently, in this version, we don't support KMS -> KMS migrations.

Ref: Previous attemps:

Notes:

  • The changes here will be backported to 4.21
  • Substantial portion of the changed lines are unit and integration tests.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 23, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 23, 2026

@ardaguclu: This pull request references CNTRLPLANE-2241 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR adds the implementation proposed in openshift/enhancements#1900.

This PR introduces KMS as a new mode in encryption controllers to be functioning along side with the other modes. Basically, the idea is to track the KMS configuration to detect any configuration changes. So that key_controller can create new Secret to initiate migration. KMS Secret will not contain any confidential data. It will simply store the KMS configuration.

For Tech Preview v1, we will only support static unix domain socket (i.e. unix:///var/run/kmsplugin/kms.sock). Consequently, in this version, we don't support KMS -> KMS migrations.

Ref: Previous attemps:

Notes:

  • The changes here will be backported to 4.21
  • Most of the changes are obsessively added unit tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 23, 2026

@ardaguclu: This pull request references CNTRLPLANE-2241 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR adds the implementation proposed in openshift/enhancements#1900.

This PR introduces KMS as a new mode in encryption controllers to be functioning along side with the other modes. Basically, the idea is to track the KMS configuration to detect any configuration changes. So that key_controller can create new Secret to initiate migration. KMS Secret will not contain any confidential data. It will simply store the KMS configuration.

For Tech Preview v1, we will only support static unix domain socket (i.e. unix:///var/run/kmsplugin/kms.sock). Consequently, in this version, we don't support KMS -> KMS migrations.

Ref: Previous attemps:

Notes:

  • The changes here will be backported to 4.21
  • Substantial portion of the changed lines are unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ardaguclu ardaguclu changed the title CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers WIP: CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers Jan 23, 2026
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 23, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 23, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ardaguclu
Once this PR has been reviewed and has the lgtm label, please assign dgrisonnet for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 23, 2026

@ardaguclu: This pull request references CNTRLPLANE-2241 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR adds the implementation proposed in openshift/enhancements#1900.

This PR introduces KMS as a new mode in encryption controllers to be functioning along side with the other modes. Basically, the idea is to track the KMS configuration to detect any configuration changes. So that key_controller can create new Secret to initiate migration. KMS Secret will not contain any confidential data. It will simply store the KMS configuration.

For Tech Preview v1, we will only support static unix domain socket (i.e. unix:///var/run/kmsplugin/kms.sock). Consequently, in this version, we don't support KMS -> KMS migrations.

Ref: Previous attemps:

Notes:

  • The changes here will be backported to 4.21
  • Substantial portion of the changed lines are unit and integration tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ardaguclu ardaguclu changed the title WIP: CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers Jan 23, 2026
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 23, 2026
@ardaguclu ardaguclu changed the title CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers WIP: CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers Jan 23, 2026
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 23, 2026
@ardaguclu ardaguclu force-pushed the kms-mode branch 2 times, most recently from 377c3cb to 0b007f9 Compare January 27, 2026 07:41
@ardaguclu ardaguclu changed the title WIP: CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers Jan 27, 2026
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 27, 2026
@ardaguclu
Copy link
Member Author

ardaguclu commented Jan 27, 2026

/hold
These PRs are prerequisite of this;

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 27, 2026
@ardaguclu
Copy link
Member Author

/uncc @dgrisonnet
/cc @benluddy @bertinatto @p0lyn0mial

@openshift-ci openshift-ci bot requested review from benluddy and bertinatto and removed request for dgrisonnet January 27, 2026 15:54
@openshift-ci openshift-ci bot changed the title CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers WIP: CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers Feb 6, 2026
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 6, 2026
@ardaguclu ardaguclu force-pushed the kms-mode branch 3 times, most recently from d0321f6 to b8208c8 Compare February 9, 2026 08:16
@ardaguclu
Copy link
Member Author

/retest

@ardaguclu
Copy link
Member Author

ardaguclu commented Feb 9, 2026

/retitle CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers

@openshift-ci openshift-ci bot changed the title WIP: CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers Feb 9, 2026
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 9, 2026
Copy link
Contributor

@p0lyn0mial p0lyn0mial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few more questions.
overall it lgtm

// Since in v1 we don't support kms -> kms migrations, we can use empty key.
// Because we don't need to compare secrets to detect change.
secret := crypto.ModeToNewKeyFunc[state.KMS]()
keyId := strings.Split(provider.KMS.Name, "-")[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m still wondering about encapsulation here.
I think other parts of the code won’t even be aware that the name has changed.

Only the code in encryptionconfig/config.go will encode and decode the new name, right?

Do we need to ensure that the name doesn’t already contain "-"?
For simplicity, we could add a comment in the key generation function describing this requirement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the code in encryptionconfig/config.go will encode and decode the new name, right?

That is correct. Rest of the code only deals with KeyState. They do not even know about the details of EncryptionConfiguration.

Do we need to ensure that the name doesn’t already contain "-"?

Kube-apiserver already fails, if it encounters a provider name without -. Because uniqueness has broken.

For simplicity, we could add a comment in the key generation function describing this requirement.

I've added a comment about the reasons behind this provider name generation mechanism. Please let me know your thoughts. Thanks.

func getKeyIDFromProviderName(providerName string) string {
// We just need to obtain the keyID to find our backed secret
// e.g. "1-secrets"
return strings.Split(providerName, "-")[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should this handle an unexpected provider name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it may not be possible, it is better to handle the invalid provider name gracefully. I've updated the code. Please let me know your thoughts.

},
})
case state.KMS:
// In order to preserve the uniqueness, we should insert resource name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there going to be side effects from this naming scheme visible in things like metric labels?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func createKMSProviderName(keyID, resource string) string {
// Ideally we should have used keyId simply in kms provider name.
// However, this is an upstream constraint that every provider name must be unique.
// To maintain uniqueness while still allowing access to the keyId, we generate provider name in this format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is at least less strange than smuggling identity names in phony AES-GCM provider configs. Later, I would be interested in coming up with alternatives that let the controllers correlate provider configs to their source secret without smuggling.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. It was error-prone and not nice. Since now we carry KMSConfigurations in backed Secrets, I think we won't need to carry anything in provider_names.

Comment on lines +361 to +362
// Moreover, we don't trigger new key when external reason is changed.
// Because it would lead to duplicate providers which is not allowed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, even if we did roll out a new encryption config in response to the "reason" update, we're not even capable of triggering a KEK rotation unless a specific provider happened to expose that capability independently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. That is correct.

Comment on lines 12 to 15
const (
DefaultEndpoint = "unix:///var/run/kmsplugin/kms.sock"
DefaultTimeout = 10 * time.Second
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a real need to export these constants? IMO, duplicating the values in tests is good because it requires that changes in observable behavior are accompanied by changes to test expectations. And other code (doing related things like setting up volume mounts) should be observing effective values rather than using hardcoded defaults.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason is to be accessed by key_controller. If we move these constants in there, we don't need to export them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we just inline these in key_controller?

if currentMode == state.KMS {
ks.KMSConfiguration = &apiserverv1.KMSConfiguration{
APIVersion: "v2",
Name: fmt.Sprintf("%d", keyID), // this will be updated by inserting resource name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that the name will be changed only in the EncryptionConfiguration and will not be visible to the KeyState, is this comment still relevant?

maybe we should add a comment about the expected key format, which must include the keyID (at least for now)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is not relevant anymore. I've removed it.

ks = state.KeyState{
Key: apiserverconfigv1.Key{
Name: keyId,
// Since in v1 we don't support kms -> kms migrations, we can use empty key.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understand this comment.

For KMS the Secret doesn't matter and can/will always be empty - because the actual encryption key material is in the external KMS instance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In v2, we'll have to compare this field (or KMSConfiguration annotation) to detect configurational changes. But it sounds like this comment creates confusions. So I've removed it.

@ardaguclu
Copy link
Member Author

Unrelated?
/test unit

func getKeyIDFromProviderName(providerName string) (string, error) {
// We just need to obtain the keyID to find our backed secret
// e.g. "1_secrets"
parsed := strings.Split(providerName, "_")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use strings.SplitN ?

secret := CreateEncryptionKeySecretWithRawKeyWithMode(targetNS, grs, keyID, emptyKey, "KMS")
kmsConfig := &apiserverconfigv1.KMSConfiguration{
APIVersion: "v2",
Name: fmt.Sprintf("%d_%s", keyID, grs[0].Resource),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be fmt.Sprintf("%d", keyID) ?
the resource suffix is added later.

})
case state.KMS:
// In order to preserve the uniqueness, we should insert resource name
kmsCopy := key.KMSConfiguration.DeepCopy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check key.KMSConfiguration for nil ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this is not possible, it makes sense gracefully handling the empty KMSConfiguration by skipping it with a message. Please let me know your thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that would match the existing pattern. good idea.

}

if ks.KMSConfiguration != nil {
ks, err := json.Marshal(ks.KMSConfiguration)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ks shadows the outer ks, we could rename to ksJSON

})
case state.KMS:
if key.KMSConfiguration == nil {
klog.Infof("skipping key %s in KMS mode as its KMSConfiguration is nil", key.Key.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add resource and index too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added resource name in the log.

Index is not the index in EncryptionConfiguration providers. It is an index in the Backed Secrets. We have keyID already as discriminator. Would you still want me to insert index name?

@ardaguclu
Copy link
Member Author

All tests pass on my local. According to the junit results, all the tests passed in the CI jobs too https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_library-go/2086/pull-ci-openshift-library-go-master-unit/2023755244520869888/artifacts/test/artifacts/junit_report.xml. It looks like a transient failure;
/test unit

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 17, 2026

@ardaguclu: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

return providers
}

func createKMSProviderName(keyID, resource string) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a comment: as of today there a no restrictions on length for the provider name. the only restriction i found is that we are not allowed to use :

if currentMode == state.KMS {
// We are here because Encryption Mode is not changed

// For now in v1, we don't support configurational changes. Therefore,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: someone reading v1 might get confused, maybe it's better to specify TP as well

AESGCM Mode = "aesgcm"
SecretBox Mode = "secretbox" // available from the first release, see defaultMode below
Identity Mode = "identity" // available from the first release, see defaultMode below
KMS Mode = "KMS" // only supports KMS v2
Copy link
Member

@bertinatto bertinatto Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: all the others are lowercase ("KMS")

keyMode := state.Mode(s.Annotations[encryptionSecretMode])
switch keyMode {
case state.AESCBC, state.AESGCM, state.SecretBox, state.Identity:
case state.AESCBC, state.AESGCM, state.SecretBox, state.Identity, state.KMS:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you also need to add a check to test if keyMode is KMS and key.KMSConfiguration is nil?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants