Skip to content

Comments

NE-2021: Support dual-stack IngressController on AWS#1940

Open
alebedev87 wants to merge 2 commits intoopenshift:masterfrom
alebedev87:NE-2021-dual-stack-support-ingresscontroller
Open

NE-2021: Support dual-stack IngressController on AWS#1940
alebedev87 wants to merge 2 commits intoopenshift:masterfrom
alebedev87:NE-2021-dual-stack-support-ingresscontroller

Conversation

@alebedev87
Copy link
Contributor

This enhancement enables automatic dual-stack (IPv4 and IPv6) IP address types for IngressController publishing services on AWS clusters using Network Load Balancers (NLB). This is a day-0 feature that automatically configures itself based on the cluster-wide cluster's IP family configuration

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 10, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 10, 2026

@alebedev87: This pull request references NE-2021 which is a valid jira issue.

Details

In response to this:

This enhancement enables automatic dual-stack (IPv4 and IPv6) IP address types for IngressController publishing services on AWS clusters using Network Load Balancers (NLB). This is a day-0 feature that automatically configures itself based on the cluster-wide cluster's IP family configuration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from Miciah and frobware February 10, 2026 23:36
@alebedev87 alebedev87 force-pushed the NE-2021-dual-stack-support-ingresscontroller branch 2 times, most recently from c178e30 to 2d2a626 Compare February 11, 2026 10:32
This enhancement enables automatic dual-stack (IPv4 and IPv6) IP address
types for IngressController publishing services on AWS clusters using
Network Load Balancers (NLB). This is a day-0 feature that automatically
configures itself based on the cluster-wide cluster's IP family configuration
@alebedev87 alebedev87 force-pushed the NE-2021-dual-stack-support-ingresscontroller branch from 2d2a626 to 61dffff Compare February 11, 2026 10:58
@alebedev87
Copy link
Contributor Author

alebedev87 commented Feb 11, 2026

/retitle NE-2021: Support dual-stack IngressController on AWS

@openshift-ci openshift-ci bot changed the title NE-2021: Add enhancement proposal for AWS dual-stack IngressController support NE-2021: Support dual-stack IngressController on AWS Feb 11, 2026
@Miciah
Copy link
Contributor

Miciah commented Feb 11, 2026

/assign


AWS NLBs support dual-stack IP address type, allowing
services to be accessible via both IPv4 and IPv6 addresses. However,
OpenShift Ingress does not currently support this capability. As IPv6 adoption increases and organizations require
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"OpenShift Ingress" is ambiguous. Does this cover our gateway controller? I suggest using a more specific term or phrasing, such as "OpenShift does not support dual-stack NLBs with the IngressController API", and adding a non-goal if this does not cover our gateway controller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@candita
Copy link
Contributor

candita commented Feb 11, 2026

/cc
/assign @davidesalerno @rfredette

Copy link
Contributor

@sadasu sadasu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this detailing Ingress controller considerations for Day-0 dualstack support. A few comments inline.

- For `DualStackIPv6Primary`: set `service.spec.ipFamilies: ["IPv6", "IPv4"]`
and `service.spec.ipFamilyPolicy: PreferDualStack`
- The AWS cloud provider (cloud-provider-aws) will read these service
fields and configure the NLB with dual-stack support accordingly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, we are not making any updates to the Ingress API, I found this comment to be useful.

Looking at the documentation for dualstack configuration on k8s services (https://kubernetes.io/docs/concepts/services-networking/dual-stack/#services), since the the 2nd IP Family in service.spec.ipFamilies is mutable, should we be setting service.spec.ipFamilyPolicy to RequireDualStack so that we don't end up in a situation with ["IPv6", "IPv4"] where the 2nd IP Family "IPv4" is removed and we are left with single stack IPv6 configured on the LB service for the NLB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, we are not making any openshift/api#2663 to the Ingress API, I found openshift/api#2663 (comment) to be useful.

Right, this is the best we have for the moment to understand how CCM will interface with the user for the dual stack implementation.

Looking at the documentation for dualstack configuration on k8s services (https://kubernetes.io/docs/concepts/services-networking/dual-stack/#services), since the the 2nd IP Family in service.spec.ipFamilies is mutable, should we be setting service.spec.ipFamilyPolicy to RequireDualStack so that we don't end up in a situation with ["IPv6", "IPv4"] where the 2nd IP Family "IPv4" is removed and we are left with single stack IPv6 configured on the LB service for the NLB.

The Kubernetes doc doesn't precise that the secondary ipFamily cannot be removed if the policy is RequireDualStack. Also, if we are considering a use case independent from the IngressController publishing service, the user can change the policy too. However in the context of this EP, we are talking about a service which is managed by the ingress-operator, so the ipFamilies and ipFamilyPolicy fields will be enforced by the operator (unsolicited changes will be stomped).

Overall, I haven't fully made my mind about which policy the ingress operator has to enforce. It's difficult to do without an actual implementation which can be tested. However, now as I'm thinking of the policy RequireDualStack can be a good choice, for the reason of it's more deterministic failure mode - I suppose that the ingress hostname will not be added to the service's status if something goes side ways with CCM load balancer provisioning. Let me change it to RequireDualStack.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Kubernetes doc doesn't precise that the secondary ipFamily cannot be removed if the policy is RequireDualStack

I just wanted to add that if the policy RequiredDualStack, the ipFamilies must have both IPv4 and IPv6 entries and we can't change the order either like the doc said. Below is tested with a dualstack kinD cluster.

$ kubectl get svc/cryostat -o yaml | yq .spec.ipFamilyPolicy
RequireDualStack

$ $ kubectl get svc/cryostat -o yaml | yq .spec.ipFamilies
- IPv4
- IPv6

$ kubectl patch svc/cryostat --type=merge -p '{"spec":{"ipFamilies": ["IPv4"]}}'
The Service "cryostat" is invalid: spec.ipFamilyPolicy: Invalid value: "RequireDualStack": must be 'SingleStack' to release the secondary IP family

$ kubectl patch svc/cryostat --type=merge -p '{"spec":{"ipFamilies": ["IPv6", "IPv4"]}}'
The Service "cryostat" is invalid: 
* spec.clusterIPs[0]: Invalid value: "10.96.17.234": expected an IPv6 value as indicated by `ipFamilies[0]`
* spec.clusterIPs[1]: Invalid value: "fd00:10:96::37f9": expected an IPv4 value as indicated by `ipFamilies[1]`

And I agreed with using RequireDualStack policy, which opens less doors for user misconfigurations.

to create both Route53 Alias A and Alias AAAA records when the cluster IP
family is dual-stack. The IP family is passed to the provider at
initialization time, similar to [AWS region](https://github.com/openshift/cluster-ingress-operator/blob/8afaffbf8ddbe65565bad52eea6267b615eceec2/pkg/dns/aws/dns.go).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, for DualStackIPv4Primary when service.spec.ipFamilies is set to ["IPv4", "IPv6"] , do we have to consider the Day-2 scenario where the cluster's IPFamily configuration remains DualStackIPv4Primary, but the 2nd IP Family in service.spec.ipFamilies removed, leaving us with just IPV4. Are we allowing this (by setting ipFamilyPolicy to PreferDualStack?
If yes, then we should also make sure to remove the AAAA DNS entry in Route53.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in the previous comment, I don't see any confirmation that RequireDualStack policy will stop the user (cluster admin in this case) from removing a secondary ipFamily. However the operator will enforce the desired state to match the IP family specified in Infrastructure CR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can quickly confirm that in #1940 (comment). And looks like we are leaning towards RequireDualStack policy so this scenario won't be supported...?

about the need to recreate the service manually. Also, the message highlights the fact
that CLB does not support the cluster-wide dual-stack IP family.

4. When the user proceeds with a service recreation, the ingress-operator creates the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see 2 other alternatives:

  1. reporting error and ingress operator going into a Degraded=True state
  2. warning user that this change to CLB on Day-2 is not allowed and continue as DualStack.

I am sure these were considered and the current option was picked as the best outcome for the customer. Could you please add a brief reasoning for that choice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning user that this change to CLB on Day-2 is not allowed and continue as DualStack.

This is what I described in the step before:

Also, the message highlights the fact that CLB does not support the cluster-wide dual-stack IP family.

reporting error and ingress operator going into a Degraded=True state

Right, this is an alternative. My reasoning was:

  1. This is a recorded user intention - to use CLB type even if a warning was given.
  2. This behavior is similar to what the ingress operator does with another Infrastructure field: ResourceTags. Azure doesn't support custom tags on their load balancer, so the ingress operator don't do anything about it.
  3. We can improve it in the future with an addition of ipFamily field to IngressController API (CRD validation). For the moment (taking deadlines into account) this is the minimum effort we can do to put our foot into the "dual stack support" territory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think letting the user know that the CLB isn't dual stack is appropriate.

One thing I don't see here is anything mentioning going back to Progressing=False once the user manually recreates the service. I'd expect this to self-heal once the user took corrective action.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I don't see here is anything mentioning going back to Progressing=False once the user manually recreates the service. I'd expect this to self-heal once the user took corrective action.

Right, I didn't go in too many details for this one as it's an existing behavior which doesn't need to be implemented. Let me explain that part though to make things clearer.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 12, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from miciah. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

- Clarify that dual-stack support is for IngressController publishing services specifically
- Add non-goal about Gateway API not being covered
- Change ipFamilyPolicy to RequireDualStack for more deterministic failure mode

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@alebedev87 alebedev87 force-pushed the NE-2021-dual-stack-support-ingresscontroller branch from d99d94b to cc616c3 Compare February 12, 2026 11:11
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 12, 2026

@alebedev87: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

The feature is day-0 only, only fresh installs are supported.

**Downgrade to version without feature:**
- On clusters installed with the `AWSDualStackInstall` feature gate and dual-stack IP family:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this section made me think of docs.redhat.com/en/documentation/openshift_container_platform/4.21/html-2ngle/config_apis/index#status-featuregates and specifically

The enabled/disabled values for a particular version may change during the life of the cluster as various .spec.featureSet values are selected. Operators may choose to restart their processes to pick up these changes, but remembering past enable/disable lists is beyond the scope of this API and is the responsibility of individual operators.

That implies to me that it is possible to remove dual stack support from an AWS OCP cluster without a complete OCP downgrade, which has cascading effects on the resources, particularly if they have IPv6 configured to be the primary/default.

My suspicion is that most of the operations listed here remain the same - the IngressController will need to reconcile and recreate the relevant services, as well as updating the DNS records. That includes not reading the IP family information, since presumably that will be behind the feature gate.

@nrb
Copy link
Contributor

nrb commented Feb 12, 2026

This looks accurate from my understanding.

- On clusters installed with the `AWSDualStackInstall` feature gate and dual-stack IP family:
- The older ingress-operator version will not read the IP family from
the Infrastructure CR and will fall back to IPv4-only configuration.
- The IngressController's publishing service will be reconciled and the `ipFamilies` and

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems in contraddiction with what we are reporting at line 317 related to the fact that Kubernetes doesn't allow switching the ipFamilies on an existing service (https://kubernetes.io/docs/concepts/services-networking/dual-stack/#services)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good point. This downgrade path from dualstack cluster to older IPv4 cluster requires conversion from dual-stack networking to IPv4-only networking (i.e. similar to day-2 conversion). This is something we do not plan to support right?

@sadasu @alebedev87 @nrb

**installer** is the OpenShift installer responsible for creating the
cluster and configuring the Infrastructure CR.

1. The cluster administrator installs an OpenShift cluster on AWS with

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposal handles DualStackIPv4Primary, DualStackIPv6Primary, and IPv4 (or unset), but the Infrastructure CR type definition likely also includes a single-stack IPv6 value. The Non-Goals section states "Implementing single-stack IPv6 IngressControllers" is a non-goal, but I think that it will be better to explicitly clarify what the operator does when it encounters ipFamily: IPv6: does it error, set a degraded status condition, fall back to IPv4, or ignore it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, I think the Infrastructure CR type definition only defines DualStackIPv4Primary, DualStackIPv6Primary, and IPv4.

https://github.com/openshift/api/blob/fca93aff74172d801b89f6c0881a910fb79931da/config/v1/types_infrastructure.go#L494-L507

Maybe, we can make it clear in this enhancement that those are the only supported values.

**Test Strategy:**
- Unit tests for operator logic that reads IP family from Infrastructure
CR and configures services accordingly.
- E2E tests verifying:
Copy link

@davidesalerno davidesalerno Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The E2E test list should also include negative and edge-case scenarios that are described in the proposal body but not covered here.

Specifically:

  • Downgrade behavior validation (dual-stack cluster downgraded to a version without the feature -> IPv4-only connectivity preserved, expected status).

cluster and configuring the Infrastructure CR.

1. The cluster administrator installs an OpenShift cluster on AWS with
the `AWSDualStackInstall` feature gate enabled and [a dual-stack IP family](https://github.com/openshift/installer/pull/9930),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can reference this link instead: https://github.com/openshift/installer/blob/b0514c8e022d8445e57f303852487d3cd59c4a0a/pkg/types/aws/platform.go#L134-L144

Alternatively, we can use openshift/installer#10207. The openshift/installer#9930 is a test PR (for reviewers to do local testing), which won't be merged at all.


4. The ingress-operator does not create an Alias AAAA record for the wildcard domain.

5. AWS provisions a standard IPv4-only NLB.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the ipFamily is IPv4, CLB is also allowed. We should be clear that the type of LB will be based on the what the user configures?

- Unit tests for operator logic that reads IP family from Infrastructure
CR and configures services accordingly.
- E2E tests verifying:
- On clusters installed with `AWSDualStackInstall` feature gate:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- On clusters installed with `AWSDualStackInstall` feature gate:
- On clusters installed with `AWSDualStackInstall` feature gate and a dual-stack IP family (i.e. field `platform.aws.ipFamily` in the install-config).:

nit: 😁

Comment on lines +355 to +356
- DNS alias records of the wildcard domain point to the AWS NLB hostname.
- AWS NLB hostname resolves to both IPv4 and IPv6 addresses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are using Route53 alias record, those A/AAAA alias records should point directly to the NLB IP addressess, right?

The current wording seems to describe CNAME records instead?

Comment on lines +370 to +372
- When the installer's `AWSDualStackInstall` feature gate is enabled,
the ingress operator will automatically configure the dual-stack IP address type for
publishing services of IngressControllers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The ingress operator should only configure dual-stack IP address if the infrastructure status set ipFamily to one of the dual-stack variants, right?

Comment on lines +447 to +448
8. Verify DNS alias is published: `dig <wildcard-domain>` should show
alias to AWS NLB hostname.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should point to IP addresses of AWS NLB since it is an alias record, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants