Skip to content

Comments

[WIP] NE-2118: Enable AAAA filtering via CoreDNS template plugin#1936

Draft
grzpiotrowski wants to merge 2 commits intoopenshift:masterfrom
grzpiotrowski:NE-2118-coredns-custom-ipv6-template-plugin
Draft

[WIP] NE-2118: Enable AAAA filtering via CoreDNS template plugin#1936
grzpiotrowski wants to merge 2 commits intoopenshift:masterfrom
grzpiotrowski:NE-2118-coredns-custom-ipv6-template-plugin

Conversation

@grzpiotrowski
Copy link

Adds enhancements/dns/coredns-custom-ipv6-template-plugin.md for configuring CoreDNS template plugins through the DNS operator API.

This enhancement introduces a templates field in the DNS operator API to enable AAAA query filtering and custom IPv6 response generation. The primary use case is filtering AAAA queries in IPv4-only clusters where dual-stack applications query for both A and AAAA records, causing CoreDNS to forward unresolvable AAAA queries to upstream resolvers.

The proposal was initially drafted using the /add-enhancement ai-helper command and reviewed and edited by the author.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 4, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 4, 2026

@grzpiotrowski: This pull request references NE-2118 which is a valid jira issue.

Details

In response to this:

Adds enhancements/dns/coredns-custom-ipv6-template-plugin.md for configuring CoreDNS template plugins through the DNS operator API.

This enhancement introduces a templates field in the DNS operator API to enable AAAA query filtering and custom IPv6 response generation. The primary use case is filtering AAAA queries in IPv4-only clusters where dual-stack applications query for both A and AAAA records, causing CoreDNS to forward unresolvable AAAA queries to upstream resolvers.

The proposal was initially drafted using the /add-enhancement ai-helper command and reviewed and edited by the author.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 4, 2026
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 4, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 4, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign knobunc for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

@alebedev87 alebedev87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review. I will need more time to think about the API and the place of template plugin in zone blocks.

Comment on lines 30 to 31
In IPv4-only clusters, dual-stack applications query for both A and AAAA
records. CoreDNS forwards unresolvable AAAA queries to upstream resolvers,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In IPv4-only clusters, dual-stack applications query for both A and AAAA
records. CoreDNS forwards unresolvable AAAA queries to upstream resolvers,
In IPv4-only clusters, applications may query for both A and AAAA
records. CoreDNS forwards unresolvable AAAA queries to upstream resolvers,

Stub resolvers like the one from glibc will try to send both DNS queries even on a single stack IPv4 Kubernetes clusters. getaddrinfo() function from glibc does exactly this it tries to shield the user space application from details of different ip families and returns a list of addrinfo. From the man page:

The getaddrinfo() function combines the functionality provided by the gethostbyname(3) and getservbyname(3) functions into a single interface, but unlike the latter functions, getaddrinfo() is reentrant and allows programs to eliminate IPv4-versus-IPv6 dependencies.

Specifying hints as NULL is equivalent to setting ai_socktype and ai_protocol to 0; ai_family to AF_UNSPEC;

hints.ai_family = AF_UNSPEC; /* Allow IPv4 or IPv6 */

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +32 to +33
adding latency per query. Filtering AAAA queries at CoreDNS eliminates
this delay and reduces upstream DNS load.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we can mention the fact that this allows a central cluster wide configuration. As opposed to more tedious (from customers) configuration which needs to be done in every pod (dnsConfig can be used in the pod spec to set option no-aaaa).

Comment on lines 41 to 43
* As a cluster administrator in an IPv4-only environment, I want to filter AAAA
queries so that I can eliminate IPv6 lookup delays and reduce upstream DNS
load.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good if we can phrase it in a way to highlight the fact that the cluster admin wants to make it a central (single for the whole cluster) configuration.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased now as suggested.

Comment on lines +48 to +49
* As an SRE, I want operator conditions and metrics for template configuration
so that I can monitor DNS optimization effectiveness.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I didn't think of it, the template plugin provides some built in metrics.

* Add `templates` field to DNS operator API with extensible design for future
expansion
* Enable AAAA filtering (primary use case) and custom response generation
* Support IN class, AAAA records, NOERROR/NXDOMAIN responses initially
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure about NXDOMAIN response code. This may be disruptive for dns clients, I can think of a dns client which may interpret AAAA's NXDOMAIN response as "don't try A query and go to the next search domain". To be double checked on a real cluster. Also, the initial request was for NOERROR code only, going beyond this is possible but we need to have a strong reason.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I was trying to take the extendability of the API into account but didn't think this one through and definitely not a place for the NXDOMAIN in the current goals. Removed that now.

Comment on lines 169 to 172
// generateResponse generates a custom DNS response with an answer section.
// This is useful for static DNS mappings or dynamic response generation.
// +optional
GenerateResponse *DNSGenerateResponseAction `json:"generateResponse,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that this is just an example of how we can extend the API to do answer stanza, right? We will have to remove it from the final version of the EP but for now we can keep it to help us design an extensible API.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this wouldn't be in the scope now. Though with the future extensibility in mind.

Comment on lines 263 to 268
### Important Limitations and Warnings

**Dual-Stack Clusters**: AAAA filtering is designed for single-stack IPv4
clusters. In dual-stack clusters, the operator automatically excludes
cluster.local from filtering to preserve IPv6 service connectivity. Review
`AAAAFilterDualStackWarning` condition when zone "." is configured.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting point. How do you think the DNS operator can detect a dual-stack setup? And how this will translate into Corefile instructions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operator can get the Network config and check if the ClusterNetwork has a IPv6 CIDR.
If we detect one, then we would have a fallthrough inserted conditionally in the corefile kubernetes plugin.

      template IN AAAA . {
          match "^(.*\.)?cluster\.local\.$"
          fallthrough # This would skip to the next plugin (kubernetes)
      }

      template IN AAAA . {
          rcode NOERROR
      }

      kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
      }

Actually more to consider here since the order of executing the plugins doesn't depend on the corefile order but rather on the coredns plugin.cfg.


The API uses discriminated unions for actions and typed enums for record types/classes/response codes. This enables future expansion:

- **New actions**: Add fields to `DNSTemplateAction` (e.g., `Rewrite`, `Redirect`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which template plugin's stanza rewrite or redirect map to? Or you meant to use template field as an umbrella for rewrite plugin too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to make much sense looking at it again. Putting rewrite and redirect or any other plugins for that matter under the template field would be a confusing design.
If there are any other plugin implemented in the future I reckon they should be all kept separated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are any other plugin implemented in the future I reckon they should be all kept separated.

Not necessary, we may try to combine similar plugins under the same DNS API. However as of now, it may complicate the design without an explicit requirement for it. The rewrite pluigin is something which we considered as an implementation for AAAA filtering however it appeared to be 1) trying to cover some legacy stack usecases (pre Happy eyeballs dns clients which send AAAA query first), 2) more "hacky" (may cause problems for some dns clients) while template plugin seems to give more standard (predictable for dns clients) api.

Comment on lines 296 to 297
**Zone Specificity**: More specific zones take precedence (e.g.,
`tools.corp.example.com` > `corp.example.com` > `.`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So dns.spec.templates will be applied only for the default block? I need to think about it more but just to understand - what was your reasoning?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After giving it another though. I think adding the template to all servers is what we makes most sense at least for the first iteration (simple, "in one place" filtering).

An alternative would be add another API field to DNS.spec.servers to explicitly configure each upstream server.

**Integration Tests**: Operator workflow (apply/update/delete templates,
ConfigMap regeneration, conditions), dual-stack detection

**E2E Tests** (labeled `[OCPFeatureGate:DNSTemplatePlugin]`):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not forget origin tests needed for the featuregate promotion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, added mention of the origin tests.

// breaking IPv6 service connectivity.
//
// +optional
Templates []DNSTemplate `json:"templates,omitempty"`
Copy link
Contributor

@alebedev87 alebedev87 Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I missed in the first iteration. Since it's a slice, what the fallthrough behavior we are anticipating here?

Seems like without regex matching having multiple templates may be not useful:

fallthrough Continue with the next template instance if the template’s ZONE matches a query name but no regex match. If there is no next template, continue resolution with the next plugin.

Another phrase attracted my attention too:

Without fallthrough, when the template’s ZONE matches a query but no regex match then a SERVFAIL response is returned.

Does it mean that we should consider adding regex to the API to not get SERVFAIL? Something to be checked.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think without the match in the implementation we won't be affected by the SERVFAIL scenario as the template would execute when the zone matches and if it doesn't match, then it would skip over to the next template.
api spec:

spec:
  templates:
    - name: some-filter
      zones: ["some.example.com"]
      queryType: AAAA
      action:
        returnEmpty:
          rcode: NOERROR

    - name: global-filter
      zones: ["."]
      queryType: AAAA
      action:
        returnEmpty:
          rcode: NOERROR

corefile:

.:5353 {
    template IN AAAA some.example.com {
        rcode NOERROR
    }

    template IN AAAA . {
        rcode NOERROR
    }

    kubernetes cluster.local ...
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the template would execute when the zone matches and if it doesn't match, then it would skip over to the next template.

Did you test this?

@Miciah
Copy link
Contributor

Miciah commented Feb 11, 2026

/assign

@candita
Copy link
Contributor

candita commented Feb 11, 2026

/cc
/assign @alebedev87 @bentito

Copy link
Contributor

@alebedev87 alebedev87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another look. Main things which may need another iteration:

  • Special treatment of dual stack clusters, I would like to avoid it to simplify the implementation.
  • Action API field and it's forward thinking design to support multiple actions.
  • Template ordering by zone, I would prefer to avoid any complex interpretation of the API on the operator side.
  • Template fallthrough, I'm still not quite convinced we need it to support multiple templates.

Apart from this, if other comments are addressed I think we are fine to convert from a draft to a real EP.

Comment on lines +36 to +37
without maintaining external DNS infrastructure. The CoreDNS template plugin
supports both use cases but lacks operator API integration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
without maintaining external DNS infrastructure. The CoreDNS template plugin
supports both use cases but lacks operator API integration.
without maintaining external DNS infrastructure. The CoreDNS [template plugin](https://coredns.io/plugins/template/) supports both use cases but lacks operator API integration.

Comment on lines +62 to +67
user configures AAAA filtering with zone "." on a dual-stack cluster, the
operator automatically generates an exclusion template that allows cluster.local
AAAA queries to reach the kubernetes plugin (preserving IPv6 service
connectivity) while filtering external AAAA queries. Example: Template with
`match "^(.*\.)?cluster\.local\.$"` and `fallthrough` is added before
the user's filter template.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything which follows "Protect dual-stack clusters with cluster.local exclusions"is providing too many details for Goals section. They would better belong in Proposal or Implementation Details section.

## Motivation

In IPv4-only clusters, applications may query for both A and AAAA
records. CoreDNS forwards unresolvable AAAA queries to upstream resolvers,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word "unresolvable" caught my attention. I think I understand what you meant. A use case when AAAA query goes to upstream resolvers configured for the default zone, that is, which were not resolved by the kubernetes plugin. However 1) we may have additional servers whose blocks precede the default zone, 2) the interpretation of the word can catch attestation. I think saying simply this would avoid any misinterpretation:

Suggested change
records. CoreDNS forwards unresolvable AAAA queries to upstream resolvers,
records. CoreDNS forwards AAAA queries to upstream resolvers,

Comment on lines 126 to 131
// name is a required unique identifier for this template.
// Must be a valid DNS subdomain as defined in RFC 1123.
// +kubebuilder:validation:Required
// +kubebuilder:validation:MaxLength=64
// +kubebuilder:validation:Pattern=`^[a-z0-9]([-a-z0-9]*[a-z0-9])?$`
Name string `json:"name"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if necessary.

If we won't use it in the implementation (or API, as a discriminant for instance), it won't make sense to keep at it as this will be additional work for us and for users.

* Support IN class, AAAA records, NOERROR/NXDOMAIN responses initially
* Validate templates before applying to CoreDNS
* Provide operator conditions for template status
* Protect dual-stack clusters with automatic cluster.local exclusions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying. You are right that we should think about use cases in which users can harm themselves. However 1) the template API is optional; 2) this restriction can work against us in the future as it'll limit our freedom in where to put the template plugin (in Corefile) and how to extend the API with regexp matching and new actions. So, unless you have any strong reason I didn't find, I think that we can keep the API and implementation simplier and do something later of an explicit user requirement will come up.


### API Extensibility Design

The API uses discriminated unions for actions and typed enums for query types/classes/response codes. This enables future expansion:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in the comment for the action API, we need to design the API thinking about not blocking future use cases which may include multiple actions on a plugin (e.g. rcode + authority).

Comment on lines +374 to +378
**Template Ordering Within Plugin:**
Templates are ordered by zone specificity (most specific first):
- `app.corp.example.com` (most specific)
- `corp.example.com` (less specific)
- `.` (catch-all, least specific)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought template <zone(s)> {} stanza can accept multiple zones (all zones from Zones slice): template IN AAAA zone1.com zone2.com {}.

Comment on lines 296 to 297
**Zone Specificity**: More specific zones take precedence (e.g.,
`tools.corp.example.com` > `corp.example.com` > `.`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After giving it another though. I think adding the template to all servers is what we makes most sense at least for the first iteration (simple, "in one place" filtering).

An alternative would be add another API field to DNS.spec.servers to explicitly configure each upstream server.

// breaking IPv6 service connectivity.
//
// +optional
Templates []DNSTemplate `json:"templates,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the template would execute when the zone matches and if it doesn't match, then it would skip over to the next template.

Did you test this?

| External DNS servers | Operational overhead, doesn't integrate with operator model |
| Direct Corefile editing | Bypasses validation, operator overwrites changes, fragile |
| CoreDNS rewrite plugin | Designed for query rewriting, not response generation/filtering |
| Application-level config | Requires modifying all apps/images, not feasible at scale |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean options no-aaaa from /etc/resolv.conf which can be configured in the pod spec? If so, can you please detail this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants