[WIP] NE-2118: Enable AAAA filtering via CoreDNS template plugin#1936
[WIP] NE-2118: Enable AAAA filtering via CoreDNS template plugin#1936grzpiotrowski wants to merge 2 commits intoopenshift:masterfrom
Conversation
|
Skipping CI for Draft Pull Request. |
|
@grzpiotrowski: This pull request references NE-2118 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
alebedev87
left a comment
There was a problem hiding this comment.
Initial review. I will need more time to think about the API and the place of template plugin in zone blocks.
| In IPv4-only clusters, dual-stack applications query for both A and AAAA | ||
| records. CoreDNS forwards unresolvable AAAA queries to upstream resolvers, |
There was a problem hiding this comment.
| In IPv4-only clusters, dual-stack applications query for both A and AAAA | |
| records. CoreDNS forwards unresolvable AAAA queries to upstream resolvers, | |
| In IPv4-only clusters, applications may query for both A and AAAA | |
| records. CoreDNS forwards unresolvable AAAA queries to upstream resolvers, |
Stub resolvers like the one from glibc will try to send both DNS queries even on a single stack IPv4 Kubernetes clusters. getaddrinfo() function from glibc does exactly this it tries to shield the user space application from details of different ip families and returns a list of addrinfo. From the man page:
The getaddrinfo() function combines the functionality provided by the gethostbyname(3) and getservbyname(3) functions into a single interface, but unlike the latter functions, getaddrinfo() is reentrant and allows programs to eliminate IPv4-versus-IPv6 dependencies.
Specifying hints as NULL is equivalent to setting ai_socktype and ai_protocol to 0; ai_family to AF_UNSPEC;
hints.ai_family = AF_UNSPEC; /* Allow IPv4 or IPv6 */
| adding latency per query. Filtering AAAA queries at CoreDNS eliminates | ||
| this delay and reduces upstream DNS load. |
There was a problem hiding this comment.
Also we can mention the fact that this allows a central cluster wide configuration. As opposed to more tedious (from customers) configuration which needs to be done in every pod (dnsConfig can be used in the pod spec to set option no-aaaa).
| * As a cluster administrator in an IPv4-only environment, I want to filter AAAA | ||
| queries so that I can eliminate IPv6 lookup delays and reduce upstream DNS | ||
| load. |
There was a problem hiding this comment.
Would be good if we can phrase it in a way to highlight the fact that the cluster admin wants to make it a central (single for the whole cluster) configuration.
There was a problem hiding this comment.
Rephrased now as suggested.
| * As an SRE, I want operator conditions and metrics for template configuration | ||
| so that I can monitor DNS optimization effectiveness. |
There was a problem hiding this comment.
That's a good point. I didn't think of it, the template plugin provides some built in metrics.
| * Add `templates` field to DNS operator API with extensible design for future | ||
| expansion | ||
| * Enable AAAA filtering (primary use case) and custom response generation | ||
| * Support IN class, AAAA records, NOERROR/NXDOMAIN responses initially |
There was a problem hiding this comment.
I'm not quite sure about NXDOMAIN response code. This may be disruptive for dns clients, I can think of a dns client which may interpret AAAA's NXDOMAIN response as "don't try A query and go to the next search domain". To be double checked on a real cluster. Also, the initial request was for NOERROR code only, going beyond this is possible but we need to have a strong reason.
There was a problem hiding this comment.
Right, I was trying to take the extendability of the API into account but didn't think this one through and definitely not a place for the NXDOMAIN in the current goals. Removed that now.
| // generateResponse generates a custom DNS response with an answer section. | ||
| // This is useful for static DNS mappings or dynamic response generation. | ||
| // +optional | ||
| GenerateResponse *DNSGenerateResponseAction `json:"generateResponse,omitempty"` |
There was a problem hiding this comment.
I suppose that this is just an example of how we can extend the API to do answer stanza, right? We will have to remove it from the final version of the EP but for now we can keep it to help us design an extensible API.
There was a problem hiding this comment.
Yes this wouldn't be in the scope now. Though with the future extensibility in mind.
| ### Important Limitations and Warnings | ||
|
|
||
| **Dual-Stack Clusters**: AAAA filtering is designed for single-stack IPv4 | ||
| clusters. In dual-stack clusters, the operator automatically excludes | ||
| cluster.local from filtering to preserve IPv6 service connectivity. Review | ||
| `AAAAFilterDualStackWarning` condition when zone "." is configured. |
There was a problem hiding this comment.
Interesting point. How do you think the DNS operator can detect a dual-stack setup? And how this will translate into Corefile instructions?
There was a problem hiding this comment.
The operator can get the Network config and check if the ClusterNetwork has a IPv6 CIDR.
If we detect one, then we would have a fallthrough inserted conditionally in the corefile kubernetes plugin.
template IN AAAA . {
match "^(.*\.)?cluster\.local\.$"
fallthrough # This would skip to the next plugin (kubernetes)
}
template IN AAAA . {
rcode NOERROR
}
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
Actually more to consider here since the order of executing the plugins doesn't depend on the corefile order but rather on the coredns plugin.cfg.
|
|
||
| The API uses discriminated unions for actions and typed enums for record types/classes/response codes. This enables future expansion: | ||
|
|
||
| - **New actions**: Add fields to `DNSTemplateAction` (e.g., `Rewrite`, `Redirect`) |
There was a problem hiding this comment.
Which template plugin's stanza rewrite or redirect map to? Or you meant to use template field as an umbrella for rewrite plugin too?
There was a problem hiding this comment.
This doesn't seem to make much sense looking at it again. Putting rewrite and redirect or any other plugins for that matter under the template field would be a confusing design.
If there are any other plugin implemented in the future I reckon they should be all kept separated.
There was a problem hiding this comment.
If there are any other plugin implemented in the future I reckon they should be all kept separated.
Not necessary, we may try to combine similar plugins under the same DNS API. However as of now, it may complicate the design without an explicit requirement for it. The rewrite pluigin is something which we considered as an implementation for AAAA filtering however it appeared to be 1) trying to cover some legacy stack usecases (pre Happy eyeballs dns clients which send AAAA query first), 2) more "hacky" (may cause problems for some dns clients) while template plugin seems to give more standard (predictable for dns clients) api.
| **Zone Specificity**: More specific zones take precedence (e.g., | ||
| `tools.corp.example.com` > `corp.example.com` > `.`) |
There was a problem hiding this comment.
So dns.spec.templates will be applied only for the default block? I need to think about it more but just to understand - what was your reasoning?
There was a problem hiding this comment.
After giving it another though. I think adding the template to all servers is what we makes most sense at least for the first iteration (simple, "in one place" filtering).
An alternative would be add another API field to DNS.spec.servers to explicitly configure each upstream server.
| **Integration Tests**: Operator workflow (apply/update/delete templates, | ||
| ConfigMap regeneration, conditions), dual-stack detection | ||
|
|
||
| **E2E Tests** (labeled `[OCPFeatureGate:DNSTemplatePlugin]`): |
There was a problem hiding this comment.
We should not forget origin tests needed for the featuregate promotion.
There was a problem hiding this comment.
Right, added mention of the origin tests.
| // breaking IPv6 service connectivity. | ||
| // | ||
| // +optional | ||
| Templates []DNSTemplate `json:"templates,omitempty"` |
There was a problem hiding this comment.
One thing I missed in the first iteration. Since it's a slice, what the fallthrough behavior we are anticipating here?
Seems like without regex matching having multiple templates may be not useful:
fallthroughContinue with the next template instance if the template’s ZONE matches a query name but no regex match. If there is no next template, continue resolution with the next plugin.
Another phrase attracted my attention too:
Without
fallthrough, when the template’s ZONE matches a query but no regex match then a SERVFAIL response is returned.
Does it mean that we should consider adding regex to the API to not get SERVFAIL? Something to be checked.
There was a problem hiding this comment.
I think without the match in the implementation we won't be affected by the SERVFAIL scenario as the template would execute when the zone matches and if it doesn't match, then it would skip over to the next template.
api spec:
spec:
templates:
- name: some-filter
zones: ["some.example.com"]
queryType: AAAA
action:
returnEmpty:
rcode: NOERROR
- name: global-filter
zones: ["."]
queryType: AAAA
action:
returnEmpty:
rcode: NOERRORcorefile:
.:5353 {
template IN AAAA some.example.com {
rcode NOERROR
}
template IN AAAA . {
rcode NOERROR
}
kubernetes cluster.local ...
}
There was a problem hiding this comment.
the template would execute when the zone matches and if it doesn't match, then it would skip over to the next template.
Did you test this?
|
/assign |
|
/cc |
There was a problem hiding this comment.
Another look. Main things which may need another iteration:
- Special treatment of dual stack clusters, I would like to avoid it to simplify the implementation.
- Action API field and it's forward thinking design to support multiple actions.
- Template ordering by zone, I would prefer to avoid any complex interpretation of the API on the operator side.
- Template fallthrough, I'm still not quite convinced we need it to support multiple templates.
Apart from this, if other comments are addressed I think we are fine to convert from a draft to a real EP.
| without maintaining external DNS infrastructure. The CoreDNS template plugin | ||
| supports both use cases but lacks operator API integration. |
There was a problem hiding this comment.
| without maintaining external DNS infrastructure. The CoreDNS template plugin | |
| supports both use cases but lacks operator API integration. | |
| without maintaining external DNS infrastructure. The CoreDNS [template plugin](https://coredns.io/plugins/template/) supports both use cases but lacks operator API integration. |
| user configures AAAA filtering with zone "." on a dual-stack cluster, the | ||
| operator automatically generates an exclusion template that allows cluster.local | ||
| AAAA queries to reach the kubernetes plugin (preserving IPv6 service | ||
| connectivity) while filtering external AAAA queries. Example: Template with | ||
| `match "^(.*\.)?cluster\.local\.$"` and `fallthrough` is added before | ||
| the user's filter template. |
There was a problem hiding this comment.
Everything which follows "Protect dual-stack clusters with cluster.local exclusions"is providing too many details for Goals section. They would better belong in Proposal or Implementation Details section.
| ## Motivation | ||
|
|
||
| In IPv4-only clusters, applications may query for both A and AAAA | ||
| records. CoreDNS forwards unresolvable AAAA queries to upstream resolvers, |
There was a problem hiding this comment.
The word "unresolvable" caught my attention. I think I understand what you meant. A use case when AAAA query goes to upstream resolvers configured for the default zone, that is, which were not resolved by the kubernetes plugin. However 1) we may have additional servers whose blocks precede the default zone, 2) the interpretation of the word can catch attestation. I think saying simply this would avoid any misinterpretation:
| records. CoreDNS forwards unresolvable AAAA queries to upstream resolvers, | |
| records. CoreDNS forwards AAAA queries to upstream resolvers, |
| // name is a required unique identifier for this template. | ||
| // Must be a valid DNS subdomain as defined in RFC 1123. | ||
| // +kubebuilder:validation:Required | ||
| // +kubebuilder:validation:MaxLength=64 | ||
| // +kubebuilder:validation:Pattern=`^[a-z0-9]([-a-z0-9]*[a-z0-9])?$` | ||
| Name string `json:"name"` |
There was a problem hiding this comment.
Not sure if necessary.
If we won't use it in the implementation (or API, as a discriminant for instance), it won't make sense to keep at it as this will be additional work for us and for users.
| * Support IN class, AAAA records, NOERROR/NXDOMAIN responses initially | ||
| * Validate templates before applying to CoreDNS | ||
| * Provide operator conditions for template status | ||
| * Protect dual-stack clusters with automatic cluster.local exclusions |
There was a problem hiding this comment.
Thanks for clarifying. You are right that we should think about use cases in which users can harm themselves. However 1) the template API is optional; 2) this restriction can work against us in the future as it'll limit our freedom in where to put the template plugin (in Corefile) and how to extend the API with regexp matching and new actions. So, unless you have any strong reason I didn't find, I think that we can keep the API and implementation simplier and do something later of an explicit user requirement will come up.
|
|
||
| ### API Extensibility Design | ||
|
|
||
| The API uses discriminated unions for actions and typed enums for query types/classes/response codes. This enables future expansion: |
There was a problem hiding this comment.
As I mentioned in the comment for the action API, we need to design the API thinking about not blocking future use cases which may include multiple actions on a plugin (e.g. rcode + authority).
| **Template Ordering Within Plugin:** | ||
| Templates are ordered by zone specificity (most specific first): | ||
| - `app.corp.example.com` (most specific) | ||
| - `corp.example.com` (less specific) | ||
| - `.` (catch-all, least specific) |
There was a problem hiding this comment.
I thought template <zone(s)> {} stanza can accept multiple zones (all zones from Zones slice): template IN AAAA zone1.com zone2.com {}.
| **Zone Specificity**: More specific zones take precedence (e.g., | ||
| `tools.corp.example.com` > `corp.example.com` > `.`) |
There was a problem hiding this comment.
After giving it another though. I think adding the template to all servers is what we makes most sense at least for the first iteration (simple, "in one place" filtering).
An alternative would be add another API field to DNS.spec.servers to explicitly configure each upstream server.
| // breaking IPv6 service connectivity. | ||
| // | ||
| // +optional | ||
| Templates []DNSTemplate `json:"templates,omitempty"` |
There was a problem hiding this comment.
the template would execute when the zone matches and if it doesn't match, then it would skip over to the next template.
Did you test this?
| | External DNS servers | Operational overhead, doesn't integrate with operator model | | ||
| | Direct Corefile editing | Bypasses validation, operator overwrites changes, fragile | | ||
| | CoreDNS rewrite plugin | Designed for query rewriting, not response generation/filtering | | ||
| | Application-level config | Requires modifying all apps/images, not feasible at scale | |
There was a problem hiding this comment.
Do you mean options no-aaaa from /etc/resolv.conf which can be configured in the pod spec? If so, can you please detail this.
Adds enhancements/dns/coredns-custom-ipv6-template-plugin.md for configuring CoreDNS template plugins through the DNS operator API.
This enhancement introduces a templates field in the DNS operator API to enable AAAA query filtering and custom IPv6 response generation. The primary use case is filtering AAAA queries in IPv4-only clusters where dual-stack applications query for both A and AAAA records, causing CoreDNS to forward unresolvable AAAA queries to upstream resolvers.
The proposal was initially drafted using the
/add-enhancementai-helper command and reviewed and edited by the author.