Skip to content

Make regex and glob pattern matching stricter#196

Closed
max-dw-i wants to merge 21 commits intokintesh:masterfrom
max-dw-i:master
Closed

Make regex and glob pattern matching stricter#196
max-dw-i wants to merge 21 commits intokintesh:masterfrom
max-dw-i:master

Conversation

@max-dw-i
Copy link

Upstream pull request.

I was trying to use glob patterns and ran into one bug and one quirk that made them impossible to use correctly:

Regex metacharacters (such as .) were not escaped in globs (which use regexes for parsing internally):

!*.example.com matched evilexample.com
!e.a.p.e.com matched example.com

Patterns do not match the entire domain, even with "match domain only" enabled:

@example.com matched evil.example.com.evil.com (user fixable with regex patterns: @^example.com$ behaves as expected, but I don't want this to be necessary)
!example.com matched evil.example.com.evil.com (no anchors in glob patterns, not fixable by the user)

I believe it is more intuitive for patterns to always match the entire string when "match domain only" is enabled.

The first commit fixes the regex metacharacter bug and the second adds anchors when matching patterns with "match domain only" is enabled. The final commit updates the tests (not 100% if the logic is correct with the matchDomainOnly/should/should not testing).

The "match domain only" change is a breaking change. You can drop that commit if you want, but please document the current behavior. The glob example in the README matches many more domains more than just the specified domain.

A weaker change might be to add anchors only when matching glob patterns, not regexes. But note that this is a breaking change too: users might've been using example.com to match subdomains or www.example.com.

If you want the patterns to behave like prefixes it's not sufficient to only add a right $ anchor, because example.com$ still matches evilexample.com

I'm not sure what the best solution here is.

raghur and others added 21 commits March 23, 2023 16:07
This will be used to handle eTLD parsing.
This fixes eTLD handling for parsing domains (for use with default
container rule addition templating) to support IANA TLDs, and while we're
at it, effective TLDs from Mozilla's public suffix list.

This may result in some hosts ending up with different containers, so this
may consitute data loss.
Could not figure out how to make 'remark-loader' work.
Replace flags '!', '@' with '{g}', '{r}' for readability.
Replace the global option 'Match domain only' with
the pattern-specific flag '{d}'.
@max-dw-i max-dw-i closed this Apr 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments