Skip to content

[1/3] Separate comparison logic from data loading in matchers#464

Open
yaakov-stein wants to merge 2 commits intofacebook:mainfrom
yaakov-stein:refactor_matchers
Open

[1/3] Separate comparison logic from data loading in matchers#464
yaakov-stein wants to merge 2 commits intofacebook:mainfrom
yaakov-stein:refactor_matchers

Conversation

@yaakov-stein
Copy link
Contributor

@yaakov-stein yaakov-stein commented Mar 8, 2026

Stacked PRs:

This is helpful for #355 (but makes sense as a standalone refactor as well). Changes are non-functional - all bytecode should remain exactly the same, no new features, etc.

Summary

Decomposes matcher codegen into independent concerns to enable flavor-specific data loading while sharing comparison logic across flavors.

Every matcher generator performs three operations:
A) Protocol guard : Is this the right L3/L4 protocol? (R7/R8 check)
B) Data loading (flavor-specific): Get the field value into R1 (R1+R2 for 128-bit)
C) Comparison: Does the value match the rule payload? (EQ/NE/RANGE/mask)

Previously, concerns B and C were tangled in per-protocol files (ip4.c, ip6.c, tcp.c, udp.c, icmp.c). Each file hardcoded both packet header loading and comparison, making it impossible to reuse comparison logic for non-packet flavors and hard to reuse the logic in general.

Before

bf_matcher_generate_packet()
     │
     ├── ip4.c ── OUTER: guard + load l3_hdr → INNER: offset + 32-bit compare
     ├── ip6.c ── OUTER: guard + load l3_hdr → INNER: offset + 128-bit compare
     ├── tcp.c ── OUTER: guard + load l4_hdr → INNER: offset + port compare
     ├── udp.c ── OUTER: guard + load l4_hdr → INNER: identical to tcp port
     └── icmp.c ── guard + load l4_hdr + 8-bit compare

Problems: B+C tangled, tcp/udp port comparison duplicated, adding a new flavor means either duplicating C or refactoring B+C apart.

After

bf_matcher_generate_packet()         future: _bf_cgroup_sock_addr_gen_inline_matcher()
     │                                    │
     ├── A: bf_stub_rule_check_protocol() │  A: bf_stub_rule_check_protocol()  ← SHARED
     ├── B: load from l3_hdr/l4_hdr       │  B: load from bpf_sock_addr        ← FLAVOR-SPECIFIC
     └── C: bf_cmp_*()                    └  C: bf_cmp_*()                     ← SHARED
                    │
          ┌─────────────────────┐
          │  cgen/matcher/cmp.c │
          │  bf_cmp_value()     │  EQ/NE at sizes 1, 2, 4, 16
          │  bf_cmp_masked_value│  masked EQ/NE for network prefixes
          │  bf_cmp_range()     │  16-bit port RANGE
          │  bf_cmp_bitfield()  │  TCP flag ANY/ALL
          └─────────────────────┘

Per-protocol files eliminated (ip4.c/h, ip6.c/h, tcp.c/h, udp.c/h, icmp.c/h). Loading logic absorbed into packet.c static helpers; comparison logic extracted into cmp.c.

Test plan

  • unit + e2e testing
  • BPF bytecode output is identical for all existing matchers (pure refactor)

@meta-cla meta-cla bot added the cla signed label Mar 8, 2026
@github-actions
Copy link

github-actions bot commented Mar 8, 2026

Claude: review of facebook/bpfilter #464 (b0f6a25)

Must fix

  • Protocol guard deduplication drops guards for mixed-protocol rulessrc/bpfilter/cgen/program.c:307 — Fixed: now uses layer-based bitfield per maintainer recommendation.

Suggestions

  • Public ABI change to bf_flavor_ops (dismissed) — src/libbpfilter/include/bpfilter/flavor.h:127 — Author: "Beyond scope of PR. The breakage is fine for now"
  • _bf_packet_load_and_compare re-fetches meta — Fixed: _bf_matcher_pkt_load_and_cmp now takes meta as a parameter.
  • Unchecked bf_matcher_get_meta() return value — Fixed: _bf_program_generate_rule pre-loop now skips BF_MATCHER_SET and returns -EINVAL if meta is NULL, per maintainer suggestion.
  • Commit 1 fixes a function that commit 2 immediately removes — No longer applicable: PR restructured to 2 commits with clean separation.
  • Unnecessary include in program.c — Fixed: cgen/matcher/packet.h include removed.
  • _bf_prefix_to_mask does not validate prefixlen (dismissed) — Author: "keep as is for now, moved from previous location".
  • bf_cmp_value size=4 changes bytecode for meta.marksrc/bpfilter/cgen/matcher/cmp.c:96 — The size=4 path emits MOV32_IMM + JMP_REG (2 insns) instead of the old single JMP_IMM. This is actually a correctness fix for mark values > 0x7FFFFFFF (JMP_IMM sign-extends, MOV32_IMM+JMP_REG does not), but commit 1 claims "bytecode is identical". Consider updating the commit message.
  • Flow hash/mark comparison duplicated across 3 flavorssrc/bpfilter/cgen/cgroup_skb.c, nf.c, tc.c — The post-load comparison logic for BF_MATCHER_META_FLOW_HASH (RANGE branch to bf_cmp_range, else bf_cmp_value on BPF_REG_0) is byte-for-byte identical across all three flavor files. Consider extracting a shared helper, leaving only the flavor-specific load in each file to align with the PR's goal of sharing comparison logic.
  • Protocol dedup sentinel value 0 matches IPPROTO_IP — Fixed: switched to layer-based bitfield (checked_layers |= BF_FLAG(meta->layer)), which no longer uses protocol IDs as sentinel values.
  • Commit 2 subject omits the lib component — No longer applicable: flavor.h is no longer modified in this version.
  • Commit 2 body references incorrect function names — Fixed: body now correctly references bf_stub_rule_check_protocol and bf_stub_load_header.
  • Dropped operator validation for ip6.nexthdrsrc/bpfilter/cgen/matcher/packet.c:203 — The original _bf_matcher_generate_ip6_nexthdr validated that the operator was EQ or NE, returning -EINVAL otherwise. This guard was dropped. Maintainer suggested using meta->ops for validation.
  • Document scratch register clobbering in bf_cmp_value/bf_cmp_masked_value — Fixed: cmp.h now explicitly documents which registers are clobbered for each size (e.g. "Clobbers BPF_REG_2 for size 4/8 and BPF_REG_3/BPF_REG_4 for size 16").
  • _bf_matcher_pkt_load_field missing size 8 case — Fixed: case 8 added alongside case 16 with bpf_size = BPF_DW.

Nits

  • Missing @brief tag in cmp.h Doxygen — Fixed.
  • Missing @param for meta in _bf_packet_load_and_compare Doxygen — Fixed.
  • Inconsistent Doxygen in meta.hsrc/bpfilter/cgen/matcher/meta.h:11bf_matcher_generate_meta has no Doxygen, while bf_matcher_generate_packet (same pattern) in packet.h has full docs.
  • Operator precedence in prefix mask computation — Fixed: parentheses added.
  • Copyright notice missing yearcmp.c:3, cmp.h:3 — New files omit the year that all other src/bpfilter/cgen/ files include (e.g., Copyright (c) 2022 Meta Platforms, Inc. and affiliates.).
  • Use bool for protocol dedup flags — Fixed: replaced with layer-based bitfield.
  • Non-Doxygen block comment closes on separate line — Fixed: comment promoted to Doxygen (/**), where closing on a separate line is correct per style guide.
  • Commit 2 body uses British English "centralises" — The codebase uses American English ("optimize", "reorganize"). Consider "centralizes".
  • Redundant header-base loads for same-protocol matchers — Each matcher independently calls bf_stub_load_header, reloading R6 even when consecutive matchers share the same protocol. The old per-protocol files loaded the header once. Minor bytecode size increase, not a correctness issue.
  • Commit 2 body omits the protocol-check deduplication change — The new pre-pass in _bf_program_generate_rule deduplicates protocol guards (previously each matcher emitted its own), changing bytecode structure. The commit body should mention this behavioral improvement rather than framing the commit as purely file-consolidation.
  • cmp.h Doxygen references internal static function — Fixed: header docs no longer name _bf_cmp_build_imm64.
  • Use @code{.c} instead of bare @codesrc/bpfilter/cgen/matcher/cmp.c:35 — The project consistently uses @code{.c} for C code (9+ examples in libbpfilter headers). The _bf_cmp_build_imm64 Doxygen uses bare @code without a language specifier.

@yaakov-stein yaakov-stein changed the title Separate comparison logic from data loading in matchers [1/3] Separate comparison logic from data loading in matchers Mar 8, 2026
@yaakov-stein yaakov-stein marked this pull request as ready for review March 9, 2026 19:06
@yaakov-stein yaakov-stein requested a review from qdeslandes as a code owner March 9, 2026 19:06
Copy link
Contributor

@qdeslandes qdeslandes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review the full diff deeply, as https://github.com/facebook/bpfilter/pull/464/changes#r2915802296 will involve big-ish changes (i.e. let's not worry about the nits when there are core comments to resolve).

For the few nits already there, most of them are linked to a specific line but apply to the all change.

That being said, it's a great PR and a big step in the right direction!

@yaakov-stein yaakov-stein force-pushed the refactor_matchers branch 2 times, most recently from ab9e6dd to c979327 Compare March 11, 2026 22:58
@yaakov-stein yaakov-stein force-pushed the refactor_matchers branch 2 times, most recently from dad2d27 to 4b67dd8 Compare March 12, 2026 01:51
Copy link
Contributor

@qdeslandes qdeslandes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more comments, but the core of the change LGTM :)

Separate comparison logic from data loading in matcher codegen.
bf_cmp_value, bf_cmp_masked_value, bf_cmp_range, and bf_cmp_bitfield
encapsulate the BPF bytecode patterns previously duplicated across
ip4.c, ip6.c, tcp.c, udp.c, icmp.c, and meta.c. No behavioral change;
emitted bytecode is identical.
Remove ip4.c/h, ip6.c/h, tcp.c/h, udp.c/h, and icmp.c/h by absorbing
their loading logic into packet.c. Protocol guards and header loads now
go through bf_stub_rule_check_protocol and bf_stub_load_header driven
by _bf_matcher_metas, while comparisons use the shared cmp.c functions.

This centralises all packet matcher codegen in a single dispatch point,
making it straightforward to add new flavor-specific dispatch without
touching per-protocol files.
Copy link
Contributor

@qdeslandes qdeslandes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few nit, but LGTM.

Comment on lines +313 to +319
if (meta->layer == BF_MATCHER_LAYER_3 ||
meta->layer == BF_MATCHER_LAYER_4) {
r = bf_stub_rule_check_protocol(program, meta);
if (r)
return r;
checked_layers |= BF_FLAG(meta->layer);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have any matcher on L2 yet, but the enum value is defined, so it should be supported here.

Comment on lines +63 to +96
static int _bf_matcher_pkt_load_field(struct bf_program *program,
const struct bf_matcher_meta *meta,
int reg)
{
int bpf_size;

switch (meta->hdr_payload_size) {
case 1:
bpf_size = BPF_B;
break;
case 2:
bpf_size = BPF_H;
break;
case 4:
bpf_size = BPF_W;
break;
case 8:
case 16:
bpf_size = BPF_DW;
break;
default:
return -EINVAL;
}

EMIT(program,
BPF_LDX_MEM(bpf_size, reg, BPF_REG_6, meta->hdr_payload_offset));

if (meta->hdr_payload_size == 16) {
EMIT(program, BPF_LDX_MEM(BPF_DW, reg + 1, BPF_REG_6,
meta->hdr_payload_offset + 8));
}

return 0;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would insert the EMIT instruction in each case here, it would clarify the behaviour instead of handling 64-bits compare as a special case. This leads to very low code duplicate (in the source code, no duplication in the BPF bytecode).

Comment on lines +215 to +218

r = _bf_cmp_build_imm64(program, BPF_REG_3, BPF_REG_4, masked_hi);
if (r)
return r;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure which question you're refering to :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants