feat: ad-intent-mismatch v3 — WSIS scoring, URL exclusion, eviction capping by danieljchuser · Pull Request #2133 · adobe/spacecat-audit-worker

danieljchuser · 2026-03-13T12:41:01Z

Summary

Replace flat pageView threshold with WSIS (Wasted Spend Impact Score) composite scoring using CPC, pageViews, bounceRate, and engagedScrollRate
Add URL pattern exclusion for support/checkout/legal/auth pages (10 regex patterns)
Fetch paid pages from S3 for CPC/traffic enrichment data
Raise PAGE_VIEW_THRESHOLD to 5000 (was 1000), CUT_OFF_BOUNCE_RATE to 0.5 (was 0.3)
Add eviction-based opportunity capping (max 4 per recommendation type per site)
Pass enriched fields (cpc, sumTraffic, topKeyword, serpTitle, engagedScrollRate, priorityScore) to Mystique via SQS
Remove step4 (importTrafficAnalysisWeekStep4) from audit pipeline

Companion PR

This PR must be deployed together with its companion Mystique PR:

Mystique: https://git.corp.adobe.com/experience-platform/mystique/pull/1379 (branch: feat/ad-intent-mismatch-v3)
- Content validation, stealth screenshots, prompt hardening, enriched field passthrough

Deploy order: Mystique first (backward-compatible), then this audit-worker PR.

Test plan

All 128 existing + new tests pass (npm test)
100% coverage maintained on all modified files
ESLint passes
CI pipeline passes
E2E validation with Mystique companion PR in dev environment

🤖 Generated with Claude Code

…apping - Replace flat pageView threshold with WSIS (Wasted Spend Impact Score) composite scoring - Add URL pattern exclusion for support/checkout/legal/auth pages - Fetch paid pages from S3 for CPC/traffic enrichment - Raise PAGE_VIEW_THRESHOLD to 5000, CUT_OFF_BOUNCE_RATE to 0.5 - Add eviction-based opportunity capping (max 4 per recommendation type) - Pass enriched fields (cpc, sumTraffic, topKeyword, serpTitle, engagedScrollRate, priorityScore) to Mystique - Remove step4 (importTrafficAnalysisWeekStep4) from audit pipeline Related: Mystique PR for companion LLM-side changes (content validation, stealth screenshots, prompt hardening) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tching Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove unreachable || [] fallback on line 576 (runner always returns predominantlyPaidPages). Add tests for || fallbacks in guidance-handler lines 101-103 (suggestions and sumTraffic defaults). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-16T10:28:06Z

This PR will trigger a minor release when merged.

codecov · 2026-03-16T10:34:22Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

alinarublea

PR Review: `feat: ad-intent-mismatch v3 — WSIS scoring, URL exclusion, eviction capping`

Overall the PR is well-structured with good test coverage, but there are a few real bugs and several concerns worth addressing.

Bugs

1. URL exclusion patterns miss no-trailing-slash URLs (Medium)
All 10 regex patterns require a trailing / (e.g., /checkout/). URLs like https://example.com/checkout (no trailing slash) won't be excluded. Fix by replacing trailing / with (\/|$):
```js
// Before
//(cart|checkout|order|payment)//i
// After
//(cart|checkout|order|payment)(/|$)/i
```
No test covers this case, so the bug is also invisible in the test suite.

2. Non-array S3 response causes unhandled TypeError (Medium)
fetchPaidPagesFromS3: if the parsed JSON is {} instead of an array, for (const page of pages) throws TypeError: pages is not iterable. Add an Array.isArray(pages) guard before the loop.

3. NaN from invalid AD_INTENT_MAX_PAGES silently sends 0 messages (Medium)
parseInt('abc', 10) → NaN, and slice(0, NaN) returns []. The audit would send no messages without any error. Should fall back to default 10 when the env var is unparseable.

4. Evicted opportunity double-setStatus('IGNORED') (Low)
After eviction, sameTypeOpportunities still contains the evicted entry. If the evicted opportunity shares the same URL as the incoming one, the URL de-duplication code will call setStatus('IGNORED') on it a second time. Filter out the evicted opportunity from sameTypeOpportunities before the URL de-duplication step.

5. Missing null-guard on auditResult.predominantlyPaidPages (Low)
The /* c8 ignore next 2 */ + || [] fallback was removed. If auditResult doesn't have this key, searchPages.length will throw. The defensive fallback should stay.

Other Concerns

WSIS formula — magic number /1000 (Low)
computePriorityScore divides by 1000 with no explanation. This is directly coupled to the > 0.01 filter cutoff — if either changes, the other must too. Both should be named constants with a comment explaining the relationship.

isExcludedPageType — /results/ is very broad
Would match /ab-test-results/, analytics dashboards, etc. Consider a more specific pattern.

S3 failure is unobservable (Low)
On S3 fetch error the audit silently returns {} with no metric/alert emitted. This makes the silent termination hard to detect in production.

save() vs saveMany() inconsistency
Eviction calls .save() directly; URL de-duplication uses Opportunity.saveMany. Should be consistent.

MAX_PAGES defined inline in function scope
Inconsistent with how other thresholds are handled via getConfig. Should be at module scope or in config.

Positive Notes

Small, single-purpose helper functions (inferRecommendationType, getEvictionScore, findLowestEvictionCandidate, isExcludedPageType) are cleanly factored
Full JSDoc on all new functions
The "full pipeline integration" test at the end of handler.test.js is excellent
Eviction capping tests cover all key branches well
Exporting internal functions for unit testing is good practice

Priority Fixes Before Merge

Fix URL exclusion regex to match terminal path segments (no trailing slash)
Add Array.isArray guard in fetchPaidPagesFromS3
Handle NaN for AD_INTENT_MAX_PAGES
Re-add || [] fallback for auditResult.predominantlyPaidPages
Exclude evicted opportunity from the URL de-duplication set

🤖 Reviewed with Claude Code

- Replace read-then-evict pattern with create-first, reconcile-after approach to handle concurrent guidance callbacks safely - Only cap audit_required opportunities (max 4 per site); never remove modify_heading opportunities (those with variationChanges) - Remove newest excess audit_required opportunities and their suggestions via removeByIds (not just IGNORED status) - Use Opportunity.allBySiteIdAndStatus to filter at DB level (M1) - fetchPaidPagesFromS3 returns null for empty data instead of throwing, with separate log levels for errors vs no-data (M2) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix URL exclusion regexes to match terminal path segments without trailing slash (e.g. /checkout) by replacing \/ with (\/|$) - Add Array.isArray guard in fetchPaidPagesFromS3 to handle non-array S3 responses gracefully instead of throwing TypeError - Add comment explaining the /1000 normalization in computePriorityScore Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

danieljchuser · 2026-03-16T14:42:13Z

Addressed the review in 847e2ca. Here's the breakdown:

Fixed

Bug 1 (URL exclusion trailing-slash): Valid. All 10 regex patterns now use (\/|$) instead of \/ so terminal path segments like /checkout are excluded. Test added.

Bug 2 (Non-array S3 response): Valid. Added Array.isArray(pages) guard in fetchPaidPagesFromS3 — returns null on non-array JSON instead of throwing TypeError. Test added.

Concern 1 (/1000 magic number): Added comment explaining the normalization scale and its relationship to the > 0.01 filter threshold.

Not fixing

Bug 3 (NaN from AD_INTENT_MAX_PAGES): The actual behavior is the opposite of what the review states — NaN > 0 evaluates to false, so slice(0, undefined) returns all pages, not zero. Only triggers if someone sets the env var to garbage. Not worth adding complexity for.

Bug 4 (Double-setStatus): Incorrect. The code already uses ignoredIds (line 168-169) to filter out evicted entries before reconciliation. No double-call occurs.

Bug 5 (Missing || [] fallback): paidKeywordOptimizerRunner always sets predominantlyPaidPages in its return value, so the fallback branch is unreachable. Adding it would create dead code that can't be covered by tests. Skipped.

Concern 2 (/results/ regex): With the Bug 1 fix, the pattern matches /results as a complete path segment. /ab-test-results won't match because results must be preceded by /.

Concern 3 (S3 failure unobservable): The error is logged. Adding metrics/alerting is out of scope for this PR.

Concern 4 (save vs saveMany): Each context uses the appropriate method — site.save() for a single site, Opportunity.saveMany() for bulk updates. No inconsistency.

Concern 5 (MAX_PAGES inline): Intentional — it's runtime-configurable via env var, unlike the compile-time constants at module scope.

alinarublea · 2026-03-16T14:52:41Z

Thanks for the detailed breakdown — here's where I land after checking the code:

Accepted explanations:

Bug 3 (NaN from AD_INTENT_MAX_PAGES): Correction accepted. I had the wrong mental model — parseInt('abc', 10) || 10 doesn't apply because 'abc' is truthy so it's passed to parseInt. But you're right that NaN > 0 === false means it falls through to slice(0, undefined), returning all pages. The actual behavior is benign, not zero-messages. My bug report was incorrect.
Bug 4 (double-setStatus): Accepted. The ignoredIds set correctly filters out evicted entries from sameTypeOpportunities before reconcileOpportunities is called. No double-call is possible.

Still a concern:

Bug 5 (missing || [] fallback): The argument "it's always set, so the fallback is dead code" doesn't hold — that's exactly what the removed /* c8 ignore next 2 */ comment was for. The old code explicitly acknowledged this contract ("paidKeywordOptimizerRunner always returns an array") while keeping the fallback as defense-in-depth. The new code does const searchPages = auditResult.predominantlyPaidPages and then immediately accesses searchPages.length — if the contract ever breaks, this throws a TypeError rather than failing gracefully. An explicit null-check (if (!searchPages) { return {}; }) would be both testable and safe. Please add this back.

All other concerns noted — no further action needed from my side on those.

## Summary - Short-circuit the ad-intent-mismatch guidance handler to return `ok()` immediately, preventing opportunity/suggestion creation - This allows E2E testing of the v3 pipeline without polluting dev with test opportunities ## Context The `feat/ad-intent-mismatch-v3` PR (#2133) is being E2E tested on a personal Lambda alias. This patch prevents the **currently deployed** dev Lambda from creating opportunities when it processes guidance responses triggered during testing. **This is temporary** — the `feat/ad-intent-mismatch-v3` PR will overwrite both files completely when merged. ## Test plan - [x] 2 tests pass covering the short-circuit behavior - [x] 100% coverage on guidance-handler.js - [x] ESLint passes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

## [1.358.1](v1.358.0...v1.358.1) (2026-03-17) ### Bug Fixes * temporarily disable ad-intent-mismatch opportunity creation ([#2147](#2147)) ([73e0db6](73e0db6)), closes [#2133](#2133)

danieljchuser and others added 3 commits March 13, 2026 13:39

fix: normalize URLs to strip www. prefix for consistent Ahrefs/RUM ma…

57ec527

…tching Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge branch 'main' into feat/ad-intent-mismatch-v3

1e9ab37

danieljchuser self-assigned this Mar 16, 2026

danieljchuser added enhancement New feature or request ai-native labels Mar 16, 2026

alinarublea reviewed Mar 16, 2026

View reviewed changes

danieljchuser and others added 2 commits March 16, 2026 15:30

alinarublea approved these changes Mar 16, 2026

View reviewed changes

danieljchuser mentioned this pull request Mar 17, 2026

fix: temporarily disable ad-intent-mismatch opportunity creation #2147

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ad-intent-mismatch v3 — WSIS scoring, URL exclusion, eviction capping#2133

feat: ad-intent-mismatch v3 — WSIS scoring, URL exclusion, eviction capping#2133
danieljchuser wants to merge 6 commits intomainfrom
feat/ad-intent-mismatch-v3

danieljchuser commented Mar 13, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

codecov bot commented Mar 16, 2026

Uh oh!

alinarublea left a comment

Uh oh!

danieljchuser commented Mar 16, 2026

Uh oh!

alinarublea commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danieljchuser commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Companion PR

Test plan

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

codecov bot commented Mar 16, 2026

Codecov Report

Uh oh!

alinarublea left a comment

Choose a reason for hiding this comment

PR Review: feat: ad-intent-mismatch v3 — WSIS scoring, URL exclusion, eviction capping

Bugs

Other Concerns

Positive Notes

Priority Fixes Before Merge

Uh oh!

danieljchuser commented Mar 16, 2026

Fixed

Not fixing

Uh oh!

alinarublea commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danieljchuser commented Mar 13, 2026 •

edited

Loading

PR Review: `feat: ad-intent-mismatch v3 — WSIS scoring, URL exclusion, eviction capping`