Skip to content

feat: ad-intent-mismatch v3 — WSIS scoring, URL exclusion, eviction capping#2133

Open
danieljchuser wants to merge 6 commits intomainfrom
feat/ad-intent-mismatch-v3
Open

feat: ad-intent-mismatch v3 — WSIS scoring, URL exclusion, eviction capping#2133
danieljchuser wants to merge 6 commits intomainfrom
feat/ad-intent-mismatch-v3

Conversation

@danieljchuser
Copy link
Collaborator

@danieljchuser danieljchuser commented Mar 13, 2026

Summary

  • Replace flat pageView threshold with WSIS (Wasted Spend Impact Score) composite scoring using CPC, pageViews, bounceRate, and engagedScrollRate
  • Add URL pattern exclusion for support/checkout/legal/auth pages (10 regex patterns)
  • Fetch paid pages from S3 for CPC/traffic enrichment data
  • Raise PAGE_VIEW_THRESHOLD to 5000 (was 1000), CUT_OFF_BOUNCE_RATE to 0.5 (was 0.3)
  • Add eviction-based opportunity capping (max 4 per recommendation type per site)
  • Pass enriched fields (cpc, sumTraffic, topKeyword, serpTitle, engagedScrollRate, priorityScore) to Mystique via SQS
  • Remove step4 (importTrafficAnalysisWeekStep4) from audit pipeline

Companion PR

This PR must be deployed together with its companion Mystique PR:

Deploy order: Mystique first (backward-compatible), then this audit-worker PR.

Test plan

  • All 128 existing + new tests pass (npm test)
  • 100% coverage maintained on all modified files
  • ESLint passes
  • CI pipeline passes
  • E2E validation with Mystique companion PR in dev environment

🤖 Generated with Claude Code

danieljchuser and others added 3 commits March 13, 2026 13:39
…apping

- Replace flat pageView threshold with WSIS (Wasted Spend Impact Score) composite scoring
- Add URL pattern exclusion for support/checkout/legal/auth pages
- Fetch paid pages from S3 for CPC/traffic enrichment
- Raise PAGE_VIEW_THRESHOLD to 5000, CUT_OFF_BOUNCE_RATE to 0.5
- Add eviction-based opportunity capping (max 4 per recommendation type)
- Pass enriched fields (cpc, sumTraffic, topKeyword, serpTitle, engagedScrollRate, priorityScore) to Mystique
- Remove step4 (importTrafficAnalysisWeekStep4) from audit pipeline

Related: Mystique PR for companion LLM-side changes (content validation, stealth screenshots, prompt hardening)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tching

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove unreachable || [] fallback on line 576 (runner always returns
predominantlyPaidPages). Add tests for || fallbacks in guidance-handler
lines 101-103 (suggestions and sumTraffic defaults).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

This PR will trigger a minor release when merged.

@codecov
Copy link

codecov bot commented Mar 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@danieljchuser danieljchuser self-assigned this Mar 16, 2026
@danieljchuser danieljchuser added enhancement New feature or request ai-native labels Mar 16, 2026
Copy link
Contributor

@alinarublea alinarublea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: feat: ad-intent-mismatch v3 — WSIS scoring, URL exclusion, eviction capping

Overall the PR is well-structured with good test coverage, but there are a few real bugs and several concerns worth addressing.


Bugs

1. URL exclusion patterns miss no-trailing-slash URLs (Medium)
All 10 regex patterns require a trailing / (e.g., /checkout/). URLs like https://example.com/checkout (no trailing slash) won't be excluded. Fix by replacing trailing / with (\/|$):
```js
// Before
//(cart|checkout|order|payment)//i
// After
//(cart|checkout|order|payment)(/|$)/i
```
No test covers this case, so the bug is also invisible in the test suite.

2. Non-array S3 response causes unhandled TypeError (Medium)
fetchPaidPagesFromS3: if the parsed JSON is {} instead of an array, for (const page of pages) throws TypeError: pages is not iterable. Add an Array.isArray(pages) guard before the loop.

3. NaN from invalid AD_INTENT_MAX_PAGES silently sends 0 messages (Medium)
parseInt('abc', 10)NaN, and slice(0, NaN) returns []. The audit would send no messages without any error. Should fall back to default 10 when the env var is unparseable.

4. Evicted opportunity double-setStatus('IGNORED') (Low)
After eviction, sameTypeOpportunities still contains the evicted entry. If the evicted opportunity shares the same URL as the incoming one, the URL de-duplication code will call setStatus('IGNORED') on it a second time. Filter out the evicted opportunity from sameTypeOpportunities before the URL de-duplication step.

5. Missing null-guard on auditResult.predominantlyPaidPages (Low)
The /* c8 ignore next 2 */ + || [] fallback was removed. If auditResult doesn't have this key, searchPages.length will throw. The defensive fallback should stay.


Other Concerns

WSIS formula — magic number /1000 (Low)
computePriorityScore divides by 1000 with no explanation. This is directly coupled to the > 0.01 filter cutoff — if either changes, the other must too. Both should be named constants with a comment explaining the relationship.

isExcludedPageType/results/ is very broad
Would match /ab-test-results/, analytics dashboards, etc. Consider a more specific pattern.

S3 failure is unobservable (Low)
On S3 fetch error the audit silently returns {} with no metric/alert emitted. This makes the silent termination hard to detect in production.

save() vs saveMany() inconsistency
Eviction calls .save() directly; URL de-duplication uses Opportunity.saveMany. Should be consistent.

MAX_PAGES defined inline in function scope
Inconsistent with how other thresholds are handled via getConfig. Should be at module scope or in config.


Positive Notes

  • Small, single-purpose helper functions (inferRecommendationType, getEvictionScore, findLowestEvictionCandidate, isExcludedPageType) are cleanly factored
  • Full JSDoc on all new functions
  • The "full pipeline integration" test at the end of handler.test.js is excellent
  • Eviction capping tests cover all key branches well
  • Exporting internal functions for unit testing is good practice

Priority Fixes Before Merge

  1. Fix URL exclusion regex to match terminal path segments (no trailing slash)
  2. Add Array.isArray guard in fetchPaidPagesFromS3
  3. Handle NaN for AD_INTENT_MAX_PAGES
  4. Re-add || [] fallback for auditResult.predominantlyPaidPages
  5. Exclude evicted opportunity from the URL de-duplication set

🤖 Reviewed with Claude Code

danieljchuser and others added 2 commits March 16, 2026 15:30
- Replace read-then-evict pattern with create-first, reconcile-after
  approach to handle concurrent guidance callbacks safely
- Only cap audit_required opportunities (max 4 per site); never remove
  modify_heading opportunities (those with variationChanges)
- Remove newest excess audit_required opportunities and their
  suggestions via removeByIds (not just IGNORED status)
- Use Opportunity.allBySiteIdAndStatus to filter at DB level (M1)
- fetchPaidPagesFromS3 returns null for empty data instead of
  throwing, with separate log levels for errors vs no-data (M2)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix URL exclusion regexes to match terminal path segments without
  trailing slash (e.g. /checkout) by replacing \/ with (\/|$)
- Add Array.isArray guard in fetchPaidPagesFromS3 to handle non-array
  S3 responses gracefully instead of throwing TypeError
- Add comment explaining the /1000 normalization in computePriorityScore

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@danieljchuser
Copy link
Collaborator Author

Addressed the review in 847e2ca. Here's the breakdown:

Fixed

Bug 1 (URL exclusion trailing-slash): Valid. All 10 regex patterns now use (\/|$) instead of \/ so terminal path segments like /checkout are excluded. Test added.

Bug 2 (Non-array S3 response): Valid. Added Array.isArray(pages) guard in fetchPaidPagesFromS3 — returns null on non-array JSON instead of throwing TypeError. Test added.

Concern 1 (/1000 magic number): Added comment explaining the normalization scale and its relationship to the > 0.01 filter threshold.

Not fixing

Bug 3 (NaN from AD_INTENT_MAX_PAGES): The actual behavior is the opposite of what the review states — NaN > 0 evaluates to false, so slice(0, undefined) returns all pages, not zero. Only triggers if someone sets the env var to garbage. Not worth adding complexity for.

Bug 4 (Double-setStatus): Incorrect. The code already uses ignoredIds (line 168-169) to filter out evicted entries before reconciliation. No double-call occurs.

Bug 5 (Missing || [] fallback): paidKeywordOptimizerRunner always sets predominantlyPaidPages in its return value, so the fallback branch is unreachable. Adding it would create dead code that can't be covered by tests. Skipped.

Concern 2 (/results/ regex): With the Bug 1 fix, the pattern matches /results as a complete path segment. /ab-test-results won't match because results must be preceded by /.

Concern 3 (S3 failure unobservable): The error is logged. Adding metrics/alerting is out of scope for this PR.

Concern 4 (save vs saveMany): Each context uses the appropriate method — site.save() for a single site, Opportunity.saveMany() for bulk updates. No inconsistency.

Concern 5 (MAX_PAGES inline): Intentional — it's runtime-configurable via env var, unlike the compile-time constants at module scope.

@alinarublea
Copy link
Contributor

Thanks for the detailed breakdown — here's where I land after checking the code:

Accepted explanations:

  • Bug 3 (NaN from AD_INTENT_MAX_PAGES): Correction accepted. I had the wrong mental model — parseInt('abc', 10) || 10 doesn't apply because 'abc' is truthy so it's passed to parseInt. But you're right that NaN > 0 === false means it falls through to slice(0, undefined), returning all pages. The actual behavior is benign, not zero-messages. My bug report was incorrect.

  • Bug 4 (double-setStatus): Accepted. The ignoredIds set correctly filters out evicted entries from sameTypeOpportunities before reconcileOpportunities is called. No double-call is possible.

Still a concern:

  • Bug 5 (missing || [] fallback): The argument "it's always set, so the fallback is dead code" doesn't hold — that's exactly what the removed /* c8 ignore next 2 */ comment was for. The old code explicitly acknowledged this contract ("paidKeywordOptimizerRunner always returns an array") while keeping the fallback as defense-in-depth. The new code does const searchPages = auditResult.predominantlyPaidPages and then immediately accesses searchPages.length — if the contract ever breaks, this throws a TypeError rather than failing gracefully. An explicit null-check (if (!searchPages) { return {}; }) would be both testable and safe. Please add this back.

All other concerns noted — no further action needed from my side on those.

danieljchuser added a commit that referenced this pull request Mar 17, 2026
## Summary

- Short-circuit the ad-intent-mismatch guidance handler to return `ok()`
immediately, preventing opportunity/suggestion creation
- This allows E2E testing of the v3 pipeline without polluting dev with
test opportunities

## Context

The `feat/ad-intent-mismatch-v3` PR (#2133) is being E2E tested on a
personal Lambda alias. This patch prevents the **currently deployed**
dev Lambda from creating opportunities when it processes guidance
responses triggered during testing.

**This is temporary** — the `feat/ad-intent-mismatch-v3` PR will
overwrite both files completely when merged.

## Test plan

- [x] 2 tests pass covering the short-circuit behavior
- [x] 100% coverage on guidance-handler.js
- [x] ESLint passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
solaris007 pushed a commit that referenced this pull request Mar 17, 2026
## [1.358.1](v1.358.0...v1.358.1) (2026-03-17)

### Bug Fixes

* temporarily disable ad-intent-mismatch opportunity creation ([#2147](#2147)) ([73e0db6](73e0db6)), closes [#2133](#2133)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-native enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants