feat: ad-intent-mismatch v3 — WSIS scoring, URL exclusion, eviction capping#2133
feat: ad-intent-mismatch v3 — WSIS scoring, URL exclusion, eviction capping#2133danieljchuser wants to merge 6 commits intomainfrom
Conversation
…apping - Replace flat pageView threshold with WSIS (Wasted Spend Impact Score) composite scoring - Add URL pattern exclusion for support/checkout/legal/auth pages - Fetch paid pages from S3 for CPC/traffic enrichment - Raise PAGE_VIEW_THRESHOLD to 5000, CUT_OFF_BOUNCE_RATE to 0.5 - Add eviction-based opportunity capping (max 4 per recommendation type) - Pass enriched fields (cpc, sumTraffic, topKeyword, serpTitle, engagedScrollRate, priorityScore) to Mystique - Remove step4 (importTrafficAnalysisWeekStep4) from audit pipeline Related: Mystique PR for companion LLM-side changes (content validation, stealth screenshots, prompt hardening) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tching Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove unreachable || [] fallback on line 576 (runner always returns predominantlyPaidPages). Add tests for || fallbacks in guidance-handler lines 101-103 (suggestions and sumTraffic defaults). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
This PR will trigger a minor release when merged. |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
alinarublea
left a comment
There was a problem hiding this comment.
PR Review: feat: ad-intent-mismatch v3 — WSIS scoring, URL exclusion, eviction capping
Overall the PR is well-structured with good test coverage, but there are a few real bugs and several concerns worth addressing.
Bugs
1. URL exclusion patterns miss no-trailing-slash URLs (Medium)
All 10 regex patterns require a trailing / (e.g., /checkout/). URLs like https://example.com/checkout (no trailing slash) won't be excluded. Fix by replacing trailing / with (\/|$):
```js
// Before
//(cart|checkout|order|payment)//i
// After
//(cart|checkout|order|payment)(/|$)/i
```
No test covers this case, so the bug is also invisible in the test suite.
2. Non-array S3 response causes unhandled TypeError (Medium)
fetchPaidPagesFromS3: if the parsed JSON is {} instead of an array, for (const page of pages) throws TypeError: pages is not iterable. Add an Array.isArray(pages) guard before the loop.
3. NaN from invalid AD_INTENT_MAX_PAGES silently sends 0 messages (Medium)
parseInt('abc', 10) → NaN, and slice(0, NaN) returns []. The audit would send no messages without any error. Should fall back to default 10 when the env var is unparseable.
4. Evicted opportunity double-setStatus('IGNORED') (Low)
After eviction, sameTypeOpportunities still contains the evicted entry. If the evicted opportunity shares the same URL as the incoming one, the URL de-duplication code will call setStatus('IGNORED') on it a second time. Filter out the evicted opportunity from sameTypeOpportunities before the URL de-duplication step.
5. Missing null-guard on auditResult.predominantlyPaidPages (Low)
The /* c8 ignore next 2 */ + || [] fallback was removed. If auditResult doesn't have this key, searchPages.length will throw. The defensive fallback should stay.
Other Concerns
WSIS formula — magic number /1000 (Low)
computePriorityScore divides by 1000 with no explanation. This is directly coupled to the > 0.01 filter cutoff — if either changes, the other must too. Both should be named constants with a comment explaining the relationship.
isExcludedPageType — /results/ is very broad
Would match /ab-test-results/, analytics dashboards, etc. Consider a more specific pattern.
S3 failure is unobservable (Low)
On S3 fetch error the audit silently returns {} with no metric/alert emitted. This makes the silent termination hard to detect in production.
save() vs saveMany() inconsistency
Eviction calls .save() directly; URL de-duplication uses Opportunity.saveMany. Should be consistent.
MAX_PAGES defined inline in function scope
Inconsistent with how other thresholds are handled via getConfig. Should be at module scope or in config.
Positive Notes
- Small, single-purpose helper functions (
inferRecommendationType,getEvictionScore,findLowestEvictionCandidate,isExcludedPageType) are cleanly factored - Full JSDoc on all new functions
- The "full pipeline integration" test at the end of
handler.test.jsis excellent - Eviction capping tests cover all key branches well
- Exporting internal functions for unit testing is good practice
Priority Fixes Before Merge
- Fix URL exclusion regex to match terminal path segments (no trailing slash)
- Add
Array.isArrayguard infetchPaidPagesFromS3 - Handle
NaNforAD_INTENT_MAX_PAGES - Re-add
|| []fallback forauditResult.predominantlyPaidPages - Exclude evicted opportunity from the URL de-duplication set
🤖 Reviewed with Claude Code
- Replace read-then-evict pattern with create-first, reconcile-after approach to handle concurrent guidance callbacks safely - Only cap audit_required opportunities (max 4 per site); never remove modify_heading opportunities (those with variationChanges) - Remove newest excess audit_required opportunities and their suggestions via removeByIds (not just IGNORED status) - Use Opportunity.allBySiteIdAndStatus to filter at DB level (M1) - fetchPaidPagesFromS3 returns null for empty data instead of throwing, with separate log levels for errors vs no-data (M2) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix URL exclusion regexes to match terminal path segments without trailing slash (e.g. /checkout) by replacing \/ with (\/|$) - Add Array.isArray guard in fetchPaidPagesFromS3 to handle non-array S3 responses gracefully instead of throwing TypeError - Add comment explaining the /1000 normalization in computePriorityScore Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Addressed the review in 847e2ca. Here's the breakdown: FixedBug 1 (URL exclusion trailing-slash): Valid. All 10 regex patterns now use Bug 2 (Non-array S3 response): Valid. Added Concern 1 (/1000 magic number): Added comment explaining the normalization scale and its relationship to the Not fixingBug 3 (NaN from AD_INTENT_MAX_PAGES): The actual behavior is the opposite of what the review states — Bug 4 (Double-setStatus): Incorrect. The code already uses Bug 5 (Missing || [] fallback): Concern 2 (/results/ regex): With the Bug 1 fix, the pattern matches Concern 3 (S3 failure unobservable): The error is logged. Adding metrics/alerting is out of scope for this PR. Concern 4 (save vs saveMany): Each context uses the appropriate method — Concern 5 (MAX_PAGES inline): Intentional — it's runtime-configurable via env var, unlike the compile-time constants at module scope. |
|
Thanks for the detailed breakdown — here's where I land after checking the code: Accepted explanations:
Still a concern:
All other concerns noted — no further action needed from my side on those. |
## Summary - Short-circuit the ad-intent-mismatch guidance handler to return `ok()` immediately, preventing opportunity/suggestion creation - This allows E2E testing of the v3 pipeline without polluting dev with test opportunities ## Context The `feat/ad-intent-mismatch-v3` PR (#2133) is being E2E tested on a personal Lambda alias. This patch prevents the **currently deployed** dev Lambda from creating opportunities when it processes guidance responses triggered during testing. **This is temporary** — the `feat/ad-intent-mismatch-v3` PR will overwrite both files completely when merged. ## Test plan - [x] 2 tests pass covering the short-circuit behavior - [x] 100% coverage on guidance-handler.js - [x] ESLint passes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Companion PR
This PR must be deployed together with its companion Mystique PR:
feat/ad-intent-mismatch-v3)Deploy order: Mystique first (backward-compatible), then this audit-worker PR.
Test plan
npm test)🤖 Generated with Claude Code