Skip to content

feat(prerender): Phase 2 — daily batching, PageCitability writes, suggestion staleness#2146

Open
ssilare-adobe wants to merge 1 commit intomainfrom
feat/prerender-phase2-citability-writes
Open

feat(prerender): Phase 2 — daily batching, PageCitability writes, suggestion staleness#2146
ssilare-adobe wants to merge 1 commit intomainfrom
feat/prerender-phase2-citability-writes

Conversation

@ssilare-adobe
Copy link
Contributor

Summary

  • Phase 2b: Increase TOP_AGENTIC_URLS_LIMIT from 200 → 2000 to match page-citability's agentic URL coverage
  • Phase 2c (batching): Filter agentic URLs already processed within the last 7 days (via Suggestion updatedAt), cap to 300 URLs/day (DAILY_BATCH_SIZE), include organic URLs only on the first batch of each 7-day cycle
  • Phase 2c (PageCitability writes): After HTML comparison in step 3, write citability metrics to the PageCitability entity for every successfully scraped URL — enables page-citability audit to detect recently-processed URLs via its 7-day staleness filter, preventing duplicate scraping (300 sites × 300 URLs = 90k pages/day vs 180k with both audits running)
  • Merged analyzeHtmlForPrerender + calculateCitabilityScore: Both called calculateStats(html, html, true) — now a single call in html-comparator.js returns both prerender and citability metrics
  • Phase 2d: Pass stalenessDays: 7 to syncSuggestions so suggestions outside the current daily batch are only marked OUTDATED after 7 days, aligning with the rolling cycle
  • CI: Commented out branch-deploy job to prevent dev deployments on every PR push during review

Post-deployment steps (no code changes)

  1. Phase 2a: Update the prerender job interval from every-sundaydaily in the Configuration entity via API
  2. Stop page-citability: Once deployed and stable, disable the page-citability audit in Configuration — it will see all PageCitability records written by prerender and skip everything

Test plan

  • All 159 prerender handler tests pass
  • All 86 data-access tests pass
  • Lint clean for modified files
  • Verify PageCitability records are created/updated in DynamoDB for prerender-audited URLs after deploy
  • Verify page-citability audit logs "No URLs to analyze" for sites where prerender recently ran

🤖 Generated with Claude Code

…gestion staleness

Phase 2b: increase TOP_AGENTIC_URLS_LIMIT from 200 to 2000 to match
page-citability coverage.

Phase 2c — daily batching: filter agentic URLs already processed within
the last 7 days using Suggestion updatedAt timestamps; cap to 300 URLs/day
(DAILY_BATCH_SIZE); include organic URLs only on the first batch of each
7-day cycle.

Phase 2c — PageCitability writes: after comparing HTML in step 3, write
citability metrics to the PageCitability entity for every successfully
scraped URL. This enables the page-citability audit to detect
recently-processed URLs via its 7-day staleness filter, eliminating
duplicate scraping across both audits (300 sites × 300 URLs/day = 90k
pages vs 180k with both audits running).

Merged analyzeHtmlForPrerender and calculateCitabilityScore into a single
calculateStats call (html-comparator.js), eliminating a redundant HTML
analysis per URL.

Phase 2d: pass stalenessDays=7 to syncSuggestions so suggestions for URLs
outside the current daily batch are only marked OUTDATED after 7 days,
aligning with the rolling batching cycle.

Also commented out branch-deploy CI job to prevent dev deployments on
every PR push while this branch is in review.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant