feat(prerender): Phase 2 — daily batching, PageCitability writes, suggestion staleness#2146
Open
ssilare-adobe wants to merge 1 commit intomainfrom
Open
feat(prerender): Phase 2 — daily batching, PageCitability writes, suggestion staleness#2146ssilare-adobe wants to merge 1 commit intomainfrom
ssilare-adobe wants to merge 1 commit intomainfrom
Conversation
…gestion staleness Phase 2b: increase TOP_AGENTIC_URLS_LIMIT from 200 to 2000 to match page-citability coverage. Phase 2c — daily batching: filter agentic URLs already processed within the last 7 days using Suggestion updatedAt timestamps; cap to 300 URLs/day (DAILY_BATCH_SIZE); include organic URLs only on the first batch of each 7-day cycle. Phase 2c — PageCitability writes: after comparing HTML in step 3, write citability metrics to the PageCitability entity for every successfully scraped URL. This enables the page-citability audit to detect recently-processed URLs via its 7-day staleness filter, eliminating duplicate scraping across both audits (300 sites × 300 URLs/day = 90k pages vs 180k with both audits running). Merged analyzeHtmlForPrerender and calculateCitabilityScore into a single calculateStats call (html-comparator.js), eliminating a redundant HTML analysis per URL. Phase 2d: pass stalenessDays=7 to syncSuggestions so suggestions for URLs outside the current daily batch are only marked OUTDATED after 7 days, aligning with the rolling batching cycle. Also commented out branch-deploy CI job to prevent dev deployments on every PR push while this branch is in review. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
TOP_AGENTIC_URLS_LIMITfrom 200 → 2000 to match page-citability's agentic URL coverageupdatedAt), cap to 300 URLs/day (DAILY_BATCH_SIZE), include organic URLs only on the first batch of each 7-day cyclePageCitabilityentity for every successfully scraped URL — enables page-citability audit to detect recently-processed URLs via its 7-day staleness filter, preventing duplicate scraping (300 sites × 300 URLs = 90k pages/day vs 180k with both audits running)analyzeHtmlForPrerender+calculateCitabilityScore: Both calledcalculateStats(html, html, true)— now a single call inhtml-comparator.jsreturns both prerender and citability metricsstalenessDays: 7tosyncSuggestionsso suggestions outside the current daily batch are only marked OUTDATED after 7 days, aligning with the rolling cyclebranch-deployjob to prevent dev deployments on every PR push during reviewPost-deployment steps (no code changes)
prerenderjobintervalfromevery-sunday→dailyin the Configuration entity via APIpage-citabilityaudit in Configuration — it will see all PageCitability records written by prerender and skip everythingTest plan
PageCitabilityrecords are created/updated in DynamoDB for prerender-audited URLs after deploy🤖 Generated with Claude Code