-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Description
Why
Storing both original and cleaned representations enables re-extraction and auditing
Definition of Done
- New columns exist for cleaned text, cleaned HTML, language,
extracted_attimestamp, and checksum - Checksums prevent duplicate writes when content has not changed
- Large bodies are stored efficiently and streamed to the database
- Database constraints protect referential integrity
- Migration is idempotent and reversible
- Unit tests cover insert, update and no-op when checksum matches
Tasks
- Database migration to add content table and needed columns
- Define data access functions to upsert by
item_idand checksum - Stream large payloads to avoid excessive memory usage
- Compute checksum from normalized text
- Add index on item identifier and on checksum
- Write unit tests for persistence behavior
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request