-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Description
Why
Must retrieve the source document for each saved item. Requires robust network behavior and normalization
Description of Done
- Given an item identifier and a URL, the fetcher downloads the document with timeouts and redirects handled
- Compressed responses are supported. Character encoding is detected and normalized to UTF-8
- Robots and common anti-bot headers are respected where feasible
- Failures are mapped to returnable vs non-retryable categories
- Unit tests stub network calls and cover timeouts, redirects, bad certificates, and content encodings
Tasks
- Add client with connect, request and total timeouts
- Enable automatic redirect following with safe maximum
- Set user agent and accept-encoding headers
- Implement content decoding and character set detection
- Classify errors: network, server, client, permanent not found
- Return a typed result used by extractor and by the job runner
- Write unit tests using a local stub server
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request