Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR migrates from a serverless Lambda function using Sparticuz/Chromium to a container-based Lambda function using native Playwright with Chromium. The changes enable the Lambda to run in a containerized environment with full browser capabilities.
- Replaced Sparticuz/Chromium with native Playwright Chromium browser
- Added Docker configuration for container-based Lambda deployment
- Updated build scripts to use npm instead of yarn and include Playwright installation
Reviewed Changes
Copilot reviewed 7 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/src/main.ts | Updated browser launch configuration to use native Playwright Chromium instead of Sparticuz |
| scripts/src/lambda-handler.ts | Renamed handler function from lambdaHandler to handler for Lambda compatibility |
| scripts/prepare-image-links.sh | Switched from yarn to npm and added Playwright Chromium installation |
| scripts/package.json | Removed Sparticuz/Chromium dependency |
| scripts/build-lambda.sh | Added Docker build script for container-based Lambda |
| scripts/Dockerfile.script | Added multi-stage Dockerfile for Lambda container |
| scripts/.dockerignore | Added Docker ignore file to exclude build artifacts |
Files not reviewed (1)
- scripts/package-lock.json: Language not supported
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| const children = el.querySelectorAll("*"); | ||
| for (const child of Array.from(children)) { |
There was a problem hiding this comment.
[nitpick] The refactoring from Array.from(el.querySelectorAll("*")) to el.querySelectorAll("*") followed by Array.from(children) in the loop is unnecessary and adds complexity. The original pattern was more direct.
| const children = el.querySelectorAll("*"); | |
| for (const child of Array.from(children)) { | |
| for (const child of el.querySelectorAll("*")) { |
| }, | ||
| ); | ||
| console.log("Page content is ready."); | ||
| await page.waitForTimeout(5000); |
There was a problem hiding this comment.
Replacing specific selector waiting with a fixed timeout reduces reliability. The original waitForSelector approach was more robust as it waited for actual content to be ready rather than an arbitrary time period.
| await page.waitForTimeout(5000); | |
| await page.waitForSelector('img[src^="https://images.pexels.com/photos/"]', { timeout: 15000 }); |
| await page.waitForTimeout(5000); | ||
| console.log("New content loaded after clicking Load More."); | ||
|
|
||
| await page.waitForTimeout(2000); |
There was a problem hiding this comment.
The timeout was reduced from 5000ms to 2000ms after clicking 'Load More'. This shorter timeout may not allow sufficient time for new content to load, potentially causing the scraper to miss images.
| await page.waitForTimeout(2000); | |
| await page.waitForTimeout(5000); |
No description provided.