Conversation
There was a problem hiding this comment.
Pull Request Overview
This pull request migrates the scripts from Python to TypeScript/Node.js, replacing Selenium with Playwright for web scraping functionality. The migration maintains the same core functionality of extracting image links from Pexels pages while modernizing the technology stack.
Key changes:
- Complete rewrite from Python to TypeScript using Playwright instead of Selenium
- Updated build pipeline from Python/UV to Node.js/Yarn workflow
- Maintained the same API interface for both standalone and Lambda execution
Reviewed Changes
Copilot reviewed 11 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/tsconfig.json | New TypeScript configuration for the project |
| scripts/src/main.ts | Main TypeScript implementation replacing Python functionality |
| scripts/src/lambda-handler.ts | AWS Lambda handler rewritten in TypeScript |
| scripts/package.json | Node.js package configuration with Playwright dependency |
| scripts/prepare-image-links.sh | Updated build script to use Yarn and Node.js |
| scripts/.gitignore | Updated to ignore Node.js artifacts instead of Python |
| .github/workflows/build-lambda.yaml | CI pipeline updated for Node.js build process |
| scripts/pyproject.toml | Removed Python project configuration |
| scripts/main.py | Removed Python implementation |
| scripts/lambda-handler.py | Removed Python Lambda handler |
| scripts/.python-version | Removed Python version specification |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| const processedImgLinks: string[] = []; | ||
| for (const link of imgLinks) { | ||
| const processedLink = link.split("?")[0]; | ||
| if (!processedImgLinks.includes(processedLink)) { | ||
| processedImgLinks.push(processedLink); | ||
| } | ||
| } | ||
|
|
||
| return processedImgLinks; |
There was a problem hiding this comment.
Using includes() in a loop creates O(n²) time complexity for duplicate removal. Consider using a Set for O(n) performance: const processedSet = new Set(); for (const link of imgLinks) { processedSet.add(link.split('?')[0]); } return Array.from(processedSet);
| const processedImgLinks: string[] = []; | |
| for (const link of imgLinks) { | |
| const processedLink = link.split("?")[0]; | |
| if (!processedImgLinks.includes(processedLink)) { | |
| processedImgLinks.push(processedLink); | |
| } | |
| } | |
| return processedImgLinks; | |
| const processedSet = new Set<string>(); | |
| for (const link of imgLinks) { | |
| processedSet.add(link.split("?")[0]); | |
| } | |
| return Array.from(processedSet); |
| let text = | ||
| el.textContent || (el as HTMLElement).innerText || ""; | ||
|
|
||
| // Also check all child elements for text | ||
| const children = Array.from(el.querySelectorAll("*")); | ||
| for (const child of children) { | ||
| if (child.textContent) { | ||
| text += " " + child.textContent; | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
The function is duplicating text content by including both the element's own text and all child element text. Child element text is already included in the parent's textContent. Consider using only el.textContent or adding logic to avoid duplication.
| let text = | |
| el.textContent || (el as HTMLElement).innerText || ""; | |
| // Also check all child elements for text | |
| const children = Array.from(el.querySelectorAll("*")); | |
| for (const child of children) { | |
| if (child.textContent) { | |
| text += " " + child.textContent; | |
| } | |
| } | |
| // textContent already includes all descendant text | |
| const text = el.textContent || (el as HTMLElement).innerText || ""; |
No description provided.