Skip to content

Add PDF upload with text extraction for Finnish learning content#7

Draft
Copilot wants to merge 3 commits intomainfrom
copilot/add-pdf-upload-functionality
Draft

Add PDF upload with text extraction for Finnish learning content#7
Copilot wants to merge 3 commits intomainfrom
copilot/add-pdf-upload-functionality

Conversation

Copy link

Copilot AI commented Jan 19, 2026

Currently, content can only be loaded by pasting text or selecting preloaded markdown files. This adds PDF upload capability with automatic text extraction and reformatting.

Changes

API & Processing

  • /api/pdf-upload endpoint handles file uploads with validation (10MB max file size, 100K max text length)
  • PDF text extraction via pdf-parse with reformatting logic:
    • Fixes hyphenated line breaks using \S+ regex (supports Finnish diacritics: ä, ö, å)
    • Normalizes whitespace and line endings
    • Preserves paragraph structure

UI

  • Tab interface in ContentSelector: "Content Library" | "Upload PDF"
  • PdfUploader component with drag-and-drop support
  • Client-side validation with error states

Configuration

  • All limits centralized in src/config/constants.ts:
export const PDF_UPLOAD_LIMITS = {
    MAX_FILE_SIZE_MB: 10,
    MAX_FILE_SIZE_BYTES: 10 * 1024 * 1024,
    MAX_TEXT_LENGTH: 100000,
    ALLOWED_MIME_TYPES: ['application/pdf'] as const,
} as const;

Screenshot

PDF Upload Interface

Dependencies

  • pdf-parse@1.1.1 - PDF text extraction (no known vulnerabilities)
  • @types/pdf-parse - TypeScript definitions

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • fonts.googleapis.com
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/finnish-learner/finnish-learner/node_modules/next/dist/compiled/jest-worker/processChild.js (dns block)
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/finnish-learner/finnish-learner/node_modules/next/dist/server/lib/start-server.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>Add ability to upload pdf and get content from there</issue_title>
<issue_description>Currently only way to get content to read/learn only from copying text or selecting already preloaded content. i want to enhance this feature by giving ability to get text from pdf document.
So app should be able to copy text, reformat it to make it suitable for app to read

Also consider about size of the file and other major scenarios, which make app crash or something. Add relacant limitation and keep them in constant file where they can calibrate in future</issue_description>

<agent_instructions>keep the implementation neat and properly seperated by not overly complicated</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

@vercel
Copy link

vercel bot commented Jan 19, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
finnish-learner Error Error Jan 19, 2026 2:38pm

Co-authored-by: MaheeGamage <26676091+MaheeGamage@users.noreply.github.com>
Co-authored-by: MaheeGamage <26676091+MaheeGamage@users.noreply.github.com>
Copilot AI changed the title [WIP] Add ability to upload PDF and extract content Add PDF upload with text extraction for Finnish learning content Jan 19, 2026
Copilot AI requested a review from MaheeGamage January 19, 2026 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ability to upload pdf and get content from there

2 participants