-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
txt2phrases — Feature Enhancement Proposal
Enhance txt2phrases to support more flexible input handling and compatibility with research workflows such as pygetpapers.
This update will make the library capable of automatically processing research papers in varied directory structures, converting PDFs to text, and allowing both single-file and batch-folder input.
Proposed Enhancements
1. pygetpapers Output Compatibility
- Goal: Enable
txt2phrasesto automatically detect and process the directory structure generated bypygetpapers. - Why: The current structure of
pygetpapersoutputs differs from standard input formats expected bytxt2phrases. - Expected Behavior:
txt2phrasesshould intelligently navigate nested folders to find and process.pdfor.txtfiles.
2. PDF → TXT Conversion Method
- Goal: Add a built-in method to convert
.pdffiles into.txtfor downstream keyword extraction. - Why: Users should be able to directly process PDF research papers without manual text extraction.
3. File and Folder Input Support
-
Goal: Allow
txt2phrasesto work seamlessly with both single files and entire directories. -
Why: This provides flexibility for users who want to analyze one document or batch-process an entire dataset.
Metadata
Metadata
Assignees
Labels
No labels