Paper Pulse is a minimalistic yet powerful LLM-based system for academic paper discovery, classification, and summarization. It automates the pipeline of fetching papers from various sources (ArXiv, Hugging Face, etc.), filtering them based on user intent, analyzing them with LLMs, and delivering structured reports via email.
- Multi-Source Fetching: Currently supports ArXiv, Hugging Face Daily Papers, and NeurIPS 2025. Work is underway to support all major ML conferences and additional data sources.
- Intent Parsing Agent: Converts natural language descriptions (e.g., "I am interested in jailbreaking attacks on LLMs") into structured search profiles with optimized keywords.
- Intelligent Filtering:
- Layer 1 (Keyword): Fast pre-filtering using Trie/Set matching.
- Layer 2 (LLM): Deep semantic relevance scoring and reasoning by LLMs.
- Hybrid Ranking: Sorts papers by a mix of LLM relevance scores and recency.
- Deep Analysis: Downloads PDFs to extract full text and generate structured summaries (Context, Innovation, Methodology, Experiments).
- Email Delivery: Sends beautifully formatted Markdown reports directly to your inbox.
- Python 3.9+
- OpenAI API Key (or compatible LLM endpoint)
-
Clone the repository:
git clone https://github.com/yangjunx21/paper-pulse.git cd paper-pulse -
Set up a virtual environment:
python -m venv .venv source .venv/bin/activate # On Windows use: .venv\Scripts\activate pip install -r requirements.txt
-
Configure Environment: Create a
.envfile in the root directory:OPENAI_API_KEY=sk-your-key-here OPENAI_MODEL=gpt-4o # OPENAI_BASE_URL=... (optional) # Email Settings (Required for email delivery) EMAIL_HOST=smtp.gmail.com / smtp.163.com EMAIL_PORT=587 / 465 EMAIL_USERNAME=your-email@gmail.com EMAIL_PASSWORD=your-app-password EMAIL_SENDER=your-email@gmail.com EMAIL_RECEIVER=target-email@example.com
Note: If you don't need email notifications, you can skip the
EMAIL_*configuration. The generated reports will be saved locally in thereports/directory.
Let the "Intent Agent" help you build a search profile.
Step 1: Build a Profile Run the interactive builder to define your research interests.
./scripts/build_intent_profile.sh "my_research_focus"
# Follow the prompts to describe what you are looking for.Step 2: Run the Pipeline Execute the pipeline using the profile you just created.
# Set your profile name as an environment variable
export PROFILE_NAME="default"
./scripts/run_with_intent.shYou can customize parameters in scripts/run_with_intent.sh or via environment variables (e.g., DATE_RANGE_START).
You can also run the CLI directly for one-off searches.
python -m paper_agent.cli \
--topics "mechanistic interpretability" "sparse autoencoders" \
--date 2025-11-20 \
--sources arxiv huggingface_daily \
--max-results 10 \
--send-emailpaper-pulse/
├── config/ # Configuration and profiles
│ └── intent_profiles/ # JSON profiles generated by the Intent Agent
├── paper_agent/ # Core package
│ ├── llm/ # Prompts and LLM client wrappers
│ ├── fetchers/ # Source adapters (ArXiv, HF, etc.)
│ ├── parsers/ # PDF and text processing
│ ├── pipeline.py # Main processing logic
│ └── intent_agent.py # Profile generation logic
├── scripts/ # Helper scripts
│ ├── build_intent_profile.sh
│ └── run_with_intent.sh
└── reports/ # Generated Markdown reports (local copies)
You can tweak the pipeline behavior via CLI arguments or the .env file. Key environment variables:
| Variable | Description | Default |
|---|---|---|
OPENAI_API_KEY |
Your LLM API key. | Required |
PAPER_PULSE_LANG |
Language for summaries (e.g., "Chinese", "English"). | English |
ENABLE_PDF_ANALYSIS |
Crucial! Set to true to download PDFs, extract full text, and generate deep summaries. |
false |
RELEVANCE_THRESHOLD |
Minimum LLM score (0.0-1.0) to include a paper in the report. | 0.6 |
EMAIL_* |
SMTP settings for report delivery. | Optional |
💡 Tip: Enable
ENABLE_PDF_ANALYSIS=truefor much richer insights (Methodology, Experiments, etc.), but it will consume more tokens and time.
Contributions are welcome! If you have any questions, suggestions, or find any issues, please feel free to open an issue or submit a pull request.
If you find this project helpful, please give it a ⭐️ Star!
📧 Contact: yangjunx21@gmail.com
If you find this project useful, please cite:
@misc{yang2025paperpulse,
title = {Paper Pulse: An LLM-Based Academic Paper Discovery and Analysis System},
author = {Junxiao Yang},
year = {2025},
url = {https://github.com/yangjunx21/Paper-Pulse}
}
