Paper Pulse 🚀

Paper Pulse is a minimalistic yet powerful LLM-based system for academic paper discovery, classification, and summarization. It automates the pipeline of fetching papers from various sources (ArXiv, Hugging Face, etc.), filtering them based on user intent, analyzing them with LLMs, and delivering structured reports via email.

✨ Key Features

Multi-Source Fetching: Currently supports ArXiv, Hugging Face Daily Papers, and NeurIPS 2025. Work is underway to support all major ML conferences and additional data sources.
Intent Parsing Agent: Converts natural language descriptions (e.g., "I am interested in jailbreaking attacks on LLMs") into structured search profiles with optimized keywords.
Intelligent Filtering:
- Layer 1 (Keyword): Fast pre-filtering using Trie/Set matching.
- Layer 2 (LLM): Deep semantic relevance scoring and reasoning by LLMs.
Hybrid Ranking: Sorts papers by a mix of LLM relevance scores and recency.
Deep Analysis: Downloads PDFs to extract full text and generate structured summaries (Context, Innovation, Methodology, Experiments).
Email Delivery: Sends beautifully formatted Markdown reports directly to your inbox.

🚀 Quick Start

Prerequisites

Python 3.9+
OpenAI API Key (or compatible LLM endpoint)

Installation

Clone the repository:

git clone https://github.com/yangjunx21/paper-pulse.git
cd paper-pulse

Set up a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows use: .venv\Scripts\activate
pip install -r requirements.txt

Configure Environment: Create a .env file in the root directory:

OPENAI_API_KEY=sk-your-key-here
OPENAI_MODEL=gpt-4o
# OPENAI_BASE_URL=... (optional)

# Email Settings (Required for email delivery)
EMAIL_HOST=smtp.gmail.com / smtp.163.com
EMAIL_PORT=587 / 465
EMAIL_USERNAME=your-email@gmail.com
EMAIL_PASSWORD=your-app-password
EMAIL_SENDER=your-email@gmail.com
EMAIL_RECEIVER=target-email@example.com

Note: If you don't need email notifications, you can skip the EMAIL_* configuration. The generated reports will be saved locally in the reports/ directory.

📖 Usage

1. Intent-Based Mode (Recommended)

Let the "Intent Agent" help you build a search profile.

Step 1: Build a Profile Run the interactive builder to define your research interests.

./scripts/build_intent_profile.sh "my_research_focus"
# Follow the prompts to describe what you are looking for.

Step 2: Run the Pipeline Execute the pipeline using the profile you just created.

# Set your profile name as an environment variable
export PROFILE_NAME="default"
./scripts/run_with_intent.sh

You can customize parameters in scripts/run_with_intent.sh or via environment variables (e.g., DATE_RANGE_START).

2. CLI Mode (Manual)

You can also run the CLI directly for one-off searches.

python -m paper_agent.cli \
  --topics "mechanistic interpretability" "sparse autoencoders" \
  --date 2025-11-20 \
  --sources arxiv huggingface_daily \
  --max-results 10 \
  --send-email

📂 Project Structure

paper-pulse/
├── config/              # Configuration and profiles
│   └── intent_profiles/ # JSON profiles generated by the Intent Agent
├── paper_agent/         # Core package
│   ├── llm/             # Prompts and LLM client wrappers
│   ├── fetchers/        # Source adapters (ArXiv, HF, etc.)
│   ├── parsers/         # PDF and text processing
│   ├── pipeline.py      # Main processing logic
│   └── intent_agent.py  # Profile generation logic
├── scripts/             # Helper scripts
│   ├── build_intent_profile.sh
│   └── run_with_intent.sh
└── reports/             # Generated Markdown reports (local copies)

🛠 Configuration

You can tweak the pipeline behavior via CLI arguments or the .env file. Key environment variables:

Variable	Description	Default
`OPENAI_API_KEY`	Your LLM API key.	Required
`PAPER_PULSE_LANG`	Language for summaries (e.g., "Chinese", "English").	English
`ENABLE_PDF_ANALYSIS`	Crucial! Set to `true` to download PDFs, extract full text, and generate deep summaries.	`false`
`RELEVANCE_THRESHOLD`	Minimum LLM score (0.0-1.0) to include a paper in the report.	`0.6`
`EMAIL_*`	SMTP settings for report delivery.	Optional

💡 Tip: Enable ENABLE_PDF_ANALYSIS=true for much richer insights (Methodology, Experiments, etc.), but it will consume more tokens and time.

🤝 Contribution

Contributions are welcome! If you have any questions, suggestions, or find any issues, please feel free to open an issue or submit a pull request.

If you find this project helpful, please give it a ⭐️ Star!

📧 Contact: yangjunx21@gmail.com

🖊️ Citation

If you find this project useful, please cite:

@misc{yang2025paperpulse,
  title  = {Paper Pulse: An LLM-Based Academic Paper Discovery and Analysis System},
  author = {Junxiao Yang},
  year   = {2025},
  url    = {https://github.com/yangjunx21/Paper-Pulse}
}

📄 License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
config/intent_profiles		config/intent_profiles
figs		figs
paper_agent		paper_agent
scripts		scripts
.gitignore		.gitignore
README.md		README.md
README_zh-CN.md		README_zh-CN.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Paper Pulse 🚀

✨ Key Features

🚀 Quick Start

Prerequisites

Installation

📖 Usage

1. Intent-Based Mode (Recommended)

2. CLI Mode (Manual)

📂 Project Structure

🛠 Configuration

🤝 Contribution

🖊️ Citation

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

yangjunx21/Paper-Pulse

Folders and files

Latest commit

History

Repository files navigation

Paper Pulse 🚀

✨ Key Features

🚀 Quick Start

Prerequisites

Installation

📖 Usage

1. Intent-Based Mode (Recommended)

2. CLI Mode (Manual)

📂 Project Structure

🛠 Configuration

🤝 Contribution

🖊️ Citation

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages