Skip to content

yangjunx21/Paper-Pulse

Repository files navigation

Paper Pulse 🚀

English 中文

Paper Pulse Abstract

Paper Pulse is a minimalistic yet powerful LLM-based system for academic paper discovery, classification, and summarization. It automates the pipeline of fetching papers from various sources (ArXiv, Hugging Face, etc.), filtering them based on user intent, analyzing them with LLMs, and delivering structured reports via email.

✨ Key Features

  • Multi-Source Fetching: Currently supports ArXiv, Hugging Face Daily Papers, and NeurIPS 2025. Work is underway to support all major ML conferences and additional data sources.
  • Intent Parsing Agent: Converts natural language descriptions (e.g., "I am interested in jailbreaking attacks on LLMs") into structured search profiles with optimized keywords.
  • Intelligent Filtering:
    • Layer 1 (Keyword): Fast pre-filtering using Trie/Set matching.
    • Layer 2 (LLM): Deep semantic relevance scoring and reasoning by LLMs.
  • Hybrid Ranking: Sorts papers by a mix of LLM relevance scores and recency.
  • Deep Analysis: Downloads PDFs to extract full text and generate structured summaries (Context, Innovation, Methodology, Experiments).
  • Email Delivery: Sends beautifully formatted Markdown reports directly to your inbox.

Paper Pulse Framework

🚀 Quick Start

Prerequisites

  • Python 3.9+
  • OpenAI API Key (or compatible LLM endpoint)

Installation

  1. Clone the repository:

    git clone https://github.com/yangjunx21/paper-pulse.git
    cd paper-pulse
  2. Set up a virtual environment:

    python -m venv .venv
    source .venv/bin/activate  # On Windows use: .venv\Scripts\activate
    pip install -r requirements.txt
  3. Configure Environment: Create a .env file in the root directory:

    OPENAI_API_KEY=sk-your-key-here
    OPENAI_MODEL=gpt-4o
    # OPENAI_BASE_URL=... (optional)
    
    # Email Settings (Required for email delivery)
    EMAIL_HOST=smtp.gmail.com / smtp.163.com
    EMAIL_PORT=587 / 465
    EMAIL_USERNAME=your-email@gmail.com
    EMAIL_PASSWORD=your-app-password
    EMAIL_SENDER=your-email@gmail.com
    EMAIL_RECEIVER=target-email@example.com

    Note: If you don't need email notifications, you can skip the EMAIL_* configuration. The generated reports will be saved locally in the reports/ directory.

📖 Usage

1. Intent-Based Mode (Recommended)

Let the "Intent Agent" help you build a search profile.

Step 1: Build a Profile Run the interactive builder to define your research interests.

./scripts/build_intent_profile.sh "my_research_focus"
# Follow the prompts to describe what you are looking for.

Step 2: Run the Pipeline Execute the pipeline using the profile you just created.

# Set your profile name as an environment variable
export PROFILE_NAME="default"
./scripts/run_with_intent.sh

You can customize parameters in scripts/run_with_intent.sh or via environment variables (e.g., DATE_RANGE_START).

2. CLI Mode (Manual)

You can also run the CLI directly for one-off searches.

python -m paper_agent.cli \
  --topics "mechanistic interpretability" "sparse autoencoders" \
  --date 2025-11-20 \
  --sources arxiv huggingface_daily \
  --max-results 10 \
  --send-email

📂 Project Structure

paper-pulse/
├── config/              # Configuration and profiles
│   └── intent_profiles/ # JSON profiles generated by the Intent Agent
├── paper_agent/         # Core package
│   ├── llm/             # Prompts and LLM client wrappers
│   ├── fetchers/        # Source adapters (ArXiv, HF, etc.)
│   ├── parsers/         # PDF and text processing
│   ├── pipeline.py      # Main processing logic
│   └── intent_agent.py  # Profile generation logic
├── scripts/             # Helper scripts
│   ├── build_intent_profile.sh
│   └── run_with_intent.sh
└── reports/             # Generated Markdown reports (local copies)

🛠 Configuration

You can tweak the pipeline behavior via CLI arguments or the .env file. Key environment variables:

Variable Description Default
OPENAI_API_KEY Your LLM API key. Required
PAPER_PULSE_LANG Language for summaries (e.g., "Chinese", "English"). English
ENABLE_PDF_ANALYSIS Crucial! Set to true to download PDFs, extract full text, and generate deep summaries. false
RELEVANCE_THRESHOLD Minimum LLM score (0.0-1.0) to include a paper in the report. 0.6
EMAIL_* SMTP settings for report delivery. Optional

💡 Tip: Enable ENABLE_PDF_ANALYSIS=true for much richer insights (Methodology, Experiments, etc.), but it will consume more tokens and time.

🤝 Contribution

Contributions are welcome! If you have any questions, suggestions, or find any issues, please feel free to open an issue or submit a pull request.

If you find this project helpful, please give it a ⭐️ Star!

📧 Contact: yangjunx21@gmail.com

🖊️ Citation

If you find this project useful, please cite:

@misc{yang2025paperpulse,
  title  = {Paper Pulse: An LLM-Based Academic Paper Discovery and Analysis System},
  author = {Junxiao Yang},
  year   = {2025},
  url    = {https://github.com/yangjunx21/Paper-Pulse}
}

📄 License

MIT License

About

Focused Papers, Delivered Simply :)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published