A two-stage LLM pipeline for generating and evaluating novel scientific research questions.
IdeaMiner is a two-stage pipeline for generating and evaluating novel scientific research questions using LLM agents. It covers a broad taxonomy of academic disciplines and produces ranked, deduplicated research questions scored on novelty, feasibility, and significance.
Visit our official platform to explore AI-generated research ideas across disciplines β no setup required.
Browse and save ideas from your personal library. Each card shows the research question along with its key topic tags.
Quick-action buttons let you skip, dislike, like, copy, or navigate between ideas with a single click.
flowchart TD
A["π Config File<br>field Β· keywords Β· research_type Β· granularity"]
A --> B["π€ Step 1 Β· Generator<br>agents/step_1_generator.py"]
B --> C["π 30 Raw Research Questions<br>data/raw_questions/*.json"]
C --> D["π Step 2 Β· Evaluator<br>agents/step_2_evaluator.py"]
D --> E["π§Ή Deduplication<br>Embedding-based Cosine Similarity"]
E --> F["β Group-Based Scoring<br>novelty Β· feasibility Β· significance"]
F --> G["π Ranked Questions<br>data/evaluated_questions/"]
Step 1 β Generation (agents/step_1_generator.py):
Each config file specifies a scientific field, a set of keywords, a research type, and a granularity level. The generator prompts an LLM to produce 30 diverse and novel research questions.
Step 2 β Evaluation (agents/step_2_evaluator.py):
The evaluator first deduplicates questions using embedding-based cosine similarity, then scores the remaining questions across multiple rounds using a group-based approach. Each group is assessed by one or more LLM models that can invoke a web_search tool to ground their evaluations in current literature.
IdeaMiner/
βββ agents/
β βββ step_1_generator.py # Question generation agent
β βββ step_2_evaluator.py # Question evaluation and ranking agent
βββ utils/
β βββ langchain_agent.py # Async LangChain agent with tool support
β βββ langchain_tools.py # web_search and paper_search tools
β βββ langchain_utils.py # Custom embeddings with HuggingFace tokenizer support
β βββ tools.py # Standalone Semantic Scholar search function
βββ configs/
β βββ subject.py # Academic discipline taxonomy and config generator
βββ sh/
β βββ 1_gen.sh # Batch generation script
β βββ 2_eval.sh # Batch evaluation script
βββ assets/ # Images for README and documentation
βββ data/
β βββ raw_questions/ # Output of Step 1 (git-ignored)
β βββ evaluated_questions/ # Output of Step 2 (git-ignored)
βββ logs/ # Runtime logs (git-ignored)
βββ .env.example # Environment variable template
βββ requirements.txt # Python dependencies
βββ LICENSE # MIT License
This project uses StructAI as its core utility library, which provides the LLMAgent, load_file, save_file, and other helpers used throughout the codebase.
pip install -r requirements.txtcp .env.example .env
# Edit .env and fill in your API keysRequired variables:
| Variable | Description |
|---|---|
LLM_API_KEY |
API key for your OpenAI-compatible LLM provider |
LLM_BASE_URL |
Base URL of the API (default: https://api.openai.com/v1) |
TAVILY_API_KEYS |
Comma-separated Tavily search API keys (or use TAVILY_API_KEY) |
Optional variables:
| Variable | Description |
|---|---|
SEMANTIC_SCHOLAR_API_KEY |
Increases the Semantic Scholar API rate limit |
The configs/subject.py script generates random experiment configs and writes them to configs/:
python configs/subject.pyOr write your own JSON config:
{
"field": "Life Sciences",
"keywords": ["Genomics", "CRISPR", "Epigenetics"],
"research_type": "Experiment",
"granularity_level": "Microscopic"
}# Step 1: Generate questions for all configs
./sh/1_gen.sh
# Step 2: Evaluate and rank the generated questions
./sh/2_eval.sh# Generate questions for a single config
python agents/step_1_generator.py --config_path configs/my_config.json
# Evaluate a single raw question file
python agents/step_2_evaluator.py \
--input_file data/raw_questions/my_config.json \
--output_dir data/evaluated_questions/my_config/ \
--field "Life Sciences" \
--models gpt-4o-mini \
--comparison_rounds 3 \
--group_size 5| Parameter | Default | Description |
|---|---|---|
--similarity_threshold |
0.85 |
Cosine similarity threshold for duplicate removal |
--filter_batch_size |
50 |
Questions per filtering batch |
--comparison_rounds |
3 |
Number of scoring rounds per question |
--group_size |
5 |
Questions per scoring group |
--models |
gpt-4o-mini |
Space-separated list of scorer models |
--max_concurrent_tasks |
32 |
Maximum parallel async scoring tasks |
After evaluation, each output directory contains:
| File | Description |
|---|---|
filtered_questions.json |
Questions after deduplication |
evaluation_results.json |
Full results including per-model scores |
ranked_questions.json |
Questions sorted by consensus score (best first) |
summary.json |
Statistics and top-10 questions |
Each ranked question includes:
{
"question": "...",
"background": "...",
"average_scores": {
"novelty": 8.2,
"feasibility": 7.5,
"significance": 8.8,
"total": 8.17
},
"rank": 1
}- GitHub Issues: Please open an issue for bug reports or feature requests
- Wechat Mini Program:
If you find this work helpful, please consider to starβ this repo. Thanks for your support! π€©
MIT License. See LICENSE for details.




