MicroGrowAgents

Agent-based system for AI-driven microbial cultivation and growth media design

Part of the CultureBotAI initiative led by Dr. Marcin Joachimiak at Lawrence Berkeley National Laboratory.

Overview
Key Achievements
Key Features
Agents & Skills
- Core Agents (28)
- Skills (50)
Cofactor Analysis Data Sources
Experimental Analysis & Optimization
Data Integrity & Provenance
Installation
Quick Start
Core Capabilities
Advanced Usage
Chemistry Modules
Repository Structure
Development
Tools, APIs & Datasets
Contributing
Citation
Contact

Overview

MicroGrowAgents bridges the microbial cultivation gap through AI-powered multi-agent systems that integrate knowledge graphs, machine learning, and experimental automation. The platform combines specialized agents (LiteratureAgent, AnalogyReasoningAgent, GenomeFunctionAgent, MediaFormulationAgent) operating on KG-Microbe (864,000+ validated species) to design optimized growth media for previously uncultured microorganisms.

📚 Documentation Quick Links:

docs/STATUS.md - Current project state (start here)
docs/AGENTS_SKILLS_TOOLS.md - Complete reference for agents, skills, and tools
docs/OPTIMIZATION_GUIDE.md - Complete guide to data-driven v14 design
docs/AUDIT_REPORT_BBOP_SKILLS.md - Audit compliance report (78% passing)
CLAUDE.md - Guidance for Claude Code development

Key Achievements

🤖 MP_plus v10: Schema-driven media recommendation system with 15 evidence-based ingredient suggestions for Methylorubrum extorquens AM1 under lanthanide depletion stress
🧬 Genome-Guided Design: 57 Bakta-annotated genomes (667K features) for auxotrophy detection and organism-specific media formulation
📚 Knowledge Integration: 864,363 validated species across bacteria, archaea, fungi, and protozoa (GTDB + LPSN + NCBI)
🔬 Multi-Modal Reasoning: Literature mining (245+ papers), metabolic modeling (FBA/gap-filling), chemical similarity (208K+ embeddings), and experimental design
✅ Validated Outputs: 100% precision in organism extraction, complete toxicity transparency, schema-compliant output generation
🔒 Data Integrity: SHA256 checksums for all input data with cryptographic reproducibility tracking
📋 Audit Compliance: 78% compliance (7/9 PASS) against bbop-skills criteria for local-first agentic systems
📊 Citation Coverage: 90.5% (143/158 DOIs) with automated PDF retrieval and validation

Key Features

🧪 Media Concentration Predictions: Predict concentration ranges for media ingredients using ML-based regression
🔬 Advanced Chemistry Calculations:
- Osmotic Properties: Osmolarity, osmolality, water activity, growth categories
- Redox Properties: Eh (redox potential), pE, electron balance, redox state classification
- Nutrient Ratios: C:N:P ratios, Redfield deviation, limiting nutrient identification, trace metal analysis
- Thermodynamic Properties: Gibbs free energy calculations (via eQuilibrator API)
📊 Sensitivity Analysis: Sweep ingredient concentrations to determine pH and salinity effects
🔍 Media Comparison: Compare ingredient compositions across different media
🌐 External APIs: Integration with PubChem, ChEBI, and eQuilibrator for chemical data enrichment
📈 Visualization: Generate plots for osmotic properties, nutrient ratios, and sensitivity analysis
🤖 MP_plus Media Recommendation System: Multi-agent workflow generating complete media formulations with:
- Literature-Based Discovery (Category 11): Organism-specific ingredient mining from 245+ papers
- Analogy-Based Discovery (Category 7): Structural similarity search using 208K+ chemical embeddings
- Genome-Guided Discovery (Categories 1-5): Metabolic modeling, auxotrophy detection, transporter analysis
- Toxicity Flagging (Tier 2D): Transparent safety assessment (SAFE/CAUTION/WARNING)
- Output Formats: YAML, TSV, CSV, JSON with complete provenance and validation
- See data/designs/MP_plus/MP_plus_v10/ for example outputs
🧬 Genome Function Interpretation: Organism-specific media design using 57 Bakta-annotated genomes (667K features) with:
- Auxotrophy Detection: Automatic identification of biosynthetic pathway gaps
- Enzyme Analysis: EC number queries with wildcard support (1.1.. finds all CH-OH oxidoreductases)
- Cofactor Requirements: Detection of essential cofactors that cannot be biosynthesized
- Transporter Analysis: Concentration refinement based on nutrient uptake genes
- See docs/GENOME_FUNCTION.md for Claude Code agent examples
📚 Sheet Query System: Query extended information sheets with:
- 4 Query Types: Entity lookup, cross-reference, publication search, filtered queries
- 3 Output Formats: Markdown tables, JSON, evidence-rich reports
- Full-Text Search: Search within publication markdown files with excerpts
- Cross-References: Automatic linking between entities and publications
- See docs/SHEET_QUERY_SYSTEM.md for complete guide

Agents & Skills

MicroGrowAgents provides 28 specialized agents and 50 skills for microbial cultivation and media design.

Core Agents (28)

Knowledge & Reasoning:

KGReasoningAgent - Query KG-Microbe knowledge graph (1.5M nodes, 5.1M edges)
LiteratureAgent - Literature mining and evidence extraction
AnalogyReasoningAgent - Chemical similarity search (208K+ embeddings)
SheetQueryAgent - Query extended information sheets

Genome Analysis:

GenomeFunctionAgent - Genome-guided media design (57 genomes, 667K features)
LanthanideGenesAgent - Lanthanide-dependent gene analysis
TransporterAgent - Nutrient transporter annotation and analysis

Media Design & Optimization:

MediaFormulationAgent - Multi-source media recommendation
GenMediaConcAgent - ML-based concentration prediction
CofactorMediaAgent - Cofactor requirement analysis
AlternateIngredientAgent - Alternative ingredient suggestions
MediaRoleAgent - Ingredient metabolic role classification
MaxProOptBlockAgent - MaxPro optimal blocking design generation
ReconcileAgent - Experimental vs prediction reconciliation
EnsembleOptimizationAgent - Response surface modeling and Bayesian optimization
DesignRecommendationAgent - Interpret experimental results to recommend next design
ExperimentalInterpretationAgent - Generate evidence-based biological interpretations with inline citations

Metabolic Modeling:

MetabolicSourceAgent - Metabolic source identification
GapMindAgent - GapMind pathway gap analysis integration
GEMsemblerAgent - Genome-scale metabolic model reconstruction
GrowthCodonAgent - Codon usage bias-based growth prediction
MediaMatchAgent - MediaDive database integration

Chemistry & Properties:

ChemistryAgent - Advanced chemistry calculations (osmotic, redox, nutrient ratios)
MediapHCalculator - pH prediction and buffer design
SensitivityAnalysisAgent - Parameter sweep and sensitivity analysis

Data Management:

SQLAgent - Database queries and management
IngredientCooccurrenceAgent - Ingredient co-occurrence analysis
IngredientEffectsEnrichmentAgent - Ingredient effects enrichment
CSVAllDOIsEnrichmentAgent - DOI-based literature enrichment
PDFEvidenceExtractor - PDF evidence extraction
EvidenceExtractionOrchestrator - Multi-source evidence orchestration

Skills (50)

Analysis Skills (19)

analyze_cofactors - Cofactor requirements from genome annotations
analyze_genome - Genome function interpretation (enzymes, auxotrophies, transporters)
analyze_lanthanide_genes - Lanthanide-dependent gene analysis
analyze_transporters - Transporter system analysis
analyze_carbon_sources - Carbon source utilization analysis
analyze_nitrogen_sources - Nitrogen source analysis
analyze_phosphate_sources - Phosphate source analysis
analyze_sulfur_sources - Sulfur source analysis
analyze_sensitivity - pH and salinity sensitivity analysis
analyze_cooccurrence - Ingredient co-occurrence patterns
analyze_metabolic_requirements - Metabolic requirement analysis
analyze_gaps - Metabolic pathway gap analysis
analyze_limitations - Growth-limiting factor identification
analyze_electron_balance - Electron donor/acceptor balance
check_carbon_sources - Carbon source validation
compare_auxotrophy_methods - Compare auxotrophy detection methods
compare_gap_fba - Compare gap analysis with FBA
annotate_transporters - Annotate transporter systems
growth_prediction_dashboard - Interactive growth prediction dashboard
interpret_experimental_results - Generate evidence-based biological interpretations

Prediction & Design Skills (12)

predict_concentration - Predict ingredient concentration ranges
predict_growth - Growth prediction from media composition
predict_growth_cub - Codon usage bias-based growth prediction
predict_growth_hybrid - Hybrid growth prediction (multiple methods)
predict_transport_requirements - Predict transport requirements
recommend_media - Media formulation recommendation
recommend_media_quick - Quick media recommendation
design_maxpro_optblock - MaxPro OptBlock experimental design
optimize_growth_conditions - Ensemble optimization and Bayesian experiment design
find_alternates - Find alternative ingredients
classify_role - Classify ingredient metabolic roles
reconstruct_model - Reconstruct genome-scale metabolic model

Query & Search Skills (5)

query_knowledge_graph - Query KG-Microbe
query_database - SQL database queries
search_literature - Literature search and extraction
search_mediadive - MediaDive database search
sheet_query - Query extended information sheets

Chemistry & Validation Skills (5)

calculate_chemistry - Calculate osmotic, redox, nutrient properties
validate_media - Media formulation validation
validate_formulation_comprehensive - Comprehensive formulation validation
validate_ingredient - Ingredient validation and normalization
export_results - Export results to multiple formats

Workflow Skills (6)

recommend_media_workflow - Comprehensive media recommendation workflow
recommend_media_comprehensive - Extended comprehensive workflow
optimize_medium_workflow - Medium optimization workflow
ingredient_report_workflow - Detailed ingredient analysis report
initialize_database - Database initialization and validation
export_results - Multi-format export utility

See src/microgrowagents/agents/ and src/microgrowagents/skills/ for complete documentation.

Cofactor Analysis Data Sources

The CofactorMediaAgent integrates 6 major biological databases and specialized literature:

Primary Databases

ChEBI - Chemical identifiers for 44 cofactors (DOI: 10.1093/nar/gkv1031)
KEGG - 30+ biosynthesis pathway definitions (DOI: 10.1093/nar/gkac963)
BRENDA - EC-to-cofactor relationships (DOI: 10.1093/nar/gky1048)
ExplorEnz - Enzyme Commission nomenclature (DOI: 10.1093/nar/gkn582)

Knowledge Graph Integration

KG-Microbe (1.5M nodes, 5.1M edges) - Enzyme-substrate relationships and pathway context
Queries via KGReasoningAgent for multi-source evidence integration

Reference Files

src/microgrowagents/data/cofactor_hierarchy.yaml - 44 cofactors across 5 categories
src/microgrowagents/data/ec_to_cofactor_map.yaml - 68 EC pattern mappings
data/processed/ingredient_cofactor_mapping.csv - 13 MP medium cofactor providers

See docs/cofactor_data_sources.md for detailed methodology and citations.

Example: Cofactor Analysis for M. extorquens AM-1

Generate cofactor requirements table from Bakta genome annotations:

# Using Python API
uv run python -c "
from microgrowagents.agents import CofactorMediaAgent
from pathlib import Path

agent = CofactorMediaAgent(Path('data/processed/microgrow.duckdb'))
result = agent.run(
    query='Analyze cofactor requirements',
    organism='SAMN31331780',  # M. extorquens AM-1
    base_medium='MP'
)

# Save results
import pandas as pd
df = pd.DataFrame(result['data']['cofactor_table'])
df.to_csv('outputs/cofactor_analysis/cofactor_table_Methylorubrum_extorquens_AM1.csv')
"

Results for M. extorquens AM-1 (from 110 EC numbers):

15 cofactors identified
4 existing in MP medium: TPP, Biotin, Fe-S clusters, Mg
11 missing: PLP, THF, Coenzyme Q, NAD+, NADP+, ATP, CTP, GTP, UTP, CoA, SAM

Generated tables available at:

CSV: outputs/cofactor_analysis/cofactor_table_Methylorubrum_extorquens_AM1.csv
TSV: outputs/cofactor_analysis/cofactor_table_Methylorubrum_extorquens_AM1.tsv

Experimental Analysis & Optimization

MicroGrowAgents provides a comprehensive dual-pipeline for analyzing experimental growth data with both absolute and relative analysis modes, plus response surface modeling and Bayesian optimization.

Features

📊 Dual-Mode Analysis: Absolute (raw OD600) and Relative (vs baseline) analysis pipelines
🔬 Hierarchical Clustering: Identify groups of similar growth conditions (276 replicates, 6 clusters)
🗺️ Response Surface Modeling: Gaussian Process modeling with multi-objective optimization and Pareto frontiers
🤖 Ensemble Optimization: Gaussian Process, Polynomial, and Random Forest ensemble models
🎯 Bayesian Optimization: Adaptive experiment design with Expected Improvement acquisition
📈 Effect Analysis: ANOVA, main effects plots, Sobol sensitivity indices
✅ Schema-Driven Validation: Automatic validation of all analysis outputs with source data traceability
🏷️ Output Labeling: All outputs labeled with source experimental data ID for full provenance
🔍 Evidence-Based Interpretation: Automated biological interpretation with inline citations and bibliography

Quick Start: Analyze Experimental Data

Analyze experimental plate data with dual-mode analysis (absolute + relative):

# Run BOTH absolute and relative analyses (recommended)
just analyze-experimental data/experimental/plate_designs_v10_maxprooptblock_long__results

# Run only absolute analysis (raw OD600)
just analyze-experimental-absolute data/experimental/plate_designs_v10_maxprooptblock_long__results

# Run only relative analysis (fold-change vs control)
just analyze-experimental-relative data/experimental/plate_designs_v10_maxprooptblock_long__results

# Run clustering on results
just cluster-experimental outputs/plate_designs_v10_maxprooptblock_long__results_experimental_analysis_absolute/v10_maxprooptblock_long__results_replicate_statistics_absolute.tsv outputs/plate_designs_v10_maxprooptblock_long__results_experimental_analysis_clustering_absolute absolute

# Validate all outputs
just validate-experimental plate_designs_v10_maxprooptblock_long__results

Analysis Modes:

Absolute Analysis: Raw OD600 measurements showing actual biomass achieved
- Answers: "Which conditions grew best overall?"
- Use for: Identifying highest-performing conditions, comparing to literature values
Relative Analysis: Fold-change, difference, and percent change vs control baseline
- Answers: "Which variations improved over baseline media?"
- Use for: Identifying growth enhancements, normalizing across experiments

Pipeline Steps:

Statistical Processing → v10_maxprooptblock_long__results_replicate_statistics_{mode}.tsv
Exploratory Visualization → v10_maxprooptblock_long__results_growth_curves.pdf
Hierarchical Clustering → v10_maxprooptblock_long__results_clustered_heatmap_growth.pdf
Response Surface Modeling → response_surfaces/surface_3d_{measurement}_{mode}.pdf (optional)
Output Validation → All files verified with proper source data ID labeling

Output Directories:

Absolute analysis: outputs/{source_data_id}_experimental_analysis_absolute/
Relative analysis: outputs/{source_data_id}_experimental_analysis_relative/
Clustering results: outputs/{source_data_id}_experimental_analysis_clustering_{mode}/

Response Surface Modeling

The experimental analysis pipeline includes optional response surface modeling using Gaussian Processes to understand ingredient-measurement relationships and multi-objective optimization:

# Response surfaces run automatically with analyze-experimental (enabled by default)
just analyze-experimental data/experimental/plate_designs_v13_latinhypercube_long__results

# Disable response surfaces for faster analysis
python scripts/run_dual_analysis.py data/experimental/plate_designs_v10_maxprooptblock_long__results --disable-response-surfaces

# Standalone response surface analysis
python scripts/analyze_response_surfaces.py \\
    outputs/plate_designs_v13_latinhypercube_long__results_experimental_analysis_absolute/ \\
    --mode absolute \\
    --measurements OD600 Nd_uM

Capabilities:

🗺️ 3D Surface Plots: Visualize ingredient-measurement relationships
🎯 Pareto Frontiers: Multi-objective optimization (e.g., maximize OD600 while minimizing Nd consumption)
🔮 Predictions: Predict measurements over entire design space
📊 Contour Maps: Identify optimal ingredient combinations

Use Cases:

v13+ designs with variable Neodymium for lanthanide-dependent growth analysis
Understanding growth-lanthanide relationships (MxaF vs XoxF-MDH pathways)
Identifying optimal conditions for multiple objectives simultaneously

Measurement Types:

OD600 (Optical Density): Bacterial biomass (higher = more growth)
- Absolute mode: Raw OD600 values
- Relative mode: Fold-change vs control baseline
Nd_uM (Neodymium concentration): Lanthanide depletion marker
- Values relative to baseline media WITH bacterial growth
- Negative values: More Nd consumption than control (higher bacterial uptake)
- Positive values: Less Nd consumption than control (lower bacterial uptake)
- Used to distinguish lanthanide-dependent vs independent growth pathways

Outputs (per mode):

response_surfaces/surface_predictions_{measurement}_{mode}.csv - Predictions over design space
response_surfaces/surface_3d_{measurement}_{mode}.pdf/png - 3D surface plots
response_surfaces/pareto_frontier_{mode}.csv - Pareto-optimal conditions (joint analysis)
response_surfaces/pareto_frontier_{mode}.pdf/png - Pareto frontier visualization
response_surfaces/optimization_report_{mode}.txt - Model parameters and best conditions

Optimization Workflow

Build response surface models and suggest next experiments using ensemble modeling:

# Using Python skill with source data ID (recommended)
uv run python -m microgrowagents.skills.simple.optimize_growth_conditions \
    --data outputs/experimental_analysis \
    --source-data-id plate_designs_v10_maxprooptblock_long__results \
    --output-dir outputs/optimization \
    --strategy hybrid \
    --n-suggestions 69

# Or via direct file path
uv run python -m microgrowagents.skills.simple.optimize_growth_conditions \
    --data outputs/experimental_analysis/v10_maxprooptblock_long__results_replicate_statistics.tsv \
    --output-dir outputs/optimization \
    --strategy hybrid

What it does:

Trains ensemble models (Gaussian Process + Polynomial + Random Forest)
Analyzes ingredient effects and interactions
Uses Bayesian optimization to suggest next experiments
Generates v12 design files compatible with pipetting infrastructure

Optimization Strategies:

Bayesian Optimization: Expected Improvement acquisition (exploitation)
Local Search: Perturbation around best observed conditions
Uncertainty Sampling: Explore high-uncertainty regions (exploration)
Hybrid: 70% local search + 15% uncertainty + 15% space-filling

Key Results: M. extorquens AM-1 Lanthanide Study

v10 Design - 69 conditions tested (4 replicates each, 3 timepoints):

Top Performer: MPOB_040

Max OD600: 0.95 (highest overall)
Strategy: Pure C1 methylotrophy (67.9 mM methanol, low succinate)
Challenge: 98% crash at 48h due to methanol depletion

Most Stable: MPOB_053

Max OD600: 0.66 (sustained growth)
Strategy: Mixed C1+C2 metabolism (19.9 mM methanol, 58.7 mM succinate)
Result: Stable across all timepoints

Key Finding: 40-60 mM succinate provides metabolic backup when methanol depletes, preventing culture crash while maintaining high peak growth.

v13 Design - Lanthanide-Dependent Growth Pathways:

Variable Neodymium (0-5 µM) to test MxaF vs XoxF-MDH pathways
Response surface modeling identifies Pareto-optimal conditions
High OD600 at low Nd → lanthanide-independent pathway (MxaF-MDH)
High OD600 at high Nd → lanthanide-dependent pathway (XoxF-MDH)
Multi-objective optimization balances growth AND Nd utilization

See outputs/optimization/MPOB_040_CRASH_ANALYSIS.md for detailed v10 analysis.

Schema & Validation

All analysis outputs conform to validation standards with automatic source data ID labeling:

Validator: src/microgrowagents/utils/analysis_output_validator.py
Documentation: docs/EXPERIMENTAL_ANALYSIS_PIPELINE.md

Source Data Traceability:

The system automatically generates output prefixes from source data directories:

Input Directory:  plate_designs_v10_maxprooptblock_long__results
       ↓
Output Prefix:    v10_maxprooptblock_long__results_
       ↓
Output Files:     v10_maxprooptblock_long__results_replicate_statistics.tsv
                  v10_maxprooptblock_long__results_growth_curves.pdf
                  v10_maxprooptblock_long__results_clustered_heatmap_growth.pdf
                  v10_maxprooptblock_long__results_cluster_assignments_growth.csv

Prefix Generation:

Removes plate_designs_ from source directory name
Adds trailing underscore
Applies to all outputs: statistical, visualization, and clustering files

Every output file is:

✅ Labeled with source experimental data ID for full traceability
✅ Named consistently across all analysis types
✅ Validated for existence and proper formatting
✅ Documented with file counts and metadata

Example: Complete Output Set

For source data plate_designs_v10_maxprooptblock_long__results, the pipeline generates:

Statistical Analysis (outputs/plate_designs_v10_maxprooptblock_long__results_experimental_analysis_{mode}/):

v10_maxprooptblock_long__results_processed_data_raw.tsv
v10_maxprooptblock_long__results_processed_data_{mode}.tsv (absolute or relative)
v10_maxprooptblock_long__results_replicate_statistics_{mode}.tsv
v10_maxprooptblock_long__results_control_statistics.tsv

Visualization (outputs/plate_designs_v10_maxprooptblock_long__results_experimental_analysis_{mode}/):

v10_maxprooptblock_long__results_growth_curves.pdf/.png
v10_maxprooptblock_long__results_dose_response_curves.pdf/.png
v10_maxprooptblock_long__results_heatmaps.pdf/.png
v10_maxprooptblock_long__results_pca_ingredient_space.pdf/.png
v10_maxprooptblock_long__results_pca_measurement_space.pdf/.png
v10_maxprooptblock_long__results_replicate_variability.pdf/.png
v10_maxprooptblock_long__results_summary_statistics.pdf/.png

Clustering (outputs/plate_designs_v10_maxprooptblock_long__results_experimental_analysis_clustering_{mode}/):

v10_maxprooptblock_long__results_clustered_heatmap_growth.pdf/.png
v10_maxprooptblock_long__results_cluster_assignments_growth.csv
v10_maxprooptblock_long__results_cluster_descriptions_growth.txt
v10_maxprooptblock_long__results_cluster_summary_growth.pdf

Response Surfaces (optional, outputs/plate_designs_v10_maxprooptblock_long__results_experimental_analysis_{mode}/response_surfaces/):

surface_predictions_{measurement}_{mode}.csv
surface_3d_{measurement}_{mode}.pdf/.png
pareto_frontier_{mode}.csv (multi-objective optimization)
pareto_frontier_{mode}.pdf/.png
optimization_report_{mode}.txt

Evidence-Based Interpretation

Generate publication-ready biological interpretations with inline citations and bibliography:

from microgrowagents.agents.analysis import ExperimentalInterpretationAgent

# Initialize agent with version identifier
agent = ExperimentalInterpretationAgent(source_version="v10")

# Run interpretation workflow
result = agent.run()

What it generates:

INTERPRETATION_REPORT.md - Clean biological interpretation
- Executive summary with key findings
- Factor-by-factor analysis (phosphate, nitrogen, carbon sources)
- Metabolic insights (carbon utilization, nutrient stoichiometry)
- Evidence-based hypotheses with testable predictions
- Recommendations for next design iteration
- Optimal media formulation based on results
INTERPRETATION_EVIDENCE.md - Evidence companion file
- Data Evidence (E1-E#) with specific file references:
  - E1: Control statistics from v10_..._control_statistics.tsv
  - E2: Top 10 conditions from v10_..._replicate_statistics.tsv
  - E3: Clustering patterns from v10_..._cluster_descriptions_growth.txt
  - E4: Boundary effects from DesignRecommendationAgent analysis
- Literature Evidence (L1-L#) with DOIs:
  - L1: M. extorquens metabolism (Chistoserdova et al. 2003)
  - L2: PQQ-dependent MDH (Anthony & Williams 2003)
  - L3: Rare earth elements (Pol et al. 2014)
- Each evidence includes: source file, full path, section, data snippet
INTERPRETATION_REPORT_evidence.md - Citation-based report
- Same content as main report but with inline citations [E1], [E2], [L1], [L2]
- Complete bibliography with file references and data snippets
- Publication-ready format
interpretation_metadata.json - Execution metadata
- Timestamp, directories used, summary statistics

Example output:

================================================================================
ExperimentalInterpretationAgent - Evidence-Based Interpretation
================================================================================

Step 1: Locating analysis directories...
  ✓ Analysis directory: outputs/plate_designs_v10_.../
  ✓ Clustering directory: outputs/plate_designs_v10_..._clustering/

Step 2: Validating data files...
  ✓ Required data files present

Step 3: Generating interpretation reports...
  - Analyzing experimental data...
  - Extracting evidence snippets...
  - Generating biological interpretation...
  - Creating citation-based report...

Step 4: Interpretation complete!

Summary:
  Conditions analyzed: 10
  Clusters identified: 6
  Boundary effects: 3
  Evidence snippets: 4
  Literature references: 3

Key Features:

📚 Complete traceability: Every claim cites specific data files and sections
🔬 Biological insights: Factor-by-factor interpretation with metabolic context
📊 Data snippets: Actual values from analysis files included in bibliography
📖 Literature support: DOI-linked references with key findings
✅ Publication-ready: Three formats (clean, evidence, citation-based)

See: docs/EXPERIMENTAL_INTERPRETATION_AGENT.md for complete documentation

Data Integrity & Provenance

MicroGrowAgents implements comprehensive data integrity and provenance tracking for reproducibility:

Input Data Checksums

All input data files are protected with SHA256 checksums for cryptographic reproducibility:

# Verify input data integrity
just verify-data-integrity

# Generate checksums for new data
python scripts/generate_checksums.py data/raw/

Checksums stored in:

Global checksums: data/checksums.txt
Per-analysis checksums: outputs/*/input_data_checksums.json

Automatic tracking:

Every analysis records checksums of input files
Verification detects any data modifications or corruption
Complies with bbop-skills Criterion 4 (cryptographic reproducibility)

Artifact Cleanup Policy

Three-tier retention model for efficient storage management:

Archival (Keep indefinitely):

Published experimental designs (v10, v13, etc.)
Validated analysis results with interpretations
Response surface models

Temporary (30 days):

Experimental analysis outputs
Clustering results
Intermediate optimization runs

Ephemeral (7 days):

Test outputs
Debugging artifacts
Temporary visualizations

Cleanup Commands:

# Archive old outputs (moves to archive/ directory)
just archive-outputs

# Clean old outputs (>30 days)
just clean-old-outputs

# Clean ephemeral artifacts (>7 days)
just clean-ephemeral

Storage Impact:

Steady-state: ~185MB (with cleanup)
Unmanaged: ~4GB/year (96% reduction)

See: docs/ARTIFACT_CLEANUP_POLICY.md for complete retention policies

Audit Compliance

Overall Compliance: 78% (7/9 PASS) against bbop-skills criteria:

✅ PASS (7 criteria):

Provenance tracking (.claude/provenance/)
Model tracking (explicit model IDs in all outputs)
Reasoning/code separation (markdown interpretations + code artifacts)
Validation (LinkML schemas, output validators)
Error-correction (DOI validation + corrections)
RAG (KG-Microbe, literature corpus, genome annotations)
Artifact cleanup (automated retention policies)

⚠️ PARTIAL (1 criterion):

Documentation/automation (needs enhancement)

❌ FAIL (1 criterion):

MCP integration (not yet adopted, under consideration)

See: docs/AUDIT_REPORT_BBOP_SKILLS.md for complete audit findings

Citation Coverage

DOI Validation: 90.5% (143/158 DOIs) with evidence

PDFs: 92 (58.2%)
Abstracts: 44 (27.8%)
Missing: 15 (9.5%)

Automated Workflows:

# Validate DOIs
uv run python scripts/doi_validation/validate_failed_dois.py

# Apply corrections
uv run python scripts/doi_corrections/apply_doi_corrections.py

# Download PDFs
uv run python scripts/pdf_downloads/download_all_pdfs_automated.py

See: notes/DOI_CORRECTIONS_FINAL_UPDATED.md for correction history

Installation

Prerequisites

Python 3.10 or higher
uv package manager

Quick Install

# Clone the repository
git clone https://github.com/CultureBotAI/MicroGrowAgents.git
cd MicroGrowAgents

# Install dependencies using uv
uv sync

# Verify installation
uv run python run.py --help

Quick Start

Generate Media Concentrations

Predict concentration ranges for a specific medium:

# Get MP medium concentrations
uv run python run.py gen-media-conc "MP medium"

# Get concentrations for custom ingredients
uv run python run.py gen-media-conc "glucose,NaCl,KH2PO4" --mode ingredients

# Export to JSON
uv run python run.py gen-media-conc "MP medium" --format json --output mp_medium.json

Sensitivity Analysis

Analyze how ingredient concentration variations affect pH and salinity:

# Basic sensitivity analysis
uv run python run.py sensitivity "MP medium"

# With osmotic property calculations
uv run python run.py sensitivity "MP medium" --calculate-osmotic

# With all advanced properties
uv run python run.py sensitivity "MP medium" \
    --calculate-osmotic \
    --calculate-redox \
    --calculate-nutrients \
    --plot

# Custom parameters
uv run python run.py sensitivity "glucose,NH4Cl,KH2PO4" \
    --calculate-redox \
    --ph 6.5 \
    --temperature 37

Advanced Chemistry Analysis

Calculate osmotic properties for a medium:

from microgrowagents.chemistry.osmotic_properties import (
    calculate_osmolarity,
    calculate_water_activity
)

ingredients = [
    {"name": "NaCl", "concentration": 150.0, "molecular_weight": 58.44, "formula": "NaCl"},
    {"name": "KCl", "concentration": 5.0, "molecular_weight": 74.55, "formula": "KCl"}
]

# Calculate osmolarity
osm_result = calculate_osmolarity(ingredients, temperature=25.0)
print(f"Osmolarity: {osm_result['osmolarity']:.1f} mOsm/L")

# Calculate water activity
aw_result = calculate_water_activity(ingredients, temperature=25.0)
print(f"Water Activity: {aw_result['water_activity']:.4f}")
print(f"Growth Category: {aw_result['growth_category']}")

Core Capabilities

1. Media Concentration Generation (`gen-media-conc`)

Predicts LOW, DEFAULT, and HIGH concentration ranges for media ingredients:

# Query by medium name
uv run python run.py gen-media-conc "MP medium"

# Query by ingredient list
uv run python run.py gen-media-conc "PIPES,NaCl,glucose" --mode ingredients

# With chemical data enrichment
uv run python run.py gen-media-conc "MP medium" --enrich pubchem

Output includes:

Predicted concentration ranges (mM)
Molecular weights
Chemical formulas
Confidence scores

2. Sensitivity Analysis (`sensitivity`)

Performs parameter sweep analysis by varying each ingredient between LOW and HIGH concentrations:

# Basic analysis (pH and salinity)
uv run python run.py sensitivity "MP medium"

# With advanced chemistry properties
uv run python run.py sensitivity "MP medium" --calculate-osmotic --calculate-nutrients

# Export results
uv run python run.py sensitivity "MP medium" --format json --output results.json

# Generate visualization
uv run python run.py sensitivity "MP medium" --plot --plot-output analysis.png

Calculates:

pH changes
Salinity (TDS and NaCl-equivalent)
Ionic strength
Optional: Osmotic properties, redox potential, nutrient ratios

3. Advanced Chemistry Properties

Osmotic Properties

Calculate osmolarity, osmolality, and water activity:

uv run python run.py sensitivity "MP medium" --calculate-osmotic

Provides:

Osmolarity (mOsm/L)
Osmolality (mOsm/kg)
Water activity (aw)
Growth category classification:
- most_bacteria (aw > 0.98)
- halotolerant (0.90 < aw ≤ 0.98)
- halophiles (aw ≤ 0.90)
Van't Hoff dissociation factors

Example output:

{
  "osmotic_properties": {
    "osmolarity": 342.5,
    "osmolality": 339.8,
    "water_activity": 0.9938,
    "growth_category": "most_bacteria",
    "confidence": {"osmolarity": 0.85, "water_activity": 0.78}
  }
}

Redox Properties

Calculate redox potential (Eh), pE, and electron balance:

uv run python run.py sensitivity "glucose,NH4Cl" --calculate-redox --ph 7.0

Calculates:

Eh (redox potential in mV)
pE (electron activity)
Redox state classification (oxidizing, reducing, intermediate)
Electron donor/acceptor balance
Standard redox couples (O2/H2O, NO3-/NO2-, SO42-/H2S, etc.)

Uses Nernst equation:

Eh = E0' + (59.16/n) × log([oxidized]/[reduced])  at 25°C
pH correction: Eh = E0 - (59.16/n) × pH

Example output:

{
  "redox_properties": {
    "eh": 245.3,
    "pe": 4.15,
    "redox_state": "oxidizing",
    "electron_balance": {
      "total_donors": 240.0,
      "total_acceptors": 220.0,
      "balance": 8.3
    }
  }
}

Nutrient Ratios

Calculate C:N:P ratios and identify limiting nutrients:

uv run python run.py sensitivity "glucose,NH4Cl,KH2PO4" --calculate-nutrients

Analyzes:

C:N:P molar ratios
Limiting nutrient prediction
Redfield ratio deviation (marine standard: 106:16:1)
Trace metal ratios (Fe:P, Mn:P, Zn:P)
Deficiencies and excesses

Limiting nutrient criteria:

P-limited: C:P > 150 or N:P > 20
N-limited: C:N > 20 or N:P < 10
C-limited: C:N < 6.6
Balanced: Near Redfield ratio

Example output:

{
  "nutrient_ratios": {
    "c_mol": 60.0,
    "n_mol": 9.0,
    "p_mol": 0.6,
    "c_n_ratio": 6.67,
    "c_p_ratio": 100.0,
    "n_p_ratio": 15.0,
    "limiting_nutrient": "balanced",
    "redfield_deviation": 3.2,
    "trace_metals": {
      "fe_p_ratio": 0.015,
      "deficiencies": ["Co", "Mo"],
      "excesses": []
    }
  }
}

4. Media Comparison

Compare ingredient compositions between two media:

uv run python run.py compare-media "MP medium" "LB medium"

Shows:

Common ingredients
Unique ingredients to each medium
Concentration differences

5. Media Formulation Recommendation (`recommend-media` workflow)

Recommend new media formulations using AI-powered multi-agent orchestration:

from microgrowagents.skills.workflows import RecommendMediaWorkflow

# Initialize workflow
workflow = RecommendMediaWorkflow()

# Recommend organism-specific medium
result = workflow.run(
    query="Recommend medium for methanotrophic bacteria",
    organism="Methylococcus capsulatus",
    temperature=42.0,
    pH=6.8,
    carbon_source="methane",
    oxygen="aerobic",
    goals="defined,selective",
    output_format="markdown"
)
print(result)

Features:

Multi-source Evidence Integration: Combines KG-Microbe, literature, and MP database
Organism-Specific: Tailored to target organism metabolic requirements
Complete Formulation: Ingredient list with concentrations, roles, and confidence scores
Chemical Compatibility: Validates precipitation and antagonism risks
Alternative Ingredients: Provides substitutes with rationales
Comprehensive Rationale: Human-readable explanations for all decisions

Example Goals:

minimal - Fewest ingredients, core nutrients only
defined - All ingredients chemically defined, no undefined supplements
complex - Rich nutrients, may include vitamins and cofactors
cost_effective - Prioritizes inexpensive, common ingredients
high_yield - Optimized for biomass/product formation
selective - Includes selective agents or unusual nutrients

Output includes:

Complete ingredient list with concentrations and ranges
Predicted pH, ionic strength, and other properties
Essential nutrient roles coverage
Chemical compatibility notes
Alternative ingredient suggestions
Evidence from KG-Microbe, literature, and database
Confidence scoring based on evidence quality

See .claude/skills/recommend-media.md for detailed documentation and examples.

6. Genome Function Interpretation

Organism-specific media design using Bakta-annotated genomes (57 genomes, 667,502 features):

Key Capabilities:

Auxotrophy Detection: Automatically identify biosynthetic pathway gaps
Enzyme Queries: EC number searches with wildcard support (e.g., 1.1.*.*)
Cofactor Analysis: Determine essential cofactors that cannot be biosynthesized
Transporter Analysis: Find nutrient uptake genes for concentration refinement

CLI Examples:

# Find oxidoreductase enzymes
from microgrowagents.agents.kg_reasoning_agent import KGReasoningAgent
from pathlib import Path

agent = KGReasoningAgent(Path('data/processed/microgrow.duckdb'))
result = agent.run('genome_enzymes SAMN00114986 1.1.*')
print(f"Found {result['data']['count']} enzymes")

# Detect auxotrophies
from microgrowagents.agents.genome_function_agent import GenomeFunctionAgent

agent = GenomeFunctionAgent(Path('data/processed/microgrow.duckdb'))
result = agent.detect_auxotrophies(query='detect auxotrophies', organism='SAMN00114986')
print(f"Detected {result['data']['summary']['auxotrophies_detected']} auxotrophies")

Claude Code Agent Examples:

See docs/GENOME_FUNCTION.md for detailed examples including:

Analyzing organism metabolic capabilities
Comparing metabolic profiles of different organisms
Designing organism-specific defined media
Auxotrophy-guided media optimization
Metabolic engineering context analysis

Automatic Integration:

Genome analysis is automatically integrated into:

MediaFormulationAgent: Adds nutrients for detected auxotrophies
GenMediaConcAgent: Refines concentrations based on transporter presence/affinity
KGReasoningAgent: Adds genome_enzymes, genome_auxotrophies, genome_transporters queries

7. Claude Code Skills

The MicroGrowAgents skills framework provides 18 Claude Code skills:

Simple Skills (12 skills)

Cofactor Analysis Skill:

from microgrowagents.skills.simple import AnalyzeCofactorsSkill

skill = AnalyzeCofactorsSkill()
result = skill.run(
    organism="SAMN31331780",  # M. extorquens AM-1
    base_medium="MP",
    output_format="markdown"
)
print(result)

Genome Analysis Skill:

from microgrowagents.skills.simple import AnalyzeGenomeSkill

skill = AnalyzeGenomeSkill()
result = skill.run(
    query="Find all methanol dehydrogenases",
    organism="SAMN31331780",
    analysis_type="enzymes",
    ec_pattern="1.1.2.*",
    output_format="markdown"
)
print(result)

Knowledge Graph Query Skill:

from microgrowagents.skills.simple import QueryKnowledgeGraphSkill

skill = QueryKnowledgeGraphSkill()
result = skill.run(
    query="Find media for Methylococcus capsulatus",
    query_type="organism_media",
    output_format="markdown"
)
print(result)

Other Simple Skills:

PredictConcentrationSkill - Predict ingredient concentrations
FindAlternatesSkill - Find alternative ingredients
AnalyzeSensitivitySkill - Sensitivity analysis for pH/salinity
ClassifyRoleSkill - Classify ingredient metabolic roles
SearchLiteratureSkill - Search scientific literature
QueryDatabaseSkill - SQL queries on MP medium database
CalculateChemistrySkill - Calculate osmotic/redox/nutrient properties
AnnotateTransportersSkill - Annotate transporter systems in genomes
PredictTransportRequirementsSkill - Predict transport requirements for medium ingredients

Workflow Skills (3 workflows)

RecommendMediaWorkflow - Comprehensive media formulation recommendation
OptimizeMediumWorkflow - Medium optimization for specific goals
IngredientReportWorkflow - Detailed ingredient analysis reports

Utility Skills (3 utilities)

InitializeDatabaseSkill - Database initialization and validation
ExportResultsSkill - Export results to JSON/CSV/Excel
ValidateIngredientSkill - Ingredient validation and normalization

See src/microgrowagents/skills/ for complete skill documentation.

8. Integration Scripts

Standalone integration scripts for specific analyses:

# MP medium with osmotic properties
uv run python scripts/analyze_mp_medium_osmotic.py --plot --output-json results.json

# Generate visualization plots
uv run python scripts/analyze_mp_medium_osmotic.py --plot --plot-output mp_osmotic.png

Advanced Usage

Combining Multiple Property Calculations

Calculate all advanced properties simultaneously:

uv run python run.py sensitivity "MP medium" \
    --calculate-osmotic \
    --calculate-redox \
    --calculate-nutrients \
    --ph 7.0 \
    --temperature 30 \
    --format json \
    --output complete_analysis.json

Pipeline Mode

Use gen-media-conc output as input to sensitivity:

# Step 1: Generate concentration predictions
uv run python run.py gen-media-conc "MP medium" --format json > predictions.json

# Step 2: Run sensitivity analysis on predictions
uv run python run.py sensitivity --input-file predictions.json --calculate-osmotic

Python API

Use MicroGrowAgents programmatically:

from microgrowagents.agents.sensitivity_analysis_agent import SensitivityAnalysisAgent

# Initialize agent
agent = SensitivityAnalysisAgent(db_path="data/microgrowdb.db")

# Run analysis with advanced properties
result = agent.run(
    query="MP medium",
    mode="medium",
    calculate_osmotic=True,
    calculate_redox=True,
    calculate_nutrients=True,
    temperature=37.0
)

# Access results
baseline = result["baseline"]
print(f"pH: {baseline['ph']}")
print(f"Osmolarity: {baseline['osmotic_properties']['osmolarity']} mOsm/L")
print(f"Limiting nutrient: {baseline['nutrient_ratios']['limiting_nutrient']}")

Chemistry Modules

Osmotic Properties

Module: microgrowagents.chemistry.osmotic_properties

Functions:

calculate_osmolarity(ingredients, temperature=25.0) - Calculate osmolarity and osmolality
calculate_water_activity(ingredients, temperature=25.0, method="raoult") - Calculate water activity
estimate_van_hoff_factor(formula, charge, name) - Estimate dissociation factor

Methods:

Raoult's law (dilute solutions)
Robinson-Stokes (concentrated solutions)
Bromley equation (high ionic strength)

Redox Properties

Module: microgrowagents.chemistry.redox_properties

Functions:

calculate_redox_potential(ingredients, ph, temperature=25.0) - Calculate Eh and pE
calculate_electron_balance(ingredients) - Calculate electron donor/acceptor balance

Constants:

Standard redox potentials (E0' at pH 7)
Electron equivalents for common compounds

Nutrient Ratios

Module: microgrowagents.chemistry.nutrient_ratios

Functions:

calculate_cnp_ratios(ingredients) - Calculate C:N:P ratios and limiting nutrients
calculate_trace_metal_ratios(ingredients) - Calculate trace metal requirements
parse_elemental_composition(formula) - Parse chemical formulas

References:

Redfield ratio (marine): C:N:P = 106:16:1
Terrestrial microbes: C:N:P ≈ 60:7:1

Thermodynamic Properties

Module: microgrowagents.chemistry.thermodynamic_properties

Functions:

calculate_gibbs_free_energy(reactants, products, ph=7.0) - Calculate ΔG
calculate_formation_energy(compound) - Calculate ΔGf°

Data Sources:

eQuilibrator API (biochemical thermodynamics)
Component Contribution method
pH and ionic strength corrections

Repository Structure

docs/ - MkDocs documentation
- AGENTS_SKILLS_TOOLS.md - Complete reference for all agents, skills, and tools
- OPTIMIZATION_GUIDE.md - Complete guide to data-driven v14 design
- OPTIMIZATION_QUICK_REFERENCE.md - One-page command reference
- EXPERIMENTAL_INTERPRETATION_AGENT.md - Evidence-based interpretation
- ARTIFACT_CLEANUP_POLICY.md - Retention policies and cleanup
- AUDIT_REPORT_BBOP_SKILLS.md - Audit compliance report (1,579 lines)
- STATUS.md - Current project state (start here)
src/microgrowagents/ - Source code
- agents/ - Agent implementations
- chemistry/ - Chemistry calculation modules
- database/ - Database utilities
- api_clients/ - External API clients
- skills/ - Claude Code skills framework
tests/ - Pytest test suite (86 tests, >90% coverage)
scripts/ - Integration and analysis scripts
- doi_validation/ - DOI validation scripts
- doi_corrections/ - DOI correction utilities
- pdf_downloads/ - Automated PDF retrieval
- enrichment/ - Data enrichment
- schema/ - Schema management
data/ - Database and cache files
- raw/ - Source data with checksums
- corrections/ - DOI correction definitions
- results/ - Validation and processing logs
notes/ - Research notes and documentation (27+ files)
.claude/ - Claude Code configuration
- provenance/ - Session manifests and action logs
- skills/ - Claude Code skills definitions

Development

Running Tests

# Run all tests
just test

# Run specific test file
uv run pytest tests/test_chemistry/test_osmotic_properties.py -v

# Run with coverage
uv run pytest --cov=microgrowagents --cov-report=html

Type Checking

just mypy

Code Formatting

just format

Documentation

# Serve documentation locally
just _serve

# Build documentation
mkdocs build

Documentation Website

https://CultureBotAI.github.io/MicroGrowAgents

Test Coverage

Osmotic Properties: 21/21 tests, 20 doctests
Redox Properties: 27/27 tests
Nutrient Ratios: 27/27 tests
Sensitivity Analysis: 11/11 integration tests
Total: 86 tests passing across all modules

Tools, APIs & Datasets

MicroGrowAgents integrates multiple external tools, APIs, and datasets for comprehensive microbial cultivation analysis.

External APIs

Chemical Data:

PubChem - Chemical structure and property data, molecular formulas, identifiers
ChEBI - Chemical Entities of Biological Interest ontology (DOI: 10.1093/nar/gkv1031)
eQuilibrator - Biochemical thermodynamics, Gibbs free energy calculations

Biological Databases:

KEGG - Pathway definitions, biosynthesis pathways (DOI: 10.1093/nar/gkac963)
BRENDA - Enzyme information, EC-to-cofactor relationships (DOI: 10.1093/nar/gky1048)
ExplorEnz - Enzyme Commission nomenclature (DOI: 10.1093/nar/gkn582)
UniProt - Protein sequences and functional annotations
NCBI - Genome sequences, taxonomy, literature (PubMed)

Specialized Tools (Planned):

NIST WebBook - Inorganic thermodynamic data

Knowledge Graphs & Datasets

KG-Microbe (Primary Knowledge Graph):

1.5M nodes, 5.1M edges - Comprehensive microbial knowledge integration
864,363 validated species - Bacteria, archaea, fungi, protozoa
Sources: GTDB (Genome Taxonomy Database), LPSN (List of Prokaryotic names), NCBI Taxonomy
Content: Organism metadata, growth requirements, media formulations, enzyme-substrate relationships

Genome Annotations:

57 Bakta-annotated genomes - 667,502 features total
Includes: Methylorubrum extorquens AM1, Methylococcus capsulatus, other model organisms
Features: EC numbers, GO terms, gene products, cofactor requirements, transporter systems

Chemical Embeddings:

208,000+ chemical embeddings - Morgan fingerprints and molecular descriptors
Use: Analogy-based reasoning, chemical similarity search, alternative ingredient discovery

MP Medium Database:

158 ingredients - Complete MP medium ingredient properties
68 columns - 47 data properties + 21 organism context fields
158 unique DOIs - 90.5% citation coverage (143/158 with evidence)
92 PDFs, 44 abstracts - Full-text evidence for ingredient recommendations

Literature Corpus:

245+ papers - Microbial cultivation and growth media design
Extended information sheets - Structured metadata extraction
Full-text search - PDF evidence extraction and excerpt retrieval

External Software Tools

Metabolic Modeling:

GapMind - Metabolic pathway gap analysis (Morgan Price lab)
GEMsembler - Genome-scale metabolic model reconstruction
COBRApy - Constraint-based reconstruction and analysis (FBA)

Genome Annotation:

Bakta - Rapid & standardized bacterial genome annotation
NCBI BLAST - Sequence similarity search

Experimental Design:

MaxPro OptBlock - Maximum projection optimal blocking design (custom implementation)
Latin Hypercube Sampling - Space-filling experimental designs

Growth Prediction:

GrowthCodon - Codon usage bias-based growth prediction
MediaDive - Media database and search tool

Python Libraries & Dependencies

Core Scientific Computing:

numpy - Numerical operations
pandas - Data manipulation and analysis
scipy - Statistical functions, optimization
scikit-learn - Machine learning (GP regression, Random Forest, PCA)

Chemistry & Thermodynamics:

rdkit - Chemical informatics and molecular fingerprints
equilibrator-api - Biochemical thermodynamics

Visualization:

matplotlib - Plotting and visualization
seaborn - Statistical visualization
plotly - Interactive dashboards

Database & Knowledge Graphs:

duckdb - Embedded analytical database
sqlalchemy - Database ORM
linkml - Linked data modeling language

Optimization & Modeling:

scikit-optimize - Bayesian optimization
SALib - Sensitivity analysis (Sobol indices)
statsmodels - Statistical modeling and ANOVA

Development:

pytest - Testing framework
mypy - Static type checking
ruff - Linting and formatting
uv - Fast Python package manager

Data Provenance

All datasets and tools are properly cited and documented:

See data/raw/mp_medium_ingredient_properties.csv for ingredient data with DOI citations
See docs/STATUS.md for citation coverage metrics
See notes/DOI_CORRECTIONS_FINAL_UPDATED.md for DOI validation and corrections
See docs/cofactor_data_sources.md for cofactor analysis sources

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Write tests for new functionality
Ensure all tests pass (just test)
Submit a pull request

License

BSD 3-Clause License. See LICENSE for details.

Credits

This project uses the template monarch-project-copier

Citation

If you use MicroGrowAgents in your research, please cite this repository.

Contact

Principal Investigator: Dr. Marcin P. Joachimiak

Institution: Lawrence Berkeley National Laboratory
Project: CultureBotAI Initiative
GitHub: CultureBotAI

For questions or issues:

Open an issue on GitHub Issues
See CLAUDE.md for development guidance

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.claude		.claude
.github		.github
MicroGrowAgents/tests		MicroGrowAgents/tests
ai-config		ai-config
data		data
docs		docs
examples		examples
external_tools/gapmind		external_tools/gapmind
notes		notes
outputs		outputs
outreach		outreach
project		project
scripts		scripts
src/microgrowagents		src/microgrowagents
tests		tests
.DS_Store		.DS_Store
.copier-answers.yml		.copier-answers.yml
.editorconfig		.editorconfig
.gitignore		.gitignore
.goosehints		.goosehints
.pre-commit-config.yaml		.pre-commit-config.yaml
.yamllint.yaml		.yamllint.yaml
ARCHITECTURE.txt		ARCHITECTURE.txt
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
GCA_026122615.1_ASM2612261v1_protein.faa.gz		GCA_026122615.1_ASM2612261v1_protein.faa.gz
LHS_design_v3_20260121.tar.gz		LHS_design_v3_20260121.tar.gz
LHS_design_v4_20260121.tar.gz		LHS_design_v4_20260121.tar.gz
LHS_design_v5_20260121.tar.gz		LHS_design_v5_20260121.tar.gz
LHS_design_v6_20260121.tar.gz		LHS_design_v6_20260121.tar.gz
LHS_design_v7_20260121.tar.gz		LHS_design_v7_20260121.tar.gz
LICENSE		LICENSE
MP_latinhypercube_v10_maxprooptblock_long_20260128.tar.gz		MP_latinhypercube_v10_maxprooptblock_long_20260128.tar.gz
MP_latinhypercube_v10_maxprooptblock_long_20260129.tar.gz		MP_latinhypercube_v10_maxprooptblock_long_20260129.tar.gz
MP_latinhypercube_v10_maxprooptblock_long_20260130.tar.gz		MP_latinhypercube_v10_maxprooptblock_long_20260130.tar.gz
MP_latinhypercube_v11_maxprooptblock_long_20260210.tar.gz		MP_latinhypercube_v11_maxprooptblock_long_20260210.tar.gz
MP_latinhypercube_v11_maxprooptblock_long_20260211.tar.gz		MP_latinhypercube_v11_maxprooptblock_long_20260211.tar.gz
MP_latinhypercube_v12_maxprooptblock_long_20260211.tar.gz		MP_latinhypercube_v12_maxprooptblock_long_20260211.tar.gz
MP_latinhypercube_v7_maxprooptblock_long_20260122.tar.gz		MP_latinhypercube_v7_maxprooptblock_long_20260122.tar.gz
MP_latinhypercube_v7_maxprooptblock_long_20260123.tar.gz		MP_latinhypercube_v7_maxprooptblock_long_20260123.tar.gz
MP_latinhypercube_v7_random_20260121.tar.gz		MP_latinhypercube_v7_random_20260121.tar.gz
MP_latinhypercube_v7_random_20260122.tar.gz		MP_latinhypercube_v7_random_20260122.tar.gz
MP_latinhypercube_v7_random_20260123.tar.gz		MP_latinhypercube_v7_random_20260123.tar.gz
MP_latinhypercube_v8_maxprooptblock_long_20260126.tar.gz		MP_latinhypercube_v8_maxprooptblock_long_20260126.tar.gz
MP_latinhypercube_v9_maxprooptblock_long_20260127.tar.gz		MP_latinhypercube_v9_maxprooptblock_long_20260127.tar.gz
README.md		README.md
ai.just		ai.just
doi_search_links.html		doi_search_links.html
download.yaml		download.yaml
download_public.yaml		download_public.yaml
justfile		justfile
mkdocs.yml		mkdocs.yml
mypy.ini		mypy.ini
organize_files.sh		organize_files.sh
plate_designs_v8_maxprooptblock_long.tar.gz		plate_designs_v8_maxprooptblock_long.tar.gz
project.justfile		project.justfile
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
run.py		run.py
sensitivity_plot.png		sensitivity_plot.png
uv.lock		uv.lock

License

CultureBotAI/MicroGrowAgents

Folders and files

Latest commit

History

Repository files navigation

MicroGrowAgents

Table of Contents

Overview

Key Achievements

Key Features

Agents & Skills

Core Agents (28)

Skills (50)

Analysis Skills (19)

Prediction & Design Skills (12)

Query & Search Skills (5)

Chemistry & Validation Skills (5)

Workflow Skills (6)

Cofactor Analysis Data Sources

Primary Databases

Knowledge Graph Integration

Reference Files

Example: Cofactor Analysis for M. extorquens AM-1

Experimental Analysis & Optimization

Features

Quick Start: Analyze Experimental Data

Response Surface Modeling

Optimization Workflow

Key Results: M. extorquens AM-1 Lanthanide Study

Schema & Validation

Evidence-Based Interpretation

Data Integrity & Provenance

Input Data Checksums

Artifact Cleanup Policy

Audit Compliance

Citation Coverage

Installation

Prerequisites

Quick Install

Quick Start

Generate Media Concentrations

Sensitivity Analysis

Advanced Chemistry Analysis

Core Capabilities

1. Media Concentration Generation (gen-media-conc)

2. Sensitivity Analysis (sensitivity)

3. Advanced Chemistry Properties

Osmotic Properties

Redox Properties

Nutrient Ratios

4. Media Comparison

5. Media Formulation Recommendation (recommend-media workflow)

6. Genome Function Interpretation

7. Claude Code Skills

Simple Skills (12 skills)

Workflow Skills (3 workflows)

Utility Skills (3 utilities)

8. Integration Scripts

Advanced Usage

Combining Multiple Property Calculations

Pipeline Mode

Python API

Chemistry Modules

Osmotic Properties

Redox Properties

Nutrient Ratios

Thermodynamic Properties

Repository Structure

Development

Running Tests

Type Checking

Code Formatting

Documentation

Documentation Website

Test Coverage

Tools, APIs & Datasets

External APIs

Knowledge Graphs & Datasets

External Software Tools

1. Media Concentration Generation (`gen-media-conc`)

2. Sensitivity Analysis (`sensitivity`)

5. Media Formulation Recommendation (`recommend-media` workflow)

Packages