Transform computational notebooks from code-first to article-first. Write what you want to analyze in natural language; let AI generate the code.
Digital Article inverts the traditional computational notebook paradigm. Instead of writing code to perform analysis, you describe your analysis in natural language, and the system generates, executes, and documents the code for you—automatically creating publication-ready scientific methodology text.
[Code: Data loading, cleaning, analysis]
[Output: Plots and tables]
[Prompt: "Analyze gene expression distribution across experimental conditions"]
[Generated Methodology: "To assess gene expression patterns, data from 6 samples..."]
[Results: Plots and tables]
[Code: Available for inspection and editing]
- Natural Language Analysis: Write prompts like "create a heatmap of gene correlations" instead of Python code
- Intelligent Code Generation: LLM-powered code generation using AbstractCore (supports LMStudio, Ollama, OpenAI, and more)
- Auto-Retry Error Fixing: System automatically debugs and fixes generated code (up to 3 attempts)
- Scientific Methodology Generation: Automatically creates article-style explanations of your analysis
- Rich Output Capture: Matplotlib plots, Plotly interactive charts, Pandas tables, and text output
- Publication-Ready PDF Export: Generate scientific article PDFs with methodology, results, and optional code
- Transparent Code Access: View, edit, and understand all generated code
- Persistent Execution Context: Variables and DataFrames persist across cells (like Jupyter)
- Workspace Isolation: Each notebook has its own data workspace
- Domain Experts (biologists, clinicians, social scientists): Perform sophisticated analyses without programming expertise
- Data Scientists: Accelerate exploratory analysis and documentation
- Researchers: Create reproducible analyses with built-in methodology text
- Educators: Teach data analysis concepts without syntax barriers
- Anyone who wants to think in terms of what to analyze rather than how to code it
- Python 3.11+
- Node.js 18+
- LMStudio or Ollama (for local LLM) OR OpenAI API key
# Clone repository
git clone https://github.com/lpalbou/digitalarticle.git
cd digitalarticle
# Set up Python environment
python -m venv .venv
source .venv/bin/activate # On macOS/Linux
pip install -e . # Installs from pyproject.toml
# Set up frontend
cd frontend
npm install
cd ..To run the application locally (e.g., on your Mac/PC):
# Terminal 1: Backend
da-backend
# Terminal 2: Frontend
da-frontendThen open http://localhost:3000
When running locally, the config.json at the root should use relative paths:
{
"llm": { "provider": "ollama", "model": "gemma3n:e2b" },
"paths": {
"notebooks_dir": "data/notebooks",
"workspace_dir": "data/workspace"
}
}Full setup guide: See Getting Started
We provide Monolithic (single container) and 3-Tier (microservices) deployment options.
Best for quick deployment or platforms that accept a single Dockerfile (Render, Railway).
-
Copy the appropriate Dockerfile to root:
# For Standard CPU (Linux/Intel) cp docker/monolithic/Dockerfile Dockerfile # For NVIDIA GPU cp docker/monolithic/Dockerfile.nvidia Dockerfile
-
Build and Run:
docker build -t digital-article . docker run -p 80:80 -v ./data:/app/data digital-article
Docker on macOS runs in a Linux VM and cannot access the Neural Engine or GPU directly due to architectural limitations of the Apple Virtualization Framework. This causes the built-in Ollama to run on CPU emulation, which is significantly slower (e.g., 34s vs 7s for generation).
To get native performance (7s) with Docker:
- Run Ollama natively on your Mac (installs globally):
ollama serve
- Run the Standard CPU container but point it to your host's Ollama:
Note: This "Hybrid Approach" gives you container isolation for the app while leveraging your Mac's full hardware acceleration for AI.
# 1. Copy the generic CPU Dockerfile cp docker/monolithic/Dockerfile Dockerfile # 2. Build docker build -t digital-article . # 3. Run with host.docker.internal docker run -p 80:80 \ -v ./data:/app/data \ -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \ digital-article
Best for development and production flexibility.
docker compose up -dNote on Docker Paths:
When running in Docker, the container uses environment variables to override paths, so they point to absolute paths inside the container (e.g., /app/data/notebooks). You do not need to change config.json manually for Docker; the image handles it.
Full Docker guide: See docker/README.md
Digital Article requires an LLM provider to generate code from prompts. The system provides flexible configuration options:
- Click the Settings button in the header to select your provider and model
- Changes persist across sessions and apply to all new notebooks
- Configuration is saved to
config.jsonin the project root
- Each notebook can use a different provider/model if needed
- New notebooks automatically inherit the global configuration
- Notebook-specific settings override global defaults during execution
- The status footer at the bottom shows the current provider, model, and context size
- Real-time updates when configuration changes
- Click the footer's Settings button for quick access to configuration
All configuration works seamlessly when accessing Digital Article from remote machines (e.g., http://server-ip:3000). The settings modal and status footer use relative API paths for proper remote connectivity.
Prompt:
Load gene_expression.csv and show the distribution of expression values
Generated Code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('data/gene_expression.csv')
plt.figure(figsize=(10, 6))
sns.histplot(df.values.flatten(), bins=50, kde=True)
plt.title('Distribution of Gene Expression Values')
plt.xlabel('Expression Level')
plt.ylabel('Frequency')
plt.show()
print(f"Dataset shape: {df.shape}")
print(f"Mean expression: {df.values.mean():.2f}")
print(f"Std expression: {df.values.std():.2f}")Generated Methodology:
To assess the overall distribution of gene expression levels, the dataset
containing 20 genes across 6 experimental conditions was examined. The
analysis revealed a mean expression level of 15.3 ± 4.2 across all genes,
with a right-skewed distribution indicative of heterogeneous expression
patterns.
Cell 1: "Load patient_data.csv and show basic statistics"
Cell 2: "Create a scatter plot of age vs blood_pressure colored by gender"
Cell 3: "Perform t-test comparing blood pressure between genders"
Cell 4: "Generate a summary table with mean values by gender"
Each cell builds on the previous context, with variables persisting across cells.
Frontend (React + TypeScript)
↓ HTTP/REST
Backend (FastAPI)
↓
Services Layer
├─ LLMService (AbstractCore → LMStudio/Ollama/OpenAI)
├─ ExecutionService (Python code execution sandbox)
├─ NotebookService (orchestration)
└─ PDFService (scientific article generation)
↓
Data Layer
├─ Notebooks (JSON files)
└─ Workspaces (isolated data directories)
Detailed architecture: See Architecture Documentation
- FastAPI - Modern Python web framework
- AbstractCore - LLM provider abstraction
- Pandas, NumPy, Matplotlib, Plotly - Data analysis and visualization
- Pydantic - Data validation and serialization
- ReportLab/WeasyPrint - PDF generation
- React 18 + TypeScript - UI framework with type safety
- Vite - Lightning-fast dev server and build tool (runs on port 3000)
- Tailwind CSS - Utility-first styling
- Monaco Editor - Code viewing
- Plotly.js - Interactive visualizations
- Axios - HTTP client
Digital Article is built on the belief that analytical tools should adapt to how scientists think, not the other way around. Key principles:
- Article-First: The narrative is primary; code is a derived implementation
- Transparent Generation: All code is inspectable and editable
- Scientific Rigor: Auto-generate methodology text suitable for publications
- Progressive Disclosure: Show complexity only when needed
- Intelligent Recovery: Auto-fix errors before asking for user intervention
Full philosophy: See Philosophy Documentation
- Getting Started Guide - Installation and first analysis
- Architecture Documentation - System design and component breakdown
- Error Handling System - Intelligent error recovery and auto-retry system
- Philosophy - Design principles and motivation
- Roadmap - Planned features and development timeline
Version: 0.3.12 (Beta)
Working Features:
- ✅ Natural language to code generation
- ✅ Code execution with rich output capture
- ✅ Auto-retry error correction (up to 3 attempts)
- ✅ Scientific methodology generation
- ✅ Matplotlib and Plotly visualization support
- ✅ Pandas DataFrame capture and display
- ✅ Multi-format export (JSON, HTML, Markdown)
- ✅ Scientific PDF export
- ✅ File upload and workspace management
- ✅ Persistent execution context across cells
Known Limitations:
⚠️ Single-user deployment only (no multi-user authentication)⚠️ Code execution in same process as server (not production-safe)⚠️ JSON file storage (not scalable to many notebooks)⚠️ No real-time collaboration⚠️ LLM latency makes it unsuitable for real-time applications
Production Readiness: This is a research prototype suitable for single-user or small team deployment. Production use requires:
- Containerized code execution
- Database storage (PostgreSQL)
- Authentication and authorization
- Job queue for LLM requests
- See Architecture - Deployment Considerations
"Load RNA-seq counts and perform differential expression analysis between treatment and control"
"Create a volcano plot highlighting significantly differentially expressed genes"
"Generate a heatmap of top 50 DE genes with hierarchical clustering"
"Analyze patient outcomes by treatment group with survival curves"
"Test for significant differences in biomarkers across cohorts"
"Create a forest plot of hazard ratios for different risk factors"
"Load the dataset and identify missing values and outliers"
"Perform PCA and visualize the first two principal components"
"Fit a linear model predicting outcome from predictors and show coefficients"
| Feature | Digital Article | Jupyter | ChatGPT Code Interpreter | Observable |
|---|---|---|---|---|
| Natural language prompts | ✅ Primary | ❌ | ✅ | ❌ |
| Code transparency | ✅ Always visible | ✅ | ||
| Local LLM support | ✅ | ❌ | ❌ | ❌ |
| Auto-error correction | ✅ 3 retries | ❌ | ❌ | |
| Scientific methodology | ✅ Auto-generated | ❌ | ❌ | ❌ |
| Publication PDF export | ✅ | ❌ | ❌ | |
| Persistent context | ✅ | ✅ | ✅ | |
| Self-hosted | ✅ | ✅ | ❌ | ❌ |
Near Term (Q2 2025):
- Enhanced LLM prompt templates for specific domains
- Version control integration (git-style cell history)
- Improved error diagnostics and suggestions
- Additional export formats (LaTeX, Quarto)
Medium Term (Q3-Q4 2025):
- Collaborative editing (real-time multi-user)
- Database backend (PostgreSQL)
- Containerized code execution (Docker)
- Template library (common analysis workflows)
Long Term (2026+):
- LLM-suggested analysis strategies
- Active learning from user corrections
- Integration with laboratory information systems
- Plugin architecture for domain-specific extensions
Full roadmap: See ROADMAP.md
We welcome contributions! Areas where help is needed:
- Testing: Try the system with your data and report issues
- Documentation: Improve guides, add examples
- LLM Prompts: Enhance code generation quality
- UI/UX: Improve the interface
- Domain Templates: Add analysis workflows for specific fields
See CONTRIBUTING.md for development guidelines.
MIT License - see LICENSE file for details.
If you use Digital Article in your research, please cite:
@software{digital_article_2025,
title = {Digital Article: Natural Language Computational Notebooks},
author = {Laurent-Philippe Albou},
year = {2025},
url = {https://github.com/lpalbou/digitalarticle}
}- AbstractCore for LLM provider abstraction
- LMStudio and Ollama for local LLM serving
- FastAPI and React communities for excellent frameworks
- Inspired by literate programming (Knuth), computational essays (Wolfram), and Jupyter notebooks
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: contact@abstractcore.ai
We're not building a better notebook. We're building a different kind of thinking tool—one that speaks the language of science, not just the language of code.
