A powerful compiler that converts human-readable text into structured DSL (Domain Specific Language), supporting both controlled scripts and natural language input with LLM enhancement.
-
Dual Input Modes:
- Controlled scripts with explicit directives (
@task,@tool, etc.) - Free-form natural language with LLM-powered structuring
- Controlled scripts with explicit directives (
-
Multi-format Output: YAML, JSON, and Protocol Buffers
-
Advanced Processing: Lexical analysis, semantic validation, optimization
-
LLM Integration: Support for multiple LLM providers (DashScope, OpenAI)
-
Debug & Analysis: Save intermediate DSL code generated by LLM for debugging and optimization
-
CLI & Library: Both command-line tool and Python library interface
-
Structured Representation: Complex conditionals, tool calls, agent invocations, and flow control
- Python 3.12+
- uv package manager
# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone the repository (replace with actual repository URL)
git clone <your-repository-url>
cd human-text
# Install dependencies and create virtual environment
# 项目已配置国内镜像源,中国大陆用户可享受更快的下载速度
uv sync
# Activate the virtual environment
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install in development mode
uv pip install -e .项目已内置国内镜像源配置(pyproject.toml 中的 [tool.uv] 部分),支持以下镜像:
- 清华大学镜像(主要):https://pypi.tuna.tsinghua.edu.cn/simple
- 阿里云镜像:https://mirrors.aliyun.com/pypi/simple/
- 腾讯云镜像:https://mirrors.cloud.tencent.com/pypi/simple/
- 百度镜像:https://mirror.baidu.com/pypi/simple/
- 豆瓣镜像:https://pypi.douban.com/simple/
如需自定义镜像源,可以修改 pyproject.toml 中的 [tool.uv] 部分,或在用户目录创建 ~/.config/uv/uv.toml。
from dsl_compiler import compile, CompilerConfig
# Create configuration
config = CompilerConfig(
llm_enabled=True,
output_format="yaml"
)
# Compile a file
result = compile("input.txt", config)
print(result.to_yaml())
# Compile from string
source_code = """
@task data_processing
Process user data from database
Validate and clean the data
Generate comprehensive report
@var user_id = 12345
@tool data_validator
Tool for validating data integrity
"""
result = compile(source_code, config)# Basic compilation
uv run dslc input.txt -o output.yaml
# Different output formats
uv run dslc input.txt -f json -o output.json
# Disable LLM for faster processing
uv run dslc input.txt --no-llm
# Syntax validation only
uv run dslc validate input.txt
# Show configuration
uv run dslc config --show
# Or use the traditional Python module syntax
uv run python -m dsl_compiler.cli input.txt -o output.yaml@task task_name
Task description
Detailed steps and instructions...
@var variable_name = value
@var user_id = 12345
@var debug_mode = true
@var config_file = "settings.json"
@tool tool_name
Tool description and usage instructions
@agent AgentName(param1=value1, param2=value2)
@next target_task
@if condition_expression
Actions when condition is true
@else
Actions when condition is false
@endif
@task order_validation
Validate customer order
@tool check_order
Order validation tool
@if result.valid == false
Order is invalid, terminate process
@next END
@else
Proceed with order processing
@next process_payment
@endif
The above compiles to:
version: "1.0"
tasks:
- id: order_validation
title: Order validation
body:
- type: text
content: "Validate customer order"
line_number: 2
- type: tool_call
tool_call:
name: check_order
description: "Order validation tool"
line_number: 4
- type: conditional
conditional:
branches:
- condition: "result.valid == false"
actions:
- type: text
content: "Order is invalid, terminate process"
- type: jump
jump:
target: END
- condition: null # else branch
actions:
- type: text
content: "Proceed with order processing"
- type: jump
jump:
target: process_payment
line_number: 6Copy dsl_compiler/env.example to .env and configure:
# Output format
DSL_OUTPUT_FORMAT=yaml
# LLM configuration
DSL_LLM_ENABLED=true
DSL_LLM_PROVIDER=dashscope
DSL_LLM_API_KEY=your_api_key_here
DSL_LLM_MODEL=qwen-turbo
DSL_LLM_SAVE_INTERMEDIATE=false
DSL_LLM_INTERMEDIATE_DIR=
# Performance settings
DSL_MAX_FILE_SIZE=10485760
DSL_PARSE_TIMEOUT=60
# Debug settings
DSL_DEBUG=false
DSL_LOG_LEVEL=INFO| Option | Default | Description |
|---|---|---|
output_format |
yaml |
Output format (yaml/json/proto) |
llm_enabled |
true |
Enable LLM enhancement |
llm_provider |
dashscope |
LLM provider |
llm_save_intermediate |
false |
Save intermediate DSL code |
llm_intermediate_dir |
null |
Directory for intermediate files |
strict_mode |
true |
Strict validation mode |
compact_mode |
false |
Compact output format |
max_file_size |
10MB |
Maximum file size |
parse_timeout |
60s |
Parse timeout |
The compiler supports multiple LLM providers for natural language processing:
export DSL_LLM_PROVIDER=dashscope
export DSL_LLM_API_KEY=your_dashscope_key
export DSL_LLM_MODEL=qwen-turboexport DSL_LLM_PROVIDER=openai
export DSL_LLM_API_KEY=your_openai_key
export DSL_LLM_MODEL=gpt-3.5-turbo###Save intermediate results
To debug and analyze the LLM conversion process, you can save the intermediate DSL code generated by LLM:
```bash
#Enable intermediate result saving
export DSL_LLM_SAVE_INTERMEDIATE=true
#Specify the save directory (optional, default to the llm_intermediate sub directory under the source file directory)
export DSL_LLM_INTERMEDIATE_DIR=./intermediate_results
After activation, each LLM conversion will generate a timestamp '. dsl' file, which includes: -Original DSL code -Generate time and source information -LLM provider and model information used
Example generated file name: level_2_cedium_natural_1lm_generated_20250714_162839.dsl
####Configuration Example
from dsl_compiler import CompilerConfig
config = CompilerConfig(
llm_enabled=True,
llm_save_intermediate=True, #Enable intermediate result saving
llm_intermediate_dir="./debug_results", #Specify the save directory
Debug=True # Enable debug mode to view saved information
)The compiler follows a multi-stage pipeline:
Input Text → Preprocessor → Lexer → Parser → Semantic Analyzer
↓
Output ← Serializer ← Optimizer ← Validator ← LLM Augmentor
- Preprocessor: BOM removal, line normalization, tab expansion
- Lexer: Tokenization with indentation tracking
- Parser: AST construction with directive parsing
- Semantic Analyzer: Symbol table building, type checking, scope validation
- LLM Augmentor: Natural language enhancement (optional)
- Validator: DAG validation, reference checking, conflict detection
- Optimizer: Dead code elimination, constant folding, text compression
- Serializer: Multi-format output generation
version: "1.0"
tasks:
- id: "data_processing"
title: "Data Processing Task"
body:
- type: "text"
content: "Process user data"
line_number: 2{
"version": "1.0",
"tasks": [
{
"id": "data_processing",
"title": "Data Processing Task",
"body": [
{
"type": "text",
"content": "Process user data",
"line_number": 2
}
]
}
]
}syntax = "proto3";
package dsl;
message DSLWorkflow {
string version = 1;
map<string, string> metadata = 2;
repeated Task tasks = 3;
}src/dsl_compiler/
├── __init__.py # Main interface
├── config.py # Configuration management
├── compiler.py # Main compiler logic
├── preprocessor.py # Text preprocessing
├── lexer.py # Lexical analyzer
├── parser.py # Syntax parser
├── semantic_analyzer.py # Semantic analysis
├── llm_augmentor.py # LLM enhancement
├── validator.py # Validation engine
├── optimizer.py # Code optimization
├── serializer.py # Output serialization
├── cli.py # Command-line interface
├── models.py # Data models
├── exceptions.py # Exception classes
└── requirements.txt # Dependencies
# Install development dependencies
pip install pytest pytest-asyncio black flake8 mypy
# Run tests
python -m pytest tests/
# Run with coverage
python -m pytest --cov=src/dsl_compiler tests/# Format code
black src/
# Lint code
flake8 src/
# Type checking
mypy src/The compiler provides detailed error information:
from dsl_compiler import compile
from dsl_compiler.exceptions import CompilerError, ValidationError
try:
result = compile("input.txt")
except ValidationError as e:
print(f"Validation error: {e}")
print(f"Rule: {e.rule}")
print(f"Suggestions: {e.suggestions}")
except CompilerError as e:
print(f"Compilation error: {e}")
print(f"File: {e.source_file}")
print(f"Line: {e.line}")- Dead Code Elimination: Remove unreachable code blocks
- Constant Folding: Evaluate constant expressions at compile time
- Text Compression: Optimize text content while preserving meaning
- Structure Optimization: Flatten unnecessary nesting
- Duplicate Removal: Eliminate redundant definitions
-
LLM Call Failures
- Check API key configuration
- Verify network connectivity
- Check LLM service status
-
Parse Errors
- Validate directive format
- Check file encoding (should be UTF-8)
- Review detailed error messages
-
Performance Issues
- Disable LLM with
--no-llmflag - Reduce file size
- Adjust timeout settings
- Disable LLM with
# Enable debug output
python -m dsl_compiler.cli input.txt --debug
# Set environment variable
export DSL_DEBUG=true- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Run the test suite
- Submit a pull request
Complete documentation is available in the doc/ directory:
- 🔧 Architecture Documentation - Compiler internals and API reference
- 🚀 Development Guide - Environment setup, toolchain, and development workflow
For quick start, see the sections above. For detailed development information, visit the documentation directory.
MIT License
-* * LLM intermediate result saving function * *: Added the function of saving intermediate DSL code generated by LLM for easy debugging and analysis of the conversion process -Add configuration options' llm_save_inttermediate 'and' llm_mediate-dir '` -Automatically generate timestamped '. dsl' files containing complete metadata information -Support configuration through environment variables' DSL_LLM_SAVEINTERMEDIATE 'and' DSL_LLM∝MEDIATe-DIR '
-* * Enhanced configuration management * : Added LLM intermediate result saving related configurations in config. py
- * Improved debugging experience * : In debugging mode, intermediate result save path information will be displayed
- * Document Update * *: Updated README and related documents, added instructions for using the intermediate result saving function
-Updated environment variable configuration example file env. example '
-Detailed explanation of LLM intermediate result saving function added in the README
-Deleted non-existent document references to maintain the accuracy of document links
- Complete LLM Augmentor Refactoring: Transformed from complex JSON structure analysis to direct DSL code output, significantly simplifying the processing pipeline
- Critical Error Fix: Resolved the "Expecting value: line 1 column 1 (char 0)" error in LLM response parsing
- Direct Natural Language to DSL Conversion: Implemented complete natural language content detection and conversion workflow
- Intelligent Content Detection: Automatically identifies natural language content that requires LLM enhancement
- Multi-LLM Provider Support: Enhanced integration with DashScope (Alibaba Cloud) and OpenAI APIs
- Response Cleaning Mechanism: Added Markdown code block cleaning and JSON extraction functionality
- Usage Examples and Documentation: Added
example_llm_usage.pycomprehensive usage guide
- Enhanced Error Handling: Added response validation, fallback mechanisms, and detailed error information
- Code Extraction Logic: Implemented algorithm for accurately extracting DSL code from LLM responses
- Re-parsing Workflow: Generated DSL code is reprocessed through the complete compiler pipeline
- Configuration Validation: Strengthened LLM configuration validation and error handling
- Fixed JSON parsing failures causing compilation interruption
- Resolved null value handling issues in natural language detection logic
- Fixed parsing errors caused by inconsistent LLM response formats
- Improved node handling logic in AST structure conversion
- Updated LLM integration usage instructions and configuration examples
- Added complete configuration guides for DashScope and OpenAI
- Provided practical examples of natural language to DSL conversion
- Enhanced troubleshooting and debugging guides