Human-Text DSL Compiler

A powerful compiler that converts human-readable text into structured DSL (Domain Specific Language), supporting both controlled scripts and natural language input with LLM enhancement.

Features

Dual Input Modes:
- Controlled scripts with explicit directives (@task, @tool, etc.)
- Free-form natural language with LLM-powered structuring
Multi-format Output: YAML, JSON, and Protocol Buffers
Advanced Processing: Lexical analysis, semantic validation, optimization
LLM Integration: Support for multiple LLM providers (DashScope, OpenAI)
Debug & Analysis: Save intermediate DSL code generated by LLM for debugging and optimization
CLI & Library: Both command-line tool and Python library interface
Structured Representation: Complex conditionals, tool calls, agent invocations, and flow control

Quick Start

Prerequisites

Python 3.12+
uv package manager

Installation

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone the repository (replace with actual repository URL)
git clone <your-repository-url>
cd human-text

# Install dependencies and create virtual environment
# 项目已配置国内镜像源，中国大陆用户可享受更快的下载速度
uv sync

# Activate the virtual environment
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install in development mode
uv pip install -e .

国内镜像源配置

项目已内置国内镜像源配置（pyproject.toml 中的 [tool.uv] 部分），支持以下镜像：

清华大学镜像（主要）：https://pypi.tuna.tsinghua.edu.cn/simple
阿里云镜像：https://mirrors.aliyun.com/pypi/simple/
腾讯云镜像：https://mirrors.cloud.tencent.com/pypi/simple/
百度镜像：https://mirror.baidu.com/pypi/simple/
豆瓣镜像：https://pypi.douban.com/simple/

如需自定义镜像源，可以修改 pyproject.toml 中的 [tool.uv] 部分，或在用户目录创建 ~/.config/uv/uv.toml。

Basic Usage

Python Library

from dsl_compiler import compile, CompilerConfig

# Create configuration
config = CompilerConfig(
    llm_enabled=True,
    output_format="yaml"
)

# Compile a file
result = compile("input.txt", config)
print(result.to_yaml())

# Compile from string
source_code = """
@task data_processing
    Process user data from database
    Validate and clean the data
    Generate comprehensive report

@var user_id = 12345
@tool data_validator
    Tool for validating data integrity
"""

result = compile(source_code, config)

Command Line Interface

# Basic compilation
uv run dslc input.txt -o output.yaml

# Different output formats
uv run dslc input.txt -f json -o output.json

# Disable LLM for faster processing
uv run dslc input.txt --no-llm

# Syntax validation only
uv run dslc validate input.txt

# Show configuration
uv run dslc config --show

# Or use the traditional Python module syntax
uv run python -m dsl_compiler.cli input.txt -o output.yaml

Syntax Guide

Basic Directives

Task Definition

@task task_name
    Task description
    
    Detailed steps and instructions...

Variable Declaration

@var variable_name = value
@var user_id = 12345
@var debug_mode = true
@var config_file = "settings.json"

Tool Definition

@tool tool_name
    Tool description and usage instructions

Agent Invocation

@agent AgentName(param1=value1, param2=value2)

Flow Control

@next target_task

@if condition_expression
    Actions when condition is true
@else
    Actions when condition is false
@endif

Advanced Features

Conditional Statements

@task order_validation
    Validate customer order
    
    @tool check_order
        Order validation tool
    
    @if result.valid == false
        Order is invalid, terminate process
        @next END
    @else
        Proceed with order processing
        @next process_payment
    @endif

Structured Output Example

The above compiles to:

version: "1.0"
tasks:
  - id: order_validation
    title: Order validation
    body:
      - type: text
        content: "Validate customer order"
        line_number: 2
      - type: tool_call
        tool_call:
          name: check_order
          description: "Order validation tool"
        line_number: 4
      - type: conditional
        conditional:
          branches:
            - condition: "result.valid == false"
              actions:
                - type: text
                  content: "Order is invalid, terminate process"
                - type: jump
                  jump:
                    target: END
            - condition: null  # else branch
              actions:
                - type: text
                  content: "Proceed with order processing"
                - type: jump
                  jump:
                    target: process_payment
        line_number: 6

Configuration

Environment Variables

Copy dsl_compiler/env.example to .env and configure:

# Output format
DSL_OUTPUT_FORMAT=yaml

# LLM configuration
DSL_LLM_ENABLED=true
DSL_LLM_PROVIDER=dashscope
DSL_LLM_API_KEY=your_api_key_here
DSL_LLM_MODEL=qwen-turbo
DSL_LLM_SAVE_INTERMEDIATE=false
DSL_LLM_INTERMEDIATE_DIR=

# Performance settings
DSL_MAX_FILE_SIZE=10485760
DSL_PARSE_TIMEOUT=60

# Debug settings
DSL_DEBUG=false
DSL_LOG_LEVEL=INFO

Configuration Options

Option	Default	Description
`output_format`	`yaml`	Output format (yaml/json/proto)
`llm_enabled`	`true`	Enable LLM enhancement
`llm_provider`	`dashscope`	LLM provider
`llm_save_intermediate`	`false`	Save intermediate DSL code
`llm_intermediate_dir`	`null`	Directory for intermediate files
`strict_mode`	`true`	Strict validation mode
`compact_mode`	`false`	Compact output format
`max_file_size`	`10MB`	Maximum file size
`parse_timeout`	`60s`	Parse timeout

LLM Integration

The compiler supports multiple LLM providers for natural language processing:

DashScope (Alibaba Cloud)

export DSL_LLM_PROVIDER=dashscope
export DSL_LLM_API_KEY=your_dashscope_key
export DSL_LLM_MODEL=qwen-turbo

OpenAI

export DSL_LLM_PROVIDER=openai
export DSL_LLM_API_KEY=your_openai_key
export DSL_LLM_MODEL=gpt-3.5-turbo

###Save intermediate results

To debug and analyze the LLM conversion process, you can save the intermediate DSL code generated by LLM:

```bash
#Enable intermediate result saving
export DSL_LLM_SAVE_INTERMEDIATE=true

#Specify the save directory (optional, default to the llm_intermediate sub directory under the source file directory)
export DSL_LLM_INTERMEDIATE_DIR=./intermediate_results

After activation, each LLM conversion will generate a timestamp '. dsl' file, which includes: -Original DSL code -Generate time and source information -LLM provider and model information used

Example generated file name: level_2_cedium_natural_1lm_generated_20250714_162839.dsl

####Configuration Example

from dsl_compiler import CompilerConfig

config = CompilerConfig(
llm_enabled=True,
llm_save_intermediate=True, #Enable intermediate result saving
llm_intermediate_dir="./debug_results", #Specify the save directory
Debug=True # Enable debug mode to view saved information
)

Architecture

The compiler follows a multi-stage pipeline:

Input Text → Preprocessor → Lexer → Parser → Semantic Analyzer
                                              ↓
Output ← Serializer ← Optimizer ← Validator ← LLM Augmentor

Components

Preprocessor: BOM removal, line normalization, tab expansion
Lexer: Tokenization with indentation tracking
Parser: AST construction with directive parsing
Semantic Analyzer: Symbol table building, type checking, scope validation
LLM Augmentor: Natural language enhancement (optional)
Validator: DAG validation, reference checking, conflict detection
Optimizer: Dead code elimination, constant folding, text compression
Serializer: Multi-format output generation

Output Formats

YAML (Default)

version: "1.0"
tasks:
  - id: "data_processing"
    title: "Data Processing Task"
    body:
      - type: "text"
        content: "Process user data"
        line_number: 2

JSON

{
  "version": "1.0",
  "tasks": [
    {
      "id": "data_processing",
      "title": "Data Processing Task",
      "body": [
        {
          "type": "text",
          "content": "Process user data",
          "line_number": 2
        }
      ]
    }
  ]
}

Protocol Buffers

syntax = "proto3";
package dsl;

message DSLWorkflow {
  string version = 1;
  map<string, string> metadata = 2;
  repeated Task tasks = 3;
}

Development

Project Structure

src/dsl_compiler/
├── __init__.py          # Main interface
├── config.py            # Configuration management
├── compiler.py          # Main compiler logic
├── preprocessor.py      # Text preprocessing
├── lexer.py             # Lexical analyzer
├── parser.py            # Syntax parser
├── semantic_analyzer.py # Semantic analysis
├── llm_augmentor.py     # LLM enhancement
├── validator.py         # Validation engine
├── optimizer.py         # Code optimization
├── serializer.py        # Output serialization
├── cli.py               # Command-line interface
├── models.py            # Data models
├── exceptions.py        # Exception classes
└── requirements.txt     # Dependencies

Running Tests

# Install development dependencies
pip install pytest pytest-asyncio black flake8 mypy

# Run tests
python -m pytest tests/

# Run with coverage
python -m pytest --cov=src/dsl_compiler tests/

Code Quality

# Format code
black src/

# Lint code
flake8 src/

# Type checking
mypy src/

Error Handling

The compiler provides detailed error information:

from dsl_compiler import compile
from dsl_compiler.exceptions import CompilerError, ValidationError

try:
    result = compile("input.txt")
except ValidationError as e:
    print(f"Validation error: {e}")
    print(f"Rule: {e.rule}")
    print(f"Suggestions: {e.suggestions}")
except CompilerError as e:
    print(f"Compilation error: {e}")
    print(f"File: {e.source_file}")
    print(f"Line: {e.line}")

Performance Features

Dead Code Elimination: Remove unreachable code blocks
Constant Folding: Evaluate constant expressions at compile time
Text Compression: Optimize text content while preserving meaning
Structure Optimization: Flatten unnecessary nesting
Duplicate Removal: Eliminate redundant definitions

Troubleshooting

Common Issues

LLM Call Failures
- Check API key configuration
- Verify network connectivity
- Check LLM service status
Parse Errors
- Validate directive format
- Check file encoding (should be UTF-8)
- Review detailed error messages
Performance Issues
- Disable LLM with --no-llm flag
- Reduce file size
- Adjust timeout settings

Debug Mode

# Enable debug output
python -m dsl_compiler.cli input.txt --debug

# Set environment variable
export DSL_DEBUG=true

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Run the test suite
Submit a pull request

📚 Documentation

Complete documentation is available in the doc/ directory:

🔧 Architecture Documentation - Compiler internals and API reference
🚀 Development Guide - Environment setup, toolchain, and development workflow

For quick start, see the sections above. For detailed development information, visit the documentation directory.

License

MIT License

Changelog

v0.1.1 (2025-07-14)

✨ New Features

-* * LLM intermediate result saving function * *: Added the function of saving intermediate DSL code generated by LLM for easy debugging and analysis of the conversion process -Add configuration options' llm_save_inttermediate 'and' llm_mediate-dir '` -Automatically generate timestamped '. dsl' files containing complete metadata information -Support configuration through environment variables' DSL_LLM_SAVEINTERMEDIATE 'and' DSL_LLM∝MEDIATe-DIR '

🛠️ Improvements

-* * Enhanced configuration management * : Added LLM intermediate result saving related configurations in config. py - * Improved debugging experience * : In debugging mode, intermediate result save path information will be displayed - * Document Update * *: Updated README and related documents, added instructions for using the intermediate result saving function

📖 Documentation Updates

-Updated environment variable configuration example file env. example ' -Detailed explanation of LLM intermediate result saving function added in the README -Deleted non-existent document references to maintain the accuracy of document links

v0.1.0 (2025-7-14)

🚀 Major Updates

Complete LLM Augmentor Refactoring: Transformed from complex JSON structure analysis to direct DSL code output, significantly simplifying the processing pipeline
Critical Error Fix: Resolved the "Expecting value: line 1 column 1 (char 0)" error in LLM response parsing
Direct Natural Language to DSL Conversion: Implemented complete natural language content detection and conversion workflow

✨ New Features

Intelligent Content Detection: Automatically identifies natural language content that requires LLM enhancement
Multi-LLM Provider Support: Enhanced integration with DashScope (Alibaba Cloud) and OpenAI APIs
Response Cleaning Mechanism: Added Markdown code block cleaning and JSON extraction functionality
Usage Examples and Documentation: Added example_llm_usage.py comprehensive usage guide

🛠️ Improvements

Enhanced Error Handling: Added response validation, fallback mechanisms, and detailed error information
Code Extraction Logic: Implemented algorithm for accurately extracting DSL code from LLM responses
Re-parsing Workflow: Generated DSL code is reprocessed through the complete compiler pipeline
Configuration Validation: Strengthened LLM configuration validation and error handling

🐛 Bug Fixes

Fixed JSON parsing failures causing compilation interruption
Resolved null value handling issues in natural language detection logic
Fixed parsing errors caused by inconsistent LLM response formats
Improved node handling logic in AST structure conversion

📖 Documentation Updates

Updated LLM integration usage instructions and configuration examples
Added complete configuration guides for DashScope and OpenAI
Provided practical examples of natural language to DSL conversion
Enhanced troubleshooting and debugging guides

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
doc		doc
dsl_compiler		dsl_compiler
example		example
intermediate_results		intermediate_results
sop_test_cases		sop_test_cases
.DS_Store		.DS_Store
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
example_llm_usage.py		example_llm_usage.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test_input.json		test_input.json
test_setup.py		test_setup.py
uv.lock		uv.lock

License

otoTree/human-text

Folders and files

Latest commit

History

Repository files navigation

Human-Text DSL Compiler

Features

Quick Start

Prerequisites

Installation

国内镜像源配置

Basic Usage

Python Library

Command Line Interface

Syntax Guide

Basic Directives

Task Definition

Variable Declaration

Tool Definition

Agent Invocation

Flow Control

Advanced Features

Conditional Statements

Structured Output Example

Configuration

Environment Variables

Configuration Options

LLM Integration

DashScope (Alibaba Cloud)

OpenAI

Architecture

Components

Output Formats

YAML (Default)

JSON

Protocol Buffers

Development

Project Structure

Running Tests

Code Quality

Error Handling

Performance Features

Troubleshooting

Common Issues

Debug Mode

Contributing

📚 Documentation

License

Changelog

v0.1.1 (2025-07-14)

✨ New Features

🛠️ Improvements

📖 Documentation Updates

v0.1.0 (2025-7-14)

🚀 Major Updates

✨ New Features

🛠️ Improvements

🐛 Bug Fixes

📖 Documentation Updates

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages