Cendol is a C11 compiler implemented in Rust. It is a learning project to understand the process of building a compiler from scratch, focusing on high-performance compiler architecture and comprehensive C11 standard compliance.
- Full C11 Preprocessor: Complete preprocessor with macro expansion, conditional compilation, file inclusion, and built-in macros (
__FILE__,__LINE__, etc.) - Lexer: Tokenization of C11 source code with proper handling of literals, keywords, and operators
- Parser: Comprehensive C11 syntax parsing using Pratt parsing for expressions and recursive descent for statements
- Semantic Analysis: Type checking, symbol resolution, and semantic validation
- Code Generation: Compiles to native object code using Cranelift backend
- Linker Integration: Automatic invocation of system linker (clang) to produce executables
- Rich Diagnostics: Error reporting with source location tracking
- No Trigraph Support: Trigraphs (three-character sequences like
??=,??<, etc.) are not supported for simplicity and modern C usage. - No Digraph Support: Digraphs (two-character sequences like
<:,:>,<%,%>,%:,%:%:) are not supported. This compiler targets modern C and does not implement legacy digraph tokens. - No K&R Function Declarations: Functions declared with empty parameter list (e.g.,
int foo()) are treated as having no parameters (equivalent toint foo(void)), not as unprototyped functions that accept any number of arguments. Calling such functions with arguments will result in a semantic error. This removes support for K&R (Kernighan & Ritchie) no-prototype semantics.
Cendol follows a traditional multi-phase compiler architecture optimized for performance:
- Preprocessing Phase: Transforms C source with macro expansion and includes
- Lexing Phase: Converts preprocessed tokens to lexical tokens
- Parsing Phase: Builds a flattened Abstract Syntax Tree (AST)
- Semantic Analysis Phase: Performs type checking and symbol resolution
- MIR Generation: Lowers AST to Mid-level Intermediate Representation
- Code Generation: Generates native machine code via Cranelift
- Linking: Links object files to create the final executable
- Rust 2024 edition or later
- Cargo
- Clang (used as the system linker)
To build the compiler, run:
cargo buildFor release build with optimizations:
cargo build --releaseTo compile a C file to an executable:
cargo run -- -o <output_file> <input_file>-E: Preprocess only, output preprocessed source to stdout-P: Suppress line markers in preprocessor output-C: Retain comments in preprocessor output-I <path>: Add include search path-D <name>[=<value>]: Define preprocessor macro--verbose: Enable verbose diagnostic output
Preprocess a file:
cargo run -- -E test.cDefine macros and include paths:
cargo run -- -D DEBUG=1 -I /usr/include test.cComprehensive design documentation is available in the design-document/ directory:
- Main Architecture - Overall compiler design and goals
- Preprocessor Design - Preprocessing phase details
- Lexer Design - Tokenization strategy
- Parser Design - AST construction
- Semantic Analysis - Type checking and validation
This is a learning project, but contributions are welcome! Areas of interest include:
- Additional C11 language features
- Performance optimizations
- Testing and bug fixes
- Documentation improvements
This project is AI-friendly and welcomes contributions from developers using AI tools. We encourage the use of AI for code generation, debugging, and documentation to enhance productivity.
See LICENSE file for details.