Version 27 Roadmap: "Amazing Grace" - Self-Hosting UA Compiler#1
Open
el-dockerr wants to merge 2 commits intomainfrom
Open
Version 27 Roadmap: "Amazing Grace" - Self-Hosting UA Compiler#1el-dockerr wants to merge 2 commits intomainfrom
el-dockerr wants to merge 2 commits intomainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Version 27 Roadmap: "Amazing Grace"
Self-Hosting UA Compiler
Goal: Compile the UA compiler using UA itself, eliminating the dependency on C compilers and achieving complete self-hosting autonomy.
Target Architectures for Self-Hosting:
Excluded Architectures:
Current State Assessment
What We Have (Version 26 "Awesome Ada")
Compiler Pipeline:
@IF_ARCH,@IF_SYS,@IMPORT)Instruction Set:
VAR/SET/GETvariable systemBUFFERmemory allocationLDSstring literal loadingLOADB/STOREBbyte-level memory accessStandard Libraries (all in UA):
std_io— Console I/O (print, read)std_string— String utilities (strlen, parse_int, to_string)std_math— Integer math (pow, factorial, max, abs)std_arrays— Byte array helpersstd_array— Fixed-size arraysstd_vector— Dynamic vectorsstd_iostream— File I/O (fopen, fread, fwrite, fclose)What We're Missing for Self-Hosting
Critical Missing Components:
malloc,realloc,free)-arch,-sys,-o,--runflagsPhase-by-Phase Self-Hosting Roadmap
Phase 1: Foundation — Dynamic Memory & Core Data Structures
Objective: Establish the fundamental building blocks required by any compiler.
Task 1.1: Dynamic Memory Allocator (
std_malloc)malloc(size)— Allocate N bytes from heapmmap(Linux/macOS, syscall 9 on x86-64) orVirtualAlloc(Win32)free(ptr)— Release allocated memoryrealloc(ptr, new_size)— Resize allocationcalloc(count, size)— Zero-initialized allocationmalloc+ zero-fill loopDependencies: Core MVIS opcodes (
LDI,MOV,ADD,SUB,CMP,JZ,STORE,LOAD,SYS)Estimated Complexity: Medium (200-300 lines UA)
Target Architectures: x86-64, x86-32, ARM64, RISC-V
Task 1.2: Hash Table (
std_hashtable)ht_create(bucket_count)— Allocate and initialize tableht_hash(key_string)— Hash function (DJB2 or FNV-1a)ht_insert(table, key, value)— Insert or update entryht_lookup(table, key)— Return value or 0 if not foundht_delete(table, key)— Remove entryht_destroy(table)— Free all memoryDependencies:
std_malloc,std_string(strlen, string comparison)Estimated Complexity: Medium-High (300-500 lines UA)
Critical for: Parser, precompiler import tracking
Task 1.3: Dynamic Array / Vector (
std_dynarray)std_vectorto support:da_create(initial_capacity, element_size)— Allocate vectorda_push(array, element_ptr)— Append element, resize if neededda_get(array, index)— Return pointer to element at indexda_set(array, index, element_ptr)— Overwrite elementda_size(array)— Return current element countda_capacity(array)— Return allocated capacityda_destroy(array)— Free memoryDependencies:
std_mallocEstimated Complexity: Low-Medium (150-250 lines UA)
Phase 2: Advanced String & I/O Libraries
Objective: Provide robust string processing and file I/O required for compiler operations.
Task 2.1: Advanced String Operations (
std_stringextensions)strcmp(str1, str2)— Compare strings, return -1/0/+1strcasecmp(str1, str2)— Case-insensitive comparisonstrcat(dest, src)— Concatenate (assumes dest has space)strcpy(dest, src)— Copy stringstrncpy(dest, src, n)— Copy up to N charactersstrstr(haystack, needle)— Find substringstrchr(str, char)— Find first occurrence of charstrdup(str)— Allocate and copy string (uses malloc)trim_whitespace(str)— Remove leading/trailing spaces, tabs, newlinessplit_string(str, delimiter)— Split into array of substringsDependencies:
std_mallocEstimated Complexity: Medium (200-300 lines UA)
Critical for: Lexer, precompiler, argument parsing
Task 2.2: Formatted String Output (
std_format)sprintf(buffer, format_string, ...)— Basic printf-style formatting%d(decimal),%x(hex),%s(string),%c(char),%%(literal %)format_error(line, col, message)— Generate compiler error message"Error at line {line}, column {col}: {message}\n"Dependencies:
std_string, numeric conversion utilitiesEstimated Complexity: Medium (200-300 lines UA)
Critical for: Error reporting, diagnostics, output generation
Task 2.3: Buffered File I/O (
std_bufio)buf_fopen(path, mode)— Open file with internal read/write bufferstd_iostream.fopenbuf_fread_line(fd)— Read one line (up to newline), return pointerbuf_fread_all(fd)— Read entire file into dynamically-allocated bufferbuf_fwrite(fd, data, size)— Buffered writebuf_fflush(fd)— Force buffer write to diskbuf_fclose(fd)— Flush and closeDependencies:
std_iostream,std_mallocEstimated Complexity: Medium (250-400 lines UA)
Phase 3: Compiler Infrastructure — Lexer in UA
Objective: Rewrite the lexer (tokenizer) entirely in UA.
Task 3.1: Token Structure & Token Array Management
token_create(type, text, value, line, col)— Allocate tokentoken_array_create()— Create dynamic token arraytoken_array_push(arr, token)— Append token to arrayDependencies:
std_malloc,std_dynarrayEstimated Complexity: Low (100-150 lines UA)
Task 3.2: Character Classification & Scanning
is_digit(char)— Returns 1 if '0'-'9'is_alpha(char)— Returns 1 if 'a'-'z', 'A'-'Z'is_alnum(char)— Returns 1 if alpha or digitis_whitespace(char)— Returns 1 if space, tab, CRis_hex_digit(char)— Returns 1 if '0'-'9', 'a'-'f', 'A'-'F'to_upper(char)— Convert to uppercaseto_lower(char)— Convert to lowercaseDependencies: None (pure MVIS logic)
Estimated Complexity: Low (50-100 lines UA)
Task 3.3: Lexer Core — Tokenization Logic
tokenize(source_code_ptr, source_length)— Main entry point;to end-of-line)R0-R15)0xhex,0bbinary)"..."with escape sequences),,:,(,),#):suffix detection)tests/hello.ua, validate token sequencetests/calc.ua, validate numeric literalsDependencies:
token_array,std_hashtable,std_string,std_bufioEstimated Complexity: High (600-800 lines UA)
Critical Success Factor: Lexer must match C version's behavior exactly
Phase 4: Compiler Infrastructure — Parser in UA
Objective: Rewrite the parser to generate IR from tokens, entirely in UA.
Task 4.1: Instruction Structure & IR Array Management
instr_create_label(name, line, col)— Create label IR entryinstr_create_opcode(opcode, operands, count, line, col)— Create instructionir_array_create()— Dynamic IR instruction arrayir_array_push(arr, instr)— Append instruction to IRDependencies:
std_malloc,std_dynarrayEstimated Complexity: Low (100-150 lines UA)
Task 4.2: Operand Parsing
parse_operand(token)— Convert token to Operand structDependencies: Token structures
Estimated Complexity: Low (100-150 lines UA)
Task 4.3: Parser Core — IR Generation
parse(token_array)— Main entry point, returns IR arrayMOV Rd, Rs→ register, register)OPERAND_LABEL_REFto addresseslabel(param1, param2):syntaxtests/test_func.ua, validate function parameter capturetests/test_jl_simple.ua, validate forward label referencesDependencies:
ir_array,parse_operand,std_hashtable,std_stringEstimated Complexity: High (700-1000 lines UA)
Critical Success Factor: IR must be identical to C parser output
Phase 5: Compiler Infrastructure — Precompiler in UA
Objective: Rewrite precompiler (directive processing, file inclusion) in UA.
Task 5.1: Directive Detection & Parsing
is_directive(line)— Returns 1 if line starts with@parse_directive(line)— Extract directive name and arguments@IF_ARCH x86→ directive=IF_ARCH, arg=x86@IMPORT "lib/std_io.ua"→ directive=IMPORT, arg=lib/std_io.ua@DEFINE FOO 42→ directive=DEFINE, name=FOO, value=42tests/test_precompiler.uaDependencies:
std_stringEstimated Complexity: Low (100-150 lines UA)
Task 5.2: Conditional Compilation Logic
total_depth— nesting level counteractive_depth— how many levels have satisfied conditions@IF_ARCH/@IF_SYSevaluation:-arch/-sys@ENDIFhandlingactive_depth == total_depthDependencies:
strcmp,strcasecmpEstimated Complexity: Medium (200-300 lines UA)
Task 5.3: File Import & Macro Expansion
@IMPORThandling:@DEFINEmacro table:@ARCH_ONLY/@SYS_ONLYguards:@DUMMYstub diagnosticDependencies:
std_bufio,std_hashtable,std_string, path utilitiesEstimated Complexity: High (500-700 lines UA)
Critical Success Factor: Must handle recursive imports and namespace prefixing identically to C version
Phase 6: Backend Code Generation in UA (Single Architecture)
Objective: Rewrite one backend (x86-64) entirely in UA as proof-of-concept.
Task 6.1: Code Buffer Management (
std_codebuffer)cb_create(initial_capacity)— Allocate code buffer (e.g., 4 KB)cb_emit_byte(buf, byte)— Append byte, resize if neededcb_emit_word(buf, word)— Append 16-bit value (little-endian)cb_emit_dword(buf, dword)— Append 32-bit valuecb_emit_qword(buf, qword)— Append 64-bit valuecb_size(buf)— Return current sizecb_get_bytes(buf)— Return pointer to byte arraycb_destroy(buf)— Free memoryDependencies:
std_mallocEstimated Complexity: Low (150-200 lines UA)
Task 6.2: x86-64 Backend — Register Mapping & Encoding
48 89 C8Dependencies: None (pure logic)
Estimated Complexity: Medium (200-300 lines UA)
Task 6.3: x86-64 Backend — Instruction Encoding
MOV Rd, Rs→MOV r64, r64LDI Rd, imm→MOV r64, imm64ADD Rd, Rs→ADD r64, r64SUB Rd, Rs→SUB r64, r64MUL Rd, Rs→IMUL r64, r64DIV→ complex (needs RDX:RAX setup, IDIV)LOAD Rd, Rs→MOV r64, [r64]STORE Rs, Rd→MOV [r64], r64LOADB Rd, Rs→MOVZX r64, BYTE PTR [r64]STOREB Rs, Rd→MOV BYTE PTR [r64], r8JMP label→JMP rel32(two-pass: compute offset)JZ label→JE rel32JNZ label→JNE rel32JL label→JL rel32JG label→JG rel32CALL label→CALL rel32RET→RETPUSH Rs→PUSH r64POP Rd→POP r64CMP Ra, Rb→CMP r64, r64AND,OR,XOR,NOT,SHL,SHRINC,DECINT #imm→INT imm8SYS→SYSCALLNOP→NOP(0x90)HLT→HLT(0xF4)LDS Rd, "string"→ allocate string in .rodata,LEA r64, [RIP+offset]VAR name, init→ allocate in .data sectionSET name, Rs→MOV [name], r64GET Rd, name→MOV r64, [name]BUFFER name, size→ allocate N bytes in .bss.text/.data/.bss/.rodatasection trackingtests/calc.uaDependencies:
cb_emit_*, label symbol table, IR structuresEstimated Complexity: Very High (1500-2500 lines UA)
Critical Success Factor: Generated x86-64 code must be byte-identical to C backend
Phase 7: Output Emitters in UA
Objective: Generate PE/ELF/Mach-O executable files in UA.
Task 7.1: ELF Emitter for Linux x86-64
emit_elf_executable(code_buf, output_path, entry_point, arch):.textsection (code bytes).datasection (initialized variables).rodatasection (string literals)tests/hello.uareadelf -h outputDependencies:
std_bufio,std_codebuffer, binary struct packingEstimated Complexity: High (600-900 lines UA)
Reference: Current C implementation in
emitter_elf.cTask 7.2: PE Emitter for Windows x86-64
.text,.data,.idatafor imports)emit_pe_executable(code_buf, output_path, entry_point):tests/hello.uatargeting Win32dumpbin /headers output.exe(orobjdumpon Linux)Dependencies:
std_bufio,std_codebufferEstimated Complexity: High (700-1100 lines UA)
Reference: Current C implementation in
emitter_pe.cTask 7.3: Mach-O Emitter for macOS ARM64/x86-64
mach_header_64)LC_SEGMENT_64,LC_MAIN,LC_DYSYMTAB)__TEXT,__DATAsegments)emit_macho_executable(code_buf, output_path, entry_point, arch):CPU_TYPE_ARM64,CPU_TYPE_X86_64)tests/hello.uaon macOS ARM64otool -h outputDependencies:
std_bufio,std_codebufferEstimated Complexity: High (600-900 lines UA)
Reference: Current C implementation in
emitter_macho.cPhase 8: Integration & Command-Line Driver
Objective: Build the UA compiler's main driver that ties all stages together.
Task 8.1: Command-Line Argument Parser (
std_args)args_parse(argc, argv)— Parse arguments into Config struct:<input.ua>— positional argument-arch <arch>— mandatory-o <output>— optional-sys <system>— optional--run— boolean flag-v,--version— print version and exitDependencies:
std_string,split_stringEstimated Complexity: Medium (200-300 lines UA)
Task 8.2: Compiler Pipeline Integration (
main.ua)main(argc, argv):source_codestringpreprocessed_codetoken_arrayir_arraygenerate_x86_64(ir_array)→ CodeBuffergenerate_x86_32(ir_array)→ CodeBuffergenerate_arm64(ir_array)→ CodeBuffergenerate_riscv(ir_array)→ CodeBufferemit_elf_executable(code_buf, output_path)emit_pe_executable(code_buf, output_path)emit_macho_executable(code_buf, output_path)tests/hello.uawith UA compiler (self-hosted!)tests/calc.ua, verify output matches C compiler's outputDependencies: All previous components
Estimated Complexity: Medium (300-500 lines UA)
Milestone: When this works, UA is self-hosting for one architecture (x86-64)!
Phase 9: Multi-Architecture Support
Objective: Extend self-hosting to all target architectures (x86-32, ARM64, RISC-V).
Task 9.1: x86-32 Backend in UA
backend_x86_32.clogic to UAINT 0x80(Linux) or kernel32 dispatcher (Win32)tests/hello.uaas 32-bit ELFDependencies: x86-64 backend in UA (leveraging similar logic)
Estimated Complexity: High (1200-1800 lines UA)
Task 9.2: ARM64 Backend in UA
backend_arm64.clogic to UASVC #0tests/hello.uaon ARM64 LinuxDependencies: ARM architecture knowledge
Estimated Complexity: Very High (1500-2200 lines UA)
Task 9.3: RISC-V Backend in UA
backend_risc_v.clogic to UAECALLtests/hello.uaon RISC-V LinuxDependencies: RISC-V architecture knowledge
Estimated Complexity: Very High (1300-2000 lines UA)
Phase 10: Testing, Validation & Documentation
Objective: Ensure the self-hosted compiler is production-ready.
Task 10.1: Comprehensive Test Suite
.uafile intests/with both C compiler and UA compiler.uasource compiled by C version and UA version → identical binary outputDependencies: Full self-hosted compiler
Estimated Complexity: High (testing infrastructure + test cases)
Task 10.2: Bootstrap Process Documentation
ua_cua_c→ua_ua_gen1ua_ua_gen1→ua_ua_gen2ua_ua_gen1==ua_ua_gen2, bootstrap complete ✅bootstrap.shscript:Task 10.3: Self-Hosting User Guide
README.md:ua ua.ua -arch x86 -sys linux -o uadocs/self-hosting.md:src_ua/directory layout)docs/stdlib-reference.md:std_malloc,std_hashtable,std_dynarray,std_format,std_bufio,std_codebuffer,std_argsPhase 11: Performance Optimization & Refinement (Post-Self-Hosting)
Objective: Improve compiler speed and output quality.
Task 11.1: Optimized Memory Allocator
free()Task 11.2: Register Allocation Optimization
Task 11.3: Peephole Optimization
MOV RAX, RAX)Task 11.4: Parallel Compilation Support
.uafiles in parallel)Release Checklist for Version 27
Pre-Release:
tests/*.uaprograms compile and run correctly with self-hosted compilerREADME.md,docs/self-hosting.md,docs/stdlib-reference.md)Release Artifacts:
uabinary (x86-64 Linux)ua.exebinary (x86-64 Windows)uabinary (ARM64 macOS)main.ua+ allstd_*.ualibraries + backend UA filestests/directorydocs/directoryAnnouncement:
Risk Assessment
Estimated Effort
Note: Estimates assume one experienced developer working full-time. Parallelizable work (e.g., multiple backends, testing) can reduce calendar time.
Success Metrics for Version 27
tests/*.uaprograms compile and execute correctly.Post-Version 27 Roadmap (Future Work)
Version 28 "Brilliant Babbage" (Optimization & Usability):
ua install <package>)Version 29 "Clever Curry" (Concurrency & Parallelism):
Version 30 "Daring Dijkstra" (Tooling Ecosystem):
ua-objdump)Closing Statement
Version 27 "Amazing Grace" represents the ultimate validation of the UA project: a compiler that needs no external dependencies, no C toolchain, no complex build system. Just UA compiling UA, creating a perfectly autonomous, self-sustaining ecosystem.
When complete, a developer can take a single UA binary and a text editor to any supported platform, modify the compiler's source code in UA assembly, and rebuild the compiler using itself. This is the Ouroboros realized — a language that creates itself, cycles through itself, and emerges stronger with each generation.
Let us honor Ada Lovelace's vision of an analytical engine capable of self-directed computation. Let UA compile itself. and hand it over to the most awesome woman Grace Hopper.
Document Version: 1.0
Created: March 1, 2026
Target Release: Q4 2026