← Back to Projects
SYSTEMS PROJECT CASE STUDY

C Compiler

Built an OCaml-based C compiler lowering source through lexer → AST → semantic analysis → TACKY IR → x86-64 assembly, with lexical scoping, control flow, and structured IR lowering.

OCamlCompilersx86-64IR
Role
Solo builder
Timeline
Spring 2026
Stack
OCaml, Recursive Descent
Outcome
Compiles nested unary expressions (~(-2)) to correct exit codes
Pipeline preview

System breakdown

01FRAMEWORKS HIDE THE MACHINE

Frameworks hide the machine

Problem

Calling clang is not understanding compilation. I built a minimal pipeline to internalize how source becomes stack frames, temporaries, and real x86-64 instructions.

What it handles

  • Lexer → parser → AST for nested unary ops
  • TACKY IR before codegen
  • Stack-backed lowering with fixups

Tools used

OCamlRecursive DescentTACKY IRx86-64
Frameworks hide the machine
02PIPELINE ARCHITECTURE

Pipeline architecture

Technical deep dive

TACKY IR flattens nested expressions into temporaries before instruction selection — the key abstraction between AST and AT&T assembly.

What it handles

  • Recursive descent: ~(-(~2))
  • TACKY temps → -4(%rbp) slots
  • movl mem,mem fixups via %r10d

Tools used

OCamlTACKY IRx86-64 AT&TMake
Pipeline architecture
03INCREMENTAL CORRECTNESS

Incremental correctness

Build process

Started with return 42, added unary ops, introduced TACKY when AST→asm became unmaintainable, then debugged stack slots on real hardware.

What it handles

  • return 42 → negation → complement
  • driver.ml end-to-end orchestration
  • Verified exit codes on Apple Silicon via x86_64 target

Tools used

OCamlMakeclangx86-64
Incremental correctness

Results & lessons

Results

Verified on metal

  • ./mycc tests/unary.c → out.s → clang -arch x86_64 → ./unary exits 1
  • Syntax errors caught at parse stage for malformed programs
  • Roadmap: binary ops, locals, control flow, function calls

Lessons Learned

IR is not optional

The moment expressions nest, you need an intermediate form. TACKY paid for itself immediately, flattening logic separated from instruction selection and stack layout.

  • Compiler engineering is mostly incremental constraint management
  • Test each stage in isolation before expanding language surface
  • Reading assembly output is the fastest debug loop
return ~(-2);
  → AST: Return(Complement(Negate(2)))
  → TACKY: tmp.0=-2; tmp.1=~tmp.0; return tmp.1
  → Asm: negl/notl on stack slots with r10d fixups