MiniTorch-OCaml
Built a PyTorch-inspired autodiff engine in OCaml — graph-based tensor ops and reverse-mode differentiation through explicit graph traversal.
- Role
- Solo builder
- Timeline
- Spring 2026
- Stack
- OCaml, Autodiff
- Outcome
- Reverse-mode AD with numerically verified gradients
System breakdown
Frameworks hide the graph
Problem
Calling torch.backward() teaches the API, not the mechanism. I implemented the graph, reverse-mode gradients, and training loop in OCaml with strong types.
What it handles
- Graph nodes with value + grad
- Numerical gradient checking
- Training loops that actually decrease loss
Tools used
Subsystems
Technical deep dive
Reverse-mode AD over custom tensor ops: forward pass builds the tape, backward pass propagates gradients to parameters.
What it handles
- Tensor ops → graph nodes
- Reverse traversal (backprop)
- Model layer + shape checks
Tools used
Gradient verification
Build process
Numerical gradcheck validates every op before stacking layers — the fastest way to catch subtle autodiff bugs.
What it handles
- Gradcheck harness per op
- Loss curves over training steps
- Small networks to prove learning
Tools used
Training & gradcheck
Verified learning
Gradient checks against finite differences, then training loops that show loss decreasing — proof the engine learns, not just runs.
What it handles
- Numeric vs analytic gradients
- Loss curve monitoring
- Small network training
Tools used
Results & lessons
Results
Demonstrated learning
- Gradient checking passes on core ops
- Training loss decreases over iterations
- Model summaries and prediction outputs match expectations
Lessons Learned
Types help, math still wins
The hard part was translating calculus into abstractions that compose. OCaml caught structural bugs early; numerical gradient checks caught the subtle ones.
- Build the graph IR before the nn.Module ergonomics
- Every new op needs a gradient test, not just a forward test
Gallery