Project Overview
Build the first native TOML 1.0 parser for the Mojo programming language ecosystem. This fills a critical infrastructure gap—Mojo projects extensively use TOML for configuration (pixi.toml, mojoproject.toml) but must rely on Python interop to parse these files. Creating a native parser benefits the entire community and enables pure-Mojo tooling.
Current State
No TOML parser exists in the Mojo ecosystem: - Confirmed via awesome-mojo, Modular stdlib, and modular-community recipes - Existing parsers: CSV, JSON, FASTQ - Current workaround: Python interop with tomli - Repository created: https://github.com/DataBooth/mojo-toml
Available reference implementation: - mojo-dotenv provides proven parser architecture (character-by-character parsing, escape handling, string processing) - Can validate against Python tomli library for correctness
Technical Design
Architecture
Follow proven mojo-dotenv pattern with three-layer architecture:
Layer 1: Lexer (lexer.mojo) - Tokenise TOML input into stream of tokens - Token types: KEY, VALUE, TABLE, ARRAY, STRING, INTEGER, FLOAT, BOOLEAN, DATETIME, WHITESPACE, COMMENT, NEWLINE, EQUALS, BRACKET, BRACE, COMMA, DOT - Handle escape sequences in strings - Track line/column positions for error messages
Layer 2: Parser (parser.mojo) - Convert token stream into structured data - Build nested Dict[String, Value] structure - Validate TOML 1.0 syntax rules - Handle dotted keys (a.b.c = value) - Process table headers [section] and array tables [[array]] - Detect duplicate keys (TOML spec requirement)
Layer 3: Public API (__init__.mojo) - parse(content: String) -> Dict[String, Value] - Parse TOML string - load(path: Path) -> Dict[String, Value] - Parse TOML file - loads(content: String) -> Dict[String, Value] - Alias for parse - TomlError - Custom error type with line/column info
Data Structure
TOML values map to Mojo types: - String → String - Integer → Int - Float → Float64 - Boolean → Bool - Datetime → String (ISO 8601 format, v0.1.0) - Array → List[Value] - Table → Dict[String, Value]
Use variant/union type for Value:
struct Value:
var data: Variant[String, Int, Float64, Bool, List[Value], Dict[String, Value]]
TOML 1.0 Features (v0.1.0 Scope)
Core syntax: - [x] Key-value pairs: key = "value" - [x] Comments: # comment - [x] Basic strings: "hello" - [x] Literal strings: 'raw string' - [x] Multiline basic strings: """text""" - [x] Multiline literal strings: '''text''' - [x] Integers: 42, +17, -5, 1_000 - [x] Floats: 3.14, 1e10, 6.022e23 - [x] Booleans: true, false - [x] Datetime: 1979-05-27T07:32:00Z (parse as string v0.1.0) - [x] Arrays: [1, 2, 3] - [x] Inline tables: {name = "value"} - [x] Tables: [section] - [x] Array of tables: [[array]] - [x] Dotted keys: a.b.c = "value" - [x] Duplicate key detection
Out of scope for v0.1.0: - Native datetime parsing (return ISO 8601 string) - Writer/serialiser functionality - Schema validation - Custom formatting
Testing Strategy
Validation Approach
Use Python tomli as reference implementation:
from python import Python
fn test_parser() raises:
let content = Path("test.toml").read_text()
# Mojo implementation
let mojo_result = parse(content)
# Python reference
let py = Python.import_module("tomli")
let py_result = py.loads(content)
# Compare results
assert_equal(mojo_result, py_result)
Test Suite Structure
tests/test_basic.mojo- Basic key-value pairstests/test_strings.mojo- All string types and escapestests/test_numbers.mojo- Integer and float parsingtests/test_arrays.mojo- Array syntax and nestingtests/test_tables.mojo- Table headers and nestingtests/test_inline.mojo- Inline tablestests/test_datetime.mojo- Datetime formatstests/test_errors.mojo- Invalid TOML detectiontests/test_tomli_compat.mojo- Validate against tomlitests/test_spec.mojo- TOML 1.0 spec compliance
Test Files
Use real-world TOML files: - fixtures/pixi.toml - Mojo project config - fixtures/mojoproject.toml - Mojo package config - fixtures/cargo.toml - Rust project (stress test) - fixtures/pyproject.toml - Python project - fixtures/spec_examples.toml - TOML spec examples
Implementation Phases
Phase 1: Foundation (Week 1)
Goal: Project structure + basic lexer
Tasks: - Initialise pixi project with dependencies - Create src/toml/ module structure - Implement lexer with basic token types - Write lexer tests - Set up CI/CD with GitHub Actions
Deliverables: - Tokeniser handles basic TOML syntax - 20+ lexer tests passing - README with project vision
Phase 2: Core Parser (Week 2)
Goal: Parse basic TOML to Dict
Tasks: - Implement parser for key-value pairs - Handle strings (basic, literal, multiline) - Parse numbers (int, float) - Parse booleans and arrays - Implement dotted key expansion
Deliverables: - Parse simple TOML files - 30+ parser tests passing - Validate against tomli
Phase 3: Tables & Nesting (Week 3)
Goal: Complete TOML 1.0 compliance
Tasks: - Implement table headers [section] - Handle array of tables [[array]] - Detect duplicate keys - Parse inline tables - Handle datetime strings - Complete error messages
Deliverables: - Full TOML 1.0 spec compliance - 50+ tests passing - Parse pixi.toml and mojoproject.toml
Phase 4: Polish & Documentation (Week 4)
Goal: Production-ready release
Tasks: - Performance optimisation - Comprehensive error messages - API documentation - Usage examples - Blog post: “Building Mojo’s First TOML Parser” - Submit to awesome-mojo - Create modular-community recipe
Deliverables: - v0.1.0 release - Complete documentation - Package published
Distribution Plan
Phase 1: GitHub Repository
- Source code on DataBooth/mojo-toml
- Release v0.1.0 with .mojopkg binary
- Installation via git submodule or direct copy
Phase 2: Community Listing
- Submit to awesome-mojo curated list
- Create recipe for modular-community channel
- Users install via:
pixi add mojo-toml
Phase 3: Ecosystem Integration
- Blog post series
- Demo projects using mojo-toml
- Integration examples with Mojo tooling
Success Criteria
- ✅ Parse all valid TOML 1.0 files
- ✅ Reject all invalid TOML (per spec)
- ✅ 100% pass rate against tomli test suite
- ✅ Parse pixi.toml and mojoproject.toml correctly
- ✅ Clear error messages with line/column info
- ✅ Published to modular-community
- ✅ Adopted by 3+ projects in community
Future Roadmap
v0.2.0 - Writer
- Serialise Dict to TOML string
- Pretty printing options
- Round-trip fidelity (parse → write → parse)
v0.3.0 - Performance
- SIMD optimisations for parsing
- Benchmarks vs Python tomli
- Memory efficiency improvements
v0.4.0 - Advanced Features
- Schema validation
- Custom datetime parsing
- Streaming parser for large files
- Error recovery mode
References
- TOML 1.0 Spec: https://toml.io/en/v1.0.0
- Python tomli: https://github.com/hukkin/tomli
- mojo-dotenv architecture: https://github.com/databooth/mojo-dotenv
- Rust toml crate: https://docs.rs/toml/latest/toml/
- Go toml library: https://github.com/BurntSushi/toml