A native C inference engine for BitNet b1.58 targeting the RP2350 with SD card weight storage. The model — 24 layers, dim 1536, 16 heads, FFN 4096, vocab 32K — fits in a 194 MB SD card image and runs end-to-end in pure C with no runtime dependencies.
What works
- SafeTensors reader + HF → binary converter
- SentencePiece BPE tokenizer
- 1.58-bit ternary mat-vec (scalar + AVX2)
- Full transformer forward pass — RoPE, GQA, gated FFN, RMSNorm
- Argmax + top-K sampler
- Raw SD card image packer — sector-aligned, DMA-friendly
SD card layout
Sector 0 holds the header. Vocab at sector 16. Embeddings at sector 1732. 24 layers with stride 11,153 sectors each. Every matrix starts on a sector boundary — DMA-aligned reads never straddle. No filesystem. No indirection.
Embedding quantization
| fp32 | 319 MB — reference baseline |
| int8 + scale | 194 MB — identical quality ✓ |
| int4 + scale | 177 MB — LM head collapses ✗ |
AI doesn't require powerful machines. It requires coordinated ones.