Our Flagship Project
The Atom Project
A Key-Value Cache compression framework that lets large-scale AI models run efficiently on consumer hardware — with negligible precision loss.
What it is
Big-model performance on retail hardware
The Atom Project is an infrastructure framework built around compression of the Transformers Key-Value Cache. It unlocks two things at once: converting today's models into a smaller, faster form, and training new models more efficiently from the ground up.
- Convert existing models into the Atom format
- Train new models faster and more affordably
- Runs across the RTX 30, 40, and 50 series — not just the 3090, 4090, or 5090
The Benchmarks
Single-Layer, Multi-Token Inference
Measured on an NVIDIA RTX 4070 Super (12 GB) · Model: DeepSeek-R1 1.5B
| Metric | Standard | Converted to Atom | Result |
|---|---|---|---|
| VRAM Usage | 3.89 GB | 674.89 MB | ≈ 5.8× less |
| Inference Time | 0.003128 s | 0.000489 s | ≈ 6.4× faster |
| Cosine Variance (accuracy) | baseline | 0.0000 | no loss |
And this is just the beginning.
Our Outlook for Atom
Scaling up, quickly
We are actively rolling out support for larger parameter models, with new benchmarks to follow. What you see here is the earliest stage of what Atom can do.
“Our goal with Atom is to be the first step toward AGI — to accelerate the race to Artificial General Intelligence by making powerful models radically more efficient.”
For business inquiries regarding licensing of Atom
We are actively looking to license the Atom Project. Reach out to start the conversation.
Click here to get in touch