Liquid AI · 24B (2.3B active) · Mixture of Experts
Hybrid MoE with convolution+attention layers — 2.3B active
16.5K downloads
275 likes
2025-11 32K context
Use Cases
chat edge rag
Mixture of Experts
Total experts: 64
Active experts: 4
Active params: 2.3B
| Quant | Bits | VRAM | Quality | Status |
|---|---|---|---|---|
| Q2_K | 2 | 8.2 GB | low | — |
| Q3_K_M | 3 | 11.3 GB | moderate | — |
| Q4_K_M | 4 | 12.8 GB | good | — |
| Q5_K_M | 5 | 15.9 GB | good | — |
| Q6_K | 6 | 18.9 GB | excellent | — |
| Q8_0 | 8 | 25.1 GB | excellent | — |
| F16 | 16 | 49.7 GB | lossless | — |
About this model
LFM2 is a family of hybrid models designed for on-device deployment. LFM2-24B-A2B is the largest model in the family, scaling the architecture to 24 billion parameters while keeping inference efficient.
- Best-in-class efficiency: A 24B MoE model with only 2B active parameters per token, fitting in 32 GB of RAM for deployment on consumer laptops and desktops.
- Fast edge inference: 112 tok/s decode on AMD CPU, 293 tok/s on H100. Fits in 32B GB of RAM.
- Predictable scaling: Quality improves log-linearly from 350M to 24B total parameters, confirming the LFM2 hybrid architecture scales reliably across nearly two orders of magnitude.