back

LFM2 24B

Liquid AI

Liquid AI · 24B (2.3B active) · Mixture of Experts

Hybrid MoE with convolution+attention layers — 2.3B active

16.5K downloads 275 likes 2025-11 32K context

Use Cases

chat edge rag

Mixture of Experts

Total experts: 64
Active experts: 4
Active params: 2.3B

Quantization Options

Quant Bits VRAM Quality Status
Q2_K 2 8.2 GB low
Q3_K_M 3 11.3 GB moderate
Q4_K_M 4 12.8 GB good
Q5_K_M 5 15.9 GB good
Q6_K 6 18.9 GB excellent
Q8_0 8 25.1 GB excellent
F16 16 49.7 GB lossless

About this model

image.png

LFM2 is a family of hybrid models designed for on-device deployment. LFM2-24B-A2B is the largest model in the family, scaling the architecture to 24 billion parameters while keeping inference efficient.

  • Best-in-class efficiency: A 24B MoE model with only 2B active parameters per token, fitting in 32 GB of RAM for deployment on consumer laptops and desktops.
  • Fast edge inference: 112 tok/s decode on AMD CPU, 293 tok/s on H100. Fits in 32B GB of RAM.
  • Predictable scaling: Quality improves log-linearly from 350M to 24B total parameters, confirming the LFM2 hybrid architecture scales reliably across nearly two orders of magnitude.

image.png