Why
How we detect your hardware, score models, and turn Hugging Face metadata into local run estimates.
How it works
canirunaimodel runs mostly in your browser. We detect your GPU, CPU, and memory, then combine that with model metadata to estimate whether something will run locally and how comfortably. The newer Hugging Face flow starts from a public repo URL, fetches repo metadata, and folds that into the same scoring pipeline.
IMPORTANT All results are estimates. Browser APIs give limited hardware information, and public model repos are often incomplete or inconsistent. Use the verdicts as a guide, not a guarantee.
Read the HF repo
For Hugging Face checks, we parse the repo id, call the public API, and inspect whatever signals the repo exposes: tags, architecture, safetensors metadata, checkpoint filenames, storage totals, and adapter hints.
// Fetch model stats from Hugging Face API const repoId = "google/gemma-3-4b-it"; const res = await fetch( `https://huggingface.co/api/models/${repoId}` ); const data = await res.json();
Hardware detection
We use browser APIs to fingerprint your hardware. No extensions or installs needed.
WebGL - GPU identification
We create a hidden WebGL canvas and query the renderer string to get the GPU name and vendor.
// Get GPU renderer string const canvas = document.createElement("canvas"); const gl = canvas.getContext("webgl2"); const ext = gl.getExtension("WEBGL_debug_renderer_info"); const gpu = gl.getParameter(ext.UNMASKED_RENDERER_WEBGL);
WebGPU - Architecture info
If the browser supports WebGPU, we request an adapter to get extra device and architecture details.
const adapter = await navigator.gpu.requestAdapter(); const info = adapter.info;
Navigator - CPU and RAM
We use browser hints like CPU core count and approximate device memory, plus a small runtime benchmark where needed.
const cores = navigator.hardwareConcurrency; const ram = navigator.deviceMemory;
GPU database
Once we identify your GPU, we look it up in a built-in database of GPUs and Apple Silicon chips. Each entry contains the VRAM capacity and memory bandwidth, the two numbers that matter most for running models locally.
// Example entries from the GPU database const GPU_DB = { "RTX 4090": { vram: 24, bw: 1008 }, "RTX 4060": { vram: 8, bw: 272 }, };
VRAM requirements
Each model has estimated memory requirements per format or quantization. The basic idea is simple: more parameters and heavier dtypes mean more memory.
VRAM (GB) = Parameters x Bits per weight / 8 + Overhead
// Adds overhead for cache + runtime const RUNTIME_OVERHEAD = 0.5; function makeQuants(paramsBillions) { return [ { name: "Q4_K_M", vram: paramsBillions * 0.55 + RUNTIME_OVERHEAD }, { name: "F16", vram: paramsBillions * 1.85 + RUNTIME_OVERHEAD }, ]; }
Scoring algorithm
The final score combines speed, memory headroom, and a small quality bonus to answer a practical question: how well will this run on your machine?
Speed score
Estimated tokens per second based on your memory bandwidth and the model footprint:
const efficiency = isAppleSilicon ? 0.65 : 0.70; const toksPerSec = (bandwidthGBs / modelVRAM) * efficiency;
Memory headroom
How much of your available memory the model consumes:
const memPct = (modelVRAM / totalMemory) * 100;
Quality bonus
A small bonus for larger models, capped so it never dominates the score:
const qualityBonus = Math.min(15, Math.log2(paramsBillions + 1) * 2.5);
Grade scale
The score maps to a verdict:
| Status | Meaning |
|---|---|
| Runs great | Fast inference, healthy memory headroom |
| Tight fit | May run, but with little room to spare |
| Too heavy | Likely exceeds what the setup can handle comfortably |
Fit classification
Before scoring, we classify the model based on whether it fits comfortably, barely fits, or clearly exceeds available memory.
Tokens/s estimation
We estimate inference speed using a bandwidth-bound model. Inference is often limited more by reading weights from memory than by raw compute.
tok/s ~= Memory bandwidth / Model VRAM x Efficiency
Data sources
Model information is gathered from multiple sources and curated or inferred carefully:
Apple Silicon: unified memory
Apple Silicon Macs share memory between CPU and GPU. This means the model can use a single unified pool instead of separate VRAM, which is why some Macs can run models that would not fit on a small discrete GPU.
Privacy and transparency
Hardware detection happens in your browser. We only fetch public model metadata needed for the estimate. The verdict is still an estimate because browser APIs and public repo metadata are both imperfect.