Why

How we detect your hardware, score models, and turn Hugging Face metadata into local run estimates.

How it works

canirunaimodel runs mostly in your browser. We detect your GPU, CPU, and memory, then combine that with model metadata to estimate whether something will run locally and how comfortably. The newer Hugging Face flow starts from a public repo URL, fetches repo metadata, and folds that into the same scoring pipeline.

IMPORTANT All results are estimates. Browser APIs give limited hardware information, and public model repos are often incomplete or inconsistent. Use the verdicts as a guide, not a guarantee.

Detect

Match GPU

Score

Grade

Read the HF repo

For Hugging Face checks, we parse the repo id, call the public API, and inspect whatever signals the repo exposes: tags, architecture, safetensors metadata, checkpoint filenames, storage totals, and adapter hints.

// Fetch model stats from Hugging Face API
const repoId = "google/gemma-3-4b-it";
const res = await fetch(
  `https://huggingface.co/api/models/${repoId}`
);
const data = await res.json();

Hardware detection

We use browser APIs to fingerprint your hardware. No extensions or installs needed.

WebGL - GPU identification

We create a hidden WebGL canvas and query the renderer string to get the GPU name and vendor.

// Get GPU renderer string
const canvas = document.createElement("canvas");
const gl = canvas.getContext("webgl2");
const ext = gl.getExtension("WEBGL_debug_renderer_info");
const gpu = gl.getParameter(ext.UNMASKED_RENDERER_WEBGL);

WebGPU - Architecture info

If the browser supports WebGPU, we request an adapter to get extra device and architecture details.

const adapter = await navigator.gpu.requestAdapter();
const info = adapter.info;

Navigator - CPU and RAM

We use browser hints like CPU core count and approximate device memory, plus a small runtime benchmark where needed.

const cores = navigator.hardwareConcurrency;
const ram = navigator.deviceMemory;

GPU database

Once we identify your GPU, we look it up in a built-in database of GPUs and Apple Silicon chips. Each entry contains the VRAM capacity and memory bandwidth, the two numbers that matter most for running models locally.

// Example entries from the GPU database
const GPU_DB = {
  "RTX 4090": { vram: 24, bw: 1008 },
  "RTX 4060": { vram: 8, bw: 272 },
};

VRAM requirements

Each model has estimated memory requirements per format or quantization. The basic idea is simple: more parameters and heavier dtypes mean more memory.

VRAM (GB) = Parameters x Bits per weight / 8 + Overhead

// Adds overhead for cache + runtime
const RUNTIME_OVERHEAD = 0.5;

function makeQuants(paramsBillions) {
  return [
    { name: "Q4_K_M", vram: paramsBillions * 0.55 + RUNTIME_OVERHEAD },
    { name: "F16", vram: paramsBillions * 1.85 + RUNTIME_OVERHEAD },
  ];
}

Scoring algorithm

The final score combines speed, memory headroom, and a small quality bonus to answer a practical question: how well will this run on your machine?

Speed score

Estimated tokens per second based on your memory bandwidth and the model footprint:

const efficiency = isAppleSilicon ? 0.65 : 0.70;
const toksPerSec = (bandwidthGBs / modelVRAM) * efficiency;

Memory headroom

How much of your available memory the model consumes:

const memPct = (modelVRAM / totalMemory) * 100;

Quality bonus

A small bonus for larger models, capped so it never dominates the score:

const qualityBonus = Math.min(15, Math.log2(paramsBillions + 1) * 2.5);

Grade scale

The score maps to a verdict:

Status	Meaning
Runs great	Fast inference, healthy memory headroom
Tight fit	May run, but with little room to spare
Too heavy	Likely exceeds what the setup can handle comfortably

Fit classification

Before scoring, we classify the model based on whether it fits comfortably, barely fits, or clearly exceeds available memory.

Tokens/s estimation

We estimate inference speed using a bandwidth-bound model. Inference is often limited more by reading weights from memory than by raw compute.

tok/s ~= Memory bandwidth / Model VRAM x Efficiency

Data sources

Model information is gathered from multiple sources and curated or inferred carefully:

›

Hugging Face API - Repo metadata, safetensors stats, tags, files, likes, and downloads.

› Model cards, papers, and official announcements for parameter counts, architecture details, and context lengths.

› GPU specs from official vendor datasheets for VRAM and bandwidth numbers.

Apple Silicon: unified memory

Apple Silicon Macs share memory between CPU and GPU. This means the model can use a single unified pool instead of separate VRAM, which is why some Macs can run models that would not fit on a small discrete GPU.

Privacy and transparency

Hardware detection happens in your browser. We only fetch public model metadata needed for the estimate. The verdict is still an estimate because browser APIs and public repo metadata are both imperfect.