back

Llama 3.2 11B Vision

Name: Llama 3.2 11B Vision
Author: Meta

Llama 3.2 Community

Meta · 11B · Dense

Multimodal vision and text model

HuggingFace Ollama

272.7K downloads 1.6K likes 2024-09 128K context

Use Cases

chat vision

Quantization Options

Quant	Bits	VRAM	Quality	Status
Q2_K	2	4 GB	low	—
Q3_K_M	3	5.4 GB	moderate	—
Q4_K_M	4	6.1 GB	good	—
Q5_K_M	5	7.5 GB	good	—
Q6_K	6	9 GB	excellent	—
Q8_0	8	11.8 GB	excellent	—
F16	16	23 GB	lossless	—

About this model

The Llama 3.2-Vision collection of multimodal large language models (LLMs) is a collection of instruction-tuned image reasoning generative models in 11B and 90B sizes (text + images in / text out). The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The models outperform many of the available open source and closed multimodal models on common industry benchmarks.

Supported Languages: For text only tasks, English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai are officially supported. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Note for image+text applications, English is the only language supported.

Usage

First, pull the model:

ollama pull llama3.2-vision

Python Library

To use Llama 3.2 Vision with the Ollama Python library:

import ollama

response = ollama.chat(
    model='llama3.2-vision',
    messages=[{
        'role': 'user',
        'content': 'What is in this image?',
        'images': ['image.jpg']
    }]
)

print(response)

JavaScript Library

To use Llama 3.2 Vision with the Ollama JavaScript library:

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'llama3.2-vision',
  messages: [{
    role: 'user',
    content: 'What is in this image?',
    images: ['image.jpg']
  }]
})

console.log(response)

cURL

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2-vision",
  "messages": [
    {
      "role": "user",
      "content": "what is in this image?",
      "images": ["<base64-encoded image data>"]
    }
  ]
}'

References

GitHub

HuggingFace