Back to Models

Meta: Llama 3.2 11B Vision Instruct

llama-3.2-11b-vision-instruct

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...

Modalities

Input

textimage

Output

text
Pricing
Cost per 1 million tokens
Input
$0.294
Output
$0.294
Model Specs
Context Window
131,072
Max Output
8,192
Release Date
2024-09-25
Knowledge Cutoff
2023-12
Capabilities
Reasoning
Tool Calling
Vision

Last Updated: 2024-09-25

Provider: