Back to Models

Xiaomi: MiMo-V2-Omni

mimo-v2-omni

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...

Modalities

Input

textimagevideoaudio

Output

text
Pricing
Cost per 1 million tokens
Input
$0.48
Output
$2.4
Model Specs
Context Window
262,144
Max Output
65,536
Release Date
2026-03-18
Knowledge Cutoff
Capabilities
Reasoning
Tool Calling
Vision

Last Updated: 2026-03-18

Provider: