Taranker.Com Logo
Pixtral 12B 24.09 logo

Pixtral 12B 24.09

Free plan available

Multimodal AI for image-text tasks with variable image support and 128K context

About Pixtral 12B 24.09

Launched Jan 22, 2025

Categories

Industry :

Horizontal

Website

Introduction Video

Description

Multimodal AI for image-text tasks with variable image support and 128K context

Pixtral-12B-2409 is a 12-billion-parameter multimodal model by Mistral AI, combining a 12B-parameter text decoder with a 400M-parameter vision encoder. It processes interleaved text and images natively, supporting variable image sizes and a 128K-token context window for long-form document analysis or multi-image workflows. The model excels in tasks like chart understanding, OCR, and multilingual reasoning, outperforming similar-sized open models (e.g., Qwen2-VL 7B, LLaVA-OV 7B) and even larger models like Llama-3.2 90B in benchmarks like MMMU (52.5%) and MathVista (58.0%)
Pixtral 12B 24.09 website

Pixtral 12B 24.09 Key Features

  • 128K Context Window: Handles long documents or multi-image inputs.
  • Variable Image Support: Processes images at native resolution and aspect ratio via a vision encoder.
  • Multilingual & Code Capabilities: Supports 80+ coding languages and nuanced multilingual understanding.
  • Open Source: Apache 2.0 license for free modification and deployment.
  • High Accuracy: Outperforms Claude 3 Haiku and Gemini-1.5 Flash 8B in multimodal benchmarks.
  • Vision-to-Code: Generates HTML/CSS from sketches or diagrams

Pixtral 12B 24.09 Use Cases

  • Image Captioning & OCR: Generate descriptions or extract text from images/documents.
  • Data Analysis: Convert charts to Markdown tables or interactive dashboards.
  • Document QA: Answer questions from technical manuals or financial reports.
  • Academic Research: Summarize papers or analyze scientific diagrams.
  • Automation: Integrate with workflows for invoice processing or customer support

More App like this

Mistral Large 24.11 logo

Top-tier multilingual reasoning for coding, math, and enterprise...

 DeepSeek V3 logo
  • Free Plan Available

Cost-efficient open-source MoE model rivaling GPT-4o in...

Codestral 25.01 logo

State-of-the-art AI model for lightning-fast code generation...

Mistral Small 3 logo
  • Free Plan Available

Efficient, open-source AI model rivaling larger competitors...

Scroll to Top