Reka releases Reka Edge, a 7B multimodal model for efficient image and video understanding
Reka has released Reka Edge, a 7-billion parameter multimodal model designed for efficient image and video understanding. The model features a 16,384 token context window and is priced at $0.20 per million input and output tokens.
Reka Edge — Quick Specs
Reka has released Reka Edge, a 7-billion parameter multimodal vision-language model optimized for efficient performance on image understanding, video analysis, object detection, and agentic tool-use tasks.
Model Specifications
Reka Edge is a multimodal model that accepts image, video, and text inputs while generating text outputs. The model features:
- Parameter count: 7 billion
- Context window: 16,384 tokens
- Input pricing: $0.20 per million tokens
- Output pricing: $0.20 per million tokens
- Release date: March 20, 2026
Capabilities
According to the model documentation, Reka Edge is optimized for:
- Image understanding and analysis
- Video analysis and processing
- Object detection
- Agentic tool-use applications
The model is designed as an efficiency-focused alternative, with its 7B parameter count positioning it as a lightweight option compared to larger multimodal models.
Availability
Reka Edge is available through OpenRouter, which routes requests across multiple providers and handles normalization of requests and responses. The platform provides API access with support for various SDKs and frameworks.
The model's pricing structure—identical at $0.20 per million tokens for both input and output—simplifies cost calculation for deployment scenarios.
What this means
Reka Edge represents a focus on efficient multimodal inference, targeting use cases where model size and inference cost matter as much as capability. With a 7B parameter count and $0.20/M pricing, it positions itself as accessible for developers building vision-language applications with moderate compute constraints. The model's emphasis on video analysis and object detection suggests it's engineered for specific computer vision tasks rather than general-purpose multimodal reasoning.
Related Articles
Mistral Releases Mistral 3 Family: 675B-Parameter Large 3 MoE and Three Edge Models Under Apache 2.0
Mistral has released Mistral 3, including Mistral Large 3—a sparse mixture-of-experts model with 41B active and 675B total parameters—and three Ministral 3 edge models (3B, 8B, 14B). All models are released under Apache 2.0 license with multimodal capabilities and are available today on multiple platforms.
Google releases Gemini 3.1 Flash Image, claims Pro-level quality at $0.50 per 1M tokens
Google has released Gemini 3.1 Flash Image, internally codenamed "Nano Banana 2," an image generation and editing model with a 131K context window. The model is priced at $0.50 per 1M input tokens and $3 per 1M output tokens.
Mistral OCR 4 Launches With Bounding Boxes, 170 Language Support at $2-4 Per 1,000 Pages
Mistral AI released OCR 4, a compact document extraction model that returns bounding boxes, block classification, and inline confidence scores alongside text. The model supports 170 languages, scores 85.20 on OlmOCRBench, and is priced at $4 per 1,000 pages via API ($2 with batch discount) or $5 per 1,000 pages through Document AI.
Mistral Releases Voxtral TTS: 4B Parameter Text-to-Speech Model at $0.016 per 1k Characters
Mistral AI has released Voxtral TTS, a 4B parameter text-to-speech model supporting 9 languages including English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic. The model achieves 70ms latency for typical inputs and can clone voices from as little as 3 seconds of audio, priced at $0.016 per 1,000 characters.
Comments
Loading...