model release

Perceptron Launches Mk1 Vision-Language Model with Video Reasoning at $0.15/$1.50 per 1M Tokens

TL;DR

Perceptron has released Perceptron Mk1, a vision-language model designed for video understanding and embodied reasoning tasks. The model accepts image and video inputs with 33K context window, priced at $0.15 per 1M input tokens and $1.50 per 1M output tokens, and supports structured spatial annotations on demand.

May 12, 2026 · 3:35 PM2 min read

Perceptron Mk1 — Quick Specs

Context window33K tokens

Input$0.15/1M tokens

Output$1.5/1M tokens

Compare Perceptron Mk1 with other models →

Perceptron Launches Mk1 Vision-Language Model with Video Reasoning

Perceptron has released Perceptron Mk1 (Mark One), a multimodal vision-language model built for video understanding and embodied reasoning tasks. The model processes image and video inputs paired with natural language queries, returning either structured annotations or natural language responses.

Pricing and Context

Perceptron Mk1 is priced at $0.15 per 1M input tokens and $1.50 per 1M output tokens, with a 33K token context window. The model is available through OpenRouter's API routing service.

Core Capabilities

According to Perceptron, Mk1 excels at multiple video understanding tasks including video question answering, summarization, and event detection. For image inputs, the model handles:

Point-by-example grounding from multimodal prompts
OCR and document parsing on real-world inputs
Open vocabulary object detection and counting
Hand pose estimation

Structured Annotation System

The model's distinctive feature is its optional structured annotation output. By default, Mk1 returns natural language text only. Users can request spatial localization through the annotation_format parameter:

"point" for point annotations on images
"box" for bounding boxes
"polygon" for polygon masks
"clip" for temporal segments (start/end timestamps) in video

Annotations are emitted inline with text only when explicitly requested.

Optional Reasoning Mode

Mk1 includes an optional reasoning mode that can be enabled per request. This trades increased latency for deeper analysis on complex tasks, allowing the model to show step-by-step thinking processes. OpenRouter provides access to the reasoning_details array in API responses.

What This Means

Perceptron Mk1 enters a crowded multimodal model market with a focus on structured output formats and video understanding. The $1.50 per 1M output tokens places it in the premium tier—comparable to GPT-4 Vision pricing. The optional reasoning mode and granular annotation controls suggest the model targets developers building computer vision pipelines and video analysis applications rather than general-purpose chat interfaces. The company has not disclosed benchmark scores or parameter count, making direct performance comparisons difficult.

Source: openrouter.ai ↗

Perceptron Mk1 multimodal vision-language video understanding spatial annotations reasoning OCR

model releaseMay 7, 2026

Google releases Gemini 3.1 Flash Lite with 1M context at $0.25 per million input tokens

Google has released Gemini 3.1 Flash Lite, a high-efficiency multimodal model with a 1,048,576 token context window priced at $0.25 per million input tokens and $1.50 per million output tokens. The model supports text, image, video, audio, and PDF inputs with four thinking levels for cost-performance optimization.

model releaseMay 10, 2026

Google DeepMind Releases Gemma 4 E4B with Multi-Token Prediction for 2x Faster Inference

Google DeepMind released the Gemma 4 E4B assistant model using Multi-Token Prediction (MTP) architecture that accelerates inference by up to 2x through speculative decoding. The 4.5B effective parameter model supports 128K context windows and handles text, image, and audio input with pricing not yet disclosed.

model releaseMay 7, 2026

Zyphra Releases ZAYA1-8B: 8.4B Parameter MoE Model with 760M Active Parameters Matches 80B+ Models on Math Benchmarks

Zyphra has released ZAYA1-8B, a mixture-of-experts language model with 760M active parameters and 8.4B total parameters. The model scores 89.1% on AIME 2026, competitive with models exceeding 100B parameters, while maintaining efficiency for on-device deployment.

model releaseMay 6, 2026

Google DeepMind Releases Gemma 4 26B A4B Assistant Model for 2x Faster Inference via Multi-Token Prediction

Google DeepMind has released a Multi-Token Prediction assistant model for Gemma 4 26B A4B that achieves up to 2x decoding speedup through speculative decoding. The model uses 3.8B active parameters from a 25.2B total parameter MoE architecture with 128 experts and a 256K token context window.