model release

Perceptron Launches Mk1 Vision-Language Model with Video Reasoning at $0.15/$1.50 per 1M Tokens

TL;DR

Perceptron has released Perceptron Mk1, a vision-language model designed for video understanding and embodied reasoning tasks. The model accepts image and video inputs with 33K context window, priced at $0.15 per 1M input tokens and $1.50 per 1M output tokens, and supports structured spatial annotations on demand.

2 min read
0

Perceptron Mk1 — Quick Specs

Context window33K tokens
Input$0.15/1M tokens
Output$1.5/1M tokens

Perceptron Launches Mk1 Vision-Language Model with Video Reasoning

Perceptron has released Perceptron Mk1 (Mark One), a multimodal vision-language model built for video understanding and embodied reasoning tasks. The model processes image and video inputs paired with natural language queries, returning either structured annotations or natural language responses.

Pricing and Context

Perceptron Mk1 is priced at $0.15 per 1M input tokens and $1.50 per 1M output tokens, with a 33K token context window. The model is available through OpenRouter's API routing service.

Core Capabilities

According to Perceptron, Mk1 excels at multiple video understanding tasks including video question answering, summarization, and event detection. For image inputs, the model handles:

  • Point-by-example grounding from multimodal prompts
  • OCR and document parsing on real-world inputs
  • Open vocabulary object detection and counting
  • Hand pose estimation

Structured Annotation System

The model's distinctive feature is its optional structured annotation output. By default, Mk1 returns natural language text only. Users can request spatial localization through the annotation_format parameter:

  • "point" for point annotations on images
  • "box" for bounding boxes
  • "polygon" for polygon masks
  • "clip" for temporal segments (start/end timestamps) in video

Annotations are emitted inline with text only when explicitly requested.

Optional Reasoning Mode

Mk1 includes an optional reasoning mode that can be enabled per request. This trades increased latency for deeper analysis on complex tasks, allowing the model to show step-by-step thinking processes. OpenRouter provides access to the reasoning_details array in API responses.

What This Means

Perceptron Mk1 enters a crowded multimodal model market with a focus on structured output formats and video understanding. The $1.50 per 1M output tokens places it in the premium tier—comparable to GPT-4 Vision pricing. The optional reasoning mode and granular annotation controls suggest the model targets developers building computer vision pipelines and video analysis applications rather than general-purpose chat interfaces. The company has not disclosed benchmark scores or parameter count, making direct performance comparisons difficult.

Related Articles

model release

Mistral OCR 4 Launches With Bounding Boxes, 170 Language Support at $2-4 Per 1,000 Pages

Mistral AI released OCR 4, a compact document extraction model that returns bounding boxes, block classification, and inline confidence scores alongside text. The model supports 170 languages, scores 85.20 on OlmOCRBench, and is priced at $4 per 1,000 pages via API ($2 with batch discount) or $5 per 1,000 pages through Document AI.

model release

Baidu Releases Unlimited-OCR, a 3B Parameter Document Parsing Model Based on Deepseek-OCR

Baidu has released Unlimited-OCR, a 3 billion parameter model for optical character recognition and document parsing. The model supports single-page and multi-page document processing with a 32,768 token context window and runs on NVIDIA GPUs using bfloat16 precision.

model release

OpenAI restricts GPT-5.6 rollout to government-approved partners, calls arrangement unsustainable

OpenAI released its GPT-5.6 model lineup to a limited group of "trusted partners" after the U.S. government requested restrictions on the rollout. The company released three models—Sol ($5/$30 per million tokens), Terra ($2.50/$15), and Luna ($1/$6)—but said the government-mandated preview "shouldn't become the long-term default."

model release

OpenAI releases GPT-5.6 in three tiers with limited government-coordinated rollout

OpenAI announced GPT-5.6, a three-tier model series launching through a limited preview coordinated with the U.S. government. The models—Sol, Terra, and Luna—are priced from $1/$6 to $5/$30 per million input/output tokens and introduce new max and ultra reasoning modes.

Comments

Loading...