AWS demonstrates two-model pipeline using Nova 2 Lite and Claude Sonnet 4.6 that cuts document processing costs by 67%
AWS published a technical demonstration showing that pairing Amazon Nova 2 Lite with Anthropic's Claude Sonnet 4.6 reduces document processing costs by approximately two-thirds compared to single-model approaches. The two-stage pipeline processed 336 scanned yearbook pages at $0.0027 per page, producing 3,122 name-to-face associations with 93% scoring at or above 0.95 confidence.
Two-model architecture delivers 67% cost reduction
AWS published a technical implementation guide demonstrating a document processing pipeline that combines Amazon Nova 2 Lite for multimodal extraction with Anthropic's Claude Sonnet 4.6 for spatial reasoning. According to AWS, this approach costs approximately two-thirds less per page than sending the entire task to a single vision-language model.
The pipeline processed 336 scanned yearbook pages containing unstructured layouts, producing 3,122 name-to-face associations. AWS reports that 93% of associations scored at or above 0.95 confidence.
Fixed per-image pricing changes cost structure
Amazon Nova 2 Lite now bills image inputs at a fixed per-image rate of 230 tokens ($0.000069 at $0.30/million input tokens), regardless of resolution or file size. This represents a significant change from previous variable pricing based on image resolution.
AWS breaks down the per-page cost at published rates:
- Image tokens (fixed): 230 tokens at $0.30/M = $0.000069
- Prompt tokens (estimated): 500 tokens at $0.30/M = $0.000150
- Output tokens (estimated): 1,000 tokens at $2.50/M = $0.0025
- Total: $0.0027 per page
The fixed image pricing makes the cost scale linearly with page count, independent of resolution.
Pipeline design: extraction then reasoning
Stage 1 uses Nova 2 Lite with reasoning set to LOW for structured extraction in a single API call. The model detects photos with bounding boxes, extracts visible names with approximate positions, and returns page metadata.
AWS reports testing showed no meaningful accuracy difference between LOW, MEDIUM, and HIGH reasoning levels for this extraction task. Constraining Nova output to names rather than full OCR keeps output at approximately 1,000 tokens per page instead of 4,500 tokens for complete text extraction.
Stage 2 calls Claude Sonnet 4.6 once per page for spatial reasoning. Claude's adaptive thinking feature adjusts reasoning depth based on input complexity. AWS reports reasoning traces ranged from 544 to 1,658 characters across the 336-page test.
Adaptive thinking handles layout variability
Claude Sonnet 4.6's adaptive thinking mode automatically adjusts reasoning based on page complexity. Simple portrait grids receive minimal reasoning, while complex layouts with group photos and shared caption blocks trigger step-by-step spatial analysis.
Adaptive thinking is enabled via the Converse API with 'thinking': {'type': 'adaptive'}. Reasoning tokens are billed as output tokens at $15.00/million for Claude Sonnet 4.6 through cross-region inference.
AWS notes that reasoning traces appear in a separate thinking content block in the API response but are not shown to end users.
What this means
The two-model approach demonstrates cost optimization through task specialization: using a cheaper model for high-volume extraction and a more capable model only for complex reasoning. The 67% cost reduction claim depends on comparing against unspecified "single-model alternatives," so actual savings will vary based on which single model is used as the baseline.
Nova 2 Lite's fixed per-image pricing eliminates resolution-based cost variability, making budget forecasting simpler for large-scale document processing workloads. For organizations processing hundreds of thousands of pages, this predictability matters more than the per-page cost reduction.
Full implementation code is available in the AWS Samples repository on GitHub. The pipeline requires Amazon Bedrock access in a region supporting both models and IAM permissions for bedrock:InvokeModel and bedrock:Converse.
Related Articles
Google adds screen selection tool to Chrome's Gemini panel, integrates computer use into Gemini 3.5 Flash API
Google has added a screen selection tool to Chrome 149's Gemini panel that allows users to capture text or images from their current tab for prompts. Separately, the company integrated computer use capabilities directly into the Gemini 3.5 Flash model API, replacing the standalone Gemini 2.5 Computer Use model.
Cursor launches iOS app for mobile code review and agent management after SpaceX acquisition
Cursor has released its first iOS app for iPhone and iPad, allowing developers to launch coding agents, review pull requests, and manage engineering work from mobile devices. The app arrives weeks after SpaceX acquired the AI coding company in June 2026.
AWS releases automated healthcare claims pipeline using Amazon Bedrock Data Automation and AgentCore
AWS has published a technical implementation guide for automating healthcare claims processing using Amazon Bedrock Data Automation and Amazon Bedrock AgentCore. The pipeline extracts data from CMS-1500 claim forms, validates against AWS HealthLake records, and generates FHIR-compliant claim resources with automated notifications.
AWS launches AgentCore Observability for Amazon Bedrock to debug production AI agents
Amazon Web Services launched AgentCore Observability for Amazon Bedrock, a debugging tool that provides visibility into AI agent execution through OpenTelemetry traces, CloudWatch metrics, and structured logs. The tool addresses silent failures in production agents including infinite reasoning loops, incorrect tool selection, and plausible but incorrect answers.
Comments
Loading...