AWS adds multimodal embeddings to Amazon Bedrock for manufacturing document retrieval
AWS released multimodal embedding capabilities for Amazon Nova on Bedrock, allowing manufacturing organizations to retrieve information from technical documents that combine text, engineering diagrams, and images. The model supports configurable dimensions from 256 to 3072 and processes text, images, and multi-page documents into a shared vector space.
AWS adds multimodal embeddings to Amazon Bedrock for manufacturing document retrieval
AWS has released multimodal embedding capabilities for Amazon Nova on Amazon Bedrock, targeting manufacturing organizations that maintain technical documentation combining text, engineering diagrams, CAD drawings, and inspection photographs.
Model specifications
Amazon Nova Multimodal Embeddings projects text, images, and document pages into a single shared vector space. The model supports configurable embedding dimensions of 256, 384, 1024, or 3072. AWS uses 1024 dimensions internally as the recommended balance between retrieval quality and computational cost.
The model includes a DOCUMENT_IMAGE detail level designed for pages containing mixed content such as charts, tables, and annotated diagrams. For single images like CAD diagrams, a STANDARD_IMAGE mode provides faster processing.
The system supports asymmetric embedding with two purpose parameters: GENERIC_INDEX for documents being indexed and GENERIC_RETRIEVAL for queries. This approach optimizes the vector space for retrieval workloads without requiring manual query formatting.
Technical implementation
AWS tested the system on a dataset of 15 standalone technical images and five multi-page PDFs containing synthetic aerospace manufacturing data. The evaluation compared two pipelines:
Pipeline A (Multimodal): Embedded images directly and PDF pages as document images using Amazon Nova Multimodal Embeddings, stored in Amazon S3 Vectors index.
Pipeline B (Text-only baseline): Extracted text via Amazon Nova 2 Lite OCR, embedded the extracted text, then indexed in a separate Amazon S3 Vectors instance.
AWS ran 26 manufacturing queries against both systems, measuring Recall@K, Mean Reciprocal Rank (MRR), and NDCG@K for retrieval metrics. Generated answers from both pipelines were scored against ground truth using an LLM judge.
Use cases
The system addresses manufacturing scenarios where critical information appears only in visual form:
- Torque specification tables embedded in engineering drawings
- Thermal contour plots showing peak temperatures in rocket engine nozzles
- Manufacturing process flow charts with quality hold points and cycle times as visual annotations
- Weld inspection reports pairing measurements with radiographic images
- S-N fatigue curves in material certifications
According to AWS, text-only retrieval systems miss spatial relationships in diagrams, visual patterns in inspection images, and quantitative information in plots because OCR either misreads technical content or strips spatial context.
Availability
Amazon Nova Multimodal Embeddings is available in Amazon Bedrock in the us-east-1 region. Pricing has not been disclosed. The model requires access to amazon.nova-2-multimodal-embeddings-v1:0 and works with Amazon S3 Vectors for vector storage and retrieval.
Complete implementation code is available in a companion notebook on GitHub.
What this means
Multimodal embeddings solve a genuine problem in industrial settings where OCR fails to capture technical information that exists primarily as diagrams, plots, and annotated images. The ability to retrieve visual content using text queries eliminates a significant gap in manufacturing document systems. AWS's focus on configurable dimensions and document-specific processing modes suggests the company is positioning this as infrastructure for production retrieval systems rather than a research demonstration. The real test will be whether organizations see measurably better results on their own technical documentation compared to existing text-extraction pipelines.
Related Articles
AWS releases healthcare appointment agent tutorial using Nova 2 Sonic speech-to-speech model
AWS published a technical guide for building voice appointment agents using Amazon Nova 2 Sonic, a speech-to-speech model that processes audio natively without separate transcription steps. The tutorial covers authentication, scheduling, and escalation tools running on Amazon Bedrock AgentCore with DynamoDB persistence.
Google adds screen selection tool to Chrome's Gemini panel, integrates computer use into Gemini 3.5 Flash API
Google has added a screen selection tool to Chrome 149's Gemini panel that allows users to capture text or images from their current tab for prompts. Separately, the company integrated computer use capabilities directly into the Gemini 3.5 Flash model API, replacing the standalone Gemini 2.5 Computer Use model.
iOS 27 beta adds ChatGPT toggle to new Siri app, but context doesn't transfer between models
Apple's iOS 27 beta introduces a standalone Siri app with a chatbot interface that allows users to switch between Siri and ChatGPT by long-pressing the input field. Conversation context does not transfer when switching providers, and the app defaults back to Siri when reopened.
Loka Achieves 87% Speech Reasoning Accuracy Using Amazon Nova 2 Sonic, Outperforming GPT Realtime and Gemini
Loka built a conversational voice agent using Amazon Nova 2 Sonic that achieved 87.0% speech reasoning accuracy on Big Bench Audio, surpassing GPT Realtime at 83.0% and Gemini 2.5 Flash Native Audio at 71.0%. The system delivers Time to First Audio of 1.39 seconds at approximately $0.27 per hour of input audio.
Comments
Loading...