product updateAmazon Web Services

AWS adds multimodal embeddings to Amazon Bedrock for manufacturing document retrieval

TL;DR

AWS released multimodal embedding capabilities for Amazon Nova on Bedrock, allowing manufacturing organizations to retrieve information from technical documents that combine text, engineering diagrams, and images. The model supports configurable dimensions from 256 to 3072 and processes text, images, and multi-page documents into a shared vector space.

2 min read
0

AWS adds multimodal embeddings to Amazon Bedrock for manufacturing document retrieval

AWS has released multimodal embedding capabilities for Amazon Nova on Amazon Bedrock, targeting manufacturing organizations that maintain technical documentation combining text, engineering diagrams, CAD drawings, and inspection photographs.

Model specifications

Amazon Nova Multimodal Embeddings projects text, images, and document pages into a single shared vector space. The model supports configurable embedding dimensions of 256, 384, 1024, or 3072. AWS uses 1024 dimensions internally as the recommended balance between retrieval quality and computational cost.

The model includes a DOCUMENT_IMAGE detail level designed for pages containing mixed content such as charts, tables, and annotated diagrams. For single images like CAD diagrams, a STANDARD_IMAGE mode provides faster processing.

The system supports asymmetric embedding with two purpose parameters: GENERIC_INDEX for documents being indexed and GENERIC_RETRIEVAL for queries. This approach optimizes the vector space for retrieval workloads without requiring manual query formatting.

Technical implementation

AWS tested the system on a dataset of 15 standalone technical images and five multi-page PDFs containing synthetic aerospace manufacturing data. The evaluation compared two pipelines:

Pipeline A (Multimodal): Embedded images directly and PDF pages as document images using Amazon Nova Multimodal Embeddings, stored in Amazon S3 Vectors index.

Pipeline B (Text-only baseline): Extracted text via Amazon Nova 2 Lite OCR, embedded the extracted text, then indexed in a separate Amazon S3 Vectors instance.

AWS ran 26 manufacturing queries against both systems, measuring Recall@K, Mean Reciprocal Rank (MRR), and NDCG@K for retrieval metrics. Generated answers from both pipelines were scored against ground truth using an LLM judge.

Use cases

The system addresses manufacturing scenarios where critical information appears only in visual form:

  • Torque specification tables embedded in engineering drawings
  • Thermal contour plots showing peak temperatures in rocket engine nozzles
  • Manufacturing process flow charts with quality hold points and cycle times as visual annotations
  • Weld inspection reports pairing measurements with radiographic images
  • S-N fatigue curves in material certifications

According to AWS, text-only retrieval systems miss spatial relationships in diagrams, visual patterns in inspection images, and quantitative information in plots because OCR either misreads technical content or strips spatial context.

Availability

Amazon Nova Multimodal Embeddings is available in Amazon Bedrock in the us-east-1 region. Pricing has not been disclosed. The model requires access to amazon.nova-2-multimodal-embeddings-v1:0 and works with Amazon S3 Vectors for vector storage and retrieval.

Complete implementation code is available in a companion notebook on GitHub.

What this means

Multimodal embeddings solve a genuine problem in industrial settings where OCR fails to capture technical information that exists primarily as diagrams, plots, and annotated images. The ability to retrieve visual content using text queries eliminates a significant gap in manufacturing document systems. AWS's focus on configurable dimensions and document-specific processing modes suggests the company is positioning this as infrastructure for production retrieval systems rather than a research demonstration. The real test will be whether organizations see measurably better results on their own technical documentation compared to existing text-extraction pipelines.

Related Articles

product update

AWS launches Claude Platform on AWS, bringing Anthropic's native APIs and features directly to AWS accounts

AWS announced general availability of Claude Platform on AWS, enabling direct access to Anthropic's native APIs, tools, and console through existing AWS accounts. The service includes the Messages API, Claude Managed Agents, web search, MCP connector, and code execution, authenticated via AWS IAM and billed through AWS Marketplace.

product update

OpenAI launches Trusted Contact feature allowing ChatGPT to alert designated friends during suicide risk

OpenAI has launched Trusted Contact for ChatGPT, allowing users 18+ to designate one adult contact who can be notified if the company's trained human review team detects serious self-harm risk. The feature comes after over 1 million of ChatGPT's 800 million weekly users expressed suicidal thoughts in conversations, and follows a 2025 wrongful death lawsuit.

product update

GitHub Reduces Token Usage in Copilot Agentic Workflows Running on Pull Requests

GitHub has optimized token usage in its production agentic workflows that run on every pull request. The company instrumented its own Copilot workflows to identify inefficiencies and built agents to address them, aiming to reduce accumulated API costs.

product update

GitHub reduces token costs in production agentic workflows with instrumentation and automated fixes

GitHub details how it reduced token consumption in production agentic workflows that run on every pull request. The company instrumented its own workflows to identify inefficiencies and built automated agents to address them.

Comments

Loading...