AWS adds metadata filtering to AgentCore Memory, improving agent retrieval accuracy from 40% to 64%
Amazon has added metadata filtering to its AgentCore Memory service for AI agents. In AWS evaluations across 151 questions, the feature improved overall question-answering accuracy from 40% to 64%, with context-dependent questions jumping from 16% to 69% accuracy. The update allows agents to filter memory retrieval by attributes like priority, department, or time range before semantic search runs.
AWS adds metadata filtering to AgentCore Memory, improving agent retrieval accuracy from 40% to 64%
Amazon has added metadata filtering to its AgentCore Memory service, addressing a core problem in long-running AI agent deployments: semantic search returns contextually irrelevant results when agents accumulate weeks of interaction history.
Performance gains
According to AWS, internal evaluations across a 151-question test set based on long-term memory benchmarks showed overall question-answering accuracy improved from 40% to 64% with metadata filtering enabled. For questions requiring contextual boundaries—time-bounded lookups, priority-based filtering, or department-scoped searches—accuracy jumped from 16% to 69%.
How it works
AgentCore Memory organizes agent memory into namespaces that isolate data by entity (e.g., clients/client-123). The new metadata filtering layer adds attribute-based filters on top of namespace isolation. Teams can now scope retrieval by business dimensions like priority, status, department, or time range before similarity search executes.
The system operates through a three-phase lifecycle:
Configuration phase: Teams declare which metadata keys to index when creating a memory resource. Indexed keys include type (STRING, NUMBER, STRINGLIST), extraction instructions, and optional validation rules like allowed values or min-max ranges.
Ingestion phase: During conversations, string-based key-value pairs attach to events. An LLM extracts structured metadata from conversation content based on schema definitions. When multiple events in a session carry the same key, the system merges values using defined resolution behaviors (e.g., LATEST_VALUE for recency-based resolution).
Retrieval phase: Agents filter memory queries using these indexed metadata fields before semantic search runs.
Enterprise use cases
The update targets multi-agent and multi-tenant architectures. In a customer support scenario, an agent can now filter for "billing issues" with status:open and priority:high from the past 30 days, rather than receiving mixed results spanning technical tickets, sales conversations, and resolved disputes.
For financial services, relationship managers can query "portfolio rebalancing discussions" scoped to high-priority conversations from the last week, distinguishing them from routine inquiries three months ago—even though both are semantically similar.
Technical implementation
Metadata operates across both short-term and long-term memory layers. Short-term memory attaches string-based key-value pairs to events. These tags propagate into long-term memory during extraction and consolidation.
The schema supports validation constraints: allowedValues for STRING and STRINGLIST types, maxItems for STRINGLIST, and min-max ranges for NUMBER types. Non-indexed keys store alongside memory records for informational purposes without query optimization.
AWS notes that sentiment analysis can be defined in the schema without indexing, allowing the LLM to derive values from conversation content without making them filterable dimensions.
What this means
This addresses a scaling problem in production agent deployments. As conversation history grows beyond a few weeks, semantic similarity alone produces too many false positives. The 53% improvement in context-dependent question accuracy (16% to 69%) suggests metadata filtering is necessary infrastructure for agents handling multi-month interaction histories. Teams running customer support, IT helpdesk, or financial advisory agents with accumulated history will see immediate gains by layering business-specific filters over namespace isolation.
Related Articles
AWS brings NVIDIA Nemotron and OpenAI GPT OSS models to GovCloud for secure government AI workloads
Amazon Bedrock now supports NVIDIA Nemotron and OpenAI GPT OSS models in AWS GovCloud (US) Regions. The launch includes OpenAI's GPT OSS models (120B and 20B parameters, 128K context) and NVIDIA Nemotron 3 family (9B to 120B parameters, 1M context), providing government agencies FedRAMP High and DoD SRG Level 5-compliant AI inference on U.S. soil.
AWS to Release Anthropic's Claude Fable 5 on Bedrock with Cybersecurity Guardrails
Amazon Web Services announced it will make Anthropic's Claude Fable 5 models available on Bedrock starting tomorrow, featuring guardrails designed to prevent cybersecurity misuse. When guardrails are triggered, the system automatically falls back to Claude Opus 4.8.
Google Drive's Ask Gemini AI assistant launches on Android and iOS for AI Pro subscribers
Google is rolling out Ask Gemini and AI Overviews to Google Drive's Android and iOS apps. The features enable multi-turn conversations across Drive, Gmail, Chat, Calendar, and web search, available to AI Pro, Ultra, Business Standard/Plus, and Enterprise Standard/Plus subscribers in English plus 28 additional languages.
AWS launches managed entitlements for Bedrock to distribute third-party model access across multi-account organizations
AWS has introduced managed entitlements for Amazon Bedrock, allowing organizations to subscribe to third-party models like Anthropic Claude and Cohere from a central account and distribute access across member accounts without requiring AWS Marketplace permissions. The feature uses AWS License Manager to create grants that share model entitlements with specific accounts or entire organizational units.
Comments
Loading...