product updateAmazon Web Services

AWS adds metadata filtering to AgentCore Memory, improving agent retrieval accuracy from 40% to 64%

TL;DR

Amazon has added metadata filtering to its AgentCore Memory service for AI agents. In AWS evaluations across 151 questions, the feature improved overall question-answering accuracy from 40% to 64%, with context-dependent questions jumping from 16% to 69% accuracy. The update allows agents to filter memory retrieval by attributes like priority, department, or time range before semantic search runs.

2 min read
0

AWS adds metadata filtering to AgentCore Memory, improving agent retrieval accuracy from 40% to 64%

Amazon has added metadata filtering to its AgentCore Memory service, addressing a core problem in long-running AI agent deployments: semantic search returns contextually irrelevant results when agents accumulate weeks of interaction history.

Performance gains

According to AWS, internal evaluations across a 151-question test set based on long-term memory benchmarks showed overall question-answering accuracy improved from 40% to 64% with metadata filtering enabled. For questions requiring contextual boundaries—time-bounded lookups, priority-based filtering, or department-scoped searches—accuracy jumped from 16% to 69%.

How it works

AgentCore Memory organizes agent memory into namespaces that isolate data by entity (e.g., clients/client-123). The new metadata filtering layer adds attribute-based filters on top of namespace isolation. Teams can now scope retrieval by business dimensions like priority, status, department, or time range before similarity search executes.

The system operates through a three-phase lifecycle:

Configuration phase: Teams declare which metadata keys to index when creating a memory resource. Indexed keys include type (STRING, NUMBER, STRINGLIST), extraction instructions, and optional validation rules like allowed values or min-max ranges.

Ingestion phase: During conversations, string-based key-value pairs attach to events. An LLM extracts structured metadata from conversation content based on schema definitions. When multiple events in a session carry the same key, the system merges values using defined resolution behaviors (e.g., LATEST_VALUE for recency-based resolution).

Retrieval phase: Agents filter memory queries using these indexed metadata fields before semantic search runs.

Enterprise use cases

The update targets multi-agent and multi-tenant architectures. In a customer support scenario, an agent can now filter for "billing issues" with status:open and priority:high from the past 30 days, rather than receiving mixed results spanning technical tickets, sales conversations, and resolved disputes.

For financial services, relationship managers can query "portfolio rebalancing discussions" scoped to high-priority conversations from the last week, distinguishing them from routine inquiries three months ago—even though both are semantically similar.

Technical implementation

Metadata operates across both short-term and long-term memory layers. Short-term memory attaches string-based key-value pairs to events. These tags propagate into long-term memory during extraction and consolidation.

The schema supports validation constraints: allowedValues for STRING and STRINGLIST types, maxItems for STRINGLIST, and min-max ranges for NUMBER types. Non-indexed keys store alongside memory records for informational purposes without query optimization.

AWS notes that sentiment analysis can be defined in the schema without indexing, allowing the LLM to derive values from conversation content without making them filterable dimensions.

What this means

This addresses a scaling problem in production agent deployments. As conversation history grows beyond a few weeks, semantic similarity alone produces too many false positives. The 53% improvement in context-dependent question accuracy (16% to 69%) suggests metadata filtering is necessary infrastructure for agents handling multi-month interaction histories. Teams running customer support, IT helpdesk, or financial advisory agents with accumulated history will see immediate gains by layering business-specific filters over namespace isolation.

Related Articles

product update

AWS brings NVIDIA Nemotron and OpenAI GPT OSS models to GovCloud for secure government AI workloads

Amazon Bedrock now supports NVIDIA Nemotron and OpenAI GPT OSS models in AWS GovCloud (US) Regions. The launch includes OpenAI's GPT OSS models (120B and 20B parameters, 128K context) and NVIDIA Nemotron 3 family (9B to 120B parameters, 1M context), providing government agencies FedRAMP High and DoD SRG Level 5-compliant AI inference on U.S. soil.

product update

AWS to Release Anthropic's Claude Fable 5 on Bedrock with Cybersecurity Guardrails

Amazon Web Services announced it will make Anthropic's Claude Fable 5 models available on Bedrock starting tomorrow, featuring guardrails designed to prevent cybersecurity misuse. When guardrails are triggered, the system automatically falls back to Claude Opus 4.8.

product update

Google Drive's Ask Gemini AI assistant launches on Android and iOS for AI Pro subscribers

Google is rolling out Ask Gemini and AI Overviews to Google Drive's Android and iOS apps. The features enable multi-turn conversations across Drive, Gmail, Chat, Calendar, and web search, available to AI Pro, Ultra, Business Standard/Plus, and Enterprise Standard/Plus subscribers in English plus 28 additional languages.

product update

AWS launches managed entitlements for Bedrock to distribute third-party model access across multi-account organizations

AWS has introduced managed entitlements for Amazon Bedrock, allowing organizations to subscribe to third-party models like Anthropic Claude and Cohere from a central account and distribute access across member accounts without requiring AWS Marketplace permissions. The feature uses AWS License Manager to create grants that share model entitlements with specific accounts or entire organizational units.

Comments

Loading...