AWS launches Neuron Agentic Development for automated Trainium kernel optimization
AWS announced Neuron Agentic Development, a collection of AI agents that automate kernel optimization for Trainium and Inferentia chips. The toolkit includes five specialized skills that handle kernel writing, debugging, profiling, and analysis, accessible through coding agents in Kiro and Claude.
AWS launches Neuron Agentic Development for automated Trainium kernel optimization
AWS announced Neuron Agentic Development capabilities, a toolkit of AI agents designed to automate kernel optimization for AWS Trainium and Inferentia chips. The system eliminates the need for manual kernel tuning by providing agents that can write, debug, and profile Neuron Kernel Interface (NKI) kernels.
Five specialized skills
The Neuron Agentic Development package includes five core skills that follow a write → debug → profile → analyze workflow:
neuron-nki-writing: Translates PyTorch, NumPy, or natural language into NKI code. Handles tiling strategies respecting hardware constraints including 128 partition dimensions and 512/4096 PSUM free dimensions, memory access patterns, and DMA sizing optimization.
neuron-nki-debugging: Provides systematic error resolution for compilation and execution issues. Includes environment setup with correct --target flags, compiler error resolution covering all 28 NCC error codes, and numerical validation against CPU references.
neuron-nki-profiling: Captures execution profiles on hardware by configuring runtime inspection variables, identifying the correct Neuron Execution File Format (NEFF), and capturing traces with neuron-explorer including DMA Graph Engine notifications.
neuron-nki-profile-querying: Runs SQL queries on NEFF and NTFF files to compute performance bounds, identify bottleneck engines, and localize inefficiencies to specific source lines. Supports neuron-explorer API server, DuckDB, and pandas analysis.
neuron-nki-docs: Provides API signatures, error code explanations, tutorials, and architecture guides for Trainium 1, 2, and 3 throughout the development process.
Agent orchestration
AWS provides five specialized agents that combine multiple skills:
- neuron-nki-agent: Unified entry point that automatically selects the appropriate workflow based on developer requests
- neuron-nki-writing-agent: Focuses on kernel authoring and modifications
- neuron-nki-debugging-agent: Autonomously resolves compiler errors with up to 10 iterations
- neuron-nki-docs-agent: Lightweight documentation navigator
- neuron-nki-profile-analysis-agent: Runs profiling and querying skills to identify performance bottlenecks
The skills integrate with coding environments including VS Code, Cursor, and Kiro by adding them to .kiro/skills or .claude/skills directories. All skills must run on Trainium-based Amazon EC2 instances.
Technical requirements
Developers can use the toolkit on trn2.3xlarge instances through AWS MLCBs with the Neuron Deep Learning AMI. The DLAMI includes a pre-installed PyTorch 2.9 virtual environment at ~opt/aws_neuronx_venv_pytorch_2_9. Trainium instances are available in regions including São Paulo (sa-east-1) and Melbourne (ap-southeast-4).
AWS provides example workflows demonstrating softmax and SwiGLU kernel optimization, showing how agents handle real-world inference pipeline bottlenecks.
What this means
AWS is directly addressing the kernel optimization bottleneck that has limited Trainium adoption. By automating the expertise-intensive work of writing hardware-aware kernels, the company aims to make custom chip optimization accessible to ML engineers without deep architectural knowledge. This positions Trainium to compete more effectively with NVIDIA's CUDA ecosystem, which benefits from years of accumulated optimization tooling. The integration with Claude suggests Anthropic may be an early partner testing these capabilities.
Related Articles
Google adds Business Profile integration to Gemini app with automated review responses, performance analysis
Google is integrating Google Business Profile data into the Gemini app, giving business owners AI-powered review management, performance analytics, and profile updates. The integration includes Business notebooks for organizing business data and generating brand-matched content.
GitHub Copilot CLI Adds Language Server Protocol Support for Code Intelligence
GitHub has added Language Server Protocol (LSP) support to Copilot CLI, replacing its previous grep and decompile-based code analysis. The update enables the AI coding assistant to use structured language intelligence for understanding codebases.
GitHub Copilot CLI adds language server protocol support for code analysis
GitHub has updated Copilot CLI to support language server protocol (LSP) integration. The update replaces the tool's previous grep and decompile-based code analysis with structured language server support for improved code intelligence.
Apple integrates Google Gemini into Siri, limits availability to select regions
Apple announced Siri AI integration with Google Gemini at its WWDC 2026 event at Apple Park. The update represents Apple's latest AI push, though regional restrictions reportedly limit availability for many users globally.
Comments
Loading...