agentic-ai
44 articles tagged with agentic-ai
Poolside releases Laguna M.1: 225B parameter MoE model scores 74.6% on SWE-bench Verified
Poolside has released Laguna M.1, a 225B total parameter Mixture-of-Experts model with 23B activated parameters per token, designed for agentic coding tasks. The model scores 74.6% on SWE-bench Verified and 63.1% on SWE-bench Multilingual, released under Apache 2.0 license.
Mistral AI Launches Forge for Enterprise Model Training on Proprietary Data
Mistral AI has launched Forge, a platform that allows enterprises to train custom AI models on their proprietary data including codebases, compliance policies, and operational documentation. The system supports both dense and mixture-of-experts architectures with pre-training, post-training, and reinforcement learning capabilities.
Cohere releases North Mini Code, a 30B-parameter sparse MoE coding model with 256K context window, free on OpenRouter
Cohere has released North Mini Code, the first model in its North family and its first agentic coding model. The sparse mixture-of-experts architecture features 30B total parameters with 3B active, a 256K-token context window, and up to 64K tokens of output, available free via OpenRouter under Apache 2.0 license.
Nex AGI Releases Nex-N2-Pro: 17B Active Parameter MoE Model with 262K Context Window
Nex AGI has released Nex-N2-Pro, a mixture-of-experts model with 17 billion active parameters from a total of 397 billion parameters. Built on the Qwen3.5 architecture, the model offers a 262,144 token context window and is available for free through OpenRouter.
Nvidia Releases Nemotron 3 Ultra: 550B Parameter MoE Model with 1M Token Context Window
Nvidia has released Nemotron 3 Ultra, a 550B parameter mixture-of-experts model with 55B active parameters and a 1M token context window. The model uses a hybrid Transformer-Mamba architecture and is available for free through OpenRouter, targeting agentic workflows and multi-step reasoning tasks.
Perplexity Computer adds hybrid inference to split tasks between local and cloud models
Perplexity announced that its Computer agentic system will gain hybrid inference in July 2026, automatically splitting tasks between local models for sensitive data and cloud-based frontier models for complex operations. The feature aims to balance privacy with computational power without requiring manual model selection.
Anthropic's Claude Opus 4.8 launches on AWS Bedrock in four regions
Anthropic's Claude Opus 4.8 is now available on Amazon Bedrock and Claude Platform on AWS. The model is designed for autonomous multi-stage tasks, agentic coding, and long-running workflows with reduced supervision.
Anthropic releases Claude Opus 4.8 with 69.2% agentic coding score, 2.5x faster performance
Anthropic released Claude Opus 4.8 on May 28, 2026, six weeks after version 4.7. The model achieves 69.2% on agentic coding benchmarks (up from 64.3%), runs 2.5 times faster in fast mode at one-third the cost, while maintaining the same pricing as version 4.7.
Anthropic releases Claude Opus 4.8 with Dynamic Workflows for multi-agent tasks
Anthropic released Claude Opus 4.8 on Thursday, its fastest upgrade cycle at 41 days since the previous Opus 4.7. The model includes a new Dynamic Workflows feature designed to manage complex tasks across hundreds of parallel subagents, with pricing unchanged from previous Opus releases.
Google launches Universal Cart, an AI agent that shops across multiple retailers in one checkout
Google announced Universal Cart at its I/O developer conference, an AI-powered shopping system that consolidates purchases from multiple retailers including Target, Shopify, Wayfair, and Etsy into a single checkout. The feature uses Gemini's agentic AI to verify product compatibility, suggest better deals, and automate routine purchases.
Google I/O 2026 announces Gemini Omni model and AI-powered search integration
Google's I/O 2026 developer conference centered entirely on AI announcements, including a new Gemini Omni model, expanded AI capabilities in Google Search, an agentic personal assistant called Spark, and the first Android XR glasses.
Google launches Antigravity 2.0 with desktop app, Go-based CLI, and SDK at $100/month
Google announced Antigravity 2.0 at I/O 2026, transforming its coding tool into a full developer platform with a revamped desktop app supporting multi-agent orchestration, a new Go-based CLI, and an SDK for custom agents. The company introduced a $100/month AI Ultra tier and confirmed Gemini CLI will shut down for consumers on June 18, 2026.
Amazon Bedrock adds programmatic tool calling to reduce latency and token usage in multi-step workflows
Amazon Bedrock now supports programmatic tool calling (PTC), a technique that allows LLMs to generate Python code for multi-step tool orchestration rather than making sequential API calls. AWS offers three implementation paths: self-hosted Docker sandboxes on ECS, managed execution via Amazon Bedrock AgentCore Code Interpreter, and Anthropic SDK-compatible proxy integration.
Tencent Releases Hy3 Preview: Mixture-of-Experts Model with 262K Context and Configurable Reasoning
Tencent has released Hy3 preview, a Mixture-of-Experts model with a 262,144 token context window priced at $0.066 per million input tokens and $0.26 per million output tokens. The model features three configurable reasoning modes—disabled, low, and high—designed for agentic workflows and production environments.
GitHub Reduces Token Usage in Copilot Agentic Workflows Running on Pull Requests
GitHub has optimized token usage in its production agentic workflows that run on every pull request. The company instrumented its own Copilot workflows to identify inefficiencies and built agents to address them, aiming to reduce accumulated API costs.
GitHub introduces dominatory analysis method for validating AI coding agents
GitHub has published a research approach for validating AI coding agents when traditional correctness testing breaks down. The company proposes dominatory analysis as an alternative to brittle scripts and black-box LLM judges for building what it calls a 'Trust Layer' for GitHub Copilot Coding Agents.
GitHub develops dominance analysis method to validate AI coding agent outputs without deterministic correctness
GitHub has published research on validating agentic AI behavior when there's no single "correct" answer. The company proposes dominance analysis as an alternative to brittle scripts or opaque LLM-as-judge approaches for building a trust layer in GitHub Copilot coding agents.
Perplexity's Mac-Native 'Personal Computer' Platform Claims $2.8B in Labor-Equivalent Work
Perplexity CEO Aravind Srinivas revealed that the company's Mac-native Personal Computer platform has performed more than $2.8B in labor-equivalent work for Pro, Max, and Enterprise subscribers since launch. The announcement follows Apple CFO Kevan Parekh citing Perplexity as an example of developers building enterprise-grade AI assistants on Mac during Apple's Q2 2026 earnings call.
Microsoft reports 20M paid Copilot users, weekly engagement now matches Outlook
Microsoft CEO Satya Nadella disclosed that M365 Copilot has reached 20 million paid enterprise seats during the company's quarterly earnings call. Weekly engagement now matches Outlook usage levels, with queries per user up 20% quarter-over-quarter.
OpenAI discontinues separate Codex line, merges coding capabilities into GPT-5.5
OpenAI will not release a separate GPT-5.5-Codex model, according to Romain Huet. The company unified its Codex coding model with the main GPT line starting with GPT-5.4, with GPT-5.5 featuring enhanced agentic coding and computer use capabilities.
Microsoft pushes agentic Copilot into Word, Excel, PowerPoint with direct document editing
Microsoft has pushed agentic Copilot features into general availability across Word, Excel, and PowerPoint. The AI assistant can now make direct edits to documents, spreadsheets, and presentations rather than just suggesting changes from a sidebar.
Xiaomi Launches MiMo-V2.5 With 1M Context Window at $0.40 per Million Input Tokens
Xiaomi released MiMo-V2.5 on April 22, 2026, a native omnimodal model with a 1,048,576 token context window. The model is priced at $0.40 per million input tokens and $2 per million output tokens, positioning it as a cost-efficient alternative for agentic applications requiring multimodal perception across image and video understanding.
Open-weight models closing gap with frontier AI, but struggle looms in specialized domains
Open-weight AI models are narrowing the performance gap with closed frontier models in current benchmarks focused on coding and terminal tasks, but industry analysts predict they'll struggle to keep pace as the field shifts toward specialized knowledge work in accounting, law, and healthcare. The gap reduction masks a more complex dynamic where benchmark correlation with real-world performance is weakening.
Roblox Assistant adds multi-step planning mode and AI-driven playtesting to automate game development
Roblox is deploying agentic features to its Assistant tool that plan, build, and test games through multi-step workflows. The enhanced Planning Mode analyzes code, asks clarifying questions, and creates editable action plans before implementation, while new AI-driven playtesting tools automatically identify and fix bugs.
Adobe launches Firefly AI Assistant that orchestrates tasks across Creative Cloud apps
Adobe is launching Firefly AI Assistant in public beta within the coming weeks, evolving from its October 2024 "Project Moonlight" preview. The assistant orchestrates workflows across Creative Cloud applications including Photoshop, Premiere, Lightroom, Illustrator, and Express, allowing users to control outputs through text prompts, buttons, and sliders.
Google AI Mode gets redesigned interface as restaurant booking expands to 8 new countries
Google has redesigned AI Mode's prompt interface with a bottom sheet layout on mobile and expanded its agentic restaurant booking feature to 8 new markets including the UK, Canada, and Australia. The update rolls out to stable channel on Android and iOS.
GLM-5.1 released: 754B agentic model outperforms Claude on coding benchmarks
Zhipu AI released GLM-5.1, a 754-parameter model optimized for agentic engineering tasks. The model scores 58.4% on SWE-Bench Pro, outperforming Claude 3.5 Sonnet (57.3%), and demonstrates sustained reasoning capability over hundreds of iterations.
GLM-5.1 achieves 58.4% on SWE-Bench Pro with sustained agentic reasoning over hundreds of iterations
Zhipu AI has released GLM-5.1, a 754-billion parameter model designed for agentic engineering with significantly improved coding capabilities over its predecessor. The model achieves 58.4% on SWE-Bench Pro and demonstrates sustained performance improvement over hundreds of tool calls and iterations, unlike earlier models that plateau quickly.
NVIDIA Optimizes Google Gemma 4 for Local Agentic AI on RTX and Spark
NVIDIA has optimized Google's Gemma 4 models for local deployment on RTX and Spark platforms, targeting the emerging wave of on-device agentic AI. The optimization enables small, efficient models to access real-time local context for autonomous decision-making without cloud dependency.
Alibaba releases Qwen3.6-Plus with 1M token context, claims performance near Claude 4.5 Opus
Alibaba has released Qwen3.6-Plus, its third proprietary AI model in days, featuring a 1 million token context window available via Alibaba Cloud Model Studio API. The model claims improved agentic coding capabilities and partially outperforms Anthropic's Claude 4.5 Opus in Alibaba-conducted benchmarks, though trails Claude 4.6 Opus released in December 2025.
GitHub's Copilot team uses AI agents to automate development work
GitHub's Applied Science team deployed coding agents to automate parts of their own development workflow, testing how AI agents can handle increasingly complex programming tasks. The experiment reveals practical insights into agent-driven development patterns and limitations.
Alibaba releases Qwen 3.6 Plus Preview with 1M token context, free via OpenRouter
Alibaba's Qwen division has released Qwen 3.6 Plus Preview, a free multimodal model available via OpenRouter with a 1,000,000 token context window. The model claims stronger reasoning and more reliable agentic behavior compared to the 3.5 series, with particular strength in coding and complex problem-solving tasks.
AWS launches agentic AI movie assistant using Nova Sonic 2.0 and Bedrock AgentCore
Amazon Web Services unveiled an agentic AI system for streaming platforms combining Nova Sonic 2.0 (real-time speech model), Bedrock AgentCore, and the Model Context Protocol. The system delivers two core capabilities: context-aware movie recommendations based on mood and viewing history, and real-time scene analysis including actor identification and plot summaries.
Anthropic's unreleased Mythos model enables autonomous large-scale cyberattacks, officials warn
Anthropic is privately warning top government officials that its unreleased model "Mythos" makes large-scale cyberattacks significantly more likely in 2026. The model enables AI agents to operate autonomously with high sophistication to penetrate corporate, government and municipal systems. One official told Axios a large-scale attack could occur this year as employees unknowingly create security vulnerabilities through unsupervised agentic AI use.
ARC-AGI-3 benchmark: frontier AI models score below 1%, humans solve all 135 tasks
The ARC Prize Foundation released ARC-AGI-3, an interactive benchmark requiring AI agents to explore environments, form hypotheses, and execute plans without instructions. All 135 environments were solved by untrained humans, yet frontier models—including Gemini 3.1 Pro Preview (0.37%), GPT 5.4 (0.26%), Opus 4.6 (0.25%), and Grok-4.20 (0.00%)—scored below 1%.
NVIDIA Nemotron 3 Super now available on Amazon Bedrock with 256K context window
NVIDIA Nemotron 3 Super, a hybrid Mixture of Experts model with 120B parameters and 12B active parameters, is now available as a fully managed model on Amazon Bedrock. The model supports up to 256K token context length and claims 5x higher throughput efficiency over the previous Nemotron Super and 2x higher accuracy on reasoning tasks.
OpenAI consolidating ChatGPT, Codex, and Atlas into single macOS superapp
OpenAI is consolidating its fragmented macOS app ecosystem by merging ChatGPT, Codex coding platform, and Atlas browser into a single "superapp" led by Chief of Applications Fidji Simo. The unified app will feature agentic AI capabilities for autonomous task execution and team collaboration, with rollout expected over coming months starting with Codex enhancements.
Perplexity's Comet AI browser launches free iOS app after $200/month PC debut
Perplexity has released Comet, its AI-powered browser, as a free standalone app for iPhone users. Originally launched on PC at $200 per month, the iOS version joins recently-released Android and existing Windows and Mac versions. The browser combines web browsing with AI assistance for summarization, research, and task automation.
Amazon Nova 2 Lite surpasses Nova 1 Pro with 1M token context and extended thinking at 7x lower cost
Amazon Nova 2 Lite expands context window to 1 million tokens, introduces extended thinking with developer controls, and adds native tool use and web grounding. AWS claims Nova 2 Lite surpasses Nova 1 Pro on multi-step reasoning while costing 7x less and running up to 5x faster.
GitHub Copilot SDK shifts AI from text prompts to executable agent workflows
GitHub has released the Copilot SDK, positioning executable agent workflows as the successor to prompt-based AI interactions. The SDK enables developers to integrate agentic AI capabilities directly into applications rather than relying on text-based prompt-response patterns.
GitHub shifts Copilot from text prompts to programmable execution with new SDK
GitHub is positioning AI interaction as a shift from prompt-response text interfaces to programmable execution models. The company announced a GitHub Copilot SDK that enables agentic workflows to run directly within applications, marking a transition toward AI systems that take concrete actions rather than generate text responses.
OpenAI's GPT-5.4 now generally available in GitHub Copilot
OpenAI's GPT-5.4, an agentic coding model, is now generally available in GitHub Copilot. The model was tested on real-world software development scenarios and demonstrated improved coding capabilities.
OpenAI launches GPT-5.4 with native computer use capabilities for autonomous agents
OpenAI has launched GPT-5.4, its latest model with native computer use capabilities that allow it to operate computers and complete tasks across applications. The release represents a step toward autonomous AI agents that can handle complex jobs independently. The model includes advancements in reasoning, coding, and professional work with spreadsheets, documents, and presentations.
AIG deploys agentic AI system with orchestration layer for underwriting
American International Group (AIG) has deployed an agentic AI system with an orchestration layer, reporting faster-than-expected productivity gains in underwriting and portfolio management. The deployment demonstrates measurable improvements in throughput and workflow efficiency, according to recent investor disclosures.