agentic-ai

44 articles tagged with agentic-ai

June 21, 2026

model releasePoolside

Poolside releases Laguna M.1: 225B parameter MoE model scores 74.6% on SWE-bench Verified

Poolside has released Laguna M.1, a 225B total parameter Mixture-of-Experts model with 23B activated parameters per token, designed for agentic coding tasks. The model scores 74.6% on SWE-bench Verified and 63.1% on SWE-bench Multilingual, released under Apache 2.0 license.

June 21, 2026 · 7:36 AM

June 18, 2026

product updateMistral AI

Mistral AI Launches Forge for Enterprise Model Training on Proprietary Data

Mistral AI has launched Forge, a platform that allows enterprises to train custom AI models on their proprietary data including codebases, compliance policies, and operational documentation. The system supports both dense and mixture-of-experts architectures with pre-training, post-training, and reinforcement learning capabilities.

June 18, 2026 · 9:06 AM

June 17, 2026

model releaseCohere

Cohere releases North Mini Code, a 30B-parameter sparse MoE coding model with 256K context window, free on OpenRouter

Cohere has released North Mini Code, the first model in its North family and its first agentic coding model. The sparse mixture-of-experts architecture features 30B total parameters with 3B active, a 256K-token context window, and up to 64K tokens of output, available free via OpenRouter under Apache 2.0 license.

June 17, 2026 · 10:05 PM

June 8, 2026

model releaseNex Agi

Nex AGI Releases Nex-N2-Pro: 17B Active Parameter MoE Model with 262K Context Window

Nex AGI has released Nex-N2-Pro, a mixture-of-experts model with 17 billion active parameters from a total of 397 billion parameters. Built on the Qwen3.5 architecture, the model offers a 262,144 token context window and is available for free through OpenRouter.

June 8, 2026 · 6:20 PM

June 4, 2026

model releaseNVIDIA

Nvidia Releases Nemotron 3 Ultra: 550B Parameter MoE Model with 1M Token Context Window

Nvidia has released Nemotron 3 Ultra, a 550B parameter mixture-of-experts model with 55B active parameters and a 1M token context window. The model uses a hybrid Transformer-Mamba architecture and is available for free through OpenRouter, targeting agentic workflows and multi-step reasoning tasks.

June 4, 2026 · 1:50 PM

June 2, 2026

product update

Perplexity Computer adds hybrid inference to split tasks between local and cloud models

Perplexity announced that its Computer agentic system will gain hybrid inference in July 2026, automatically splitting tasks between local models for sensitive data and cloud-based frontier models for complex operations. The feature aims to balance privacy with computational power without requiring manual model selection.

June 2, 2026 · 6:06 PM

May 28, 2026

model releaseAnthropic

Anthropic's Claude Opus 4.8 launches on AWS Bedrock in four regions

Anthropic's Claude Opus 4.8 is now available on Amazon Bedrock and Claude Platform on AWS. The model is designed for autonomous multi-stage tasks, agentic coding, and long-running workflows with reduced supervision.

May 28, 2026 · 6:05 PM

model releaseAnthropic

Anthropic releases Claude Opus 4.8 with 69.2% agentic coding score, 2.5x faster performance

Anthropic released Claude Opus 4.8 on May 28, 2026, six weeks after version 4.7. The model achieves 69.2% on agentic coding benchmarks (up from 64.3%), runs 2.5 times faster in fast mode at one-third the cost, while maintaining the same pricing as version 4.7.

May 28, 2026 · 5:21 PM

model releaseAnthropic

Anthropic releases Claude Opus 4.8 with Dynamic Workflows for multi-agent tasks

Anthropic released Claude Opus 4.8 on Thursday, its fastest upgrade cycle at 41 days since the previous Opus 4.7. The model includes a new Dynamic Workflows feature designed to manage complex tasks across hundreds of parallel subagents, with pricing unchanged from previous Opus releases.

May 28, 2026 · 5:05 PM

May 20, 2026

product update

Google launches Universal Cart, an AI agent that shops across multiple retailers in one checkout

Google announced Universal Cart at its I/O developer conference, an AI-powered shopping system that consolidates purchases from multiple retailers including Target, Shopify, Wayfair, and Etsy into a single checkout. The feature uses Gemini's agentic AI to verify product compatibility, suggest better deals, and automate routine purchases.

May 20, 2026 · 4:05 PM

analysis

Google I/O 2026 announces Gemini Omni model and AI-powered search integration

Google's I/O 2026 developer conference centered entirely on AI announcements, including a new Gemini Omni model, expanded AI capabilities in Google Search, an agentic personal assistant called Spark, and the first Android XR glasses.

May 20, 2026 · 12:35 PM

May 19, 2026

product update

Google launches Antigravity 2.0 with desktop app, Go-based CLI, and SDK at $100/month

Google announced Antigravity 2.0 at I/O 2026, transforming its coding tool into a full developer platform with a revamped desktop app supporting multi-agent orchestration, a new Go-based CLI, and an SDK for custom agents. The company introduced a $100/month AI Ultra tier and confirmed Gemini CLI will shut down for consumers on June 18, 2026.

May 19, 2026 · 9:05 PM

product updateAmazon Web Services

Amazon Bedrock adds programmatic tool calling to reduce latency and token usage in multi-step workflows

Amazon Bedrock now supports programmatic tool calling (PTC), a technique that allows LLMs to generate Python code for multi-step tool orchestration rather than making sequential API calls. AWS offers three implementation paths: self-hosted Docker sandboxes on ECS, managed execution via Amazon Bedrock AgentCore Code Interpreter, and Anthropic SDK-compatible proxy integration.

May 19, 2026 · 3:35 PM

May 8, 2026

model releaseTencent

Tencent Releases Hy3 Preview: Mixture-of-Experts Model with 262K Context and Configurable Reasoning

Tencent has released Hy3 preview, a Mixture-of-Experts model with a 262,144 token context window priced at $0.066 per million input tokens and $0.26 per million output tokens. The model features three configurable reasoning modes—disabled, low, and high—designed for agentic workflows and production environments.

May 8, 2026 · 11:05 PM

May 7, 2026

product updateGitHub

GitHub Reduces Token Usage in Copilot Agentic Workflows Running on Pull Requests

GitHub has optimized token usage in its production agentic workflows that run on every pull request. The company instrumented its own Copilot workflows to identify inefficiencies and built agents to address them, aiming to reduce accumulated API costs.

May 7, 2026 · 11:06 PM

May 6, 2026

researchGitHub

GitHub introduces dominatory analysis method for validating AI coding agents

GitHub has published a research approach for validating AI coding agents when traditional correctness testing breaks down. The company proposes dominatory analysis as an alternative to brittle scripts and black-box LLM judges for building what it calls a 'Trust Layer' for GitHub Copilot Coding Agents.

May 6, 2026 · 9:36 PM

researchGitHub

GitHub develops dominance analysis method to validate AI coding agent outputs without deterministic correctness

GitHub has published research on validating agentic AI behavior when there's no single "correct" answer. The company proposes dominance analysis as an alternative to brittle scripts or opaque LLM-as-judge approaches for building a trust layer in GitHub Copilot coding agents.

May 6, 2026 · 9:35 PM

May 1, 2026

product update

Perplexity's Mac-Native 'Personal Computer' Platform Claims $2.8B in Labor-Equivalent Work

Perplexity CEO Aravind Srinivas revealed that the company's Mac-native Personal Computer platform has performed more than $2.8B in labor-equivalent work for Pro, Max, and Enterprise subscribers since launch. The announcement follows Apple CFO Kevan Parekh citing Perplexity as an example of developers building enterprise-grade AI assistants on Mac during Apple's Q2 2026 earnings call.

May 1, 2026 · 8:50 PM

April 29, 2026

product updateMicrosoft

Microsoft reports 20M paid Copilot users, weekly engagement now matches Outlook

Microsoft CEO Satya Nadella disclosed that M365 Copilot has reached 20 million paid enterprise seats during the company's quarterly earnings call. Weekly engagement now matches Outlook usage levels, with queries per user up 20% quarter-over-quarter.

April 29, 2026 · 11:20 PM

April 25, 2026

changelogOpenAI

OpenAI discontinues separate Codex line, merges coding capabilities into GPT-5.5

OpenAI will not release a separate GPT-5.5-Codex model, according to Romain Huet. The company unified its Codex coding model with the main GPT line starting with GPT-5.4, with GPT-5.5 featuring enhanced agentic coding and computer use capabilities.

April 25, 2026 · 12:20 PM

April 23, 2026

product updateMicrosoft

Microsoft pushes agentic Copilot into Word, Excel, PowerPoint with direct document editing

Microsoft has pushed agentic Copilot features into general availability across Word, Excel, and PowerPoint. The AI assistant can now make direct edits to documents, spreadsheets, and presentations rather than just suggesting changes from a sidebar.

April 23, 2026 · 4:05 PM

April 22, 2026

model releaseXiaomi+1

Xiaomi Launches MiMo-V2.5 With 1M Context Window at $0.40 per Million Input Tokens

Xiaomi released MiMo-V2.5 on April 22, 2026, a native omnimodal model with a 1,048,576 token context window. The model is priced at $0.40 per million input tokens and $2 per million output tokens, positioning it as a cost-efficient alternative for agentic applications requiring multimodal perception across image and video understanding.

April 22, 2026 · 4:36 PM

April 20, 2026

analysis

Open-weight models closing gap with frontier AI, but struggle looms in specialized domains

Open-weight AI models are narrowing the performance gap with closed frontier models in current benchmarks focused on coding and terminal tasks, but industry analysts predict they'll struggle to keep pace as the field shifts toward specialized knowledge work in accounting, law, and healthcare. The gap reduction masks a more complex dynamic where benchmark correlation with real-world performance is weakening.

April 20, 2026 · 6:36 PM

April 16, 2026

product update

Roblox Assistant adds multi-step planning mode and AI-driven playtesting to automate game development

Roblox is deploying agentic features to its Assistant tool that plan, build, and test games through multi-step workflows. The enhanced Planning Mode analyzes code, asks clarifying questions, and creates editable action plans before implementation, while new AI-driven playtesting tools automatically identify and fix bugs.

April 16, 2026 · 4:06 PM

April 15, 2026

product update

Adobe launches Firefly AI Assistant that orchestrates tasks across Creative Cloud apps

Adobe is launching Firefly AI Assistant in public beta within the coming weeks, evolving from its October 2024 "Project Moonlight" preview. The assistant orchestrates workflows across Creative Cloud applications including Photoshop, Premiere, Lightroom, Illustrator, and Express, allowing users to control outputs through text prompts, buttons, and sliders.

April 15, 2026 · 1:20 PM

April 10, 2026

product update

Google AI Mode gets redesigned interface as restaurant booking expands to 8 new countries

Google has redesigned AI Mode's prompt interface with a bottom sheet layout on mobile and expanded its agentic restaurant booking feature to 8 new markets including the UK, Canada, and Australia. The update rolls out to stable channel on Android and iOS.

April 10, 2026 · 5:20 PM

April 9, 2026

model releaseZhipu AI

GLM-5.1 released: 754B agentic model outperforms Claude on coding benchmarks

Zhipu AI released GLM-5.1, a 754-parameter model optimized for agentic engineering tasks. The model scores 58.4% on SWE-Bench Pro, outperforming Claude 3.5 Sonnet (57.3%), and demonstrates sustained reasoning capability over hundreds of iterations.

April 9, 2026 · 6:50 PM

April 7, 2026

model release

GLM-5.1 achieves 58.4% on SWE-Bench Pro with sustained agentic reasoning over hundreds of iterations

Zhipu AI has released GLM-5.1, a 754-billion parameter model designed for agentic engineering with significantly improved coding capabilities over its predecessor. The model achieves 58.4% on SWE-Bench Pro and demonstrates sustained performance improvement over hundreds of tool calls and iterations, unlike earlier models that plateau quickly.

April 7, 2026 · 5:51 PM

April 2, 2026

model releaseNVIDIA

NVIDIA Optimizes Google Gemma 4 for Local Agentic AI on RTX and Spark

NVIDIA has optimized Google's Gemma 4 models for local deployment on RTX and Spark platforms, targeting the emerging wave of on-device agentic AI. The optimization enables small, efficient models to access real-time local context for autonomous decision-making without cloud dependency.

April 2, 2026 · 4:35 PM

model release

Alibaba releases Qwen3.6-Plus with 1M token context, claims performance near Claude 4.5 Opus

Alibaba has released Qwen3.6-Plus, its third proprietary AI model in days, featuring a 1 million token context window available via Alibaba Cloud Model Studio API. The model claims improved agentic coding capabilities and partially outperforms Anthropic's Claude 4.5 Opus in Alibaba-conducted benchmarks, though trails Claude 4.6 Opus released in December 2025.

April 2, 2026 · 3:05 PM

March 31, 2026

product updateGitHub

GitHub's Copilot team uses AI agents to automate development work

GitHub's Applied Science team deployed coding agents to automate parts of their own development workflow, testing how AI agents can handle increasingly complex programming tasks. The experiment reveals practical insights into agent-driven development patterns and limitations.

March 31, 2026 · 4:05 PM

March 30, 2026

model release

Alibaba releases Qwen 3.6 Plus Preview with 1M token context, free via OpenRouter

Alibaba's Qwen division has released Qwen 3.6 Plus Preview, a free multimodal model available via OpenRouter with a 1,000,000 token context window. The model claims stronger reasoning and more reliable agentic behavior compared to the 3.5 series, with particular strength in coding and complex problem-solving tasks.

March 30, 2026 · 6:50 PM

product updateAmazon Web Services

AWS launches agentic AI movie assistant using Nova Sonic 2.0 and Bedrock AgentCore

Amazon Web Services unveiled an agentic AI system for streaming platforms combining Nova Sonic 2.0 (real-time speech model), Bedrock AgentCore, and the Model Context Protocol. The system delivers two core capabilities: context-aware movie recommendations based on mood and viewing history, and real-time scene analysis including actor identification and plot summaries.

March 30, 2026 · 3:35 PM

March 29, 2026

model releaseAnthropic+1

Anthropic's unreleased Mythos model enables autonomous large-scale cyberattacks, officials warn

Anthropic is privately warning top government officials that its unreleased model "Mythos" makes large-scale cyberattacks significantly more likely in 2026. The model enables AI agents to operate autonomously with high sophistication to penetrate corporate, government and municipal systems. One official told Axios a large-scale attack could occur this year as employees unknowingly create security vulnerabilities through unsupervised agentic AI use.

March 29, 2026 · 1:05 PM

March 26, 2026

benchmarkOpenAI

ARC-AGI-3 benchmark: frontier AI models score below 1%, humans solve all 135 tasks

The ARC Prize Foundation released ARC-AGI-3, an interactive benchmark requiring AI agents to explore environments, form hypotheses, and execute plans without instructions. All 135 environments were solved by untrained humans, yet frontier models—including Gemini 3.1 Pro Preview (0.37%), GPT 5.4 (0.26%), Opus 4.6 (0.25%), and Grok-4.20 (0.00%)—scored below 1%.

March 26, 2026 · 12:05 PM

March 23, 2026

product updateNVIDIA

NVIDIA Nemotron 3 Super now available on Amazon Bedrock with 256K context window

NVIDIA Nemotron 3 Super, a hybrid Mixture of Experts model with 120B parameters and 12B active parameters, is now available as a fully managed model on Amazon Bedrock. The model supports up to 256K token context length and claims 5x higher throughput efficiency over the previous Nemotron Super and 2x higher accuracy on reasoning tasks.

March 23, 2026 · 3:23 PM

March 20, 2026

product updateOpenAI

OpenAI consolidating ChatGPT, Codex, and Atlas into single macOS superapp

OpenAI is consolidating its fragmented macOS app ecosystem by merging ChatGPT, Codex coding platform, and Atlas browser into a single "superapp" led by Chief of Applications Fidji Simo. The unified app will feature agentic AI capabilities for autonomous task execution and team collaboration, with rollout expected over coming months starting with Codex enhancements.

March 20, 2026 · 2:06 PM

March 18, 2026

product update

Perplexity's Comet AI browser launches free iOS app after $200/month PC debut

Perplexity has released Comet, its AI-powered browser, as a free standalone app for iPhone users. Originally launched on PC at $200 per month, the iOS version joins recently-released Android and existing Windows and Mac versions. The browser combines web browsing with AI assistance for summarization, research, and task automation.

March 18, 2026 · 6:50 PM

product updateAmazon Web Services

Amazon Nova 2 Lite surpasses Nova 1 Pro with 1M token context and extended thinking at 7x lower cost

Amazon Nova 2 Lite expands context window to 1 million tokens, introduces extended thinking with developer controls, and adds native tool use and web grounding. AWS claims Nova 2 Lite surpasses Nova 1 Pro on multi-step reasoning while costing 7x less and running up to 5x faster.

March 18, 2026 · 3:20 PM

March 10, 2026

product updateGitHub

GitHub Copilot SDK shifts AI from text prompts to executable agent workflows

GitHub has released the Copilot SDK, positioning executable agent workflows as the successor to prompt-based AI interactions. The SDK enables developers to integrate agentic AI capabilities directly into applications rather than relying on text-based prompt-response patterns.

March 10, 2026 · 8:36 PM

product updateGitHub

GitHub shifts Copilot from text prompts to programmable execution with new SDK

GitHub is positioning AI interaction as a shift from prompt-response text interfaces to programmable execution models. The company announced a GitHub Copilot SDK that enables agentic workflows to run directly within applications, marking a transition toward AI systems that take concrete actions rather than generate text responses.

March 10, 2026 · 8:35 PM

March 5, 2026

model releaseOpenAI

OpenAI's GPT-5.4 now generally available in GitHub Copilot

OpenAI's GPT-5.4, an agentic coding model, is now generally available in GitHub Copilot. The model was tested on real-world software development scenarios and demonstrated improved coding capabilities.

March 5, 2026 · 11:50 PM

model releaseOpenAI

OpenAI launches GPT-5.4 with native computer use capabilities for autonomous agents

OpenAI has launched GPT-5.4, its latest model with native computer use capabilities that allow it to operate computers and complete tasks across applications. The release represents a step toward autonomous AI agents that can handle complex jobs independently. The model includes advancements in reasoning, coding, and professional work with spreadsheets, documents, and presentations.

March 5, 2026 · 6:06 PM

February 20, 2026

product update

AIG deploys agentic AI system with orchestration layer for underwriting

American International Group (AIG) has deployed an agentic AI system with an orchestration layer, reporting faster-than-expected productivity gains in underwriting and portfolio management. The deployment demonstrates measurable improvements in throughput and workflow efficiency, according to recent investor disclosures.

February 20, 2026 · 4:37 AM

← Back to all news