agentic-ai

31 articles tagged with agentic-ai

May 8, 2026
model releaseTencent

Tencent Releases Hy3 Preview: Mixture-of-Experts Model with 262K Context and Configurable Reasoning

Tencent has released Hy3 preview, a Mixture-of-Experts model with a 262,144 token context window priced at $0.066 per million input tokens and $0.26 per million output tokens. The model features three configurable reasoning modes—disabled, low, and high—designed for agentic workflows and production environments.

May 7, 2026
product updateGitHub

GitHub Reduces Token Usage in Copilot Agentic Workflows Running on Pull Requests

GitHub has optimized token usage in its production agentic workflows that run on every pull request. The company instrumented its own Copilot workflows to identify inefficiencies and built agents to address them, aiming to reduce accumulated API costs.

May 6, 2026
researchGitHub

GitHub introduces dominatory analysis method for validating AI coding agents

GitHub has published a research approach for validating AI coding agents when traditional correctness testing breaks down. The company proposes dominatory analysis as an alternative to brittle scripts and black-box LLM judges for building what it calls a 'Trust Layer' for GitHub Copilot Coding Agents.

researchGitHub

GitHub develops dominance analysis method to validate AI coding agent outputs without deterministic correctness

GitHub has published research on validating agentic AI behavior when there's no single "correct" answer. The company proposes dominance analysis as an alternative to brittle scripts or opaque LLM-as-judge approaches for building a trust layer in GitHub Copilot coding agents.

May 1, 2026
product update

Perplexity's Mac-Native 'Personal Computer' Platform Claims $2.8B in Labor-Equivalent Work

Perplexity CEO Aravind Srinivas revealed that the company's Mac-native Personal Computer platform has performed more than $2.8B in labor-equivalent work for Pro, Max, and Enterprise subscribers since launch. The announcement follows Apple CFO Kevan Parekh citing Perplexity as an example of developers building enterprise-grade AI assistants on Mac during Apple's Q2 2026 earnings call.

April 29, 2026
product updateMicrosoft

Microsoft reports 20M paid Copilot users, weekly engagement now matches Outlook

Microsoft CEO Satya Nadella disclosed that M365 Copilot has reached 20 million paid enterprise seats during the company's quarterly earnings call. Weekly engagement now matches Outlook usage levels, with queries per user up 20% quarter-over-quarter.

April 25, 2026
changelogOpenAI

OpenAI discontinues separate Codex line, merges coding capabilities into GPT-5.5

OpenAI will not release a separate GPT-5.5-Codex model, according to Romain Huet. The company unified its Codex coding model with the main GPT line starting with GPT-5.4, with GPT-5.5 featuring enhanced agentic coding and computer use capabilities.

April 23, 2026
product updateMicrosoft

Microsoft pushes agentic Copilot into Word, Excel, PowerPoint with direct document editing

Microsoft has pushed agentic Copilot features into general availability across Word, Excel, and PowerPoint. The AI assistant can now make direct edits to documents, spreadsheets, and presentations rather than just suggesting changes from a sidebar.

April 22, 2026
model releaseXiaomi+1

Xiaomi Launches MiMo-V2.5 With 1M Context Window at $0.40 per Million Input Tokens

Xiaomi released MiMo-V2.5 on April 22, 2026, a native omnimodal model with a 1,048,576 token context window. The model is priced at $0.40 per million input tokens and $2 per million output tokens, positioning it as a cost-efficient alternative for agentic applications requiring multimodal perception across image and video understanding.

April 20, 2026
analysis

Open-weight models closing gap with frontier AI, but struggle looms in specialized domains

Open-weight AI models are narrowing the performance gap with closed frontier models in current benchmarks focused on coding and terminal tasks, but industry analysts predict they'll struggle to keep pace as the field shifts toward specialized knowledge work in accounting, law, and healthcare. The gap reduction masks a more complex dynamic where benchmark correlation with real-world performance is weakening.

April 16, 2026
product update

Roblox Assistant adds multi-step planning mode and AI-driven playtesting to automate game development

Roblox is deploying agentic features to its Assistant tool that plan, build, and test games through multi-step workflows. The enhanced Planning Mode analyzes code, asks clarifying questions, and creates editable action plans before implementation, while new AI-driven playtesting tools automatically identify and fix bugs.

April 15, 2026
product update

Adobe launches Firefly AI Assistant that orchestrates tasks across Creative Cloud apps

Adobe is launching Firefly AI Assistant in public beta within the coming weeks, evolving from its October 2024 "Project Moonlight" preview. The assistant orchestrates workflows across Creative Cloud applications including Photoshop, Premiere, Lightroom, Illustrator, and Express, allowing users to control outputs through text prompts, buttons, and sliders.

April 10, 2026
product update

Google AI Mode gets redesigned interface as restaurant booking expands to 8 new countries

Google has redesigned AI Mode's prompt interface with a bottom sheet layout on mobile and expanded its agentic restaurant booking feature to 8 new markets including the UK, Canada, and Australia. The update rolls out to stable channel on Android and iOS.

April 9, 2026
model releaseZhipu AI

GLM-5.1 released: 754B agentic model outperforms Claude on coding benchmarks

Zhipu AI released GLM-5.1, a 754-parameter model optimized for agentic engineering tasks. The model scores 58.4% on SWE-Bench Pro, outperforming Claude 3.5 Sonnet (57.3%), and demonstrates sustained reasoning capability over hundreds of iterations.

April 7, 2026
model release

GLM-5.1 achieves 58.4% on SWE-Bench Pro with sustained agentic reasoning over hundreds of iterations

Zhipu AI has released GLM-5.1, a 754-billion parameter model designed for agentic engineering with significantly improved coding capabilities over its predecessor. The model achieves 58.4% on SWE-Bench Pro and demonstrates sustained performance improvement over hundreds of tool calls and iterations, unlike earlier models that plateau quickly.

April 2, 2026
model releaseNVIDIA

NVIDIA Optimizes Google Gemma 4 for Local Agentic AI on RTX and Spark

NVIDIA has optimized Google's Gemma 4 models for local deployment on RTX and Spark platforms, targeting the emerging wave of on-device agentic AI. The optimization enables small, efficient models to access real-time local context for autonomous decision-making without cloud dependency.

model release

Alibaba releases Qwen3.6-Plus with 1M token context, claims performance near Claude 4.5 Opus

Alibaba has released Qwen3.6-Plus, its third proprietary AI model in days, featuring a 1 million token context window available via Alibaba Cloud Model Studio API. The model claims improved agentic coding capabilities and partially outperforms Anthropic's Claude 4.5 Opus in Alibaba-conducted benchmarks, though trails Claude 4.6 Opus released in December 2025.

March 31, 2026
product updateGitHub

GitHub's Copilot team uses AI agents to automate development work

GitHub's Applied Science team deployed coding agents to automate parts of their own development workflow, testing how AI agents can handle increasingly complex programming tasks. The experiment reveals practical insights into agent-driven development patterns and limitations.

March 30, 2026
model release

Alibaba releases Qwen 3.6 Plus Preview with 1M token context, free via OpenRouter

Alibaba's Qwen division has released Qwen 3.6 Plus Preview, a free multimodal model available via OpenRouter with a 1,000,000 token context window. The model claims stronger reasoning and more reliable agentic behavior compared to the 3.5 series, with particular strength in coding and complex problem-solving tasks.

product updateAmazon Web Services

AWS launches agentic AI movie assistant using Nova Sonic 2.0 and Bedrock AgentCore

Amazon Web Services unveiled an agentic AI system for streaming platforms combining Nova Sonic 2.0 (real-time speech model), Bedrock AgentCore, and the Model Context Protocol. The system delivers two core capabilities: context-aware movie recommendations based on mood and viewing history, and real-time scene analysis including actor identification and plot summaries.

March 29, 2026
model releaseAnthropic+1

Anthropic's unreleased Mythos model enables autonomous large-scale cyberattacks, officials warn

Anthropic is privately warning top government officials that its unreleased model "Mythos" makes large-scale cyberattacks significantly more likely in 2026. The model enables AI agents to operate autonomously with high sophistication to penetrate corporate, government and municipal systems. One official told Axios a large-scale attack could occur this year as employees unknowingly create security vulnerabilities through unsupervised agentic AI use.

March 26, 2026
benchmarkOpenAI

ARC-AGI-3 benchmark: frontier AI models score below 1%, humans solve all 135 tasks

The ARC Prize Foundation released ARC-AGI-3, an interactive benchmark requiring AI agents to explore environments, form hypotheses, and execute plans without instructions. All 135 environments were solved by untrained humans, yet frontier models—including Gemini 3.1 Pro Preview (0.37%), GPT 5.4 (0.26%), Opus 4.6 (0.25%), and Grok-4.20 (0.00%)—scored below 1%.

March 23, 2026
product updateNVIDIA

NVIDIA Nemotron 3 Super now available on Amazon Bedrock with 256K context window

NVIDIA Nemotron 3 Super, a hybrid Mixture of Experts model with 120B parameters and 12B active parameters, is now available as a fully managed model on Amazon Bedrock. The model supports up to 256K token context length and claims 5x higher throughput efficiency over the previous Nemotron Super and 2x higher accuracy on reasoning tasks.

March 20, 2026
product updateOpenAI

OpenAI consolidating ChatGPT, Codex, and Atlas into single macOS superapp

OpenAI is consolidating its fragmented macOS app ecosystem by merging ChatGPT, Codex coding platform, and Atlas browser into a single "superapp" led by Chief of Applications Fidji Simo. The unified app will feature agentic AI capabilities for autonomous task execution and team collaboration, with rollout expected over coming months starting with Codex enhancements.

March 18, 2026
product update

Perplexity's Comet AI browser launches free iOS app after $200/month PC debut

Perplexity has released Comet, its AI-powered browser, as a free standalone app for iPhone users. Originally launched on PC at $200 per month, the iOS version joins recently-released Android and existing Windows and Mac versions. The browser combines web browsing with AI assistance for summarization, research, and task automation.

product updateAmazon Web Services

Amazon Nova 2 Lite surpasses Nova 1 Pro with 1M token context and extended thinking at 7x lower cost

Amazon Nova 2 Lite expands context window to 1 million tokens, introduces extended thinking with developer controls, and adds native tool use and web grounding. AWS claims Nova 2 Lite surpasses Nova 1 Pro on multi-step reasoning while costing 7x less and running up to 5x faster.

March 10, 2026
product updateGitHub

GitHub Copilot SDK shifts AI from text prompts to executable agent workflows

GitHub has released the Copilot SDK, positioning executable agent workflows as the successor to prompt-based AI interactions. The SDK enables developers to integrate agentic AI capabilities directly into applications rather than relying on text-based prompt-response patterns.

product updateGitHub

GitHub shifts Copilot from text prompts to programmable execution with new SDK

GitHub is positioning AI interaction as a shift from prompt-response text interfaces to programmable execution models. The company announced a GitHub Copilot SDK that enables agentic workflows to run directly within applications, marking a transition toward AI systems that take concrete actions rather than generate text responses.

March 5, 2026
model releaseOpenAI

OpenAI's GPT-5.4 now generally available in GitHub Copilot

OpenAI's GPT-5.4, an agentic coding model, is now generally available in GitHub Copilot. The model was tested on real-world software development scenarios and demonstrated improved coding capabilities.

model releaseOpenAI

OpenAI launches GPT-5.4 with native computer use capabilities for autonomous agents

OpenAI has launched GPT-5.4, its latest model with native computer use capabilities that allow it to operate computers and complete tasks across applications. The release represents a step toward autonomous AI agents that can handle complex jobs independently. The model includes advancements in reasoning, coding, and professional work with spreadsheets, documents, and presentations.

February 20, 2026
product update

AIG deploys agentic AI system with orchestration layer for underwriting

American International Group (AIG) has deployed an agentic AI system with an orchestration layer, reporting faster-than-expected productivity gains in underwriting and portfolio management. The deployment demonstrates measurable improvements in throughput and workflow efficiency, according to recent investor disclosures.