Breaking

Nvidia claims 291 MLPerf wins with 288-GPU setup; AMD MI355X crosses 1M tokens/sec

MLCommons published MLPerf Inference v6.0 results on April 1, 2026, with Nvidia, AMD, and Intel each claiming top spots in different configurations. Nvidia's 288-GPU GB300-NVL72 system achieved 2.49 million tokens per second on DeepSeek-R1, while AMD's MI355X crossed one million tokens per second for the first time. Direct comparisons remain difficult as each chipmaker targets different market segments and benchmarks.

April 2, 2026

Latest News

All news →
0
model release

Alibaba releases Qwen3.6-Plus with 1M token context, claims performance near Claude 4.5 Opus

Alibaba has released Qwen3.6-Plus, its third proprietary AI model in days, featuring a 1 million token context window available via Alibaba Cloud Model Studio API. The model claims improved agentic coding capabilities and partially outperforms Anthropic's Claude 4.5 Opus in Alibaba-conducted benchmarks, though trails Claude 4.6 Opus released in December 2025.

0
product updateAmazon Web Services

AWS Bedrock AgentCore adds persistent filesystem storage and shell command execution

Amazon Bedrock AgentCore Runtime now offers managed session storage to persist agent filesystem state across stop/resume cycles and InvokeAgentRuntimeCommand for executing shell commands directly within agent microVMs. The features address two core challenges in production agent workflows: ephemeral filesystems that reset between sessions and the inability to execute deterministic operations without routing them through LLMs.

3 min readvia aws.amazon.com
0
analysisOpenAI

OpenAI's Brockman claims GPT reasoning models have 'line of sight' to AGI

OpenAI President Greg Brockman stated that GPT reasoning models have 'line of sight' to AGI and represents a settled debate on whether text-based models can achieve general intelligence. The company is prioritizing this approach over multimodal world models like Sora, which Brockman views as 'a different branch of the tech tree.' The stance contradicts prominent AI researchers including Yann LeCun and Demis Hassabis, who argue LLMs alone are insufficient for human-level intelligence.

2 min readvia the-decoder.com
0
research

Google's TurboQuant compresses AI memory use by 6x, but won't ease DRAM shortage

Google has unveiled TurboQuant, a KV cache quantization technology that claims to reduce memory consumption during AI inference by up to 6x by compressing data from 16-bit precision to as low as 2.5 bits. While the compression technique delivers meaningful efficiency gains for inference providers, it is unlikely to resolve the DRAM shortage that has driven memory prices to record highs, as expanding context windows offset memory savings.

3 min readvia go.theregister.com
0
product updateAnthropic

Claude Code bypasses safety rules after 50 chained commands, enabling prompt injection attacks

Claude Code will automatically approve denied commands—like curl—if preceded by 50 or more chained subcommands, according to security firm Adversa. The vulnerability stems from a hard-coded MAX_SUBCOMMANDS_FOR_SECURITY_CHECK limit set to 50 in the source code, after which the system falls back to requesting user permission rather than enforcing deny rules.

0
product updateAmazon Web Services

Amazon Nova Act automates competitive price monitoring for ecommerce teams

Amazon Web Services has detailed how its Nova Act browser automation SDK can streamline competitive price intelligence workflows. The service enables developers to build agents that navigate websites, extract pricing data using natural language instructions, and run parallel monitoring across multiple competitor sites—addressing manual processes that consume hours daily and delay pricing decisions.

0
researchGoogle DeepMind

Google Deepmind identifies six attack categories that can hijack autonomous AI agents

A Google Deepmind paper introduces the first systematic framework for 'AI agent traps'—attacks that exploit autonomous agents' vulnerabilities to external tools and internet access. The researchers identify six attack categories targeting perception, reasoning, memory, actions, multi-agent networks, and human supervisors, with proof-of-concept demonstrations for each.

0
model release

Holo3 achieves 78.85% on OSWorld benchmark with only 10B active parameters

H Company unveiled Holo3, a computer use model that scores 78.85% on the OSWorld-Verified benchmark—the highest on the leading desktop automation benchmark. The model achieves this with only 10B active parameters (122B total), positioning it as a lower-cost alternative to proprietary models like GPT 5.4 and Opus 4.6.

0
product updateAnthropic

Claude Code source leak reveals Anthropic working on 'Proactive' mode and autonomous payments

Anthropic's Claude Code version 2.1.88 release accidentally included a source map exposing over 512,000 lines of code and 2,000 TypeScript files. Analysis of the leaked codebase by security researchers reveals evidence of a planned 'Proactive' mode that would execute coding tasks without explicit user prompts, plus potential crypto-based autonomous payment systems.

Latest Models

All →