ai-research
7 articles tagged with ai-research
AI agent skills fail in real-world conditions, researchers find testing 34,000 skills
A large-scale study testing 34,198 real-world skills reveals that AI agent performance drops drastically when moving from curated benchmarks to realistic conditions. Claude Opus 4.6 saw pass rates fall from 55.4% with hand-selected skills to 38.4% in truly realistic scenarios, while weaker models like Kimi K2.5 actually perform below their no-skill baseline.
OpenAI's Brockman claims GPT reasoning models have 'line of sight' to AGI
OpenAI President Greg Brockman stated that GPT reasoning models have 'line of sight' to AGI and represents a settled debate on whether text-based models can achieve general intelligence. The company is prioritizing this approach over multimodal world models like Sora, which Brockman views as 'a different branch of the tech tree.' The stance contradicts prominent AI researchers including Yann LeCun and Demis Hassabis, who argue LLMs alone are insufficient for human-level intelligence.
Meta's hyperagents learn to improve their own improvement mechanisms across multiple domains
Researchers at Meta, University of British Columbia, and partner institutions have developed hyperagents—AI systems that optimize both their task performance and the mechanisms controlling their self-improvement. Unlike previous self-improvement approaches locked to coding tasks, DGM-Hyperagents (DGM-H) demonstrate significant gains across four domains and can transfer improvement strategies to entirely new tasks.
Nvidia to spend $26B on open-weight AI models, filing reveals
Nvidia will invest $26 billion over the next five years to build open-weight AI models, according to a 2025 financial filing confirmed by executives. The move signals a strategic shift from chipmaker to AI frontier lab, with the company releasing Nemotron 3 Super (128B parameters) and claiming it outperforms GPT-OSS on multiple benchmarks.
Yann LeCun's AMI Labs raises $1.03B to develop world models
AMI Labs, cofounded by Turing Prize winner Yann LeCun, has raised $1.03 billion at a $3.5 billion pre-money valuation. The funding will support the company's effort to develop world models, marking a major commitment to foundational AI research outside of existing tech giants.
Apple Research Identifies 'Text-Speech Understanding Gap' Limiting LLM Speech Performance
Apple researchers have identified a fundamental limitation in speech-adapted large language models: they consistently underperform their text-based counterparts on language understanding tasks. The team terms this the 'text-speech understanding gap' and documents that speech-adapted LLMs lag behind both their original text versions and cascaded speech-to-text pipelines.
DeepMind proposes AI should assign humans busywork to maintain job skills
A new Google DeepMind research paper on AI agent delegation recommends that AI systems occasionally assign humans tasks they could easily complete themselves. The goal: preventing workforce skill atrophy as automation increases.