Breaking

GitHub Copilot CLI adds Rubber Duck for second-opinion AI suggestions

GitHub has added Rubber Duck to Copilot CLI, a feature that provides alternative suggestions by consulting different AI model families. The feature lets developers get a second opinion on code suggestions directly from the command line.

April 6, 2026

Latest News

All news →
0
analysisAnthropic

AMD AI director reports Claude Code performance degradation since March update

Stella Laurenzo, director of AI at AMD, filed a GitHub issue documenting significant performance degradation in Claude Code since early March, specifically following the deployment of thinking content redaction in version 2.1.69. Analysis of 6,852 sessions with 234,760 tool calls shows stop-hook violations increased from zero to 10 per day, while code-reading behavior dropped from 6.6 reads to 2 reads per session.

3 min readvia go.theregister.com
0
product updateAnthropic

Anthropic blocks Claude subscriptions for OpenClaw, citing capacity constraints

Anthropic has disallowed subscription-based pricing for users accessing Claude through open-source agentic tools like OpenClaw, effective April 4, 2026. The restriction comes as the company faces elevated service errors and struggles to balance capacity with demand. Third-party tool usage will now draw from pay-per-token rates instead of subscription limits.

3 min readvia go.theregister.com
1
product updateOpenAI

OpenAI launches ChatGPT app integrations with DoorDash, Spotify, Uber, and 10+ others

OpenAI has expanded ChatGPT with native app integrations allowing users to connect accounts from Spotify, DoorDash, Uber, Booking.com, Canva, Figma, Coursera, Expedia, Target, and others. Users can request actions like meal planning with DoorDash grocery delivery, playlist creation with Spotify, and hotel bookings through Booking.com directly within ChatGPT. The feature requires account authentication and data sharing; users can disconnect any integrated app from Settings.

2 min readvia techcrunch.com
0
model releaseGoogle DeepMind

Google DeepMind releases Gemma 4 family: multimodal models from 2.3B to 31B parameters with 256K context

Google DeepMind released the Gemma 4 family of open-weights multimodal models in four sizes: E2B (2.3B effective parameters), E4B (4.5B effective), 26B A4B (3.8B active parameters), and 31B dense. All models support text and image input with 128K-256K context windows; E2B and E4B add native audio capabilities. Models feature reasoning modes, function calling, and multilingual support across 140+ languages.

0
research

Alibaba's HopChain framework fixes vision model failures in multi-step reasoning tasks

Researchers from Alibaba's Qwen team and Tsinghua University developed HopChain, a framework that automatically generates multi-step image questions to fix how vision-language models fail during complex reasoning tasks. The method improved 20 out of 24 tested benchmarks by forcing models to re-examine images at each reasoning step, preventing early perceptual errors from cascading through subsequent steps.

0
research

AI offensive cyber capabilities doubling every 5.7 months since 2024, study finds

AI offensive cybersecurity capabilities are accelerating faster than previously measured. Lyptus Research's new study finds the doubling time has compressed from 9.8 months (since 2019) to 5.7 months (since 2024), with GPT-5.3 Codex and Opus 4.6 now solving tasks at 50% success rates that would take human security experts three hours.

0
research

Google study: AI benchmarks need 10+ human raters per example, not standard 3-5

A Google Research and Rochester Institute of Technology study reveals that standard AI benchmarking practices using three to five human evaluators per test example systematically underestimate human disagreement and produce unreliable model comparisons. The researchers found that at least ten raters per example are needed for statistically reliable results, and that budget allocation between test examples and raters matters as much as total budget size.

0
research

Alibaba's Qwen team develops algorithm that doubles reasoning chain length in math problems

Alibaba's Qwen team has developed Future-KL Influenced Policy Optimization (FIPO), a training algorithm that assigns different weights to tokens based on their influence on subsequent reasoning steps, rather than treating all tokens equally. Testing on Qwen2.5-32B-Base showed reasoning chains double from ~4,000 to 10,000+ tokens, with AIME 2024 accuracy improving from 50% to 58%, outperforming Deepseek-R1-Zero-Math-32B (47%) and OpenAI's o1-mini (56%). The team plans to open-source the system.

Latest Models

All →