Anthropic Python SDK v0.104.0 adds thinking token count estimates for streaming responses
Anthropic released version 0.104.0 of its Python SDK on May 21, 2026. The update adds support for a thinking-token-count beta feature that provides estimated token counts in thinking block deltas when streaming responses from reasoning models.
Anthropic Python SDK v0.104.0 adds thinking token count estimates for streaming responses
Anthropic released version 0.104.0 of its Python SDK on May 21, 2026, adding support for tracking token usage in reasoning model thought processes.
What's new
The update introduces a thinking-token-count beta feature that provides estimated token counts within thinking block deltas during streaming responses. This allows developers to monitor token consumption in real-time as Claude's reasoning models process extended chains of thought.
The feature specifically targets streaming scenarios where Claude models with thinking capabilities—such as Claude 3.5 Sonnet with extended thinking mode—generate internal reasoning before producing final outputs.
Technical details
The implementation provides token count estimates as part of the delta stream, enabling developers to:
- Track thinking token usage during active streaming
- Estimate costs for reasoning operations in real-time
- Monitor and debug extended thinking processes
- Optimize prompts based on thinking token consumption
The feature is marked as beta, indicating the API may change in future releases.
Version information
- Version: 0.104.0
- Release date: May 21, 2026
- Type: Minor version update
- Full changelog: Available at github.com/anthropics/anthropic-sdk-python/compare/v0.103.1...v0.104.0
What this means
This update addresses a key observability gap for developers using Claude's reasoning capabilities. Previously, tracking token usage in thinking blocks required waiting for complete responses. With streaming token counts, developers can now monitor costs and performance in real-time, particularly important for applications using extended thinking modes where reasoning token counts can significantly exceed output tokens. The beta designation suggests Anthropic is still refining how thinking token metrics are calculated and surfaced to developers.
Related Articles
Anthropic releases Claude Opus 4.7 Fast with 6x pricing for higher output speed
Anthropic has released Claude Opus 4.7 Fast, a speed-optimized variant of its Opus 4.7 model. The fast-mode version delivers identical capabilities with higher output speed at premium pricing: $30 per 1M input tokens and $150 per 1M output tokens, representing a 6x increase over standard pricing.
AWS launches Claude Platform on AWS, bringing Anthropic's native APIs and features directly to AWS accounts
AWS announced general availability of Claude Platform on AWS, enabling direct access to Anthropic's native APIs, tools, and console through existing AWS accounts. The service includes the Messages API, Claude Managed Agents, web search, MCP connector, and code execution, authenticated via AWS IAM and billed through AWS Marketplace.
Anthropic doubles Claude Code usage limits for paid users, increases API capacity by up to 1500%
Anthropic has doubled Claude Code's five-hour usage limits for Pro, Max, Team, and Enterprise users while removing peak hour restrictions for Pro and Max plans. The company also increased API limits by up to 1500% for input tokens per minute through a compute capacity deal with SpaceX's Colossus 1 data center.
Anthropic doubles Claude Code rate limits, secures 220,000 Nvidia GPUs via SpaceX Colossus 1 deal
Anthropic doubled Claude Code's five-hour rate limits across Pro, Max, Team, and Enterprise plans effective Tuesday, removing peak-hours throttling for Pro and Max users. The capacity expansion comes from an exclusive agreement securing all compute at SpaceX's Colossus 1 data center, which provides over 300 megawatts and more than 220,000 Nvidia GPUs.
Comments
Loading...