SageMaker

6 articles tagged with SageMaker

May 20, 2026
product updateAmazon Web Services

AWS SageMaker AI adds bidirectional streaming for real-time speech transcription with vLLM

Amazon SageMaker AI has launched bidirectional streaming support for real-time inference, enabling WebSocket-based voice applications through vLLM integration. The feature uses HTTP/2 on port 8443 to bridge client connections with vLLM's Realtime API, allowing audio to stream in while transcription streams back simultaneously over a single persistent connection.

May 4, 2026
product updateAmazon Web Services

AWS launches agent-guided workflows in SageMaker AI to automate model fine-tuning

Amazon Web Services has released agent-guided workflows in SageMaker AI that use AI coding agents to automate model customization. The feature includes nine pre-built skills covering use case definition, data preparation, fine-tuning technique selection (SFT, DPO, RLVR), evaluation, and deployment to Amazon Bedrock or SageMaker endpoints.

product update

AWS SageMaker adds automatic instance fallback to prevent GPU capacity failures

Amazon SageMaker AI now supports capacity-aware instance pools that automatically try alternative GPU instance types when primary choices lack capacity. The feature works across endpoint creation, autoscaling, and scale-in operations, eliminating the manual retry cycles that previously left endpoints stuck in failed states.

April 28, 2026
model releaseNVIDIA

NVIDIA Nemotron 3 Nano Omni: 30B-parameter multimodal model launches on AWS SageMaker with 131K token context

NVIDIA has launched Nemotron 3 Nano Omni on Amazon SageMaker JumpStart, a multimodal model with 30 billion total parameters (3 billion active) that processes video, audio, images, and text in a single inference pass. The model features a 131K token context window and uses a Mamba2 Transformer Hybrid MoE architecture combining three specialized encoders.

April 17, 2026
product updateAmazon Web Services

AWS releases Nova Forge SDK data mixing guide to preserve general capabilities during fine-tuning

Amazon Web Services published a practical guide for fine-tuning Amazon Nova models using the Nova Forge SDK's data mixing capabilities. According to AWS, blending customer data with Amazon-curated datasets preserved near-baseline MMLU scores while delivering a 12-point F1 improvement on a Voice of Customer classification task spanning 1,420 leaf categories.

April 16, 2026
product updateAmazon Web Services

Amazon Nova Micro Fine-Tuned Text-to-SQL Models Now Available on Bedrock On-Demand Inference at $0.80/Month for 22,000 Q

AWS has enabled fine-tuned Amazon Nova Micro models to run on Bedrock's on-demand inference for text-to-SQL generation. According to AWS testing, a sample workload of 22,000 queries per month costs $0.80 monthly using the serverless approach, compared to higher costs with persistent model hosting. The solution uses LoRA fine-tuning on the sql-create-context dataset containing over 78,000 SQL examples.