AWS releases Nova Forge SDK data mixing guide to preserve general capabilities during fine-tuning
Amazon Web Services published a practical guide for fine-tuning Amazon Nova models using the Nova Forge SDK's data mixing capabilities. According to AWS, blending customer data with Amazon-curated datasets preserved near-baseline MMLU scores while delivering a 12-point F1 improvement on a Voice of Customer classification task spanning 1,420 leaf categories.
AWS releases Nova Forge SDK data mixing guide to preserve general capabilities during fine-tuning
Amazon Web Services published a hands-on guide for fine-tuning Amazon Nova models using the Nova Forge SDK's data mixing capabilities, which allows developers to fine-tune on domain-specific data without losing general model capabilities.
Performance claims
According to AWS, blending customer data with Amazon-curated datasets preserved near-baseline MMLU scores while delivering a 12-point F1 improvement on a Voice of Customer classification task spanning 1,420 leaf categories. By contrast, AWS claims fine-tuning an open-source model on customer data alone caused a near-total loss of general capabilities.
Technical implementation
The guide covers a five-stage workflow: environment setup with Nova Forge SDK installation, data preparation with sanitization and validation, training configuration including SageMaker HyperPod runtime setup, model training using supervised fine-tuning with Low-Rank Adaptation (LoRA), and model evaluation against public benchmarks.
The SDK enforces token-level validation on training data to prevent conflicts with Nova's internal chat template. Special delimiters like System:, User:, and Assistant: must be sanitized before training to avoid corrupting the training signal.
Infrastructure requirements
The walkthrough uses 4 ml.p5.48xlarge GPU instances for both training and evaluation. AWS recommends starting with a short test run (max_steps=5) to validate configuration before committing to full training runs. Prerequisites include an AWS account with Amazon Nova Forge access, a provisioned SageMaker HyperPod cluster with GPU instances, an Amazon SageMaker MLflow application for experiment tracking, and appropriate IAM permissions.
Dataset example
The guide demonstrates the workflow using the MedReason dataset from Hugging Face, which contains approximately 32,700 medical question-answer pairs. The Nova Forge SDK supports JSONL, JSON, and CSV input formats and provides a JSONLDatasetLoader that converts raw data into the structured turn-based format Nova models expect during training.
What this means
Data mixing addresses a critical challenge in model fine-tuning: maintaining general capabilities while adapting to specific domains. AWS's 12-point F1 improvement claim suggests meaningful performance gains are possible without catastrophic forgetting. However, the requirement for expensive GPU infrastructure (ml.p5.48xlarge instances) and the proprietary nature of Amazon's curated datasets may limit adoption to larger organizations already invested in AWS infrastructure. The detailed sanitization requirements highlight the fragility of chat template-based training approaches.
Related Articles
OpenAI GPT-5.5 and GPT-5.4 Launch on Amazon Bedrock at Parity Pricing
OpenAI's GPT-5.5 and GPT-5.4 models are now generally available on Amazon Bedrock, with pricing matching OpenAI's first-party rates. Codex, OpenAI's coding agent used by 5 million developers weekly, is also available with pay-per-token pricing and no seat licenses.
AWS adds Policy Engine and Lambda interceptors to Bedrock AgentCore gateway for agent security controls
Amazon Web Services launched Policy Engine and Lambda interceptors for Bedrock AgentCore gateway, enabling enterprises to control which tools AI agents can access and validate requests dynamically. The Policy Engine uses Cedar declarative policy language for deterministic access decisions, while Lambda interceptors run custom code before or after each tool call for validation, token exchange, and response filtering.
AWS launches dataset management in Bedrock AgentCore for versioned agent test suites
Amazon Web Services introduced dataset management in Bedrock AgentCore, enabling developers to build versioned test suites with immutable baselines for agent evaluation. The feature supports predefined scenarios with ground truth assertions and user simulation scenarios where LLM-backed actors conduct multi-turn conversations.
ChatGPT app adds long-press gesture to switch intelligence levels mid-conversation
OpenAI added a long-press gesture to ChatGPT's mobile app that lets users select intelligence levels (Instant, Thinking, Extended) before sending a message. The update also includes a table of contents feature for conversations with 5+ responses and improvements to the GPT-5.5 Instant model.
Comments
Loading...