AWS launches hyperparameter optimization guide for Amazon Nova Forge custom model training
AWS has published a technical guide on hyperparameter optimization for Amazon Nova Forge, its platform for building custom frontier models from Amazon Nova checkpoints. The guide addresses three core challenges: catastrophic forgetting during domain specialization, learning rate calibration when mixing proprietary and curated training data, and baseline performance constraints for reinforcement fine-tuning.
AWS launches hyperparameter optimization guide for Amazon Nova Forge custom model training
AWS has published a technical guide on hyperparameter optimization for Amazon Nova Forge, its platform for building custom frontier models from Amazon Nova checkpoints. The guide addresses three core challenges: catastrophic forgetting during domain specialization, learning rate calibration when mixing proprietary and curated training data, and baseline performance constraints for reinforcement fine-tuning.
Key capabilities and training pipeline
Amazon Nova Forge enables organizations to customize Amazon Nova models using three complementary techniques:
Continued pre-training (CPT) expands model knowledge through self-supervised learning on unlabeled domain text. Nova Forge offers three checkpoint options for CPT: pre-trained, mid-trained, and post-trained, each suited to different data scales and downstream requirements.
Supervised fine-tuning (SFT) customizes model behavior using 1,000–10,000 input-output demonstration pairs per task. According to AWS, quality and consistency matter more than volume. SFT with data mixing uses Amazon Nova-curated datasets in reasoning and instruction-following categories to preserve general capabilities.
Reinforcement fine-tuning (RFT) optimizes model outputs using reward signals. Nova Forge supports custom verification logic through AWS Lambda integration, enabling domain-specific quality assessment for single-turn or multi-turn conversational tasks.
Critical hyperparameter challenges
The guide identifies learning rate as the most sensitive hyperparameter across all customization techniques. According to AWS, deviating from service defaults when mixing Nova data with proprietary data is the most common source of training instability.
Data mixing ratios require careful calibration. The technique blends proprietary training data with Amazon Nova-curated datasets to prevent catastrophic forgetting—when models lose general capabilities after training on narrow domain data. AWS states that models can lose instruction-following ability, reasoning capability, and broad knowledge when this balance is incorrect.
RFT works within specific baseline accuracy ranges. If baseline accuracy is too low, insufficient correct responses exist for reward-guided learning. If baseline accuracy is already high, additional training yields diminishing returns. AWS recommends running SFT first for low-baseline scenarios to establish foundational capabilities before RFT.
Training configuration
Nova Forge provides calibrated service defaults for each training technique that account for interactions between data distribution, mixing ratio, and training method. The platform supports secure hosting of custom models on AWS infrastructure.
Checkpoint selection determines how much existing alignment to preserve during customization. Combined with data mixing, this addresses the stability-flexibility tradeoff between learning organizational domain knowledge and retaining general model capabilities.
What this means
This guide signals AWS's positioning of Nova Forge as an enterprise-focused alternative to training custom models from scratch or using general-purpose fine-tuning APIs. The technical depth on catastrophic forgetting and learning rate sensitivity addresses real production failures that occur when organizations customize foundation models without adequate guardrails. The emphasis on service defaults and calibrated parameters suggests AWS is productizing lessons from internal model development to reduce trial-and-error costs for enterprise customers. Organizations evaluating custom model development now have documented approaches for the specific hyperparameter interactions that determine whether domain specialization succeeds or wastes compute.
Related Articles
OpenAI GPT-5.5 and GPT-5.4 Launch on Amazon Bedrock at Parity Pricing
OpenAI's GPT-5.5 and GPT-5.4 models are now generally available on Amazon Bedrock, with pricing matching OpenAI's first-party rates. Codex, OpenAI's coding agent used by 5 million developers weekly, is also available with pay-per-token pricing and no seat licenses.
AWS adds Policy Engine and Lambda interceptors to Bedrock AgentCore gateway for agent security controls
Amazon Web Services launched Policy Engine and Lambda interceptors for Bedrock AgentCore gateway, enabling enterprises to control which tools AI agents can access and validate requests dynamically. The Policy Engine uses Cedar declarative policy language for deterministic access decisions, while Lambda interceptors run custom code before or after each tool call for validation, token exchange, and response filtering.
AWS launches dataset management in Bedrock AgentCore for versioned agent test suites
Amazon Web Services introduced dataset management in Bedrock AgentCore, enabling developers to build versioned test suites with immutable baselines for agent evaluation. The feature supports predefined scenarios with ground truth assertions and user simulation scenarios where LLM-backed actors conduct multi-turn conversations.
Microsoft releases ASSERT, open-source framework for testing application-specific AI behavior using natural language
Microsoft released ASSERT (Adaptive Spec-driven Scoring for Evaluation and Regression Testing), an open-source framework that converts natural language descriptions of expected AI behavior into structured test cases. The tool addresses a gap in AI evaluation by testing application-specific behaviors that general benchmarks cannot capture.
Comments
Loading...