product updateAmazon Web Services

Amazon Bedrock adds reinforcement fine-tuning best practices for Nova and open source models

TL;DR

Amazon Bedrock now supports Reinforcement Fine-Tuning (RFT) for customizing Amazon Nova and open source models using reward signals instead of labeled datasets. AWS reports up to 66% accuracy improvements over base models with reduced customization complexity. The approach works best for tasks with verifiable correctness (code, math) or subjective evaluation (moderation, summarization).

3 min read
0

Amazon Bedrock Adds Reinforcement Fine-Tuning Best Practices

Amazon Web Services has published comprehensive best practices for Reinforcement Fine-Tuning (RFT) on Amazon Bedrock, a technique that customizes foundation models using reward signals rather than static labeled datasets. According to AWS, RFT delivers up to 66% accuracy gains over base models while reducing customization cost and complexity.

How RFT Works

Unlike supervised fine-tuning (SFT) that trains on correct input-output pairs, RFT uses a dataset of inputs paired with a reward function. The reward function can be rule-based, a trained grader model, or an LLM acting as a judge. During training, the model generates candidate responses, the reward function scores each response, and model weights update to increase probability of high-reward outputs. This iterative cycle steers the model toward behaviors that maximize reward signals.

AWS identifies two primary categories where RFT excels:

Reinforcement Learning with Verifiable Rewards (RLVR): Tasks where correctness can be automatically verified through rules or tests. Examples include code generation (unit-test pass rates), math reasoning (exact answers), structured data extraction (schema validation), and API orchestration (successful task completion).

Reinforcement Learning with AI Feedback (RLAIF): Subjective tasks where another model evaluates quality against a rubric. Applications include content moderation, chatbots, creative writing, and summarization.

Dataset Requirements and Guidelines

Amazon Bedrock's RFT supports datasets between 100–10,000 training samples, with requirements varying by task complexity. AWS provides tiered guidance:

  • 100–200 examples: Initial experimentation to validate prompts, reward functions, and measurable improvements
  • 200–5,000 examples: Typical implementations providing stronger generalization and consistent performance across prompt variations
  • 5,000–10,000 examples: Complex reasoning tasks, specialized domains, or sophisticated reward functions requiring robustness across diverse inputs

AWS emphasizes that dataset quality fundamentally determines RFT outcomes and that training data must follow OpenAI chat completion format as JSONL files.

Mathematical Reasoning Case Study

AWS demonstrates RFT effectiveness using the GSM8K (Grade School Math 8K) dataset, showing how the approach improves mathematical problem-solving. Unlike standard fine-tuning that encourages pattern-matching, RFT can define reward functions that assign full credit for exact answers while providing partial credit for correct intermediate reasoning steps. This allows models to discover valid solution approaches with relatively small datasets (100–1000 examples) while maintaining structured output formats.

The example shows a math problem requiring multi-step reasoning with intermediate verification, where RFT can guide the model toward breaking problems into logical steps and following required formatting—capabilities that supervised fine-tuning typically struggles to achieve.

Practical Implementation

On Amazon Bedrock, both rule-based and model-based reward approaches implement as custom AWS Lambda functions that the platform invokes during the training loop. AWS guidance covers:

  • Reward function strategy and design
  • Hyperparameter tuning informed by experiments across multiple models and use cases
  • Training progress monitoring using Amazon Bedrock metrics
  • Use cases including code generation, structured extraction, and content moderation

The approach works with Amazon Nova and supported open source models available through Bedrock.

What This Means

AWS is positioning RFT as a practical alternative to supervised fine-tuning for scenarios where labeled datasets are expensive or impractical to curate. The 66% accuracy improvement claim and support for datasets as small as 100 examples could significantly lower the barrier to model customization for specialized tasks. However, AWS's emphasis on dataset quality and the requirement for well-designed reward functions suggests RFT success depends heavily on implementation details beyond dataset size. The guidance toward 200–5,000 examples for typical implementations indicates that "small dataset" claims should be interpreted conservatively for production deployments.

Related Articles

product update

Amazon Nova Act Becomes HIPAA Eligible for Healthcare Workflows

Amazon Nova Act, AWS's browser-based AI agent service, now qualifies as HIPAA eligible, allowing healthcare organizations to deploy autonomous agents for workflows involving electronically protected health information. The service automates repetitive browser tasks including claims processing, referral coordination, and prior authorization.

product update

AWS Launches Amazon Bedrock AgentCore for Deploying Production AI Agents

AWS has launched Amazon Bedrock AgentCore, a serverless runtime environment for deploying production AI agents. Turkish fulfillment company OPLOG demonstrated the platform's capabilities by building three business intelligence agents using Anthropic's Claude Sonnet, achieving a 35% reduction in sales cycles and 98% reduction in manual research time.

product update

AWS releases four multimodal evaluators for image-to-text AI tasks in Strands Evals SDK

AWS has added four multimodal evaluators to its Strands Evals SDK that judge image-to-text AI outputs by directly analyzing source images. The evaluators—Overall Quality, Correctness, Faithfulness, and Instruction Following—use multimodal large language models to detect visual hallucinations, factual errors, and instruction violations that text-only judges miss.

product update

AWS SageMaker AI adds bidirectional streaming for real-time speech transcription with vLLM

Amazon SageMaker AI has launched bidirectional streaming support for real-time inference, enabling WebSocket-based voice applications through vLLM integration. The feature uses HTTP/2 on port 8443 to bridge client connections with vLLM's Realtime API, allowing audio to stream in while transcription streams back simultaneously over a single persistent connection.

Comments

Loading...