product updateAmazon Web Services

Amazon Bedrock adds reinforcement fine-tuning with OpenAI-compatible APIs

TL;DR

Amazon Bedrock now enables reinforcement fine-tuning (RFT) across multiple model families including Amazon Nova, open-weight models like OpenAI's GPT-OSS 20B, and Qwen 3 32B. The service automates the end-to-end customization workflow using GRPO optimization, allowing models to learn from feedback on multiple responses rather than static training datasets, with support for OpenAI-compatible APIs.

3 min read
0

Amazon Bedrock Adds Reinforcement Fine-Tuning with OpenAI-Compatible APIs

Amazon Bedrock now supports reinforcement fine-tuning (RFT) across multiple model families, beginning with Amazon Nova models in December 2025 and expanding to open-weight models including OpenAI's GPT-OSS 20B and Alibaba's Qwen 3 32B in February 2026.

How Reinforcement Fine-Tuning Works

Unlike traditional supervised fine-tuning, which requires large labeled datasets of input-output pairs, RFT enables models to learn through iterative feedback loops. The model generates multiple candidate responses, receives numerical reward scores based on performance criteria, and adjusts weights to favor higher-scoring outputs.

The process mirrors training a chess player: instead of memorizing every possible move, the player learns through practice and feedback on which decisions lead to winning positions. For LLMs, this translates to more efficient customization requiring fewer prompt examples upfront.

Core RFT components include:

  • Actor model: The foundation model being customized (Nova, Llama, Qwen, or others)
  • State: Current context including prompt, conversation history, and metadata
  • Action: The model's generated response
  • Reward function: A numerical scoring function evaluating response quality, which can verify correctness automatically (particularly effective for math and code tasks)

Bedrock's RFT implementation uses GRPO, a state-of-the-art reinforcement learning algorithm, with built-in convergence detection to automatically stop training at optimal points.

Technical Workflow

The end-to-end process leverages standard OpenAI SDK calls pointed at Bedrock's Mantle endpoint:

  1. Authentication: Configure the OpenAI client with Bedrock API keys generated via aws-bedrock-token-generator library
  2. Data upload: Submit training data via the Files API in JSONL format containing messages (prompt in OpenAI message format) and optional reference_answer fields
  3. Reward function: Deploy an AWS Lambda function that scores model-generated responses
  4. Job creation: Initiate fine-tuning through the OpenAI SDK; Bedrock automatically generates candidate responses, invokes the reward function, and updates model weights
  5. Monitoring: Track progress through CloudWatch metrics and the Bedrock console showing reward trends and policy updates
  6. Inference: Call the fine-tuned model on-demand without endpoint provisioning

AWS handles batching, parallelization, resource allocation, and error recovery transparently. Customer data remains within AWS environments and is not used to train Bedrock-provided models.

Key Advantages

RFT's online learning capability allows models to encounter novel scenarios during training, continuously improving without pre-collected labeled examples. This is particularly effective for verifiable tasks where correctness can be automatically evaluated—eliminating the need for human labeling at scale.

The approach addresses a critical limitation of traditional fine-tuning: the model learns from responses it generates during training, not only from static examples, enabling real-time adaptation and superior performance on complex tasks including code generation, mathematical reasoning, and multi-turn conversations.

Requirements

Implementation requires an AWS account with Bedrock access, Python with openai, boto3, and aws-bedrock-token-generator libraries, appropriate IAM roles for Lambda and Bedrock fine-tuning, and an Amazon Bedrock API key.

What This Means

Bedrock RFT lowers the barrier to enterprise-scale model customization by automating infrastructure complexity while maintaining API compatibility with OpenAI's SDK. For organizations with verifiable outputs (math problems, code, SQL queries), the efficiency gains are substantial—no need to pre-generate thousands of labeled examples. The expanding model support (Nova, Llama, Qwen) signals AWS's commitment to supporting both proprietary and open-weight models, though pricing details for RFT jobs have not been disclosed.

Related Articles

product update

Amazon Bedrock adds three video analysis workflows for multimodal understanding at scale

Amazon Bedrock has introduced three distinct video analysis workflows that leverage multimodal foundation models to extract insights from video content at scale. The approaches—frame-based, shot-based, and multimodal embedding—are designed for different use cases and cost-performance trade-offs, with open-source reference implementations available on GitHub.

product update

AWS adds Claude tool use to Bedrock for custom entity extraction from documents

Amazon Web Services has integrated Claude's tool use (function calling) capability into Bedrock, enabling serverless document processing for custom entity recognition. The solution uses Claude 3.5 Sonnet to extract structured data like names, dates, and addresses from driver's licenses and other documents without traditional model training.

product update

OpenAI shutters Sora video tool after Disney deal collapse, signaling shift to enterprise focus

OpenAI announced the shutdown of its Sora video generation app on Tuesday via an X post, just two days after publishing usage guidelines and following Disney's withdrawal from a proposed $1 billion investment deal. The move represents OpenAI's second major product discontinuation in recent months, after deprecating GPT-4o in January with two weeks' notice.

product update

Google's Gemini app now creates 3-minute songs with Lyria 3 Pro

Google announced Lyria 3 Pro, expanding the Gemini app's music generation capability from 30-second tracks to full 3-minute songs. The model improves structural understanding of musical composition, allowing users to prompt for specific elements like intros, verses, choruses, and bridges. Available now for Gemini subscribers with tier-based daily limits (10-50 tracks/day) and in Vertex AI, Google AI Studio, and the Gemini API for developers.

Comments

Loading...