Amazon Bedrock adds reinforcement fine-tuning with OpenAI-compatible APIs
Amazon Bedrock now enables reinforcement fine-tuning (RFT) across multiple model families including Amazon Nova, open-weight models like OpenAI's GPT-OSS 20B, and Qwen 3 32B. The service automates the end-to-end customization workflow using GRPO optimization, allowing models to learn from feedback on multiple responses rather than static training datasets, with support for OpenAI-compatible APIs.
Amazon Bedrock Adds Reinforcement Fine-Tuning with OpenAI-Compatible APIs
Amazon Bedrock now supports reinforcement fine-tuning (RFT) across multiple model families, beginning with Amazon Nova models in December 2025 and expanding to open-weight models including OpenAI's GPT-OSS 20B and Alibaba's Qwen 3 32B in February 2026.
How Reinforcement Fine-Tuning Works
Unlike traditional supervised fine-tuning, which requires large labeled datasets of input-output pairs, RFT enables models to learn through iterative feedback loops. The model generates multiple candidate responses, receives numerical reward scores based on performance criteria, and adjusts weights to favor higher-scoring outputs.
The process mirrors training a chess player: instead of memorizing every possible move, the player learns through practice and feedback on which decisions lead to winning positions. For LLMs, this translates to more efficient customization requiring fewer prompt examples upfront.
Core RFT components include:
- Actor model: The foundation model being customized (Nova, Llama, Qwen, or others)
- State: Current context including prompt, conversation history, and metadata
- Action: The model's generated response
- Reward function: A numerical scoring function evaluating response quality, which can verify correctness automatically (particularly effective for math and code tasks)
Bedrock's RFT implementation uses GRPO, a state-of-the-art reinforcement learning algorithm, with built-in convergence detection to automatically stop training at optimal points.
Technical Workflow
The end-to-end process leverages standard OpenAI SDK calls pointed at Bedrock's Mantle endpoint:
- Authentication: Configure the OpenAI client with Bedrock API keys generated via
aws-bedrock-token-generatorlibrary - Data upload: Submit training data via the Files API in JSONL format containing
messages(prompt in OpenAI message format) and optionalreference_answerfields - Reward function: Deploy an AWS Lambda function that scores model-generated responses
- Job creation: Initiate fine-tuning through the OpenAI SDK; Bedrock automatically generates candidate responses, invokes the reward function, and updates model weights
- Monitoring: Track progress through CloudWatch metrics and the Bedrock console showing reward trends and policy updates
- Inference: Call the fine-tuned model on-demand without endpoint provisioning
AWS handles batching, parallelization, resource allocation, and error recovery transparently. Customer data remains within AWS environments and is not used to train Bedrock-provided models.
Key Advantages
RFT's online learning capability allows models to encounter novel scenarios during training, continuously improving without pre-collected labeled examples. This is particularly effective for verifiable tasks where correctness can be automatically evaluated—eliminating the need for human labeling at scale.
The approach addresses a critical limitation of traditional fine-tuning: the model learns from responses it generates during training, not only from static examples, enabling real-time adaptation and superior performance on complex tasks including code generation, mathematical reasoning, and multi-turn conversations.
Requirements
Implementation requires an AWS account with Bedrock access, Python with openai, boto3, and aws-bedrock-token-generator libraries, appropriate IAM roles for Lambda and Bedrock fine-tuning, and an Amazon Bedrock API key.
What This Means
Bedrock RFT lowers the barrier to enterprise-scale model customization by automating infrastructure complexity while maintaining API compatibility with OpenAI's SDK. For organizations with verifiable outputs (math problems, code, SQL queries), the efficiency gains are substantial—no need to pre-generate thousands of labeled examples. The expanding model support (Nova, Llama, Qwen) signals AWS's commitment to supporting both proprietary and open-weight models, though pricing details for RFT jobs have not been disclosed.
Related Articles
Anthropic launches Claude Tag for Slack, writes 65% of its product team's code
Anthropic released Claude Tag, a beta feature that integrates Claude into Slack for Enterprise and Team customers. The company says the tool writes 65% of its product team's code and can work proactively with ambient mode enabled.
OpenAI releases GPT-5.5-Cyber with 85.6% CyberGym score, surpassing restricted Anthropic model
OpenAI released an updated GPT-5.5-Cyber model that scores 85.6% on CyberGym, surpassing Anthropic's Mythos 5 (83.8%) — the same model that triggered Trump administration export controls. The release proceeds without the political pushback that forced Anthropic to restrict foreign national access.
Anthropic launches Claude Tag for Slack: AI agent with persistent memory across team channels
Anthropic has released Claude Tag in research preview for Slack, an AI agent that maintains persistent memory across channels and can proactively participate in team conversations. Available to Claude Enterprise and Team customers, it differs from existing Slack integrations by learning organizational context over time and sharing a single identity across team members.
GitHub Copilot CLI Gets Redesigned Terminal Interface in General Availability
GitHub has released the redesigned terminal interface for GitHub Copilot CLI to general availability. The update, previewed at Microsoft Build 2026, introduces a tabbed layout for working with GitHub directly from the command line.
Comments
Loading...