Amazon Bedrock adds reinforcement fine-tuning with OpenAI-compatible APIs

TL;DR

Amazon Bedrock now enables reinforcement fine-tuning (RFT) across multiple model families including Amazon Nova, open-weight models like OpenAI's GPT-OSS 20B, and Qwen 3 32B. The service automates the end-to-end customization workflow using GRPO optimization, allowing models to learn from feedback on multiple responses rather than static training datasets, with support for OpenAI-compatible APIs.

March 25, 2026 · 5:50 PM3 min read

Amazon Bedrock Adds Reinforcement Fine-Tuning with OpenAI-Compatible APIs

Amazon Bedrock now supports reinforcement fine-tuning (RFT) across multiple model families, beginning with Amazon Nova models in December 2025 and expanding to open-weight models including OpenAI's GPT-OSS 20B and Alibaba's Qwen 3 32B in February 2026.

How Reinforcement Fine-Tuning Works

Unlike traditional supervised fine-tuning, which requires large labeled datasets of input-output pairs, RFT enables models to learn through iterative feedback loops. The model generates multiple candidate responses, receives numerical reward scores based on performance criteria, and adjusts weights to favor higher-scoring outputs.

The process mirrors training a chess player: instead of memorizing every possible move, the player learns through practice and feedback on which decisions lead to winning positions. For LLMs, this translates to more efficient customization requiring fewer prompt examples upfront.

Core RFT components include:

Actor model: The foundation model being customized (Nova, Llama, Qwen, or others)
State: Current context including prompt, conversation history, and metadata
Action: The model's generated response
Reward function: A numerical scoring function evaluating response quality, which can verify correctness automatically (particularly effective for math and code tasks)

Bedrock's RFT implementation uses GRPO, a state-of-the-art reinforcement learning algorithm, with built-in convergence detection to automatically stop training at optimal points.

Technical Workflow

The end-to-end process leverages standard OpenAI SDK calls pointed at Bedrock's Mantle endpoint:

Authentication: Configure the OpenAI client with Bedrock API keys generated via aws-bedrock-token-generator library
Data upload: Submit training data via the Files API in JSONL format containing messages (prompt in OpenAI message format) and optional reference_answer fields
Reward function: Deploy an AWS Lambda function that scores model-generated responses
Job creation: Initiate fine-tuning through the OpenAI SDK; Bedrock automatically generates candidate responses, invokes the reward function, and updates model weights
Monitoring: Track progress through CloudWatch metrics and the Bedrock console showing reward trends and policy updates
Inference: Call the fine-tuned model on-demand without endpoint provisioning

AWS handles batching, parallelization, resource allocation, and error recovery transparently. Customer data remains within AWS environments and is not used to train Bedrock-provided models.

Key Advantages

RFT's online learning capability allows models to encounter novel scenarios during training, continuously improving without pre-collected labeled examples. This is particularly effective for verifiable tasks where correctness can be automatically evaluated—eliminating the need for human labeling at scale.

The approach addresses a critical limitation of traditional fine-tuning: the model learns from responses it generates during training, not only from static examples, enabling real-time adaptation and superior performance on complex tasks including code generation, mathematical reasoning, and multi-turn conversations.

Requirements

Implementation requires an AWS account with Bedrock access, Python with openai, boto3, and aws-bedrock-token-generator libraries, appropriate IAM roles for Lambda and Bedrock fine-tuning, and an Amazon Bedrock API key.

What This Means

Bedrock RFT lowers the barrier to enterprise-scale model customization by automating infrastructure complexity while maintaining API compatibility with OpenAI's SDK. For organizations with verifiable outputs (math problems, code, SQL queries), the efficiency gains are substantial—no need to pre-generate thousands of labeled examples. The expanding model support (Nova, Llama, Qwen) signals AWS's commitment to supporting both proprietary and open-weight models, though pricing details for RFT jobs have not been disclosed.

Source: aws.amazon.com ↗

amazon-bedrock reinforcement-learning fine-tuning openai-compatible-apis amazon-nova llama qwen gpt-oss

product updateJune 23, 2026

Anthropic launches Claude Tag for Slack, writes 65% of its product team's code

Anthropic released Claude Tag, a beta feature that integrates Claude into Slack for Enterprise and Team customers. The company says the tool writes 65% of its product team's code and can work proactively with ambient mode enabled.

product updateJune 23, 2026

OpenAI releases GPT-5.5-Cyber with 85.6% CyberGym score, surpassing restricted Anthropic model

OpenAI released an updated GPT-5.5-Cyber model that scores 85.6% on CyberGym, surpassing Anthropic's Mythos 5 (83.8%) — the same model that triggered Trump administration export controls. The release proceeds without the political pushback that forced Anthropic to restrict foreign national access.

product updateJune 23, 2026

Anthropic launches Claude Tag for Slack: AI agent with persistent memory across team channels

Anthropic has released Claude Tag in research preview for Slack, an AI agent that maintains persistent memory across channels and can proactively participate in team conversations. Available to Claude Enterprise and Team customers, it differs from existing Slack integrations by learning organizational context over time and sharing a single identity across team members.

product updateJune 23, 2026

GitHub Copilot CLI Gets Redesigned Terminal Interface in General Availability

GitHub has released the redesigned terminal interface for GitHub Copilot CLI to general availability. The update, previewed at Microsoft Build 2026, introduces a tabbed layout for working with GitHub directly from the command line.

Amazon Bedrock adds reinforcement fine-tuning with OpenAI-compatible APIs

Amazon Bedrock Adds Reinforcement Fine-Tuning with OpenAI-Compatible APIs

How Reinforcement Fine-Tuning Works

Technical Workflow

Key Advantages

Requirements

What This Means

Related Articles

Anthropic launches Claude Tag for Slack, writes 65% of its product team's code

OpenAI releases GPT-5.5-Cyber with 85.6% CyberGym score, surpassing restricted Anthropic model

Anthropic launches Claude Tag for Slack: AI agent with persistent memory across team channels

GitHub Copilot CLI Gets Redesigned Terminal Interface in General Availability

Comments