product updateAmazon Web Services

Amazon Bedrock adds reinforcement fine-tuning with OpenAI-compatible APIs

TL;DR

Amazon Bedrock now enables reinforcement fine-tuning (RFT) across multiple model families including Amazon Nova, open-weight models like OpenAI's GPT-OSS 20B, and Qwen 3 32B. The service automates the end-to-end customization workflow using GRPO optimization, allowing models to learn from feedback on multiple responses rather than static training datasets, with support for OpenAI-compatible APIs.

3 min read
0

Amazon Bedrock Adds Reinforcement Fine-Tuning with OpenAI-Compatible APIs

Amazon Bedrock now supports reinforcement fine-tuning (RFT) across multiple model families, beginning with Amazon Nova models in December 2025 and expanding to open-weight models including OpenAI's GPT-OSS 20B and Alibaba's Qwen 3 32B in February 2026.

How Reinforcement Fine-Tuning Works

Unlike traditional supervised fine-tuning, which requires large labeled datasets of input-output pairs, RFT enables models to learn through iterative feedback loops. The model generates multiple candidate responses, receives numerical reward scores based on performance criteria, and adjusts weights to favor higher-scoring outputs.

The process mirrors training a chess player: instead of memorizing every possible move, the player learns through practice and feedback on which decisions lead to winning positions. For LLMs, this translates to more efficient customization requiring fewer prompt examples upfront.

Core RFT components include:

  • Actor model: The foundation model being customized (Nova, Llama, Qwen, or others)
  • State: Current context including prompt, conversation history, and metadata
  • Action: The model's generated response
  • Reward function: A numerical scoring function evaluating response quality, which can verify correctness automatically (particularly effective for math and code tasks)

Bedrock's RFT implementation uses GRPO, a state-of-the-art reinforcement learning algorithm, with built-in convergence detection to automatically stop training at optimal points.

Technical Workflow

The end-to-end process leverages standard OpenAI SDK calls pointed at Bedrock's Mantle endpoint:

  1. Authentication: Configure the OpenAI client with Bedrock API keys generated via aws-bedrock-token-generator library
  2. Data upload: Submit training data via the Files API in JSONL format containing messages (prompt in OpenAI message format) and optional reference_answer fields
  3. Reward function: Deploy an AWS Lambda function that scores model-generated responses
  4. Job creation: Initiate fine-tuning through the OpenAI SDK; Bedrock automatically generates candidate responses, invokes the reward function, and updates model weights
  5. Monitoring: Track progress through CloudWatch metrics and the Bedrock console showing reward trends and policy updates
  6. Inference: Call the fine-tuned model on-demand without endpoint provisioning

AWS handles batching, parallelization, resource allocation, and error recovery transparently. Customer data remains within AWS environments and is not used to train Bedrock-provided models.

Key Advantages

RFT's online learning capability allows models to encounter novel scenarios during training, continuously improving without pre-collected labeled examples. This is particularly effective for verifiable tasks where correctness can be automatically evaluated—eliminating the need for human labeling at scale.

The approach addresses a critical limitation of traditional fine-tuning: the model learns from responses it generates during training, not only from static examples, enabling real-time adaptation and superior performance on complex tasks including code generation, mathematical reasoning, and multi-turn conversations.

Requirements

Implementation requires an AWS account with Bedrock access, Python with openai, boto3, and aws-bedrock-token-generator libraries, appropriate IAM roles for Lambda and Bedrock fine-tuning, and an Amazon Bedrock API key.

What This Means

Bedrock RFT lowers the barrier to enterprise-scale model customization by automating infrastructure complexity while maintaining API compatibility with OpenAI's SDK. For organizations with verifiable outputs (math problems, code, SQL queries), the efficiency gains are substantial—no need to pre-generate thousands of labeled examples. The expanding model support (Nova, Llama, Qwen) signals AWS's commitment to supporting both proprietary and open-weight models, though pricing details for RFT jobs have not been disclosed.

Related Articles

product update

OpenAI launches Trusted Contact feature allowing ChatGPT to alert designated friends during suicide risk

OpenAI has launched Trusted Contact for ChatGPT, allowing users 18+ to designate one adult contact who can be notified if the company's trained human review team detects serious self-harm risk. The feature comes after over 1 million of ChatGPT's 800 million weekly users expressed suicidal thoughts in conversations, and follows a 2025 wrongful death lawsuit.

product update

GitHub Reduces Token Usage in Copilot Agentic Workflows Running on Pull Requests

GitHub has optimized token usage in its production agentic workflows that run on every pull request. The company instrumented its own Copilot workflows to identify inefficiencies and built agents to address them, aiming to reduce accumulated API costs.

product update

GitHub reduces token costs in production agentic workflows with instrumentation and automated fixes

GitHub details how it reduced token consumption in production agentic workflows that run on every pull request. The company instrumented its own workflows to identify inefficiencies and built automated agents to address them.

product update

OpenAI launches GPT-Realtime-2 with GPT-5-class reasoning, adds real-time translation across 70 languages

OpenAI has added three voice intelligence features to its Realtime API: GPT-Realtime-2 with GPT-5-class reasoning for complex conversational requests, GPT-Realtime-Translate supporting 70 input languages and 13 output languages, and GPT-Realtime-Whisper for live speech-to-text transcription. Translation and transcription are billed by the minute, while GPT-Realtime-2 uses token-based pricing.

Comments

Loading...