Backed by Y Combinator Logo

Improve your models with Reinforcement Learning

University of California, Berkeley logo University of Wisconsin-Madison logo MIT logo University of Toronto logo

1. Define your task

Submit prompts and create custom reward functions that evaluate model outputs on your specific tasks.

2. Run RL

Our platform applies the reinforcement learning algorithms behind Deepseek R1 to optimize your model's performance.

3. Get Better Results

Deploy your improved model that's been optimized based on your specific reward criteria.

Visual representing specialized AI models

Specialized Models for Your Approach

Tired of tuning prompts to make generic models do what you want? Train your model to be good at your task—all you need to tell us is what's good and what's bad. We'll make the model perform reliably.

Learn how we taught a research agent to use tools › Beating o3-mini on performance and cost

Already know what you want?
Start training in seconds.

Integrate with your agents

Works with your existing code

Supports OpenAI, Anthropic, LiteLLM, and many more provider APIs

Continuous improvement

Let your agents self-improve based on your reward

View statistics

See exactly how much your agents have improved

Built for researchers and developers

$ pip install runrl

from runrl import RunRL
client = RunRL()
client.create_run(
    
    
    model_name="runrl/dsp",
    
    prompt_file="math_prompts.jsonl",
    reward_file="steganography_reward.py",
    
    
)
                        

Specialized Enterprise Agents

Custom reward development

We'll work with you to define targets, measure your agent performance, and help you outperform closed models.

World-class RL expertise

Work directly with our RL research team on your problems.

Integration with your stack

Seamlessly deploy your optimized agents into your existing infrastructure and workflows.

Book a 15-min Call