Create natural and adaptive conversational AI that learns from interactions.
Submit prompts and create custom reward functions that evaluate model outputs on your specific tasks.
Our platform applies the reinforcement learning algorithms behind Deepseek R1 to optimize your model's performance.
Deploy your improved model that's been optimized based on your specific reward criteria.
Tired of tuning prompts to make generic models do what you want? Train your model to be good at your task—all you need to tell us is what's good and what's bad. We'll make the model perform reliably.
Learn how we taught a research agent to use tools › Beating o3-mini on performance and costSupports OpenAI, Anthropic, LiteLLM, and many more provider APIs
Let your agents self-improve based on your reward
See exactly how much your agents have improved
from runrl import RunRL
client = RunRL()
client.create_run(
model_name="runrl/dsp",
prompt_file="math_prompts.jsonl",
reward_file="steganography_reward.py",
)
We'll work with you to define targets, measure your agent performance, and help you outperform closed models.
Work directly with our RL research team on your problems.
Seamlessly deploy your optimized agents into your existing infrastructure and workflows.