RunRL Documentation
Learn how to train and improve language models with reinforcement learning
Quick Start
Get started with RunRL in just a few steps:
- Sign up: Create an account using Google OAuth
- Upload prompts: Prepare your training data in JSONL format
- Define rewards: Write a Python function or custom environment
- Run training: Launch RL training on GPU clusters
- Monitor progress: Track training in real-time
Prompt File Format
Training prompts should be in JSONL format (JSON Lines):
{"prompt":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is 2+2?"}],"expected_result":"4"}
{"prompt":[{"role":"user","content":"What is the capital of France?"}],"expected_result":"Paris"}
Reward Functions
Define how to evaluate model outputs:
def reward_fn(completion, **kwargs):
response = completion[0].get('content', '')
expected = kwargs.get('expected_result')
if expected and str(expected) in response:
return 1.0
return 0.0
Custom Environments
For advanced use cases, create complete RL environments:
class CustomEnv:
def setup(self, **kwargs):
self.max_steps = kwargs.get("max_steps", 10)
self.current_step = 0
def step(self, action: str):
self.current_step += 1
done = self.current_step >= self.max_steps
return {
"observation": "Environment response",
"reward": 1.0 if "correct" in action else 0.0,
"done": done
}