RunRL Documentation

Learn how to train and improve language models with reinforcement learning

Quick Start

Get started with RunRL in just a few steps:

Sign up: Create an account using Google OAuth
Upload prompts: Prepare your training data in JSONL format
Define rewards: Write a Python function or custom environment
Run training: Launch RL training on GPU clusters
Monitor progress: Track training in real-time

Prompt File Format

Training prompts should be in JSONL format (JSON Lines):

{"prompt":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is 2+2?"}],"expected_result":"4"}
{"prompt":[{"role":"user","content":"What is the capital of France?"}],"expected_result":"Paris"}

Reward Functions

Define how to evaluate model outputs:

def reward_fn(completion, **kwargs):
    response = completion[0].get('content', '')
    expected = kwargs.get('expected_result')
    
    if expected and str(expected) in response:
        return 1.0
    return 0.0

Custom Environments

For advanced use cases, create complete RL environments:

class CustomEnv:
    def setup(self, **kwargs):
        self.max_steps = kwargs.get("max_steps", 10)
        self.current_step = 0
        
    def step(self, action: str):
        self.current_step += 1
        done = self.current_step >= self.max_steps
        
        return {
            "observation": "Environment response",
            "reward": 1.0 if "correct" in action else 0.0,
            "done": done
        }

Get Started with RunRL