Skip to content

Overview

CLI and SDK for creating, managing, and scaling Ray clusters on Kubernetes.

Krayne wraps the KubeRay operator behind a clean, opinionated interface so ML practitioners can get distributed compute without touching Kubernetes manifests.


Why Krayne?

Running Ray on Kubernetes typically requires writing verbose YAML manifests for the RayCluster custom resource, understanding Kubernetes CRDs, pod specs, resource requests, and node selectors, stitching together kubectl commands for lifecycle management, and manually configuring services like dashboards, notebooks, and SSH. This is a significant barrier for ML practitioners who just want distributed compute.

Krayne eliminates that friction:

  • One command to a working clusterkrayne create my-cluster gives you a Ray cluster with notebooks and SSH ready to go.
  • SDK for automation — the same operations are available as Python functions for pipelines, scripts, and notebooks.
  • No Kubernetes knowledge required — sensible defaults handle resource allocation, service configuration, and manifest generation.
  • Full escape hatch — power users can override any setting via YAML or drop down to raw KubeRay manifests.

How it works

graph LR
  User["You"] -->|"CLI or Python"| Krayne["Krayne"]
  Krayne -->|"builds manifest"| KubeRay["KubeRay CRD"]
  KubeRay -->|"reconciles"| Ray["Ray Cluster"]
  Ray -->|"serves"| Services["Dashboard\nNotebook\nSSH"]

Both the CLI and SDK produce the same result — a fully configured Ray cluster with dashboard, notebook, and SSH access:

# Create a GPU cluster with 2 workers
krayne create my-experiment --gpus-per-worker 1 --workers 2

# Check status
krayne describe my-experiment

# Scale up
krayne scale my-experiment --replicas 4

# Clean up
krayne delete my-experiment --force
from krayne.api import create_cluster, scale_cluster, delete_cluster
from krayne.config import ClusterConfig, WorkerGroupConfig

config = ClusterConfig(
    name="my-experiment",
    namespace="ml-team",
    worker_groups=[
        WorkerGroupConfig(replicas=2, gpus=1, gpu_type="a100")
    ],
)

# Create and wait for ready
info = create_cluster(config, wait=True)
print(f"Dashboard: {info.dashboard_url}")

# Scale up
scale_cluster("my-experiment", "ml-team", "worker", replicas=4)

# Clean up
delete_cluster("my-experiment", "ml-team")

At a glance

Feature Details
Language Python 3.10+
CLI framework Typer + Rich
Config validation Pydantic v2
K8s integration kubernetes-client
CRD target KubeRay RayCluster (ray.io/v1)
Architecture Functional-first, stateless SDK
License Apache 2.0

Architecture

graph TD
  CLI["<b>CLI</b><br/>Typer + Rich"]
  SDK["<b>SDK</b><br/>Functional API"]
  Config["<b>Config</b><br/>Pydantic models"]
  Kube["<b>KubeClient</b><br/>Protocol + manifest"]
  Output["<b>Output</b><br/>Rich formatters"]
  K8s["<b>Kubernetes API</b>"]

  CLI --> SDK
  CLI --> Output
  SDK --> Config
  SDK --> Kube
  Kube --> K8s
Module Responsibility
CLI (krayne.cli) Parse arguments, call SDK, format output
SDK (krayne.api) All business logic as free functions
Config (krayne.config) Pydantic models + YAML loading
KubeClient (krayne.kube) Kubernetes API calls + manifest building
Output (krayne.output) Rich tables and panels for terminal display

The CLI is a thin wrapper — every operation available from the command line is available as a Python function with the same semantics.


Key features

Feature Description
Zero-config defaults krayne create my-cluster just works — sensible CPU, memory, and service defaults
GPU support One flag to add GPUs: --gpus-per-worker 1 --worker-gpu-type a100
YAML configuration Full cluster spec in a YAML file for version control and reproducibility
Local sandbox krayne sandbox setup spins up a local k3s cluster with KubeRay for development
JSON output Every command supports --output json for scripting and pipelines
Functional SDK Stateless free functions — no classes to instantiate, no state to manage
Testable by design KubeClient Protocol enables mock injection without patching imports

What's next

  • Quickstart


    Install Krayne and create your first cluster in under 5 minutes.

    Quickstart

  • Core Concepts


    Understand Ray clusters, KubeRay, and the cluster lifecycle.

    Core Concepts

  • CLI Reference


    Full reference for every krayne command, flag, and option.

    CLI Reference

  • Python SDK


    Use Krayne programmatically in scripts, notebooks, and pipelines.

    SDK Reference