Overview¶
CLI and SDK for creating, managing, and scaling Ray clusters on Kubernetes.
Krayne wraps the KubeRay operator behind a clean, opinionated interface so ML practitioners can get distributed compute without touching Kubernetes manifests.
Why Krayne?¶
Running Ray on Kubernetes typically requires writing verbose YAML manifests for the RayCluster custom resource, understanding Kubernetes CRDs, pod specs, resource requests, and node selectors, stitching together kubectl commands for lifecycle management, and manually configuring services like dashboards, notebooks, and SSH. This is a significant barrier for ML practitioners who just want distributed compute.
Krayne eliminates that friction:
- One command to a working cluster —
krayne create my-clustergives you a Ray cluster with notebooks and SSH ready to go. - SDK for automation — the same operations are available as Python functions for pipelines, scripts, and notebooks.
- No Kubernetes knowledge required — sensible defaults handle resource allocation, service configuration, and manifest generation.
- Full escape hatch — power users can override any setting via YAML or drop down to raw KubeRay manifests.
How it works¶
graph LR
User["You"] -->|"CLI or Python"| Krayne["Krayne"]
Krayne -->|"builds manifest"| KubeRay["KubeRay CRD"]
KubeRay -->|"reconciles"| Ray["Ray Cluster"]
Ray -->|"serves"| Services["Dashboard\nNotebook\nSSH"]
Both the CLI and SDK produce the same result — a fully configured Ray cluster with dashboard, notebook, and SSH access:
from krayne.api import create_cluster, scale_cluster, delete_cluster
from krayne.config import ClusterConfig, WorkerGroupConfig
config = ClusterConfig(
name="my-experiment",
namespace="ml-team",
worker_groups=[
WorkerGroupConfig(replicas=2, gpus=1, gpu_type="a100")
],
)
# Create and wait for ready
info = create_cluster(config, wait=True)
print(f"Dashboard: {info.dashboard_url}")
# Scale up
scale_cluster("my-experiment", "ml-team", "worker", replicas=4)
# Clean up
delete_cluster("my-experiment", "ml-team")
At a glance¶
| Feature | Details |
|---|---|
| Language | Python 3.10+ |
| CLI framework | Typer + Rich |
| Config validation | Pydantic v2 |
| K8s integration | kubernetes-client |
| CRD target | KubeRay RayCluster (ray.io/v1) |
| Architecture | Functional-first, stateless SDK |
| License | Apache 2.0 |
Architecture¶
graph TD
CLI["<b>CLI</b><br/>Typer + Rich"]
SDK["<b>SDK</b><br/>Functional API"]
Config["<b>Config</b><br/>Pydantic models"]
Kube["<b>KubeClient</b><br/>Protocol + manifest"]
Output["<b>Output</b><br/>Rich formatters"]
K8s["<b>Kubernetes API</b>"]
CLI --> SDK
CLI --> Output
SDK --> Config
SDK --> Kube
Kube --> K8s
| Module | Responsibility |
|---|---|
CLI (krayne.cli) |
Parse arguments, call SDK, format output |
SDK (krayne.api) |
All business logic as free functions |
Config (krayne.config) |
Pydantic models + YAML loading |
KubeClient (krayne.kube) |
Kubernetes API calls + manifest building |
Output (krayne.output) |
Rich tables and panels for terminal display |
The CLI is a thin wrapper — every operation available from the command line is available as a Python function with the same semantics.
Key features¶
| Feature | Description |
|---|---|
| Zero-config defaults | krayne create my-cluster just works — sensible CPU, memory, and service defaults |
| GPU support | One flag to add GPUs: --gpus-per-worker 1 --worker-gpu-type a100 |
| YAML configuration | Full cluster spec in a YAML file for version control and reproducibility |
| Local sandbox | krayne sandbox setup spins up a local k3s cluster with KubeRay for development |
| JSON output | Every command supports --output json for scripting and pipelines |
| Functional SDK | Stateless free functions — no classes to instantiate, no state to manage |
| Testable by design | KubeClient Protocol enables mock injection without patching imports |
What's next¶
-
Quickstart
Install Krayne and create your first cluster in under 5 minutes.
-
Core Concepts
Understand Ray clusters, KubeRay, and the cluster lifecycle.
-
CLI Reference
Full reference for every
kraynecommand, flag, and option. -
Python SDK
Use Krayne programmatically in scripts, notebooks, and pipelines.