Configuration¶
Krayne uses Pydantic v2 models for all cluster configuration. This provides type validation, sensible defaults, and clear error messages for invalid input.
Configuration sources¶
Configuration is resolved from three sources, in order of precedence (highest wins):
flowchart LR
CLI["<b>CLI Flags</b><br/>(highest priority)"]
YAML["<b>YAML File</b><br/>(--file cluster.yaml)"]
Defaults["<b>Built-in Defaults</b><br/>(lowest priority)"]
CLI --> Merge["Merge &<br/>Validate"]
YAML --> Merge
Defaults --> Merge
Merge --> Config["ClusterConfig"]
- CLI flags — individual options like
--gpus-per-worker 1 - YAML file — full cluster spec via
--file cluster.yaml - Built-in defaults — sensible values so zero-config works
Minimal configuration¶
The only required field is name. Everything else has a default:
This creates a cluster with:
- Head: 1 CPU, 4 Gi memory (control plane only —
runs_tasks=False, so Ray'snum-cpus=0) - 1 worker group named
worker: autoscaling from 0 to 1 worker (0 initial), 1 CPU, 2 Gi memory each - Autoscaling enabled by default (Ray in-tree autoscaler,
Defaultupscaling mode, 60 s idle timeout) - Jupyter notebook, code-server, and SSH all enabled on the head node
Autoscaling¶
Krayne enables Ray v2 in-tree autoscaling by default. The Ray autoscaler sidecar runs on the head pod and dynamically scales worker groups between configurable min/max bounds.
Default behavior¶
By default, worker groups start with min_replicas=0, replicas=0, max_replicas=1. The autoscaler will scale workers up when Ray tasks or actors demand resources, and scale them back down after an idle timeout.
CLI flags¶
# Create with custom autoscaling bounds
krayne create my-cluster --min-workers 0 --max-workers 10 --workers 2
# Disable autoscaling (pin replicas)
krayne create my-cluster --no-autoscaling --workers 4
YAML configuration¶
name: my-experiment
autoscaler:
enabled: true
idle_timeout_seconds: 120
upscaling_mode: Aggressive # Default, Aggressive, or Conservative
worker_groups:
- name: gpu-workers
replicas: 2
min_replicas: 0
max_replicas: 10
gpus: 1
Python SDK¶
from krayne.config import ClusterConfig, WorkerGroupConfig, AutoscalerConfig
config = ClusterConfig(
name="auto-cluster",
autoscaler=AutoscalerConfig(idle_timeout_seconds=120),
worker_groups=[
WorkerGroupConfig(replicas=2, min_replicas=0, max_replicas=10),
],
)
Disabling autoscaling¶
Set autoscaler.enabled = false to pin minReplicas == maxReplicas == replicas in the manifest (no autoscaler sidecar):
YAML configuration¶
For complex setups, define your cluster in a YAML file:
name: my-experiment
namespace: ml-team
head:
cpus: 8
memory: 32Gi
worker_groups:
- name: cpu-workers
replicas: 4
cpus: 15
memory: 48Gi
- name: gpu-workers
replicas: 2
gpus: 1
image: rayproject/ray:2.41.0-gpu
services:
notebook: true
code_server: true
Overriding YAML values with CLI flags¶
CLI flags take precedence over YAML values:
# YAML sets workers=1, but this creates 4
krayne create my-experiment --file cluster.yaml --workers 4
Loading YAML from Python¶
from krayne.config import load_config_from_yaml
# Basic load
config = load_config_from_yaml("cluster.yaml")
# With overrides (supports dot-notation for nested fields)
config = load_config_from_yaml(
"cluster.yaml",
overrides={"namespace": "staging", "head.cpus": 32},
)
Default values rationale¶
| Default | Value | Why |
|---|---|---|
| Head CPUs | 1 |
Minimum that boots the head pod (GCS, dashboard, autoscaler, postStart services). Head is clamped up to at least 1 CPU / 4Gi memory in the manifest builder. |
| Head Memory | 4Gi |
Same minimum-boot rationale — clamped up if you set a smaller value. |
Head runs_tasks |
false |
Ray num-cpus=0 on the head, so user tasks are routed to workers. (No GPU support on the head — GPUs belong on workers.) |
| Worker Replicas | 0 |
Autoscaler manages worker count |
| Worker Min Replicas | 0 |
Scale to zero when idle |
| Worker Max Replicas | 1 |
Conservative default; increase for production |
| Worker CPUs | 1 |
Small CPU request so the cluster fits on modest sandboxes |
| Worker Memory | 2Gi |
Small memory request so the cluster fits on modest sandboxes |
| Worker GPUs | 0 |
CPU-only by default; opt in via --gpus-per-worker |
| Autoscaling | enabled | KubeRay in-tree autoscaler manages worker lifecycle |
| Idle Timeout | 60s |
Scale down unused workers after 60 seconds |
| Upscaling Mode | Default |
One of Conservative, Default, Aggressive |
| Notebook | enabled | Jupyter notebook on port 8888 |
| Code Server | enabled | Browser IDE on port 8443 |
| SSH | enabled | SSH to the head pod on port 22 (loopback-only inside the pod) |
Config validation¶
ClusterConfig uses Pydantic's extra = "forbid" mode — unknown fields in YAML or keyword arguments raise a ConfigValidationError:
from krayne.config import ClusterConfig
# This raises ConfigValidationError — "unknown_field" is not a valid field
config = ClusterConfig(name="test", unknown_field="value")
╭──── Error ─────────────────────────────────╮
│ Configuration validation error: │
│ Extra inputs are not permitted │
╰────────────────────────────────────────────╯
See Configuration Models Reference for full field definitions and types.
What's next¶
- Error Handling — how to debug and handle errors
- Configuration Models Reference — complete Pydantic model field tables