Architecture¶

Krayne is organized into five modules with a clear dependency direction. This page describes the module structure, design principles, and how the pieces fit together.

Module overview¶

graph TD
  CLI["<b>CLI</b> (cli/)<br/>Typer commands + Rich output"]
  SDK["<b>SDK</b> (api/)<br/>Functional API — all business logic"]
  Config["<b>Config</b> (config/)<br/>Pydantic models + YAML loading"]
  Kube["<b>K8s Client</b> (kube/)<br/>Protocol + manifest builder"]
  Output["<b>Output</b> (output/)<br/>Rich formatters"]

  CLI -->|"calls"| SDK
  CLI -->|"uses"| Output
  SDK -->|"reads"| Config
  SDK -->|"calls"| Kube

Dependency rule: cli/ depends on api/. api/ depends on config/ and kube/. Nothing depends on cli/. The output/ module is used only by cli/.

Modules¶

`krayne.cli` — CLI layer¶

The CLI is a thin Typer shell over the SDK. It has three responsibilities:

Parse command-line arguments
Call the corresponding SDK function
Format the result for the terminal

No business logic lives here. If a feature can't be tested through the SDK alone, it's in the wrong layer.

`krayne.api` — SDK layer¶

The core of Krayne. All cluster lifecycle operations are implemented as free functions:

create_cluster(), get_cluster(), list_clusters()
describe_cluster(), scale_cluster(), delete_cluster()
wait_until_ready(), get_cluster_services(), open_tunnel()

Each function takes explicit parameters and returns immutable dataclasses. The Kubernetes client is injected via the client parameter (defaults to None, which creates a client from kubeconfig).

`krayne.config` — Configuration layer¶

Pydantic models for cluster configuration:

ClusterConfig — top-level config
HeadNodeConfig, WorkerGroupConfig, ServicesConfig — nested models
load_config_from_yaml() — YAML loading with override support

Validation happens at this boundary. By the time data reaches the SDK, it's guaranteed to be valid.

`krayne.kube` — Kubernetes client layer¶

The only module that talks to the Kubernetes API. Contains:

KubeClient protocol — structural interface (Protocol) for K8s operations
DefaultKubeClient — implementation using the official kubernetes Python client
build_manifest() — pure function that converts ClusterConfig into a KubeRay RayCluster CRD dict

The protocol-based design means any object with the right methods works as a client — no inheritance, no registration.

`krayne.output` — Output formatters¶

Rich formatters for CLI display:

format_cluster_created() — success panel after creation
format_cluster_list() — table of clusters
format_cluster_details() — multi-table detailed view
format_json() — JSON output for any dataclass

Request flow¶

This sequence diagram shows what happens when you run krayne create my-cluster:

sequenceDiagram
  participant User
  participant CLI as CLI (Typer)
  participant SDK as SDK (api/)
  participant Config as Config (Pydantic)
  participant Manifest as Manifest Builder
  participant Client as KubeClient
  participant K8s as Kubernetes API

  User->>CLI: krayne create my-cluster
  CLI->>Config: Build ClusterConfig from flags
  Config-->>CLI: Validated config
  CLI->>SDK: create_cluster(config)
  SDK->>Manifest: build_manifest(config)
  Manifest-->>SDK: RayCluster CRD dict
  SDK->>Client: create_ray_cluster(manifest)
  Client->>K8s: POST /apis/ray.io/v1/rayclusters
  K8s-->>Client: 201 Created
  Client-->>SDK: Cluster object
  SDK-->>CLI: ClusterInfo dataclass
  CLI->>User: Rich panel output

Design principles¶

Functional-first¶

Behavior lives in free functions, not classes. Data lives in dataclasses and Pydantic models. There are no inheritance hierarchies, no abstract base classes for business logic.

Why: Functions are stateless, composable, and trivial to test. Pass input, get output, assert.

CLI = SDK¶

The CLI is a thin wrapper. Every operation available from the command line is available as a Python function with the same semantics.

Why: Users who start with the CLI can graduate to the SDK for automation without learning a new API.

Progressive disclosure¶

Every command works with zero flags. Defaults produce a working cluster. Power users override via flags or YAML.

Why: Low barrier to entry. ML practitioners shouldn't need Kubernetes expertise to get a cluster.

Explicit dependencies¶

The KubeClient is passed as a parameter, never imported directly by the SDK functions. Default None means "create from kubeconfig" — explicit, not magical.

Why: Makes the dependency visible at every call site. Makes testing straightforward.

Pydantic at the boundary¶

All configuration input is validated by Pydantic models. Return types are plain frozen dataclasses (output doesn't need re-validation).

Why: Catches invalid config early with clear error messages. Return types stay simple and fast.

Manifest generation¶

The build_manifest() function is a pure function that converts a ClusterConfig into a KubeRay RayCluster custom resource dict. It has no side effects and no I/O — it's deterministic and easy to snapshot-test.

flowchart LR
  Config["ClusterConfig"] --> Build["build_manifest()"]
  Build --> CRD["RayCluster CRD dict"]
  CRD --> K8s["Kubernetes API"]

CRD mapping:

Krayne Config	KubeRay CRD Path
`config.name`	`metadata.name`
`config.namespace`	`metadata.namespace`
`head.cpus`	`spec.headGroupSpec...resources.requests.cpu`
`head.memory`	`spec.headGroupSpec...resources.requests.memory`
`worker_groups[n].replicas`	`spec.workerGroupSpecs[n].replicas`
`worker_groups[n].gpus`	`spec.workerGroupSpecs[n]...resources.limits[nvidia.com/gpu]`

Error handling¶

All exceptions inherit from KrayneError:

classDiagram
  KrayneError <|-- ClusterNotFoundError
  KrayneError <|-- ClusterAlreadyExistsError
  KrayneError <|-- ConfigValidationError
  KrayneError <|-- ClusterTimeoutError
  KrayneError <|-- KubeConnectionError
  KrayneError <|-- KubeRayNotInstalledError
  KrayneError <|-- NamespaceNotFoundError
  KrayneError <|-- SandboxError
  SandboxError <|-- DockerNotFoundError
  SandboxError <|-- SandboxAlreadyExistsError
  SandboxError <|-- SandboxNotFoundError

The CLI catches KrayneError and renders a Rich panel. SDK users catch specific subtypes for fine-grained handling.

Testing strategy¶

Layer	Approach
Config	Unit tests: valid/invalid inputs, default merging, YAML round-trips
K8s Client	Mock `kubernetes-client`. Snapshot-test `build_manifest()` output.
SDK	Inject mock `KubeClient`. Test full lifecycle independently.
CLI	Typer `CliRunner` snapshot tests for output formatting and flag parsing.
Integration	End-to-end against a real or kind cluster. Full create/scale/delete lifecycle.

The KubeClient protocol is the key enabler: unit tests inject a MagicMock, integration tests use DefaultKubeClient against a real cluster.