Architecture

This page contains information on the architecture and capabilities of the Kubernetes environment in my homelab. Below you can find introductory information on various aspects of the environment, with links leading you to more detailed information in the respective sections of this documentation.

Cluster Resources¶

The cluster runs on four Turing RK1 compute modules, thus resulting in the following combined resources being available in the cluster:

Resource	Value
CPUs	32 Cores
RAM	128GB
Disk	4TB SSD storage

Operating System¶

The OS of choice for my Kubernetes environment is Talos. The reasons for this are simple:

Immutable
Declarative
Minimal
Support for my hardware

You can find more information on Talos here.

Operating Mode¶

Kubernetes is configured in High Availability mode, with 3 nodes being members of the control plane. This ensures that even in case of node updates or failures, the cluster API continues to work:

Control plane nodes: talos-1, talos-2, talos-3
Worker nodes: tba

Since the cluster consists of just 4 nodes in total, the control plane nodes are configured to allow for 'normal' workloads to be scheduled on them. This means that the total cluster load is (roughly) equally shared among all four nodes, despite just one of them being a worker node.

Networking¶

My Container Network Interface (CNI) of choice is Cilium. It has been adopted by most cloud providers and many end users as the de-facto default CNI for Kubernetes, and comes with a lot of features useful for my homelab, e.g.

support for cluster-wide and node-level NetworkPolicies
L2 load balancing
doesn't need kube-proxy

You can find more information on Cilium here.

Storage¶

As outlined in the Hardware section, all cluster nodes posses very little disk space (just 32GB), but are equipped with 1TB NVMe SSDs. Therefore, the Kubernetes clusters makes almost no use of local disk storage, and instead utilizes PersistentVolumes for persistent data.

The storage engine used in the cluster is Rook, which provides file-, block-, and object storage on Kubernetes based on Ceph.

You can find more information on Rook here.

Deployments¶

The Kubernetes components running in the cluster need to be deployed somehow. This includes infrastructure workloads such as Cilium or Rook, as well as applications I am self-hosting within the cluster.

These applications are installed and managed using a GitOps approach. The GitOps engine in question is Flux. I went with Flux because it's lightweight and provides full OCI support.

You can find more information on Flux here.

Observability¶

Observing all the moving parts of a Kubernetes cluster can quickly become overwhelming. Goal number 1 for me was not to reinvent the wheel.

Since I am using Grafana Cloud's freemium tier, I am leveraging many of the available integrations for Kubernetes, Docker, etc. using Alloy, Grafana's OTel collector distribution.

This also allows me to manage a fleet of Alloy instances, should I require multiple OTel collectors in the future.

You can find more information on observability of my homelab here.