Skip to content

Architecture

Hardware

Three Lenovo ThinkCentre M70q Tiny (Gen 3) running Talos Linux.

Node Mgmt IP Ceph IP
k8s-1 192.168.5.211 192.168.43.11
k8s-2 192.168.5.212 192.168.43.12
k8s-3 192.168.5.213 192.168.43.13
  • VIP: 192.168.5.210:6443
  • GPU: Intel i915 iGPU on all nodes — Plex and Jellyfin use it for hardware transcoding
  • OS configs: Minijinja templates in kubernetes/talos/, never edit rendered output directly

Networking

Physical

  • UDM-Pro (192.168.5.1): router, firewall, DNS controller
  • Primary LAN: 192.168.5.0/24 — management and cluster traffic
  • Ceph network: 192.168.43.0/24 — dedicated 2.5GbE for storage replication
  • VPN VLAN: 192.168.99.0/24 — VLAN 99 via Multus for qBittorrent

LAN DNS

UniFi controller manages the .internal domain for non-cluster hosts:

  • truenas.internal, unraid.internal, ai3090.internal — static entries in UniFi
  • The unifi-dns pod syncs cluster HTTPRoutes into UniFi so LAN clients resolve cluster services without touching Cloudflare

Cluster

No kube-proxy

Cilium is the eBPF replacement. Standard kube-proxy debug tools don't apply. Use cilium CLI or Hubble for network debugging.

  • CNI: Cilium — eBPF-based, completely replaces kube-proxy
  • Load balancing: Cilium L2 ARP announcements, IP pool 192.168.5.200–250
  • Routing: HTTPRoute resources, not legacy Ingress objects
  • Auth: Authentik SSO via Envoy SecurityPolicy forward-auth

Ingress — Envoy Gateway runs two instances:

Internet-facing. Cloudflared terminates the Cloudflare tunnel and forwards traffic here. All *.t0m.co requests enter through this gateway.

LAN-only. UniFi DNS points local clients directly to this gateway's LoadBalancer IP. No Cloudflare involvement.

DNS

Three layers of resolution:

flowchart LR
    Pod -->|cluster DNS| CoreDNS["CoreDNS\n10.43.0.10"]
    LAN["LAN Client"] -->|*.t0m.co| UniFi["unifi-dns\n→ UniFi Controller"]
    Internet -->|*.t0m.co| CF["external-dns\n→ Cloudflare"]
    UniFi --> EI[envoy-internal]
    CF --> Tunnel[cloudflared] --> EE[envoy-external]
Layer Service Scope
In-cluster CoreDNS Pod-to-pod, service discovery (10.43.0.10)
LAN unifi-dns Syncs HTTPRoutes → UniFi controller
External external-dns Syncs envoy-external routes → Cloudflare (proxied)

Certificates

cert-manager handles TLS via Let's Encrypt with DNS-01 challenges through Cloudflare API.

Storage

Class Backend Use
ceph-ssd (default) Rook-Ceph, Samsung SSDs All persistent workloads
openebs-hostpath Local node storage CNPG clusters, victoria-logs, actions-runner
nfs-media TrueNAS NFS Media libraries

S3-compatible object storage via RustFS (ceph-ssd backed). No ceph-rbd, no CephFS.

Backups

  • VolSync: Restic-based PVC snapshots → Backblaze B2
  • CNPG: Barman-cloud WAL archiving + base backups → Backblaze B2

Databases

PostgreSQL (CNPG)

Two clusters — don't mix them up

They use different images. pgsql-cluster is standard PostgreSQL 17. immich17 has the vectorchord extension for vector search.

Cluster Image Use
pgsql-cluster cloudnative-pg/postgresql:17 General apps (Authentik, Gatus, Homebox, Mealie, etc.)
immich17 tensorchord/cloudnative-vectorchord:17 Immich (vector search)

Read-write endpoint: <cluster>-rw.database.svc.cluster.local

Apps declare dependsOn: [{name: cnpg-cluster, namespace: database}] in their ks.yaml.

Cache

Dragonfly (Redis-compatible) at dragonfly-cluster.database.svc.cluster.local:6379:

DB Consumer
0 Default
2 Immich
3 Searxng

Secrets

All secrets live in aKeyless and sync into the cluster via ExternalSecret CRDs. Cluster-wide variables are injected through postBuild.substituteFrom: cluster-secrets in each app's ks.yaml.

GitOps

Flux CD watches this repo and reconciles on every push.

  • Entry point: ks.yaml per app — defines dependsOn, postBuild substitutions, and components
  • Components: Reusable patterns in kubernetes/components/ — volsync, cnpg, ext-auth-internal, ext-auth-external, keda

kubectl edits are ephemeral

Flux resets them on the next reconciliation. Always edit in Git, push, and reconcile.