Bootstrap Guide¶
This guide walks you through bootstrapping the cluster from scratch—useful for disaster recovery or setting up a new cluster.
Prerequisites¶
Before starting, ensure you have:
- Hardware: Three nodes with Talos OS images (USB or PXE boot)
- Network: Nodes connected to your home network with access to 192.168.5.0/24
- Tools: All CLI tools installed via
mise install - Secrets: Access to aKeyless with
akeylessCLI configured - Repository: This Git repository cloned locally
Tool Setup
Install all required tools via mise:
Configure aKeyless CLI:
akeyless configure --access-type api_key \
--access-id <your-access-id> \
--access-key <your-access-key>
Verify authentication:
Bootstrap Overview¶
The bootstrap process follows these stages (defined in bootstrap/mod.just):
graph TB
A[1. Install Talos] --> B[2. Bootstrap Kubernetes]
B --> C[3. Fetch Kubeconfig]
C --> D[4. Wait for Nodes]
D --> E[5. Apply Namespaces]
E --> F[6. Apply Resources]
F --> G[7. Apply CRDs]
G --> H[8. Deploy Core Apps]
H --> I[9. Restore CNPG Clusters] Full Bootstrap Command¶
To run all stages in sequence:
This is equivalent to running:
just bootstrap talos
just bootstrap k8s
just bootstrap kubeconfig
just bootstrap wait
just bootstrap namespaces
just bootstrap rook-ceph-external # If using external Ceph
just bootstrap resources
just bootstrap crds
just bootstrap apps
just bootstrap cnpg
Step-by-Step Bootstrap¶
For more control or troubleshooting, run each stage individually:
Stage 1: Install Talos¶
What it does:
- Reads node configurations from
talos/nodes/*.yaml.j2 - Renders Talos machine configs using
minijinja-cli - Injects secrets from aKeyless using
bootstrap/scripts/akeyless-inject.sh - Applies configurations to each node via
talosctl apply-config --insecure
Expected output:
INFO Running stage... stage=talos
INFO Talos config applied node=k8s-1
INFO Talos config applied node=k8s-2
INFO Talos config applied node=k8s-3
If Nodes Already Have Talos
If nodes are already running Talos, the command detects this and skips:
To force reapplication, manually run:
Stage 2: Bootstrap Kubernetes¶
What it does:
- Runs
talosctl bootstrapon the first control plane node (determined bytalosctl config info) - Initializes etcd and the Kubernetes control plane
- Retries if bootstrap is already in progress
Expected output:
Bootstrap Only Runs Once
If Kubernetes is already bootstrapped:
This is normal—it means the cluster is already initialized. The bootstrap command handles this gracefully.
Stage 3: Fetch Kubeconfig¶
What it does:
- Fetches
kubeconfigfrom the cluster viatalosctl kubeconfig - Saves to
kubeconfigin the repository root - Sets context name to
main - Configures kubectl to use the Cilium LoadBalancer VIP (192.168.5.210)
Expected output:
Verify:
You should see nodes in NotReady state (Cilium isn't installed yet).
Stage 4: Wait for Nodes¶
What it does:
- Waits for all nodes to transition from
NotReady→Ready - Ensures the cluster is stable before proceeding
Expected output:
INFO Running stage... stage=wait
INFO Waiting for nodes to be ready...
node/k8s-1 condition met
node/k8s-2 condition met
node/k8s-3 condition met
Stage 5: Apply Namespaces¶
What it does:
- Scans
kubernetes/apps/*/namespace.yamlfiles - Extracts namespace definitions using
kustomize - Applies namespaces using
kubectl apply --server-side
Expected output:
INFO Running stage... stage=namespaces
namespace/actions-runner-system created
namespace/cert-manager created
namespace/database created
namespace/default configured
...
Stage 6: Apply Resources¶
What it does:
- Renders
bootstrap/resources.yaml.j2usingminijinja-cli - Injects secrets from aKeyless via
akeyless-inject.sh - Applies rendered resources (ConfigMaps, Secrets, etc.)
Expected output:
INFO Running stage... stage=resources
âś“ Injecting secrets...
configmap/cluster-secrets created
secret/github-deploy-key created
...
What's in resources.yaml.j2:
cluster-secretsConfigMap: Contains cluster-wide variables likeSECRET_DOMAIN, VIP addresses, etc.- GitHub deploy keys for Flux
- Other bootstrap-time secrets
Stage 7: Apply CRDs¶
What it does:
- Renders CRDs from
bootstrap/helmfile.d/00-crds.yamlusing Helmfile - Applies Custom Resource Definitions to the cluster
Expected output:
INFO Running stage... stage=crds
customresourcedefinition.apiextensions.k8s.io/helmreleases.helm.toolkit.fluxcd.io created
customresourcedefinition.apiextensions.k8s.io/kustomizations.kustomize.toolkit.fluxcd.io created
...
Stage 8: Deploy Core Apps¶
What it does:
- Deploys core infrastructure from
bootstrap/helmfile.d/01-apps.yaml - Apps deploy in strict dependency order via Helmfile
Deployment order:
- Cilium: CNI networking (eBPF, kube-proxy replacement)
- CoreDNS: Cluster DNS
- Spegel: OCI image mirror (reduces external registry load)
- cert-manager: TLS certificate automation
- external-secrets: Syncs secrets from aKeyless
Expected output:
INFO Running stage... stage=apps
Upgrading release=cilium, chart=cilium/cilium
Release "cilium" does not exist. Installing it now.
...
Upgrading release=coredns, chart=coredns/coredns
...
Verify:
kubectl get pods -n kube-system
kubectl get pods -n cert-manager
kubectl get pods -n external-secrets
Stage 9: Restore CNPG Clusters¶
What it does:
- Waits for CNPG operator to be ready
- Attempts to apply CNPG cluster definitions from
bootstrap/cnpg/ - If backups exist in S3, clusters restore from backup
- If backups don't exist, cluster creation fails gracefully and Flux creates fresh clusters later
Expected output (with backups):
Expected output (no backups):
INFO Running stage... stage=cnpg
ERROR: cluster.postgresql.cnpg.io/pgsql-cluster: backup not found
INFO CNPG cluster creation failed (likely no backups exist). Clusters will be created by Flux. stage=cnpg
This is normal! Flux will create fresh CNPG clusters if backups don't exist.
Post-Bootstrap¶
After bootstrap completes, Flux takes over:
- Flux watches Git: Monitors
kubernetes/flux/for changes - Apps deploy automatically: Flux reconciles
kubernetes/apps/and deploys all applications - Renovate creates PRs: Automated dependency updates via GitHub Actions
Verify Flux is running:
Watch app deployments:
Common Bootstrap Scenarios¶
Scenario 1: Fresh Cluster (No Backups)¶
Starting from scratch with no existing backups:
# 1. Boot nodes from Talos ISO
# 2. Run full bootstrap
just bootstrap
# 3. Wait for Flux to deploy apps (5-10 minutes)
watch kubectl get pods -A
# 4. Configure apps manually (databases, auth, etc.)
Scenario 2: Disaster Recovery (With Backups)¶
Rebuilding after catastrophic failure:
# 1. Factory reset all nodes
just talos reset-node k8s-1 # Confirm prompt
just talos reset-node k8s-2 # Confirm prompt
just talos reset-node k8s-3 # Confirm prompt
# 2. Boot nodes from Talos ISO
# 3. Run full bootstrap
just bootstrap
# 4. CNPG clusters restore from B2 backups automatically
# 5. Flux deploys apps
# 6. Restore VolSync PVCs for stateful apps
# Restore app data (example: Immich)
just kube volsync-restore default immich 1
Reset Nodes Wipe All Data
just talos reset-node performs a factory reset, wiping:
- All Kubernetes state
- All persistent volumes
- All Talos configuration
Always ensure you have:
- VolSync backups of application data
- CNPG backups of databases
- Access to aKeyless for secrets
Before running reset commands!
Scenario 3: Partial Bootstrap (Testing Changes)¶
Testing changes to bootstrap configuration:
# Re-run specific stages
just bootstrap namespaces
just bootstrap resources
just bootstrap crds
just bootstrap apps
Troubleshooting Bootstrap¶
Nodes Stuck in NotReady¶
Cause: Cilium failed to deploy
Fix:
# Check Cilium status
kubectl get pods -n kube-system | grep cilium
# View Cilium logs
kubectl logs -n kube-system daemonset/cilium
# Re-deploy Cilium via Helmfile
just bootstrap apps
Bootstrap Hangs at "Waiting for nodes"¶
Cause: Node networking issue or Talos config error
Fix:
# Check Talos service status on each node
talosctl -n 192.168.5.211 service kubelet status
talosctl -n 192.168.5.212 service kubelet status
talosctl -n 192.168.5.213 service kubelet status
# Check node logs
talosctl -n 192.168.5.211 logs kubelet
# Reboot stuck nodes
just talos reboot-node k8s-1
aKeyless Secrets Not Injecting¶
Cause: aKeyless CLI not authenticated
Fix:
# Re-authenticate
akeyless auth
# Test secret retrieval
akeyless get-secret-value --name talos/MACHINE_TOKEN
# Re-run resource stage
just bootstrap resources
CRDs Already Exist¶
Error: customresourcedefinition.apiextensions.k8s.io/kustomizations.kustomize.toolkit.fluxcd.io already exists
Fix: This is normal! kubectl apply --server-side handles existing resources gracefully. The bootstrap continues.
Bootstrap File Reference¶
Key files involved in bootstrap:
| File | Purpose |
|---|---|
bootstrap/mod.just | Bootstrap stage definitions |
bootstrap/scripts/akeyless-inject.sh | Injects ak:// secrets from aKeyless |
bootstrap/resources.yaml.j2 | Bootstrap-time ConfigMaps and Secrets |
bootstrap/helmfile.d/00-crds.yaml | Flux CRDs |
bootstrap/helmfile.d/01-apps.yaml | Core infrastructure apps |
bootstrap/cnpg/ | CNPG cluster definitions with backup recovery |
talos/machineconfig.yaml.j2 | Base Talos configuration |
talos/nodes/k8s-*.yaml.j2 | Node-specific Talos patches (k8s-1, k8s-2, k8s-3) |
Next Steps¶
- Infrastructure Overview: Learn about Talos OS and node architecture
- Secrets Management: Understand aKeyless integration
- Operations Guide: Day-to-day cluster management