nxtgauge-gitops/ops/k3s/README.md

64 lines
1.9 KiB
Markdown

# k3s Registry Node Configuration
This repo uses `registry.nxtgauge.com` for backend images.
## Why
Image pulls happen on k3s nodes via containerd, not inside cluster DNS context.
Using `*.svc.cluster.local` for image pulls can fail with DNS lookup errors from node runtime.
## Required node config
Each node must have `/etc/rancher/k3s/registries.yaml` configured with auth for the registry.
Template file:
- `ops/k3s/registries.yaml`
## Recommended node DNS/hosts override (prevents Cloudflare/proxy path)
Even if `registry.nxtgauge.com` is set to "DNS only" in Cloudflare, k3s nodes can still end up resolving to public/IPv6 records depending on upstream DNS/caches.
For reliable large image pulls/pushes (avoids `413 Payload Too Large` from proxies), point nodes directly at the in-cluster ingress VIP:
- Traefik VIPs: `10.0.0.2`, `10.0.0.3`, `10.0.0.5`
- Recommended: pick one stable VIP (example `10.0.0.2`) and map `registry.nxtgauge.com` to it on every node.
## Apply to all nodes
1. Export required env vars:
```bash
export K3S_NODES="node1 node2 node3"
export REGISTRY_USERNAME="<registry-user>"
export REGISTRY_PASSWORD="<registry-pass>"
export REGISTRY_VIP_IP="10.0.0.2" # optional but recommended
```
2. Apply config and restart k3s on each node:
```bash
./ops/k3s/apply-registries.sh
```
## Manual steps (if needed)
On each node:
1. Copy `registries.yaml` to `/etc/rancher/k3s/registries.yaml`
2. Restart runtime:
```bash
sudo systemctl restart k3s
# or for agents
sudo systemctl restart k3s-agent
```
3. Verify pod pulls:
```bash
kubectl -n nxtgauge get pods
kubectl -n nxtgauge describe pod <failing-pod>
```
## Notes
- Ensure DNS for `registry.nxtgauge.com` resolves from every k3s node.
- If DNS is not available, use a stable node-reachable IP and update:
- backend GitOps manifests
- backend Woodpecker registry push target
- `ops/k3s/registries.yaml`