Kubernetes Operator’s Playbook: Habits
Mastering Kubernetes isn't just about understanding the architecture; it is about building the right muscle memory. Whether you are spinning up a local lab on WSL or managing a multi-node production cluster, the commands you run and the habits you form dictate your success as an infrastructure engineer.
Here is a complete guide to setting up a local Minikube playground, deploying a stateful app, and the daily routines you need to operate at a senior level.
Part 1: The Local Lab (Minikube on WSL)
WSL is a fantastic environment for local Kubernetes, but it requires navigating a few permission and networking quirks.
1. The Setup
You need Docker, Kubectl, and Minikube. Run this to get the binaries in place:
Bash
# Install Docker
sudo apt update && sudo apt install -y docker.io
sudo usermod -aG docker $USER
newgrp docker
# Install Kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
# Install Minikube
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube
2. Avoiding the WSL Traps
The Root Trap: Never start Minikube with sudo or as the root user when using the Docker driver. It will explicitly fail.
The PATH Trap: If you switch to a standard user via su -, your environment variables might drop standard paths, resulting in command not found errors. Fix this permanently before starting the cluster:
Bash
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
echo 'export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:$PATH' >> ~/.bashrc
Start your cluster:
Bash
minikube start --driver=docker
(Note: If you ever see a localhost:8080 timeout error when running kubectl, it means your cluster is stopped or your context is lost. Running minikube start automatically fixes both).
3. Deploying State (Grafana)
Let's deploy Grafana using a PersistentVolumeClaim (PVC) so data survives pod restarts, and a NodePort service to expose it. Create grafana-setup.yaml:
YAML
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
containers:
- name: grafana
image: grafana/grafana:latest
ports:
- containerPort: 3000
volumeMounts:
- name: grafana-storage
mountPath: /var/lib/grafana
volumes:
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-pvc
---
apiVersion: v1
kind: Service
metadata:
name: grafana-service
spec:
type: NodePort
selector:
app: grafana
ports:
- port: 3000
targetPort: 3000
nodePort: 32000
Deploy it:
Bash
kubectl apply -f grafana-setup.yaml
4. Access and Teardown
Because WSL networking can complicate direct NodePort access from the Windows browser, port-forwarding is the most reliable method:
Bash
kubectl port-forward svc/grafana-service 3000:3000
Navigate to http://localhost:3000 (Default: admin / admin).
When you are done experimenting, clean up:
Bash
kubectl delete -f grafana-setup.yaml
minikube stop
minikube delete
Part 2: The "First 15 Minutes" Routine
Great operators don't wait for pages to alert them to cluster rot. Make it a habit to run these checks every morning to understand the state of your infrastructure.
Read the Cluster's Diary (Events):
kubectl get events --sort-by='.lastTimestamp' -A | tail -20
Events reveal the silent failures: why pods are failing to schedule or why nodes are complaining.
Check Resource Pressure:
kubectl top nodes
kubectl top pods -A
Catch memory leaks or CPU bottlenecks before they trigger evictions.
Spot Failing Pods Instantly:
kubectl get pods -A | grep -v -E 'Running|Completed'
This filters out the noise, revealing only CrashLoopBackOff, Pending, or Error states.
Check Node Health:
kubectl get nodes -o wide
Look for Ready status, mismatched OS versions, or unexpected reboots.
Part 3: Core Debugging Muscle Memory
When a deployment fails, run through this 4-step sequence without overthinking it.
Is it internal cluster DNS/Connectivity?Keep a Swiss-army-knife container ready to test networking from inside the cluster:Bash
kubectl run -i --tty --rm debug --image=nicolaka/netshoot -- restart=Never -- sh
Is it a network routing issue?Bash
kubectl port-forward svc/<service-name> 8080:80 -n <namespace>
Bypass the ingress entirely. If port-forwarding works but the public URL doesn't, your ingress controller or DNS is the culprit.
What is the application complaining about?Bash
kubectl logs <pod-name> -n <namespace> --tail=100 -f
Why isn't it running?Bash
kubectl describe pod <pod-name> -n <namespace>
Scroll straight to the "Events" at the bottom to find failing probes, image pull errors, or resource constraints.
Part 4: The Infra Engineer Golden Rules
Commands change, but this mindset separates juniors from seniors.
- Never
kubectl editin production. Manual changes cause configuration drift and will be overwritten by your GitOps pipeline (like ArgoCD or Flux). Always update the YAML and apply. - Resource Requests & Limits are Non-Negotiable. Never deploy a pod without defining memory and CPU limits. Uncapped pods will consume node resources until they cause cascading
OOMKilledfailures across the cluster. - Respect the State. Compute (Pods) is disposable and stateless. Storage (PVCs, Databases) is fragile. Always double-check volume retention policies before tearing down deployments.
- Trust Monitoring Over Instinct. Keep your Prometheus alerts and Grafana dashboards clean. If you have to dig through raw terminal logs to discover a critical production failure, your observability stack needs immediate attention.
Verify Your Context. Before running any destructive command (delete, drain), verify exactly where you are pointing:Bash
kubectl config current-context