Kubernetes Operator’s Playbook: Habits

Kubernetes Operator’s Playbook:  Habits
Photo by mk. s / Unsplash

Mastering Kubernetes isn't just about understanding the architecture; it is about building the right muscle memory. Whether you are spinning up a local lab on WSL or managing a multi-node production cluster, the commands you run and the habits you form dictate your success as an infrastructure engineer.

Here is a complete guide to setting up a local Minikube playground, deploying a stateful app, and the daily routines you need to operate at a senior level.

Part 1: The Local Lab (Minikube on WSL)

WSL is a fantastic environment for local Kubernetes, but it requires navigating a few permission and networking quirks.

1. The Setup

You need Docker, Kubectl, and Minikube. Run this to get the binaries in place:

Bash

# Install Docker
sudo apt update && sudo apt install -y docker.io
sudo usermod -aG docker $USER
newgrp docker

# Install Kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

# Install Minikube
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube

2. Avoiding the WSL Traps

The Root Trap: Never start Minikube with sudo or as the root user when using the Docker driver. It will explicitly fail.

The PATH Trap: If you switch to a standard user via su -, your environment variables might drop standard paths, resulting in command not found errors. Fix this permanently before starting the cluster:

Bash

export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
echo 'export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:$PATH' >> ~/.bashrc

Start your cluster:

Bash

minikube start --driver=docker

(Note: If you ever see a localhost:8080 timeout error when running kubectl, it means your cluster is stopped or your context is lost. Running minikube start automatically fixes both).

3. Deploying State (Grafana)

Let's deploy Grafana using a PersistentVolumeClaim (PVC) so data survives pod restarts, and a NodePort service to expose it. Create grafana-setup.yaml:

YAML

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: grafana-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
      - name: grafana
        image: grafana/grafana:latest
        ports:
        - containerPort: 3000
        volumeMounts:
        - name: grafana-storage
          mountPath: /var/lib/grafana
      volumes:
      - name: grafana-storage
        persistentVolumeClaim:
          claimName: grafana-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: grafana-service
spec:
  type: NodePort
  selector:
    app: grafana
  ports:
    - port: 3000
      targetPort: 3000
      nodePort: 32000

Deploy it:

Bash

kubectl apply -f grafana-setup.yaml

4. Access and Teardown

Because WSL networking can complicate direct NodePort access from the Windows browser, port-forwarding is the most reliable method:

Bash

kubectl port-forward svc/grafana-service 3000:3000

Navigate to http://localhost:3000 (Default: admin / admin).

When you are done experimenting, clean up:

Bash

kubectl delete -f grafana-setup.yaml
minikube stop
minikube delete

Part 2: The "First 15 Minutes" Routine

Great operators don't wait for pages to alert them to cluster rot. Make it a habit to run these checks every morning to understand the state of your infrastructure.

Read the Cluster's Diary (Events):

kubectl get events --sort-by='.lastTimestamp' -A | tail -20

Events reveal the silent failures: why pods are failing to schedule or why nodes are complaining.

Check Resource Pressure:

kubectl top nodes
kubectl top pods -A

Catch memory leaks or CPU bottlenecks before they trigger evictions.

Spot Failing Pods Instantly:

kubectl get pods -A | grep -v -E 'Running|Completed'

This filters out the noise, revealing only CrashLoopBackOff, Pending, or Error states.

Check Node Health:

kubectl get nodes -o wide

Look for Ready status, mismatched OS versions, or unexpected reboots.

Part 3: Core Debugging Muscle Memory

When a deployment fails, run through this 4-step sequence without overthinking it.

Is it internal cluster DNS/Connectivity?Keep a Swiss-army-knife container ready to test networking from inside the cluster:Bash

kubectl run -i --tty --rm debug --image=nicolaka/netshoot -- restart=Never -- sh

Is it a network routing issue?Bash

kubectl port-forward svc/<service-name> 8080:80 -n <namespace>

Bypass the ingress entirely. If port-forwarding works but the public URL doesn't, your ingress controller or DNS is the culprit.

What is the application complaining about?Bash

kubectl logs <pod-name> -n <namespace> --tail=100 -f

Why isn't it running?Bash

kubectl describe pod <pod-name> -n <namespace>

Scroll straight to the "Events" at the bottom to find failing probes, image pull errors, or resource constraints.

Part 4: The Infra Engineer Golden Rules

Commands change, but this mindset separates juniors from seniors.

  • Never kubectl edit in production. Manual changes cause configuration drift and will be overwritten by your GitOps pipeline (like ArgoCD or Flux). Always update the YAML and apply.
  • Resource Requests & Limits are Non-Negotiable. Never deploy a pod without defining memory and CPU limits. Uncapped pods will consume node resources until they cause cascading OOMKilled failures across the cluster.
  • Respect the State. Compute (Pods) is disposable and stateless. Storage (PVCs, Databases) is fragile. Always double-check volume retention policies before tearing down deployments.
  • Trust Monitoring Over Instinct. Keep your Prometheus alerts and Grafana dashboards clean. If you have to dig through raw terminal logs to discover a critical production failure, your observability stack needs immediate attention.

Verify Your Context. Before running any destructive command (delete, drain), verify exactly where you are pointing:Bash

kubectl config current-context

Subscribe to Experiment Lab

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe