Kube Pod Self-Healer

The problem

When apps run in Kubernetes (K8s), work runs in pods (one or more containers grouped together). Pods crash, run out of memory, fail health checks, or cannot download their container image. Operators often run the same fix commands over and over — delete the pod, let the cluster recreate it, scale up, alert the team.

What I built

Kube Pod Self-Healer is a portfolio demo of that workflow. A Go health agent watches pod status. When it sees a known failure pattern, it sends the event to a Python remediation service that applies bounded fixes — restart the pod, scale replicas, clear a cache — under least-privilege permissions. Terraform and Kind (Kubernetes in Docker) make the whole stack reproducible on a laptop.

Think of it as a tireless operator for boring, repetitive incidents so humans can focus on judgment calls.

Connection to my day job

At work I automate incident response, queue management, and deployment safety nets. This repo is where I experiment openly with self-healing: how much automation helps before it becomes dangerous, and how to keep detection separate from action so each layer stays testable.

What I learned

Efficient informers (Kubernetes watchers that cache state locally) beat raw watches for Application Programming Interface (API) load. Fix handlers should be safe to run twice and leave an audit trail. Self-healing is a spectrum — the goal is faster recovery, not removing humans from every decision.

Repo

Full source and design notes are on GitHub.