Converging a Frankenstein homelab
Engineering is not always about having the newest equipment in an expensive rack. It is about making disparate systems work together under load.
I recently spent a weekend overhauling my “Frankenstein” lab — a collection of refurbished hardware and mismatched parts I have accumulated over the past couple of years. The goal was to migrate the entire stack to a professional Layer 3 routed architecture and provide a more stable foundation for MLOps and SRE experiments. What I hoped would be a clean transition naturally turned into a multi-day deep dive into network convergence.
As we all know at this point in our careers, the “simple” move is where the real fun happens.
After a few rounds of trial, error, and my network being down for a few days, I finally brought the infrastructure back into focus.
1. Managing cluster quorum and state
Moving my Proxmox cluster to a new management subnet immediately triggered a split-brain scenario. With nodes unable to reach consensus, the filesystem went read-only to protect data integrity — a critical SRE lesson in state management. I had to force quorum manually and re-engineer the Corosync heartbeat to align with the new IP scheme.
2. Solving asymmetric routing
I hit the classic no-reply hurdle: packets reached their destination, but the return path was a black hole. Because my nodes and VMs now live on different subnets, I had to implement host-level static routes to bypass the WAN gateway and point traffic back to the core switch. In a platform engineering context, Layer 3 is only as good as its return paths.
3. SSH identity and persistence
The final boss was my development environment. Between Windows permission locks on local keys and cached remote authorities in my IDE, I spent more time than I would like to admit purging metadata and unifying fragmented SSH configs to ensure CI/CD pipelines and remote deployments stayed seamless.
Takeaway
The hardware might be Frankenstein, but the logic is enterprise. Building in an environment where things are not perfect forces you to understand the nuances.
The lab is now fully converged, VLAN-aware, and back online to support the next round of sandbox deployments.