Troubleshooting Workload Management
I ran into an issue when enabling Workload Management in my lab where I was unable to ping the Control Plane IP Node IP Address. I followed a few different blog posts to get the environment setup. Specifically, Florian Grehl’s blog and Viktor van den Berg’s blog. I highly recommend them, as they are both are excellent resources for getting started with Workload Management (among other things).
I validated that I met all of the prerequisites, so I enabled Workload Management and an hour or so later I received the “You have successfully enabled Workload Management” message. Sweet! Time to play with some Kubernetes. I create a Namespace, and go to download the CLI Tools via the Namespace Summary page and nothing. Well that’s a bummer.
Everything looks good on the surface. I have the correct subnets configured for my Ingress and Egress CIDR’s. I do know they work because I have other workloads leveraging those subnets. But there’s definitely no connectivity to the Control Plane IP Node Address. Troubleshooting time!
A lot of what I’m able to gather suggests a communication problem between vCenter and NSX; specifically about fragmented packets due to a misconfigured MTU. I validate that all my interface MTU’s are at 1600, so no issues there. Everything looks good in NSX from what I can tell. So what can it be?
I start some traceroutes, and see that they’re going nowhere. My Tier-0 Gateway is configured with BGP to my Sophos firewall, so I head over there to see what’s being advertised. And there it is (or rather isn’t)! None of the routes from the new Tier-1 Gateway are advertised!
I enabled Route Advertisement for NAT IP’s, and LB VIP Routes, and BAM! Route convergence occurs and NSX tells my Sophos about some new friends it made. I am now able to access the Control Plane CLI Tools and get going with some Workload Management!