I had a long afternoon today. I spent a good part of it troubleshooting my vCenter installation and why the hosts were connecting for about 30 seconds and then showing a disconnected / failed state. I am hoping with this quick little note, I can save many others from the torture that I endured today.
Here is the scenario: It is a brand new installation of a SAN with 4 blade servers. I know that is generic and not very informative, but I am not out to push any products. Two of the blades are going to be used for a small environment, one of the blades is the backup server and vCenter manager, and the remaining blade will be used for voice equipment. I have built a handful of servers (two domain controllers, two file and print servers, one application server, two exchange servers, and one WSUS server.) All of these servers were joined to the newly created domain and seem to be functioning fine. The backup server was not on the domain yet so following recommended best practice I decided I would add it to the domain and control the security and other items of vSphere through Active Directory. This is where things started getting strange. Up to this point, everything was working great, so I could only assume the configuration was correct.
The moment I joined the server to the domain and rebooted the hosts were showing disconnected. I also found it strange that I was no longer able to connect using vSphere client from a remote location, but I was able to connect using vSphere on the server locally. I started investigating, and I setup DNS correctly on the hosts, and the vCenter management server. I tested that I was able to ping and resolve the servers, everything was working as expected, except the host servers were not responding at all, and showing disconnected in the vSphere client. The reason that I pursued the idea that resolution was failing at some point was because these were setup prior to the domain and DNS servers, and had been using host files for resolution. The vCenter server name changed slightly when it was joined to the domain.
The most bizarre part of all of this, is I was able to right click the hosts and choose the connect option, but after about 30 seconds, they would disconnect. I was also able to perform tasks on the virtual machines *and* the host for the 30 seconds it was connected.
I found several articles online of people experiencing the same problem I was currently facing. I heard of everything from remove the server from the domain, to reinstall vCenter, a little drastic in my opinion. To be honest, and I hate to admit this, a peer of mine suggested looking at the Windows firewall and verify that it was off. I blew that advice off at first, which was my big mistake. It would have saved me approximately 2 hours of frustration today had I just taken 30 seconds to check the firewall settings. When the server was built, the firewalls were disabled, but as soon as the server was joined to the domain, the domain profile of the firewall was re-enabled. This was blocking traffic that was very important for the functionality of vCenter. Once I disabled this profile, everything magically started working again and the hosts showed “connected” again in the vSphere managment console. Moral of the story, ALWAYS check the little things.
If you are having intermittent issues with communication between your vCenter manager and host machines, check the windows firewall.