PipelineFX Forum

Qube! => General => Topic started by: infinitesunrise on June 11, 2014, 10:15:25 PM

Title: [Solved] Supervisor not reporting new workers
Post by: infinitesunrise on June 11, 2014, 10:15:25 PM
I'm currently in the middle of an overhaul of my farm which includes new disk images for all my render nodes. While creating the master image for my nodes, I made sure that it was showing up correctly in the supervisor, which it was. I then removed it from the farm via qbadmin w --remove and then cleared the ban with qbadmin w --clearbanned. Everything worked as expected.

However, after cloning this image out to all of my nodes (CentOS 6.5, which I've had no trouble with in Qube previously) and tweaking each node to have the correct IP and MAC addresses, they aren't being reported in the GUI or on the supervisor. I can't get them to show up and can't get the supervisor to acknowledge that they exist. I can ping the super from the nodes and nodes from the super just fine. Also, the worker logs on my nodes say that they're connected to the supervisor just fine, and are reporting requests for updates from the supervisor and successfully updating those requests. Everything seems to be working great, except for the fact that I can't see any of my new nodes in the GUI and the qbhosts commends on the supervisor returns nothing.

I suspect this is an issue with my supervisor and not my nodes, because my company has two offices and I did this exact same process in the other office just last month. However, I don't have a clue as to how to troubleshoot. Any help would be very appreciated!
Title: Re: Supervisor not reporting new workers
Post by: BrianK on June 12, 2014, 12:47:01 AM
First thing's first, make sure you're explicitly setting qb_supervisor to be the name or ip address of the supervisor.  This should be done on the image.  At the end of the day, that setting will go into /etc/qb.conf.  Along those same lines, be sure that qb_domain is the same on every machine.  The default value for qb_domain is "qube" and should remain that way unless you are managing multiple farms on the same network.

Be sure your firewall/iptables is turned OFF.  If you can't turn them off, be sure you add the correct rules for Qube to work through your firewall: http://docs.pipelinefx.com/x/zoJ

The next thing to look at is the "Banned Hosts" list in WranglerView: http://docs.pipelinefx.com/x/iYF.  You said you cleared the ban, but to be sure, check the banned hosts list.

After that, look at the worker's workerlog to see if there are any error messages. I suggest look at the workerlog from the bottom (last line) up.

Lastly, you should look in the supervisor's supelog. Look for the hostname and/or ip address of the machine that should be joining but isn't. If there's an entry there, there will likely also be a reason as to why it wasn't able to join.

If you're running Qube 6.5, be sure you're running at least Qube 6.5-2 which officially supports CentOS 6.5.  Qube 6.5-3 is the most recent & recommended version as of this writing.
Title: Re: Supervisor not reporting new workers
Post by: infinitesunrise on June 12, 2014, 02:22:43 PM
 ;D Well I'll be! The worker's qb_domain parameter was the problem. For the longest time we've been putting our company's internal domain in that field and it's been working fine. But I checked the configuration on the supervisor and sure enough, it's domain was set to just "qube". So I set it back to that on my new workers, and they showed up! I guess I'll go back and switch it on my existing workers as well, just to be safe.

Thank you!!