Author Topic: workers in desktop user mode can't pick up qbwrk.conf settings  (Read 9095 times)

tomo

  • Jr. Member
  • **
  • Posts: 4
I have a environment like this:

Qube version: 6.4.2
Supe: CentOS 6.2
Workers: Windows 7

Workers are running in proxy mode as "render", which is an AD authenticated account. Due to a problem in 3dsMax, I had to switch all workers to Desktop User mode. Once set in this mode, there are 2 serious problems:

1. workers not picking up any settings in the centralized qbwrk.conf file at bootup, unless I  issued "qbadmin w --reconfig" manually. But then it will lead to the next problem.
2. jobs are pending forever with these reasons:

 3 workers report:  host missing jobtype: _3dsmax
 2 workers report:  no hosts available in job's host list

The only way for the jobs to be dispatched to the workers again is to reboot them. Now all the jobs are happy but then we're back to problem #1.

Did anyone else have similar problems? Any fix?

BrianK

  • Hero Member
  • *****
  • Posts: 107
Re: workers in desktop user mode can't pick up qbwrk.conf settings
« Reply #1 on: May 02, 2013, 06:48:35 PM »
Workers should be picking up qbwrk.conf regardless of the mode in which they are running.  Confirm that you don't have autostart set to "service on boot" (which is the default).  It's possible you're starting both the service and the desktop worker.

To check this, you'll need to go to each computer (or at least a sampling of a few), open the GUI, then go to Administration > Autostart Worker: xxx > Enable Desktop User.  Do this, then restart the GUI and see that it's still set to start on user login.  If it reverts back to "service on boot", then the running user doesn't have permissions to change the service.  You can try running the gui as an Administrator (right-click the Qube icon > "Run as administrator"), or you can log in as an administrator, start qube, then disable autostart.... then re-login as the user who should be running qube and set the autostart to be desktop user.
Alternatively, Assuming your workers are Windows, you can go into the Services control panel  check that the "qubeworker" service is not running & not set to auto load.

Important Note: When running in desktop user mode, you *MUST* set the "disable_windows_job_object" job flag.  I strongly recommend setting this as a supervisor_job_flag on the supervisor - this will force all jobs to use that flag.

Workers that report they are missing the jobtype _3dsmax are, in fact, missing the jobtype.  You'll need to install the jobtype (and Perl) on those machines.  You can determine which machines they are by going to the workers tab, clicking a worker, then scroll to the bottom of that worker's Properties tab.  You will see a list of available jobtypes.  Some of them will be missing the "_3dsmax" entry.

tomo

  • Jr. Member
  • **
  • Posts: 4
Re: workers in desktop user mode can't pick up qbwrk.conf settings
« Reply #2 on: May 02, 2013, 08:03:42 PM »
I can confirm that all Windows workers is set to start on user login and not "service on boot", as well as all your other suggestions (incl disable_windows_job_object and check if qubeworker is not running). No avail. I have the Worker Tray in the Windows Startup folder. Should I remove it?

About missing the _3dsmax jobtype, it's already installed on all workers (and Perl & Python too). We made our customized _3dsmax jobtype and have put them onto a centralized location which is appended to "worker_template_path". Then I observed 2 scenarios.

1. When workers are rebooted, I see no qbwrk.conf settings on the workers. (proxy_account = qubeproxy instead of our "render" account, etc) Jobs are dispatched just fine. But since they don't know about the centralized jobtype location, the customized _3dsmax jobtype was never run.

2. When I issued "qbadmin w --reconfig" manually, the workers' settings are changed immediately to whatever was in qkwrk.conf. So supposedly it now knows about our customized jobtype. But jobs are pending forever just like what I mentioned. In fact, all jobtypes (even cmdline or cmdrange) are pending forever with at least "no hosts available in job's host list" reason.
« Last Edit: May 02, 2013, 08:06:30 PM by tomo »

tomo

  • Jr. Member
  • **
  • Posts: 4
Re: workers in desktop user mode can't pick up qbwrk.conf settings
« Reply #3 on: May 06, 2013, 07:53:40 AM »
There are dozens of segfault messages in /var/log/messages per day:

Code: [Select]
[root@qube ~]# grep segfault /var/log/messages
May  6 14:06:21 qube kernel: supervisor[20823]: segfault at 54 ip 00000000006aa760 sp 00007fff455cdd70 error 4 in supervisor[400000+34b000]
May  6 14:06:21 qube kernel: supervisor[21668]: segfault at 54 ip 00000000006aa760 sp 00007fff455cdd70 error 4 in supervisor[400000+34b000]
May  6 14:26:15 qube kernel: supervisor[19674]: segfault at 54 ip 00000000006aa760 sp 00007fff455ce6c0 error 4 in supervisor[400000+34b000]
May  6 14:43:45 qube kernel: supervisor[22767]: segfault at 54 ip 00000000006aa760 sp 00007fff455ce6c0 error 4 in supervisor[400000+34b000]
May  6 15:00:22 qube kernel: supervisor[21671]: segfault at 54 ip 00000000006aa760 sp 00007fff455cdd70 error 4 in supervisor[400000+34b000]
May  6 15:10:22 qube kernel: supervisor[23499]: segfault at 54 ip 00000000006aa760 sp 00007fff455cdd70 error 4 in supervisor[400000+34b000]
May  6 15:16:23 qube kernel: supervisor[23865]: segfault at 54 ip 00000000006aa760 sp 00007fff455cdd70 error 4 in supervisor[400000+34b000]
May  6 15:22:23 qube kernel: supervisor[24071]: segfault at 54 ip 00000000006aa760 sp 00007fff455cdd70 error 4 in supervisor[400000+34b000]
May  6 15:24:04 qube kernel: supervisor[24396]: segfault at 54 ip 00000000006aa760 sp 00007fff6136d130 error 4 in supervisor[400000+34b000]
May  6 15:26:02 qube kernel: supervisor[24457]: segfault at 54 ip 00000000006aa760 sp 00007fff6136d130 error 4 in supervisor[400000+34b000]

I did a simple test like this:
  • I submitted a few basic cmdline (dir) jobs. All workers are idle and the jobs are pending with the "no hosts available in job's host list" reason.
  • Restarted the supervisor service. Now the forever pending jobs were immediately dispatched like normal.
  • Then at some point a segfault happened to appear, the running instances kept running until complete. But no more new instances were dispatched since then. So the once running jobs became pending again forever.

Seems to me the segfault is causing the problem.