Author Topic: all worker hosts have stopped communicating with supervisor  (Read 3054 times)

westernx

  • Hero Member
  • *****
  • Posts: 55
all worker hosts have stopped communicating with supervisor
« on: March 11, 2010, 06:10:59 PM »
  I'm having an issue where all the workers won't start their worker daemons on the linux side, and won't communicate with the supervisor.

[root@vfx26 ~]# service worker restart
[root@vfx26 ~]#
 
usually this displays that the worker has started and the log is initialized

journal: /var/spool/qube/worker5.jnl
INFO: redirecting output to: '/var/log/workerlog'

We have some linux workstations and some mac workstations.  All the mac worker daemons restart no problem

bash-3.2# /Applications/pfx/qube/sbin/worker restart
bash-3.2# journal: /var/spool/qube/worker5.jnl
INFO: redirecting output to: '/var/log/workerlog'

I've done the following to try and fix this ;

on the supervisor ;
./qbadmin worker --refresh
./qbadmin worker --clearbanned
./qbadmin worker --forcereport
./supervisor restart

on the linux workers;
service worker restart ( nothing happens )

[root@vfx26 sbin]# ./qbadmin supervisor --find
total: 1 supervisor(s)
ipaddress
10.0.0.184

and on the supe

vfx-xserve:sbin root# qbadmin worker --configuration vfx26
ERROR: unable to establish tcp connection with vfx26 - unable to connect to host.
ERROR: unable to contact worker.
ERROR: unable to contact worker

but to a mac workstation

vfx-xserve:sbin root# qbadmin worker --configuration vfx02
items: 53
0: client_cluster=
1: client_drive_map=
2: client_host_domain=
3: client_job_flags=auto_mount
4: client_priority=-1
5: client_restrictions=
6: proxy_account=qubeproxy
7: proxy_execution_mode=proxy
8: proxy_group=
9: proxy_location=/Applications/pfx/qube//sbin/proxy
10: proxy_nice_value=0
11: proxy_password=********
 . . .  etc

help

R

westernx

  • Hero Member
  • *****
  • Posts: 55
Re: all worker hosts have stopped communicating with supervisor
« Reply #1 on: March 11, 2010, 10:17:30 PM »
. . . it seems it has taken awhile after reloading the supervisor, for all the worker host daemons to work again.  After a couple of hours service worker restart, started working again on most of the linux workstations.  Some I actually manually removed and reinstalled the worker rpm package.  This fixed the ones with the broken worker daemons.  So this is resolved for now . . .

R