Author Topic: worker lock question  (Read 7734 times)

throb

  • Full Member
  • ***
  • Posts: 14
worker lock question
« on: September 20, 2009, 03:56:48 PM »
hey all,
i rebooted my workstation here to find out that it was not rendering anymore.  on some poking around i found this :
Locks:host.processor_all=0

this is not on any of the nodes or other workstations.  I can't find a reference to this in my local qb.conf or qbwork.conf on the supervisor.  I also can't find any reference to this exact thing in the docs.

I must have changed something somewhere because it was rendering 2 days ago just fine.  I just can't figure out where.

help is certainly appreciated.

rob

shinya

  • Administrator
  • *****
  • Posts: 229
Re: worker lock question
« Reply #1 on: September 21, 2009, 10:30:19 PM »
Hi Rob,

A lock value of "host.processor_all=0" means that none of your job slots on that
worker is locked.  When somebody issued a "qbunlock <HOST>" (or an equivalent
from the GUI) then the value can appear in the host's property.  The lock setting
is saved in the supervisor's host/worker database, and not in the qb.conf.

In any case, I'm not sure if the reason why your worker isn't taking jobs anymore
could be related to the lock or not.

Try submitting a job specifically to that worker, and see the "Pending Reason"
in the "qbhosts -l <HOST>" output, or in the GUI's job properties pane. What
does it say?  You can also go down to the command prompt and try
"qbhostorder <JOBID>" to figure out the pending reason for a job.

-shinya.



throb

  • Full Member
  • ***
  • Posts: 14
Re: worker lock question
« Reply #2 on: September 21, 2009, 10:32:53 PM »
thanks for the info shinya.
i will poke around with that and let you know how it turns out.  wacky stuff.

rob

throb

  • Full Member
  • ***
  • Posts: 14
Re: worker lock question
« Reply #3 on: September 22, 2009, 04:50:31 AM »
typing qbhosts gives me :
rnd_3ghz_4gb_01                00:1C:C0:D0:CE:FC  192.168.3.20   down    0/1   nuke, max, maya, vray  node
rnd_3ghz_4gb_02                00:1C:C0:C7:06:CD  192.168.3.21   down    0/1   nuke, max, maya, vray  node
rnd_3ghz_4gb_03                00:1C:C0:D0:CF:83  192.168.3.22   down    0/1   nuke, max, maya, vray  node
throb-PC                       00:04:4B:00:02:12  192.168.3.100  active  0/2   nuke, max, maya, vray  workstation

qbhosts -l throb-PC gives me :
throb-PC  00:04:4B:00:02:12  192.168.3.100  active  0/2

Host Details:
        Restrictions:
        Resources:
                host.processors=0/2
                host.memory=2631/8190
                host.swap=71/8388607
        Flags: 28 (auto_mount,remove_logs,load_profile)
        Description:
        Stats:
        Properties:
                host.qube_version=5.4-2
                host.processor_speed=2666
                host.architecture=
                host.proxy_mode=proxy
                host.os=winnt
                host.qube_build=bld-5-4-2009-07-01-0
                host.kernel_version=6.1
                host.cpus=2
                host.worker_mode=service
                host.processor_model=
                host.processor_make=GenuineIntel
                host.qube_class=
        Job Types:
                cmdfile
                cmdmulti
                maya
                3dsmax
                cmdline
                frame
                throbnuke
                cmdrange
                frame2
        Locks:
                host.processor_all=0

Panic Reason:


Running Subjobs:
        none

running the command you gave me :
C:\Users\throb>qbhostorder 357
total: 0/5 cpu(s)
name                           address        reason
rnd_3ghz_4gb_01                192.168.3.20   host is down, no hosts available in job's host list

throb-PC                       192.168.3.100  none

rnd_3ghz_4gb_03                192.168.3.22   host is down, no hosts available in job's host list

rnd_3ghz_4gb_02                192.168.3.21   host is down, no hosts available in job's host list

there is NO reason :)
i am a bit baffled by this honestly.  since i have a tiny farm, throb-PC is a serious chunk of my processing power here at home.  the rest of qube runs silky smooth, but this is wacky as all get out.

rob

throb

  • Full Member
  • ***
  • Posts: 14
Re: worker lock question
« Reply #4 on: September 22, 2009, 09:30:27 AM »
hey neat.  now my machine is turning from active to 'down'.  !?
throb-work  00:04:4B:00:02:12  192.168.3.100  down   0/2
(i changed the name, first removing the worker and then unremoving it and doing a clearbanned)
what's interesting is that at some point in the qubegui it said it was running a jobsub.  the gui had that in the worker layout, but not in the job agenda layout.  totally strange as hell.  i am completely clueless since there is no reason the qube service to be down at all.  i checked and it's running the process.

however, i opened up the worker log and saw this :

[Sep 22, 2009 2:18:17] throb-work : tracking: 0 jobs.
[Sep 22, 2009 2:18:17] throb-work : supervisor has no locks recorded for this host.
[Sep 22, 2009 2:18:17] throb-work : sending host status report to the supervisor.
[Sep 22, 2009 2:18:18] throb-work : supervisor 192.168.3.1 host report - report successful.
[Sep 22, 2009 2:18:18] throb-work : variable: worker_cpus = 2
[Sep 22, 2009 2:18:18] throb-work : variable: worker_jobtypes = 3dsmax,cmdfile,cmdline,cmdmulti,cmdrange,frame,frame2,maya,throbnuke
[Sep 22, 2009 2:18:18] throb-work : variable: last_activity = 0
[Sep 22, 2009 2:18:18] throb-work : variable: firewall = 0
INFO: opened address: 0.0.0.0 port: 50011 type: udp.
INFO: opened address: 0.0.0.0 port: 50011 type: tcp.
INFO: mac address: 00:04:4B:00:02:12

opened the address 0.0.0.0 ??  what the monkey is that?

throb

  • Full Member
  • ***
  • Posts: 14
Re: worker lock question
« Reply #5 on: September 22, 2009, 09:53:23 AM »
a couple of qubeworker service restarts and this is always at the end of the worker log:

INFO: opened address: 0.0.0.0 port: 50011 type: udp.
INFO: opened address: 0.0.0.0 port: 50011 type: tcp.
INFO: mac address: 00:04:4B:00:02:12

so somehow qube is getting that IP but it won't run a job that i send to the worker
the worker says this :
throb-work                     00:04:4B:00:02:12  192.168.3.100  active  0/2
no wait, i did qbhosts a few more times and now it's down.

what else do you guys need to help debug this?

rob

shinya

  • Administrator
  • *****
  • Posts: 229
Re: worker lock question
« Reply #6 on: September 22, 2009, 09:41:39 PM »
Is there a firewall running on the worker/supe?

What do the end of workerlog and supelog say, right after you notice that it went "down"?


throb

  • Full Member
  • ***
  • Posts: 14
Re: worker lock question
« Reply #7 on: September 22, 2009, 10:11:46 PM »
the worker log says :
INFO: opened address: 0.0.0.0 port: 50011 type: udp.
INFO: opened address: 0.0.0.0 port: 50011 type: tcp.
INFO: mac address: 00:04:4B:00:02:12

i will have to check and get back to you on the supe log.  tis at home.

anything off the top of your head i can try?

rob

throb

  • Full Member
  • ***
  • Posts: 14
Re: worker lock question
« Reply #8 on: September 23, 2009, 08:08:58 AM »
holy crap.  welcome to n00b town.
the damned network must have reconfigured itself because the firewall in windows was causing this.  argh argh rgh.
the give-away were the lines that said :
ERROR: unable to establish tcp connection with 192.168.3.100 - unable to connect to host.
in the suplog.  that got me to thinking :)

so that looks like it's resolved. 
qube - 1
rob -0