PipelineFX Forum
Qube! => Installation and Configuration => Topic started by: throb on September 20, 2009, 03:56:48 PM
-
hey all,
i rebooted my workstation here to find out that it was not rendering anymore. on some poking around i found this :
Locks:host.processor_all=0
this is not on any of the nodes or other workstations. I can't find a reference to this in my local qb.conf or qbwork.conf on the supervisor. I also can't find any reference to this exact thing in the docs.
I must have changed something somewhere because it was rendering 2 days ago just fine. I just can't figure out where.
help is certainly appreciated.
rob
-
Hi Rob,
A lock value of "host.processor_all=0" means that none of your job slots on that
worker is locked. When somebody issued a "qbunlock <HOST>" (or an equivalent
from the GUI) then the value can appear in the host's property. The lock setting
is saved in the supervisor's host/worker database, and not in the qb.conf.
In any case, I'm not sure if the reason why your worker isn't taking jobs anymore
could be related to the lock or not.
Try submitting a job specifically to that worker, and see the "Pending Reason"
in the "qbhosts -l <HOST>" output, or in the GUI's job properties pane. What
does it say? You can also go down to the command prompt and try
"qbhostorder <JOBID>" to figure out the pending reason for a job.
-shinya.
-
thanks for the info shinya.
i will poke around with that and let you know how it turns out. wacky stuff.
rob
-
typing qbhosts gives me :
rnd_3ghz_4gb_01 00:1C:C0:D0:CE:FC 192.168.3.20 down 0/1 nuke, max, maya, vray node
rnd_3ghz_4gb_02 00:1C:C0:C7:06:CD 192.168.3.21 down 0/1 nuke, max, maya, vray node
rnd_3ghz_4gb_03 00:1C:C0:D0:CF:83 192.168.3.22 down 0/1 nuke, max, maya, vray node
throb-PC 00:04:4B:00:02:12 192.168.3.100 active 0/2 nuke, max, maya, vray workstation
qbhosts -l throb-PC gives me :
throb-PC 00:04:4B:00:02:12 192.168.3.100 active 0/2
Host Details:
Restrictions:
Resources:
host.processors=0/2
host.memory=2631/8190
host.swap=71/8388607
Flags: 28 (auto_mount,remove_logs,load_profile)
Description:
Stats:
Properties:
host.qube_version=5.4-2
host.processor_speed=2666
host.architecture=
host.proxy_mode=proxy
host.os=winnt
host.qube_build=bld-5-4-2009-07-01-0
host.kernel_version=6.1
host.cpus=2
host.worker_mode=service
host.processor_model=
host.processor_make=GenuineIntel
host.qube_class=
Job Types:
cmdfile
cmdmulti
maya
3dsmax
cmdline
frame
throbnuke
cmdrange
frame2
Locks:
host.processor_all=0
Panic Reason:
Running Subjobs:
none
running the command you gave me :
C:\Users\throb>qbhostorder 357
total: 0/5 cpu(s)
name address reason
rnd_3ghz_4gb_01 192.168.3.20 host is down, no hosts available in job's host list
throb-PC 192.168.3.100 none
rnd_3ghz_4gb_03 192.168.3.22 host is down, no hosts available in job's host list
rnd_3ghz_4gb_02 192.168.3.21 host is down, no hosts available in job's host list
there is NO reason :)
i am a bit baffled by this honestly. since i have a tiny farm, throb-PC is a serious chunk of my processing power here at home. the rest of qube runs silky smooth, but this is wacky as all get out.
rob
-
hey neat. now my machine is turning from active to 'down'. !?
throb-work 00:04:4B:00:02:12 192.168.3.100 down 0/2
(i changed the name, first removing the worker and then unremoving it and doing a clearbanned)
what's interesting is that at some point in the qubegui it said it was running a jobsub. the gui had that in the worker layout, but not in the job agenda layout. totally strange as hell. i am completely clueless since there is no reason the qube service to be down at all. i checked and it's running the process.
however, i opened up the worker log and saw this :
[Sep 22, 2009 2:18:17] throb-work : tracking: 0 jobs.
[Sep 22, 2009 2:18:17] throb-work : supervisor has no locks recorded for this host.
[Sep 22, 2009 2:18:17] throb-work : sending host status report to the supervisor.
[Sep 22, 2009 2:18:18] throb-work : supervisor 192.168.3.1 host report - report successful.
[Sep 22, 2009 2:18:18] throb-work : variable: worker_cpus = 2
[Sep 22, 2009 2:18:18] throb-work : variable: worker_jobtypes = 3dsmax,cmdfile,cmdline,cmdmulti,cmdrange,frame,frame2,maya,throbnuke
[Sep 22, 2009 2:18:18] throb-work : variable: last_activity = 0
[Sep 22, 2009 2:18:18] throb-work : variable: firewall = 0
INFO: opened address: 0.0.0.0 port: 50011 type: udp.
INFO: opened address: 0.0.0.0 port: 50011 type: tcp.
INFO: mac address: 00:04:4B:00:02:12
opened the address 0.0.0.0 ?? what the monkey is that?
-
a couple of qubeworker service restarts and this is always at the end of the worker log:
INFO: opened address: 0.0.0.0 port: 50011 type: udp.
INFO: opened address: 0.0.0.0 port: 50011 type: tcp.
INFO: mac address: 00:04:4B:00:02:12
so somehow qube is getting that IP but it won't run a job that i send to the worker
the worker says this :
throb-work 00:04:4B:00:02:12 192.168.3.100 active 0/2
no wait, i did qbhosts a few more times and now it's down.
what else do you guys need to help debug this?
rob
-
Is there a firewall running on the worker/supe?
What do the end of workerlog and supelog say, right after you notice that it went "down"?
-
the worker log says :
INFO: opened address: 0.0.0.0 port: 50011 type: udp.
INFO: opened address: 0.0.0.0 port: 50011 type: tcp.
INFO: mac address: 00:04:4B:00:02:12
i will have to check and get back to you on the supe log. tis at home.
anything off the top of your head i can try?
rob
-
holy crap. welcome to n00b town.
the damned network must have reconfigured itself because the firewall in windows was causing this. argh argh rgh.
the give-away were the lines that said :
ERROR: unable to establish tcp connection with 192.168.3.100 - unable to connect to host.
in the suplog. that got me to thinking :)
so that looks like it's resolved.
qube - 1
rob -0