PipelineFX Forum
Qube! => GUI => Topic started by: Achilles on May 04, 2010, 09:10:33 PM
-
now we are at job 350.
It started rendering on the 5 machines.
after a while subjob 0 and 1 are state "pending" in GUI. I can't see any log in the GUI. Looking directly in /var/spool/qube/job/0/350/350_0.out:
JOB 0.7 pro[May 4, 2010 22:32:42] node1 : reporting status on work for: 350.0 199 - failed
9 - failed
similiar message for subjob 1. in Qube GUI "retry" leads to the
Failed to retry the following subjobs. ['350.1']
I also tried "Refresh (clear cache)" but nothing changed.
as jburk told in another thread the cli utilities don't think for me. i tried to first kill, then retry:
# /usr/local/pfx/qube/bin/qbkill 350.0
killed job: 350.0
# /usr/local/pfx/qube/bin/qbkill 350.1
killed job: 350.1
# /usr/local/pfx/qube/bin/qbretry 350.1
retrying job: 350.1
# /usr/local/pfx/qube/bin/qbretry 350.0
retrying job: 350.0
after a while the subjobs are back on track.
Qube 5.5-2 , GUI 5.5.3, Centos 5.4 (supervisor, gui, worker), Win Vista x64 (worker, gui)
-
now one node (win vista x64) is not rendering anymore. also qbkill/qbretry will not bring it back.
last lines in job/0/350/350_3.out and also 350_3.err
================= qube! - retry/requeue on May 4, 2010 23:14:37 ===================
================= qube! - retry/requeue on May 4, 2010 23:15:38 ===================
================= qube! - retry/requeue on May 4, 2010 23:31:03 ===================
and supelog (last lines with 350.3 or 350_3 in it):
[May 4, 2010 23:31:03] supervisor : QB_QUEUE_UPDATE_SUBJOB: [DELETE FROM duty WHERE id IN ('267.3') LIMIT 1]
[May 4, 2010 23:31:03] supervisor : wrote stderr data to file: /var/spool/qube/job/0/350/350_3.err size: 88
[May 4, 2010 23:31:03] supervisor : wrote stdout data to file: /var/spool/qube/job/0/250/350_3.out size: 88
[May 4, 2010 23:31:03] supervisor : retrying in supervisor by root from 10.10.0.210: 350.3
-
very strange! after doing
# /usr/local/pfx/qube/sbin/qbadmin w --assignments 10.10.0.15
worker job(s): 10.10.0.15
350.3
the subjob is suddenly no more in "pending" state but again in running state!