Author Topic: subjob "pending" in GUI but job log on supervisor tells "failed"  (Read 10564 times)

Achilles

  • Sr. Member
  • ****
  • Posts: 25
now we are at job 350.

It started rendering on the 5 machines.

after a while subjob 0 and 1 are state "pending" in GUI. I can't see any log in the GUI. Looking directly in /var/spool/qube/job/0/350/350_0.out:

Code: [Select]
JOB  0.7  pro[May 4, 2010 22:32:42] node1 : reporting status on work for: 350.0 199 - failed
9 - failed

similiar message for subjob 1. in Qube GUI "retry" leads to the

Code: [Select]
Failed to retry the following subjobs. ['350.1']
I also tried "Refresh (clear cache)" but nothing changed.

 as jburk told in another thread the cli utilities don't think for me. i tried to first kill, then retry:
Code: [Select]
# /usr/local/pfx/qube/bin/qbkill 350.0
killed job: 350.0
# /usr/local/pfx/qube/bin/qbkill 350.1
killed job: 350.1
# /usr/local/pfx/qube/bin/qbretry 350.1
retrying job: 350.1
# /usr/local/pfx/qube/bin/qbretry 350.0
retrying job: 350.0

after a while the subjobs are back on track.

Qube 5.5-2 , GUI 5.5.3, Centos 5.4 (supervisor, gui, worker), Win Vista x64 (worker, gui)

Achilles

  • Sr. Member
  • ****
  • Posts: 25
Re: subjob "pending" in GUI but job log on supervisor tells "failed"
« Reply #1 on: May 04, 2010, 09:41:09 PM »
now one node (win vista x64) is not rendering anymore. also qbkill/qbretry will not bring it back.

last lines in job/0/350/350_3.out and also 350_3.err
Code: [Select]
=================  qube! - retry/requeue on May 4, 2010 23:14:37  ===================


=================  qube! - retry/requeue on May 4, 2010 23:15:38  ===================


=================  qube! - retry/requeue on May 4, 2010 23:31:03  ===================

and supelog (last lines with 350.3 or 350_3 in it):

Code: [Select]
[May 4, 2010 23:31:03] supervisor : QB_QUEUE_UPDATE_SUBJOB: [DELETE FROM duty WHERE id IN ('267.3') LIMIT 1]
[May 4, 2010 23:31:03] supervisor : wrote stderr data to file: /var/spool/qube/job/0/350/350_3.err size: 88
[May 4, 2010 23:31:03] supervisor : wrote stdout data to file: /var/spool/qube/job/0/250/350_3.out size: 88
[May 4, 2010 23:31:03] supervisor : retrying in supervisor by root from 10.10.0.210: 350.3

Achilles

  • Sr. Member
  • ****
  • Posts: 25
Re: subjob "pending" in GUI but job log on supervisor tells "failed"
« Reply #2 on: May 04, 2010, 09:47:51 PM »
very strange! after doing

Code: [Select]
# /usr/local/pfx/qube/sbin/qbadmin w --assignments 10.10.0.15
worker job(s): 10.10.0.15
350.3

the subjob is suddenly no more in "pending" state but again in running state!