Author Topic: gui retry "failed to ..." / qbretry jobid:frameid of failed job does not work  (Read 8632 times)

Achilles

  • Sr. Member
  • ****
  • Posts: 25
Hi

i've tried to retry a failed frame of a failed job (about 20 frames are completed) with the GUI. GUI says "failed to retry".

So i've tried "qbretry":

Code: [Select]
./qbretry 300:5
retrying work: 300:5

looks ok, refresh in the GUI, frame is "pending", refresh 2min later - still pending. 4min later still pending - it won't do anything.

I've then decided to kill the frame.  GUI won't do it again ("failed to ..." message..)

Code: [Select]
# ./qbkill 300:5
killed work: 300:5 

Now the whole Job is marked as "killed".

Resubmit the failed fram worked (with GUI).

Qube GUI: 5.5.3
Qube: 5.5-2


jburk

  • Administrator
  • *****
  • Posts: 493
this may be 2 separate issues.

1.) the GUI may have a bug, we'll look at that asap.

2.) when you retry from the command-line, you will also need to retry 1 or more subjobs (if the job has no running subjobs) so that there is a running subjob to service the pending frame.

As you've found, work is designated with a colon ":", like "300:5"

Subjobs are designated with a period ".", like "300.1"

When you retry in the GUI, it's does some work behind the scenes to determine whether it also needs to retry 1 or more subjobs; it will automagically retry 1 subjob for every retried frame, up to the job's cpu limit. 

The command-line doesn't do this; the command-line provides a lot more functionality than the gui, with the downside being that you have to tell it to do everything - it makes no assumptions.