Author Topic: Subjob timeout issue  (Read 3966 times)

jbrandibas

  • Sr. Member
  • ****
  • Posts: 35
Subjob timeout issue
« on: January 19, 2008, 11:45:05 PM »
I have a scene I am submitting that occasionally gets a fatal error from Maya while rendering.  The problem I am having is that Qube doesn't know that Maya died.  It sits forever with the following errors stacking up in the StdErr tab:

INFO: timer of [600 secs] expired-- "nudging" maya for 'mel:' prompt at /Applications/pfx/jobtypes/maya/MelProcessor.pm line 550.


I wound up putting a timeout in the submit dialog of the jobtype, but this has the adverse side effect of causing the subjobs to fail after the timeout, even though they are successfully rendering frames.  Which now I have specified the number of CPU's to be an exorbent number so that when the subjobs die, there are open slots to continue on.

I feel like I am digging myself deeper and deeper into a dark hole :)

Basically, what I need qube to do is either

a) know that maya died while rendering the frame and retry the frame

or

b) if the frame takes longer than XX:XX:XX then kill the subjob and retry the frame.

Any idea how I can accomplish this?


Thanks,

shinya

  • Administrator
  • *****
  • Posts: 232
Re: Subjob timeout issue
« Reply #1 on: January 20, 2008, 03:12:39 AM »
Hi jbrandibas,

What kind of "fatal" error are you getting?
The jobtype should detect if the Maya process crashes on it. It won't be able
to detect a process that's "hung".

I'd like to take a look at your job's error logs for further troubleshooting,
so could you send us, at support@pipelinefx.com, the log from a
sample job that's exhibited this issue?  FYI, below are instructions.  Thanks!

---
In order to address your problem more completely, please send us the job log directory for the job in question.

You can locate the directory by logging into your Supervisor, and looking for the job log folder in the following location (depending upon your Supervisor platform):

Windows
\Program Files\pfx\qube\logs\job

Linux, OS X
/var/spool/qube/job

In that folder, you will find a numbered directory that corresponds to the number of thousands in the job ID. (ID < 1000 = 0) Search in one of these folders for the one that corresponds to the correct job ID.

Zip up the entire job ID folder and reply to this email message with the zipped file.

If the zipped job ID folder turns out to be larger than 2MB, don't send it, but let us know and we will help you address the problem in an alternative fashion.