PipelineFX Forum

Qube! => Jobtypes and Applications => Topic started by: stevespo on March 02, 2010, 11:37:29 PM

Title: Error: cannot create output dir
Post by: stevespo on March 02, 2010, 11:37:29 PM
Bear with me while I try and explain the behavior we're seeing.

Qube 5.5.1, Maya2009 running on RedHat Enterprise WS R4
Render nodes read data from local drive and then write frames to a MetaLAN 4.2.1 mounted SAN drive array.

The rendering was stable and reliable when our output files were IFF (2MB).  We recently switched to EXR format (24MB) and the behavior has become very erratic.  Renders will work ok for some time (1 hour, perhaps 2 hours) and then the jobs sit with no progress.

The MetaLAN mounted directories show a small (128 byte) file for the stalled renders, which I assume is some part of the EXR header.  The permissions on the files should be root:root, but sometimes they are qubeproxy:qubeproxy.  The MetaLAN mounted device appear to be stable and accessible, but this error typically appears in one or more of the stderr logs:

INFO: testing output directory for [images]
WARN: output directory [/mnt/SAN0/sg_mayaRender/tests/test06/chunk_0009_shotCam4_1K185/] does not exist... attempting to create it...
ERROR: cannot create output dir [/mnt/SAN0/sg_mayaRender/tests/test06/chunk_0009_shotCam4_1K185/]
ERROR: SUPER::initialize() at /usr/local/pfx/jobtypes/maya/UniversalMayaRenderJob.pm line 112.
ERROR: in initializing job at /usr/local/pfx/jobtypes/maya/MayaJob.pm line 206.
INFO: reporting status [failed] to supe: qb::reportjob('failed')
INFO: HARNESS=[IPC::Run=HASH(0x1114e00)]
INFO: exiting from maya
quit -f

That directory clearly exists and has many output frames already written to it correctly.  For whatever reason, it seems that the directory is unreadable and that causes one render to fail, which seems to cause a domino effect with the other 3 render nodes.  They sit stalled indefinitely until we take some manual intervention.

I realize that this is a complex environment, and my feeling is that the MetaLAN is the likely culprit, perhaps something to do with the large size of the files (24MB+) or the timing of the I/O.  I am going to remove that piece from the puzzle and see if local writes improve reliability.

Sometimes we can kill/retry the stalled jobs and processing will continue ok.  Other times we need to restart the worker, and still other times it requires a shutdown/reboot to get the rendering working again.

Any other thoughts or ideas would be appreciated.  Thanks.

Title: Re: Error: cannot create output dir
Post by: stevespo on March 03, 2010, 04:20:52 PM
We changed the output directory to the local hard drive and the same problem is occurring, so the MetaLAN mounted device is completely out of the equation.

Has anyone see this type of behavior with Qube and Maya, perhaps specific to EXR output?


Title: Re: Error: cannot create output dir
Post by: jburk on March 03, 2010, 11:04:37 PM
That is odd, but it's not EXR-specific.  I've worked at several large houses with Qube, Maya, and large EXR's, but nothing with images sizes that exceeded 18MB. 
Title: Re: Error: cannot create output dir
Post by: stevespo on March 04, 2010, 04:20:03 AM
Actually after more testing today it appears that it could be some type of memory constraint.  There have been times when Maya appears to be flushing cache constantly and throwing up all types of errors.

When we switched back to batch mode (the errors occurred in interactive rendering mode) the renders once again seem quite reliable and proceed for many hours without any trouble.

The problem with this batch mode "workaround" is that each frame now takes an extra 3 minutes because all the input files have to be read and processed in order to render a single frame.  This adds up quickly with the many thousands of frames we need to render, but it's better than restarting everything every hour and having things fail overnight.

We will probably upgrade (at least one node) to Maya2010 and see if that makes any difference.  I'll report back if I uncover anything else interesting.  Any other ideas or suggestions are appreciated.

Title: Re: Error: cannot create output dir
Post by: jburk on March 10, 2010, 06:59:02 PM
You might try using "chunks"; for example, if you set the chunk size to 5, you only take the 3 minute hit once every 5 frames.

The downside of chunks is that if the render crashes on frame 5 of a 5-frame chunk, you have to render the entire chunk again if you simply want to just hit 'retry', since the smallest unit of work is 5 frames long.

It's a balancing act between taking the scene load hit and having the "unit of work" size small enough, and it comes down to personal preference