Author Topic: Jobs completing but some frames not done  (Read 11837 times)

freelanceit

  • Jr. Member
  • **
  • Posts: 4
Jobs completing but some frames not done
« on: October 19, 2015, 01:42:03 PM »
We're using an ancient version of Qube (6.4.0b) and sometimes when jobs are submitted from Nuke 9 v 04 the job says it's completed but a few of the frames haven't rendered. Is this a known error in old versions of Qube?

Thanks,
Adam.

jburk

  • Administrator
  • *****
  • Posts: 493
Re: Jobs completing but some frames not done
« Reply #1 on: October 19, 2015, 03:04:04 PM »
No, the issue isn't with Qube, this would be happening regardless of the Qube version.  What's most likely occurring is that the Nuke command to render the frame is returning an exit code of 0, indicating "no errors".

Is there anything in the job's stdout or stderr logs that gives a clue as to why a particular frame is not being output?

freelanceit

  • Jr. Member
  • **
  • Posts: 4
Re: Jobs completing but some frames not done
« Reply #2 on: October 19, 2015, 04:05:00 PM »
There didn't seem to be any errors in the logs. The user rendered the job locally out of Nuke and it was fine.

jburk

  • Administrator
  • *****
  • Posts: 493
Re: Jobs completing but some frames not done
« Reply #3 on: October 19, 2015, 06:02:35 PM »
Are you eligible for support?  If so, I'll ask you to open a case on our helpdesk and send the job log directories, add the URL for this forum thread as well.

Are the unrendered frames on disk but the wrong size or black, or are they missing altogether?   Another thing to test is if this never happens when you render a Qube job with only a single job instance running; if the number of missing frames goes up the more concurrent job instances are running, it's likely that your file server is dropping connections.

Think of Qube as Fedex (especially for simple-to-start jobs like Nuke cmdllne-based renders): once Qube gets the job to the worker and gets it started, it's out of our hands.  Fedex doesn't know what's in the packages they deliver, nor do they know what you do with the contents.

freelanceit

  • Jr. Member
  • **
  • Posts: 4
Re: Jobs completing but some frames not done
« Reply #4 on: October 20, 2015, 09:32:40 AM »
Are you eligible for support? 

No, our support is long gone. I'm kind of building a case for an upgrade so was sort of hoping someone would just say "Oh, that's fixed in patch 6.X" so we'd have another reason to upgrade.  8)

Quote
Are the unrendered frames on disk but the wrong size or black, or are they missing altogether?   Another thing to test is if this never happens when you render a Qube job with only a single job instance running; if the number of missing frames goes up the more concurrent job instances are running, it's likely that your file server is dropping connections.

The frames just didn't render.
We have about 20 operational nodes on the farm but I'm not sure if all of them were being used for the job, probably more like 10.

jburk

  • Administrator
  • *****
  • Posts: 493
Re: Jobs completing but some frames not done
« Reply #5 on: October 20, 2015, 04:51:22 PM »
Nuke will happily render to thin air without an error, going through all the motions and then writing the frame to the bit bucket.  This often happens when the destination folder for the rendered frame is either not accessible, prohibits writing due to permissions, or is full.

Permissions can be hard to track down; some workers may be running under a different proxy_account than others, so you may want to check and see if the failed frames are all from the same worker or workers, and if those workers are able to write earlier or later frames.

If a worker can write frames for a while, then fail to write, then be able to write again all within the same job, the usual cause for that is the file server is running a Windows desktop O/S version, which only supports a limited number of concurrent connections and will silently drop a connection to pick up another one.   The client on the other end has no clue that the network connection to the filesystem is dead.  The clue to this is when the number of dropped frames goes up as the number of running instances for all jobs on the farm goes up.