Author Topic: Blocking/Unblocking jobs restarts previously running frames  (Read 13165 times)

connnor.v

  • Full Member
  • ***
  • Posts: 10
Blocking/Unblocking jobs restarts previously running frames
« on: February 21, 2013, 07:55:22 PM »
I wanted to report this, and see if there were any solutions to this common issue I have observed when using Qube:

Whenever I block a running job, the frames that were previously running on that job start over once the job is unblocked again. This is true for all jobs that run on the farm.

For instance, let's say I have a job with 4 frames that each have been running for 3 hours. Now let's say that I need to block that job for whatever reason, before those 4 running frames have a chance to finish. Those frames start over from 0 once they are unblocked again. One would prefer that those frames would pick up again at the 3 hour mark once they were unblocked, but this is never the case.

Why does this happen? Can it be fixed with Qube's next version?

jburk

  • Administrator
  • *****
  • Posts: 493
Re: Blocking/Unblocking jobs restarts previously running frames
« Reply #1 on: February 22, 2013, 12:51:23 AM »
The behavior you're looking for is called "checkpointing", it's application specific, and not many 3rd-party applications or renderers support checkpointing.  It's the same as a "save point" in a video game; if you're killed after the save point, you're taken back to the save point when you restart the game.

Checkpointing works the same way; if and when a process (be it a render, a distributed software build, whatever) is interrupted, AND it's been saving checkpoints along the way, it can optionally pick up again at the last saved checkpoint.

I believe the maxwell renderer supports checkpointing, and I've heard that renderMan supports it as well, but Qube doesn't implement checkpointing on a global scale.

Instead of blocking the job, you might want to mark the running job instances (the subjobs in older Qube terminology) as "complete" instead with the right-mouse-menu in the job instance tab (the one to the right of the Agenda/Frames tab). 

This will allow the job instance to finish the frame it's currently working on, and when it contacts the supervisor for another frame, the supervisor will instead instruct it to surrender the worker back to the idle pool, and then the job instance will be marked as "complete". In order to revive it, you should retry the job instance (NOT the job of the frame), which will put it back into a "pending" state, waiting for dispatch.

connnor.v

  • Full Member
  • ***
  • Posts: 10
Re: Blocking/Unblocking jobs restarts previously running frames
« Reply #2 on: February 22, 2013, 01:05:18 AM »
Interesting. So, the problem mostly lies with the abilities of the 3rd-parter renderer that I am using (in this case Mentalray). Is there any way that this "global checkpointing" could be integrated into the Qube program itself, eventually?

Additionally, thanks for the tip about completing subjob processes, allowing the running frames to finish before migrating CPUs to other jobs, I didn't know that could be done. However, in the case of chunked renders, this method wouldn't be much help, would it? Particularly if the entire job is chunked into a full frame range.

jburk

  • Administrator
  • *****
  • Posts: 493
Re: Blocking/Unblocking jobs restarts previously running frames
« Reply #3 on: February 22, 2013, 02:24:25 AM »
We can't add it into Qube, since you can run almost anything through Qube, but not everything supports checkpointing.

And this is the hidden cost of chunks; this is why it's best to keep your chunk sizes on the smaller size.  It's kind of like buying car insurance, and deciding how big a deductible to go with; a larger deductible costs less per month, but you get hammered if you ever have to pay it.  Larger chunks spread the startup time across more frames, but you have to re-render the entire chunk if it doesn't complete.

jburk

  • Administrator
  • *****
  • Posts: 493
Re: Blocking/Unblocking jobs restarts previously running frames
« Reply #4 on: February 24, 2013, 01:58:22 AM »
The next release of the WranglerView will change the blocking behavior to support this "passive" blocking style, in that it will optionally allow the currently running frame to complete.

The current "Block" job menu item has been renamed to "Block (+ Purge)" to more accurately reflect the behavior, and there is a new "Block (+ Finish Current)" menu item as well.

It will be available in the WranglerView v6.4-4, due of in early to mid March.