Author Topic: subjob processes are stuck on "complete"  (Read 2334 times)

choitown

  • Newbie
  • *
  • Posts: 1
subjob processes are stuck on "complete"
« on: May 12, 2011, 10:15:25 PM »
we have a 10 worker 80 core farm running here, and there's a strange problem where we never get the full number of subjob processes  requested.

if i submit a job and request 20 subjob processes, I usually get only about 6 or 7. the processes column in the job layout tab will show "6/20" There are plenty of idle cores available, but if i look in the subjob processes tab, the subjobs listed are mostly stuck on "complete" and they do not move on to "running" status. needless to say this makes the render farm very useless if i can only utilize a fraction of the available cores.

anyone have any clue on why this is?

[attachment deleted by admin]

ptanna

  • Jr. Member
  • **
  • Posts: 7
Re: subjob processes are stuck on "complete"
« Reply #1 on: May 17, 2011, 07:11:44 PM »
are you setting your "Reservations" to equal "host.processors=1+"?  You can click on the browse next to that setting and check the box "All".  This tells your job to use all available cores I believe.

jburk

  • Administrator
  • *****
  • Posts: 493
Re: subjob processes are stuck on "complete"
« Reply #2 on: May 17, 2011, 11:03:05 PM »
I would check the logs for the prematurely complete subjobs.  Something is occurring during the startup phase of those subjobs to cause them to complete; I'm guessing that they actually have gone into a failed state.