Author Topic: 100% jobs with pending workers  (Read 3322 times)

Zameer

  • Jr. Member
  • **
  • Posts: 8
100% jobs with pending workers
« on: July 19, 2007, 04:37:44 PM »
Hi,

I?m running into situations where I have a large number of tasks and the tasks are completing fast enough where the first ?few? workers complete most of the tasks. As a result a large number of workers aren?t being assigned work and the job is being reported as 100% but with a status of pending/running. When I look further into it, there are a large number of workers that are pending, and it seems to take a great deal of time to cycle through these workers that are seemingly doing nothing but at the same time they are using up CPUs.

This is causing problems for two reasons: Firstly, it is using up CPU time for empty workers, preventing that CPU from being used elsewhere. Secondly, it is preventing a complete task from actually being completed; thereby preventing any task that is dependent on it from starting.

I am currently evaluating Qube, and as a result I have only a fraction of the farm to test with, however it is possible for this kind of situation to appear when the farm is under large amounts of stress, and only a handful of CPUs are available for use.

Thanks,
Zameer
 

anthony

  • Senior Software Engineer
  • Hero Member
  • *****
  • Posts: 183
Re: 100% jobs with pending workers
« Reply #1 on: July 19, 2007, 09:05:12 PM »
Hey Zameer,

     Since you are on eval, it's probably better if you try to update the supervisor to 5.2.  This is because 5.2 has several bug fixes for issues you mention here, it's also good to upgrade the workers, however just upgrading the supervisor will help quite a bit.  As far as idling workers, how many hosts do you have?

     Thanks,
             Anthony

Zameer

  • Jr. Member
  • **
  • Posts: 8
Re: 100% jobs with pending workers
« Reply #2 on: July 19, 2007, 10:04:05 PM »
Hi,

In the test I use to do this, I have a single mac with 2 cpu's on it. I create a job that consists of 43 subjobs. When I issue the submit through the api, I send it with cpu=200, but that seems to get cut down to 43 (probably a good thing).

I can also create this creating a job with 1000 subjobs with a cpu setting of 200. This is run against a farm of 4 quad core machines.

Prehaps it has to do with creating more subjobs then there are cpu's available to use it?

As for upgrading, we're (planning on) upgrading our version of Qube! tomorrow.

Zameer

Zameer

  • Jr. Member
  • **
  • Posts: 8
Re: 100% jobs with pending workers
« Reply #3 on: July 26, 2007, 05:32:58 PM »
Hi!

We've upgraded to version 5.2 on all of our machines, and we're still running into the same issues. Jobs not hitting complete status because subjobs are waiting to be queued, and dependencies are not being started because of it.

Jobs in this state are also being prioritized lower then jobs with pending items in their agenda. When this occurs, new jobs receive cpu time, and the old job is waiting for free cpu time, keeping it in a pending state and blocking dependent tasks further.

Zameer