Author Topic: Query to get list of eligible jobs for a worker  (Read 2495 times)

gabe

  • Jr. Member
  • **
  • Posts: 3
Query to get list of eligible jobs for a worker
« on: December 10, 2014, 12:26:59 AM »
Hello,

I'm looking for a way to query Qube (via python) to get a list of jobs a worker is eligible to process. 

For an initial test, I'm using the following method:
  jobs = qb.joborder(name=worker)

The problem is that this call seems to returns all jobs that a machine could do regardless of the job's requirements. As a result, I'm seeing jobs returned that will never run on this worker due to in my case memory constraints.

Is there a better API that can be used to get a list of jobs a worker could actually run (ie, after filtering out jobs that do not meet the worker's constraints?)

Thanks,
-Gabe

BrianK

  • Hero Member
  • *****
  • Posts: 107
Re: Query to get list of eligible jobs for a worker
« Reply #1 on: December 10, 2014, 11:42:57 PM »
Quote
The problem is that this call seems to returns all jobs that a machine could do regardless of the job's requirements. As a result, I'm seeing jobs returned that will never run on this worker due to in my case memory constraints.

You mention that the worker will not qualify due to the job's requirements.  You then mention memory.  Memory , being a machine resource, maps to a reservation -  you cannot require an amount of memory, you can only reserve it.  

You can only require machine properties, you can only reserve machine resources.

Back to your question: Does that mean that a job is reserving, for example, 12GB of RAM, but only 3 are available right now, or does that mean that the machine only has 8GB of RAM and could never run the job because it reserves 12GB of RAM?

In the case of the former - when you don't have enough memory to run the job right now - that's not a good indication of what could potentially run on this machine in the future, therefore the job is included the list of potential jobs that this host can run.  In the case of the latter, the job should not show up in the list of jobs able to run on this machine.  My tests confirm that is the case (I'm running Qube 6.6).

If I haven't answered your question or you are seeing something other than what I've described, please give a specific example of what you're seeing.

-Brian
« Last Edit: December 10, 2014, 11:45:30 PM by BrianK »

gabe

  • Jr. Member
  • **
  • Posts: 3
Re: Query to get list of eligible jobs for a worker
« Reply #2 on: December 19, 2014, 11:47:18 PM »
Hi Brian.

Going through our old tickets, it looks like one of my co-workers may have logged this issues over a year ago.... Here is the ticket:

  http://pipelinefx.zendesk.com/tickets/10231

Based on the comments, it looks like it was auto-closed without being resolved. 

In my current investigation, I'm seeing workers that are in an active state without any subjobs (we call that idle), but show having 1 or more eligible jobs.  They stay in this state for minutes at a time.

Here is an example:

worker:
- cluster: /foo/bar
- restrictions: /+

Job on that worker via qb.joborder(name):
- cluster: /foo/bar
- restrictions:

Is this the same problem as what we reported in Sept 2013?

If so, is there a known solution that does not involve us having to parse through your rules and restrictions (as recommended by by the old ticket).

My end goal is still to determine if a worker has the possibility to get future work given the currently queued requests.  As it is now, it looks like the results returned by qb.joborder() in some cases does not return what I need.

The Qube version is 6.6-2

Thanks,
-Gabe

BrianK

  • Hero Member
  • *****
  • Posts: 107
Re: Query to get list of eligible jobs for a worker
« Reply #3 on: December 20, 2014, 02:04:06 AM »
Hi Brian.

Going through our old tickets, it looks like one of my co-workers may have logged this issues over a year ago.... Here is the ticket:

  http://pipelinefx.zendesk.com/tickets/10231

Based on the comments, it looks like it was auto-closed without being resolved. 

For the record, a work-around was offered, a bug was filed, and a fix was made.  The fix went into version 6.5-2.

Quote
In my current investigation, I'm seeing workers that are in an active state without any subjobs (we call that idle), but show having 1 or more eligible jobs.  They stay in this state for minutes at a time.

If this is for the customer who I think it's for, then they have around 700 workers.  While the farm is *very* busy, a worker sitting idle for a minute or two is not out of the ordinary.  Now if you're speaking of 10-15 minutes of being in this state, then that might be an issue & I'd want to look at that as a separate issue (through our support system - support@pipelinefx.com). 

Quote
Here is an example:

worker:
- cluster: /foo/bar
- restrictions: /+

Job on that worker via qb.joborder(name):
- cluster: /foo/bar
- restrictions:

I'm sorry Gabe, I don't mean to be obtuse, but I don't see the problem here.  You've got a worker that is in the cluster /foo/bar and will run any job in the cluster / or below it.  You have a job that's submitted to the /foo/bar cluster, which satisfies the worker's cluster and restrictions.  The worker should be able to run the job and the job should show up in qb.joborder.

What are you seeing that goes against what you expect to be seeing?

Quote
Is this the same problem as what we reported in Sept 2013?

According to changelogs, the issue in question, of qb.joborder ignoring worker restrictions, was resolved with Qube 6.5-2.

Quote
My end goal is still to determine if a worker has the possibility to get future work given the currently queued requests.  As it is now, it looks like the results returned by qb.joborder() in some cases does not return what I need.

Again, I'm sorry I don't fully understand the problem.  What do you expect qb.joborder to return and how is that different from what it returns now?  Is it giving you false positives?  False negatives?  More than you expect?  Less? 

gabe

  • Jr. Member
  • **
  • Posts: 3
Re: Query to get list of eligible jobs for a worker
« Reply #4 on: December 20, 2014, 03:57:21 AM »
Thank you for getting back so quickly.

For the fix that went into v6.5-2, do you know if it went into the python layer of something on the backend?  From looking at the file timestamps I have a feeling that our python API code is very old (Aug 2012) so even though qb.ping() returns a Qube version of 6.6-2, it may not have the change you noted.  What do you think?

Back to why I'm asking... All I'm trying to do is get a programatic way to determine if a worker is needed by the farm at a particular point in time.  Currently we are using the following test:

  if len(qb.joborder(name=worker)) > 0:
      # worker is eligible for work.. so it may be needed.

From my tests, I have at least one worker that returns potential jobs (since their restriction is /+), but has not received a task for the last 5 hours and counting.  To me, this indicates qb.joborder() is not taking into account some other constraint that is limiting the queue manager from dispatching work to this worker.  As a result, our software thinks the worker is needed, but the queue manager does not actually send work its way.  This is the contradiction I'm trying to solve.  Nothing more.  If you think this is a bug, I'm more than happy to create a ticket with the support desk.

If this is expected behaviour, what else should I be checking to determine if a particular worker will actually be dispatched work?  Is qb.joborder() a good way to determine this?  Are we using the API incorrectly?  Should I be checking the state of all the jobs returned?

Thanks in advance Brian.

« Last Edit: December 22, 2014, 05:46:16 PM by gabe »