Author Topic: Issues with dependencies  (Read 5545 times)

chardavo

  • Full Member
  • ***
  • Posts: 11
Issues with dependencies
« on: November 25, 2009, 01:59:31 AM »
Hi!
I seem to be having trouble with the 'dependency' job property, and getting it to work correctly. Here's a simple example, wher JobB should wait for jobA to complete before starting:

Code: [Select]
jobA = {'name': 'jobA', 'package': {'cmdline': 'echo Running JobA 1>&2'}, 'agenda': qb.genframes(1), 'prototype': 'cmdrange'}

listOfSubmittedJobs = qb.submit([jobA])
jobAJobID = listOfSubmittedJobs[0]['id']

jobB = {'name': 'jobB', 'package': {'cmdline': 'echo running Job B 1>&2'}, 'dependency': 'link-complete-job-%s' % (jobAJobID), 'agenda': qb.genframes(1), 'prototype': 'cmdrange'}

qb.submit([jobB])

jobA runs (while jobB is blocked), but jobB never gets unblocked when jobA completes. When selecting jobB in the UI, it properly reports: "dependsup: 53296" (which is the jobID for jobA).

I've tried simply using the job name (and not the job id) for the dependency, but same problem:

Code: [Select]
jobA = {'name': 'jobA', 'package': {'cmdline': 'echo Running JobA 1>&2'}, 'agenda': qb.genframes(1), 'prototype': 'cmdrange'}
jobB = {'name': 'jobB', 'package': {'cmdline': 'echo running Job B 1>&2'}, 'dependency': 'link-complete-job-jobA', 'agenda': qb.genframes(1), 'prototype': 'cmdrange'}
qb.submit([jobB])

Any ideas? We are running qube UI 5.4.6 and qube scripts report "Qube! - version: 5.4-2  build: bld-5-4-2009-07-01-0".

Thanks!


jburk

  • Administrator
  • *****
  • Posts: 479
Re: Issues with dependencies
« Reply #1 on: November 25, 2009, 05:56:23 PM »
I'd like to see what the callbacks look like for the dependent job (jobB).  Can you post the output of 'qbjobs -l -callbacks <jobB_id>'.

Also, you may want to use job labels to link jobs instead.  That way you can submit both jobs in one operation without having to wait to get the jobID of the independent job (jobA).

Code: [Select]

jobA = {..., 'label': 'firstJob'}
jobB = {..., 'dependency': 'link-complete-job-firstJob', 'label': 'secondJob', }
jobC = {..., 'dependency': 'link-complete-job-secondJob'}

submitted = qb.submit([jobA, jobB, jobC])

for job in submitted:
    print 'submitted job %(id)s:%(name)s' % job

When you submit multiple jobs in one operation, they become part of a 'process group'; they will all have the same 'pgrp' qube attribute.  Labels are scoped within a process group, which means that they only have to be unique within a process group.  You can reuse labels across process groups - it's a way for jobs within the same process group to refer to each other ~before~ submission, which makes it easy to set up dependencies without having to submit the independent job(s) first.
« Last Edit: November 25, 2009, 06:10:52 PM by jburk »

chardavo

  • Full Member
  • ***
  • Posts: 11
Re: Issues with dependencies
« Reply #2 on: November 25, 2009, 11:34:11 PM »
Here's the full output when running the "qbjobs -l -callbacks <jobB_id>" command after jobA finished, and joB is still 'blocked', apparently not realizing that jobA (where jobA_id=53634) finished:

Code: [Select]
> qbjobs -l -callbacks 53635
total: 0/1 cpu(s)       0/1 work
%    id     pid  pgrp   label  status   user      type      name  cpus  priority  cluster       groups
----------------------------------------------------------------------------------------------
  0  53635  1    53635  Job B  blocked  chardavo  cmdrange  jobB  0/1   9999      homenewhosts

Job Details:
        Time Submit: Nov 26, 2009 5:01:03
        Time Start: none
        Time Complete: none
        Source Host: 172.16.21.92
        Deadline: none
        Domain: .
        Groups:
        Omit Groups:
        Hosts:
        Omit Hosts:
        Host Order:
        Timeout: none
        Kind:
        Flags: 8 (auto_mount)
        Dependency: link-complete-job-53634
        Requirements:

        Restrictions:

        Reservations:
                host.processors=1

Package:
        cmdline: echo running Job B 1>&2


Subjob(s) Status:
        0: blocked -  (0.0.0.0) 00:00:00:00:00:00

Callbacks:
==============================================================================================

Work Status:
        [1] 1: pending - 53635.*  (0.0.0.0) 00:00:00:00:00:00


chardavo

  • Full Member
  • ***
  • Posts: 11
Re: Issues with dependencies
« Reply #3 on: November 25, 2009, 11:37:08 PM »
Note: as you can see from the 'dependency' field, this is when linking to the job ID, not the job name.

The ability for 'label' to be unique within groups of submitted names is good to know -
We have actually been treating the job "name" as having to be unique (and using the name for callback events and dependency linking), and using the "label" as a UI-only user friendly name, which may even contain spaces (and in the UI have 'label' show up before 'name'). Have we been doing that backwards?

Thanks!

chardavo

  • Full Member
  • ***
  • Posts: 11
Re: Issues with dependencies
« Reply #4 on: November 26, 2009, 12:02:41 AM »
Note: I've tried my simple test using the labels and submitting the jobs together, and still no-go. Here's the code being executed:

Code: [Select]
jobA = {'name': 'jobA', 'label': 'one', 'package': {'cmdline': 'echo Running JobA 1>&2'}, 'agenda': qb.genframes(1), 'prototype': 'cmdrange'}
jobB = {'name': 'jobB', 'label': 'two', 'package': {'cmdline': 'echo running Job B 1>&2'}, 'dependency': 'link-complete-job-one', 'agenda': qb.genframes(1), 'prototype': 'cmdrange'}
qb.submit([jobA, jobB])

And jobB still stays blocked, and here's the output from the corresponding qbjobs -l -callbacks on jobB:

Code: [Select]
> qbjobs -l -callbacks 53641
total: 0/1 cpu(s)       0/1 work
%    id     pid  pgrp   label  status   user      type      name  cpus  priority  cluster  groups
-----------------------------------------------------------------------------------------
  0  53641  1    53640  two    blocked  chardavo  cmdrange  jobB  0/1   9999      /

Job Details:
        Time Submit: Nov 26, 2009 5:29:40
        Time Start: none
        Time Complete: none
        Source Host: 172.16.21.92
        Deadline: none
        Domain: .
        Groups:
        Omit Groups:
        Hosts:
        Omit Hosts:
        Host Order:
        Timeout: none
        Kind:
        Flags: 8 (auto_mount)
        Dependency: link-complete-job-one
        Requirements:

        Restrictions:

        Reservations:
                host.processors=1

Package:
        cmdline: echo running Job B 1>&2


Subjob(s) Status:
        0: blocked -  (0.0.0.0) 00:00:00:00:00:00

Callbacks:
=========================================================================================

Work Status:
        [1] 1: pending - 53641.*  (0.0.0.0) 00:00:00:00:00:00

I'm curious, are you able to reproduce this behavior on your end, or is it working fine for you?

jburk

  • Administrator
  • *****
  • Posts: 479
Re: Issues with dependencies
« Reply #5 on: November 26, 2009, 04:03:46 PM »
Yes, this works on my end.  I even cut&paste'd your code in case I was missing a typo when reading it, and it worked fine (except I had to change the argument to qb.genframes() to a string - qb.genframes('1') )

Your jobB has no callbacks.  The output from 'qbjobs -l -callbacks <jobID> should look like this:
Code: [Select]
Subjob(s) Status:
0: complete - rhel51-64 (192.168.60.108) 00:0C:29:5C:AC:4C (0 s)

Callbacks:
=====================================================================================
id: 2557^1
triggers: complete-job-one
language: dependency
-------------------------------------------------------------------------------------
link-complete-job-one
=====================================================================================

Work Status:
[1] 1: complete - 2557.0 rhel51-64 (192.168.60.108) 00:0C:29:5C:AC:4C (0 s)

What version is your supervisor?  Could you post the output from 'qbping' for me, please?

And do your permissions in qube allow you to submit callbacks?  Could you post the output from 'qbusers --list'?  It's located in $QBDIR/sbin.


« Last Edit: November 26, 2009, 04:25:24 PM by jburk »

jburk

  • Administrator
  • *****
  • Posts: 479
Re: Issues with dependencies
« Reply #6 on: November 26, 2009, 04:12:57 PM »
Quote
We have actually been treating the job "name" as having to be unique (and using the name for callback events and dependency linking), and using the "label" as a UI-only user friendly name, which may even contain spaces (and in the UI have 'label' show up before 'name'). Have we been doing that backwards?

Perhaps.  You probably want your job names to be more human-readable, and make the labels more terse.  I would avoid the use of spaces and hyphens in labels.  It can make the trigger hard to read, and may break the trigger parser in some future release (though we normally test for things like that, but you never know)

The labels are usually only shown in the UI when you have larger process groups; it makes it simpler to figure out what job is what within a pgrp.

Job names don't have to be unique, either.  Sometimes it's handy to reuse the same job name for the same render; that way, you can easily filter for different iterations of the same task.

It might be nice if we had processGroup names.  That would be something to think about for a future release.

chardavo

  • Full Member
  • ***
  • Posts: 11
Re: Issues with dependencies
« Reply #7 on: November 30, 2009, 11:11:23 PM »
Thanks for the follow-up!

Here is the output from qbping:
Code: [Select]
supervisor - active - tag: 172.16.30.73 5.2-2 bld-5-2-2007-11-05-0 linux - host - 16/50 licenses.
Am I reading that correctly and it's a really old build (meaning we may have upgraded our client tools but not the supervisors, or maybe never restarted the supervisor after the last upgrade)?

And here is the output from qbusers:
Code: [Select]
total 4
---l jc- krmpbuicseyqg-vft- [default]
asil jcg krmpbuicseyqgpvftn Administrator
asil jcg krmpbuicseyqgpvftn qube
asil jcg krmpbuicseyqgpvftn root
asil jcg krmpbuicseyqgpvftn system

From the docs, it looks like the 'default' users don't have permissions to submit 'global' callbacks: could that be it? Or are 'global' callbacks something else?

chardavo

  • Full Member
  • ***
  • Posts: 11
Re: Issues with dependencies
« Reply #8 on: November 30, 2009, 11:33:22 PM »
ok, I can confirm the supervisor is running an old (5.2) version, which probably explains why the dependencies aren't working - we'll upgrade and report back.

Can you just confirm whether the output from 'qbusers' looks good or not with respect to the 'default' user?
Thanks!

jburk

  • Administrator
  • *****
  • Posts: 479
Re: Issues with dependencies
« Reply #9 on: December 01, 2009, 12:52:03 AM »
qbusers looks good; users can submit callbacks.

Yes, you'll need to upgrade your supervisor if you want to use the 'dependency' callback language.

But you could do it the old-fashioned way and build the callbacks and define the dependencies yourself.  Much simpler than upgrading (for now).

Code: [Select]
jobA = {
    'prototype': 'cmdline',
    'package': {'cmdline': 'sleep 15'},
    'requirements': 'host.os=linux',
    'label': 'firstJob'
}

jobB = {
    'prototype': 'cmdline',
    'package': {'cmdline': 'sleep 15'},
    'requirements': 'host.os=linux',
    'label': 'secondJob',
    'status': 'blocked',
    'callbacks': [
        {
            'triggers':'complete-job-firstJob',
            'language': 'qube',
            'code': 'unblock-self'
        }
    ]
}

jobC = {
    'prototype': 'cmdline',
    'package': {'cmdline': 'date'},
    'requirements': 'host.os=linux',
    'status': 'blocked',
    'callbacks': [
        {
            'triggers':'complete-job-secondJob',
            'language': 'qube',
            'code': 'unblock-self'
        }
    ]
}

submitted = qb.submit([jobA, jobB, jobC])

for job in submitted:
    print 'submitted job %(id)s:%(name)s' % job