Author Topic: XSI Jobtype  (Read 16100 times)

Nikos

  • Full Member
  • ***
  • Posts: 15
XSI Jobtype
« on: June 20, 2006, 06:58:22 PM »
We have historically been using commandline only submitters for various applications.
In Softimage|XSI, does your XSI JobType offer any improved performance or reliability compared to executing the standard cmdrange type in conjunction with XSIBatch?

We are using our own savers that put files into the correct location on the network. They also allow you to submit files to the renderfarm.
An example command line would be:

qbsub -name QubeRenderTest -range 1-100 -partition 10 -priority 5000 -flags expand  "c:\Softimage\XSI_5.1\Application\bin\xsibatch.bat -r -scene P:\PP_Development\Scenes\Cube\Cube.scn -startframe QB_FRAME_START -endframe QB_FRAME_END -step QB_FRAME_STEP -pass Default_Pass -continue"

Would this command be as "reliable" as your JobType script?

anthony

  • Senior Software Engineer
  • Hero Member
  • *****
  • Posts: 183
Re: XSI Jobtype
« Reply #1 on: June 20, 2006, 09:31:48 PM »
Hey Nikos,

     Like most technical questions, there are almost always multiple answers.  However for this situation I'll explain the technical differences between both approaches.  We support both because it doesn't make sense to arbitrarily say: "Do it this way."  Especially because different studios have different secondary objectives in mind when they render.

     For the most part, the method you are using in this post is reliable.  It's been used in most rendering systems for ages, and is well understood. Because of this, I'll outline all of the things you probably already know, for the audience reading this post. 

     Chunking/Partitioning model:  Benefits - well understood
                                                   minor optimization because the scene is loaded once per chunk
                                           
                                               Problems - single lost frame causes loss of entire chunk
                                                              shortest possible render time is equal to the speed of the
                                                                   slowest machine's chunk.
                                                              unscalable once submitted into the queue

     The second model, which is what we use for the XSI job type, is the Request/Report model.  This model allows us to get around most of the issues involved with partitioning however involves much tighter intengration with the application.  In fact for XSI specifically we use a plugin to control the render.

     Request/Report model:    Benefits - single lost frames are requeued
                                                       shortest possible render time is equal to the speed of the slowest
                                                       machine's current frame.
                                                       scalable once submitted into the queue
                                                       single load of scene per host, not per chunk
                                                       by nature, automatically balances load between several machines,
                                                             faster machines do more frames.

                                         Problems - more complex and less understood


     Each of these models in terms of reliability would be dependent upon outside factors as well.  My suggestion is for you to try both of these to see which your artists prefer.

     Anthony

     
                               

Nikos

  • Full Member
  • ***
  • Posts: 15
Re: XSI Jobtype
« Reply #2 on: June 21, 2006, 03:22:22 PM »
Thanks for clarifying the differenses between cmdline and xsi jobtype.

Right, the commandline submitter seems to be working for us.
Now, moving on to the XSI Jobtype. When we submit a job it just remains in its "pending" state in the queue.
None of our nodes seem to pick the job.

What is the first thing we should be in the lookout for to get this working? None of the logs seem to contain
any relevant error messages...

anthony

  • Senior Software Engineer
  • Hero Member
  • *****
  • Posts: 183
Re: XSI Jobtype
« Reply #3 on: June 23, 2006, 01:46:24 AM »
That's actually a pretty common question.  Why is my job pending?  Actually, qube! answers that question in it's "Pending Reason" field.  Using either qubic or the command line tools, you can get the info.  Since the command line is easier for me to type, I'll use that as an example:

qbjobs -l <jobid>

If you're looking in the gui(qubic) look at the job's detail, and scroll to the bottom.

You can also add pending reason as a column.

However ,if I was to guess, you need to load the xsi job type on all of your workers and give them a quick restart, or repush out the configuration for the host and they will update themselves.

A.

Nikos

  • Full Member
  • ***
  • Posts: 15
Re: XSI Jobtype
« Reply #4 on: June 26, 2006, 06:30:39 PM »
Hi Anthony, not sure why I didn't see your reply until now.
Anyway, we have installed the XSI jobtype onto every machine and
now the the subjobs simply fail. Subjob status "Failed".

Can I just ask what files actually are required for the XSI jobtype?
All our workstations and rendernodes are running off a shared workgroup
on the network named:

\\Mothership\System\XSI\WorkGroups\PassionWorkGroup\Softimage\XSI_5.1\

While your JobType shows up in the plug-in manager as:
"C:\Program Files\pfx\jobtypes\xsi"

Could it have something to do with our workgroup path setup?
This is just a thought, but I don't know why the nodes would fail.

PS. They have been rebooted, had their service restarted and had their configs sent out a number of times.



anthony

  • Senior Software Engineer
  • Hero Member
  • *****
  • Posts: 183
Re: XSI Jobtype
« Reply #5 on: June 26, 2006, 07:01:59 PM »
Failures can happen for a number of reasons.  The first thing you should always check are the job logs.  They will point you in the right direction in terms of what has failed.

Do obtain the logs you can either browse to them using qubic, or use qbout <jobid>

Thanks,
         Anthony

Nikos

  • Full Member
  • ***
  • Posts: 15
Re: XSI Jobtype
« Reply #6 on: June 27, 2006, 10:51:32 AM »
If you don't mind, I might post the log here from one of the boxes that fails:

That last line says:
ERROR: unable to reply - couldn't imprint connection

Thanks Anthony



[Jun 27, 2006 10:12:50] RB-01 : qube! worker starting.
qube! - Copyright (C) 1999-2005 Pipelinefx L.L.C. (info@pipelinefx.com)
perl - Copyright (C) 1987-2001, Larry Wall
python - Copyright (C) 2002-2003 Python Software Foundation
zlib - Copyright (C) 1995-2002 Jean-loup Gailly and Mark Adler
pcre - Copyright (C) 1997-2003 University of Cambridge
md5 - Copyright (C) 1999-2002 Aladdin Enterprises.? All rights reserved.
openssl - Copyright (C) 1995-1998 Eric Young (eay@cryptsoft.com)
[Jun 27, 2006 10:12:50] RB-01 : sync time to supervisor host.
[Jun 27, 2006 10:12:50] RB-01 : worker running...
[Jun 27, 2006 10:12:50] RB-01 : requesting remote config.
[Jun 27, 2006 10:12:50] RB-01 : loading config: local
[Jun 27, 2006 10:12:50] RB-01 : loading config: supervisor
[Jun 27, 2006 10:12:50] RB-01 : remote worker config successful.
redirecting output to: '\\Mothership\System\Qube\logs\rb-01.workerlog'
[Jun 27, 2006 10:12:51] RB-01 : booting worker - version: 4.0-6 build: bld-4-0-2006-04-18-0 host: RB-01.
worker_address:
worker_check_interval: 1800
worker_cluster: /renderboxes/
worker_cpus: 0
worker_domain: qube
worker_flags: dedicated,auto_mount,load_profile (21)
worker_groups:
worker_heartbeat_interval: 90
worker_host_domain:
worker_idle_threads: 4
worker_journal_location: \\Mothership\System\Qube\jnl\worker.jnl
worker_lock_module:
worker_log_timeout: 10
worker_logfile: \\Mothership\System\Qube\logs\rb-01.workerlog
worker_logmode: mounted
worker_logpath: \\Mothership\System\Qube\logs
worker_lookup: local,supervisor
worker_max_clients: 8
worker_max_threads: 8
worker_memory: 512
worker_pidfile: C:\Program Files\pfx\qube\logs\workerpid
worker_port: 50011
worker_post_interval: 43200
worker_process_timeout: 500
worker_properties:
worker_resources:
worker_restrictions:
worker_ssl_certfile: C:\WINNT\qbsslcert.pem
worker_ssl_keyfile: C:\WINNT\qbsslkey.pem
worker_stats:
worker_job_types:
worker_template_path: \\Mothership\System\Qube\types
proxy_account: render
proxy_execution_mode: proxy
proxy_location: C:\Program Files\pfx\qube\sbin\proxy.exe
proxy_nice_value: 0
proxy_password: 0a0d376d2a0b5694e5112676d9d754c6c35fc4c4294e9696e59d7c41e30e20af
[Jun 27, 2006 10:12:51] RB-01 : tracking: 0 jobs.
[Jun 27, 2006 10:12:51] RB-01 : importing locks: host.processor_all=0
[Jun 27, 2006 10:12:51] RB-01 : sending host status report to the supervisor.
[Jun 27, 2006 10:12:52] RB-01 : supervisor 200.200.20.112 host report - report successful.
INFO: opened address: 0.0.0.0 port: 50011 type: tcp.
INFO: mac address: 00:0B:DB:93:0F:6C
[Jun 27, 2006 10:13:52] RB-01 : INFO: new job qualifies: 2880.1
[Jun 27, 2006 10:13:52] RB-01 : received start order for new job: 2880.1
[Jun 27, 2006 10:13:52] RB-01 : proxy command: "C:\Program Files\pfx\qube\sbin\proxy.exe" -host 127.0.0.1 -port 50011 -jobid 2880 -subid 1 -api "C:\Program Files\pfx\qube\api"
[Jun 27, 2006 10:13:52] RB-01 : INFO: using render in place of nikosg.
[Jun 27, 2006 10:13:52] RB-01 : logonuser sid: S-1-5-5-0-2478840
[Jun 27, 2006 10:13:52] RB-01 : login successful account: 'render' domain: ''
[Jun 27, 2006 10:13:52] RB-01 : sid: S-1-5-5-0-2478840
[Jun 27, 2006 10:13:52] RB-01 : WARNING: drive already assigned: W: => \\dropship\ftp
[Jun 27, 2006 10:13:52] RB-01 : WARNING: drive already assigned: X: => \\dropship\personal
[Jun 27, 2006 10:13:52] RB-01 : WARNING: bad network device: \\Ws-64-xp-nikos\DeadlineRepository
[Jun 27, 2006 10:13:52] RB-01 : WARNING: drive already assigned: Z: => \\dropship\jobs
[Jun 27, 2006 10:13:52] RB-01 : WARNING: drive already assigned: M: => \\fatone\big
[Jun 27, 2006 10:13:52] RB-01 : WARNING: drive already assigned: N: => \\fatone\D Drive
[Jun 27, 2006 10:13:52] RB-01 : WARNING: drive already assigned: P: => \\mothership\jesus_mungus_hostal
[Jun 27, 2006 10:13:52] RB-01 : WARNING: drive already assigned: S: => \\dropship\ToonZ_frames
[Jun 27, 2006 10:13:52] RB-01 : WARNING: drive already assigned: T: => \\mothership\system
[Jun 27, 2006 10:13:52] RB-01 : WARNING: drive already assigned: V: => \\dropship\Daddy Joseph
[Jun 27, 2006 10:13:52] RB-01 : running job status report sent to supervisor - loading profile: 2880.1
[Jun 27, 2006 10:13:52] RB-01 : found roaming profile path:
[Jun 27, 2006 10:13:52] RB-01 : Setting Profile Path: C:\Documents and Settings\render
[Jun 27, 2006 10:13:52] RB-01 : Setting Appication Path: C:\Documents and Settings\render\Application Data
[Jun 27, 2006 10:13:52] RB-01 : Setting Logon Server: \\DMC1
[Jun 27, 2006 10:13:52] RB-01 : pid: 484? tid: 1312
[Jun 27, 2006 10:13:53] RB-01 : received request for job details: 2880.1
[Jun 27, 2006 10:13:53] RB-01 : received status report from proxy: 2880.1 - running seq: 24
[Jun 27, 2006 10:13:53] RB-01 : gathering stats on job: 2880.1
[Jun 27, 2006 10:13:53] RB-01 : sending report to supervisor for job: 2880.1 - running seq: 29
[Jun 27, 2006 10:13:53] RB-01 : supervisor 200.200.20.112 confirmed report 2880.1
[Jun 27, 2006 10:13:53] RB-01 : sent logs 2880.1 0 - bytes.
[Jun 27, 2006 10:13:54] RB-01 : received status report from proxy: 2880.1 - failed seq: 24
[Jun 27, 2006 10:13:54] RB-01 : returning work for job: 2880 total items: 0
[Jun 27, 2006 10:13:54] RB-01 : gathering stats on job: 2880.1
[Jun 27, 2006 10:13:54] RB-01 : sending report to supervisor for job: 2880.1 - failed seq: 36
[Jun 27, 2006 10:13:54] RB-01 : sent logs 2880.1 0 - bytes.
[Jun 27, 2006 10:13:54] RB-01 : releasing resources for: 2880.1 res: 'host.processors=1'
[Jun 27, 2006 10:13:54] RB-01 : running windows cleanup for: 2880.1 - S-1-5-5-0-2478840
[Jun 27, 2006 10:13:54] RB-01 : total sids: 11
[Jun 27, 2006 10:13:55] RB-01 : INFO: using render in place of nikosg.
[Jun 27, 2006 10:13:55] RB-01 : using cached login token: 'render' domain: '
[Jun 27, 2006 10:13:55] RB-01 : render's visible drives: A:\ C:\ D:\ M:\ N:\ P:\ S:\ T:\ V:\ W:\ X:\ Z:\
[Jun 27, 2006 10:13:55] RB-01 : WARNING: unable to umount volume: W: - not connected.
[Jun 27, 2006 10:13:55] RB-01 : WARNING: unable to umount volume: X: - not connected.
[Jun 27, 2006 10:13:55] RB-01 : WARNING: unable to umount volume: Y: - not connected.
[Jun 27, 2006 10:13:55] RB-01 : WARNING: unable to umount volume: Z: - not connected.
[Jun 27, 2006 10:13:55] RB-01 : WARNING: unable to umount volume: M: - not connected.
[Jun 27, 2006 10:13:55] RB-01 : WARNING: unable to umount volume: N: - not connected.
[Jun 27, 2006 10:13:55] RB-01 : WARNING: unable to umount volume: P: - not connected.
[Jun 27, 2006 10:13:55] RB-01 : WARNING: unable to umount volume: S: - not connected.
[Jun 27, 2006 10:13:55] RB-01 : WARNING: unable to umount volume: T: - not connected.
[Jun 27, 2006 10:13:55] RB-01 : WARNING: unable to umount volume: V: - not connected.
[Jun 27, 2006 10:13:55] RB-01 : releasing session token: S-1-5-5-0-2478840
[Jun 27, 2006 10:13:55] RB-01 : removed job 2880.1
[Jun 27, 2006 10:13:55] RB-01 : sending host status report to the supervisor.
[Jun 27, 2006 10:13:55] RB-01 : supervisor 200.200.20.112 host report - report successful.
ERROR: unable to reply - couldn't imprint connection