Author Topic: PipelineFX Qube 6.0-3 Core/Supervisor/Worker maintenance release is available  (Read 3029 times)

jburk

  • Administrator
  • *****
  • Posts: 493
A new 6.0.3 maintenance release is now available for the Qube 6.0 Core/Supervisor/Worker. This is a recommended release for all customers running Qube v6.0.

Below is a list of the fixes and enhancements.

================================================

@CHANGE: Removing Mac OS X 10.4 support.

@FIX: Handle case where database_host defined in the supervisor as 127.0.0.1

@FIX: Python API issue where null, empty or incorrect input to commands such
   as qb.block() will apply the function to ALL jobs/work items
   
   Passing a wrong parameter or a null/empty to these routines will now raise
   an exception.
   
   BUGZID: 63565

@FIX: Windows-specific bug where PreForkDaemon-based daemon's (supe , worker)
   background "kin" threads were not always running their intended routines.

@TWEAK: added more useful messages to print on sending host report to supe,
   and when checking job's resource requirements and reservations agains the
   worker's current resources.

@FIX: Worker memory/swap resource tracking (Linux)
   
   * memory/swap resource tracking was broken, as the code was adding the
     reserved amount on top of the actually used values.  For example, if a
     subjobs has 'reservations="host.memory=1000"' and actually running and
     using 900 MB, the code was incorrectly subtracting 1900 MB from the
     available host.memory for the worker.

@FIX: Python API routine qb.rangechunk() crash with Bus error on certain
   input.
   
   The Python API qb.genchunk() routine was reported to crash on input that
   included whitespace.  It turns out that the C++ _qb_rangechunk() routine
   was crashing when it had an empty input sequence.
   
   Also fixed regex in Python API routine qb.rangechunk() to be more
   permissive about whitespaces in sequence strings (i.e., " -1 - 10 x 1" is
   equivalent to "-1-10x1").
   
   BUGZID: 63559
   ZD: 3401

@FIX: OSX: host swap "usage/total" collected and displayed accurately
   
@CHANGE: OSX: host memory usage excludes "inactive" memory

@FIX: fixed accuracy of the "host.memory" reports (used/total) of workers on
   Linux.

   BUGZID: ZD: 3308

@CHANGE: removed the not-too-useful tmp_used worker host property

@FIX: fixed accuracy of tmp_used.

@FIX: changed worker code that collects the /tmp usage to use statfs(),
   instead of crawling the dir (recursively), for efficiency.
   
   BUGZID: 63475
   ZD: 3175 2586

@CHANGE: Modified QbWorker::remoteConfig() routine to retry up to 10
   times with random intevals, then give up.
   
@TWEAK: Added more useful error message to print on "tcp compressed
   header" writing errors.

@FIX: fixed perl .xs code to accomodate interface change in perl 5.11
   and above.

@FIX: disabled code (temporarily) that crawls /tmp to get its size, as
   it was causing worker threads to choke on systems with large
   amounts of data in /tmp.
   
   Change with Linux code only-- other platforms (win, osx), doesn't have any
   similar code that collects /tmp (or similar) size info.
   
   ZD: 3175

@FIX: fixed issue where worker_max_threads specified in qbwrk.conf
   wasn't being effective.
   
   ZD: 3175

@FIX: reduced number of retries when host is found "busy" (retry
   busy), in effect reducing the blocking time of the thread down to
   7 seconds max, and give up after that.
   
@FIX: added code to do a final check before dispatching a subjob to a
   worker, to see if there's already another subjob trying to start
   on it.  (Check is done in the duty table for a "ghost" entry for
   the host)
   
   ZD: 3175

@FIX: Fixed default value of worker_max_clients to actually be the
   intended value, 256
   (instead of picking up the default value for worker_max_threads,
   which is 8).
   
@FIX: reverted code that attempts to automatically backoff and retry
   in QbFarm::reserve().  it was causing many supe threads to stall
   for a long time, eventually maxing out the max_threads and
   resulting in "connection overflow.
   
   ZD: 3164

@FIX: removed debugging log output of "IN: QbQueue::clearGhostDutys( "

@FIX: Adding code to show why getpwuid() call failed

@CHANGE: optimization. add code to prevent supe from dispatching
   subjobs from the same job to a worker until the worker says "no".
   
   Works for most simple cases where "host.processors" is an integer value.
   
   This should prevent many brief "oversubscription" of workers seen
   (where the GUI would show workers running more subjobs than they
   have jobslots for) when jobs are first being dispatched.
   
   Changes made to the startJob(), startHost(), and startQualified()
   routines.
   
   Note that even with these changes, timing issues can/will still cause
   dispatch-a-subjob-then-get-rejected-by-worker scenarios (which is fine).
   
   ZD: 2646

@FIX: reverting back the supervisor python engine changes introduced
   in 6.1, which was causing initilizing threads to stall/crash.
   BUGZID: 63502
   ZD: 2894 3047

@FIX: don't create jobstatus_sk column in job_fact; it belongs in table
   version 6.
   
   move column creation commands into upgrade_v6 script

@CHANGE: Add section to Installation docs detailing
   Win7/Vista/Server2008 considerations - re: disabling UAC and
   Interactive Services Detection service

@NEW: datawarehouse: NEW FEATURE - add 'jobstatus' column to job_fact
   table to store terminal state of job in datawarehouse - customer
   request.
   
   * also create pfx_stats.tableversion placeholder for consistency,
      we already create a placeholder pfx_stats db if necessary.  Set
      the tableversion for the pfx_stats DB equal to 0.

@TWEAK: datawarehouse: TWEAK - allow the use of setting DATAWH_DIR
   env_var to specify location of dataw/h sql scripts during
   installation

@COSMETIC: datawarehouse: COSMETIC - update feedback printed during
   initial job fact table creation

@FIX: updated supervisor_flags and supervisor_log_flags to show the
   correct default values in the default qb.conf
« Last Edit: April 05, 2011, 12:26:55 PM by jburk »