Author Topic: qube supervisor on os x 10.5.8 completely broken  (Read 9577 times)

westernx

  • Hero Member
  • *****
  • Posts: 55
qube supervisor on os x 10.5.8 completely broken
« on: July 22, 2010, 09:48:29 PM »
Hello,

My primary issue is the supervisor does not see or accept new jobs, when new jobs are submitted. 

I've had to restore my xserve back to an older backup from a time macine backup, because of a serious crash.  First I backed up the job list in /var/spool/qube/job to a server.  I restored the HD to an older version.  I copied back the job folder to /var/spool/qube/, and I actually rsynced all the mysql folders from yesterday, before the crash, to the new restore.  So I rsynced ;

/usr/local/mysql-standard-4.1.22-apple-darwin8.5.1-i386
/usr/mysql
/usr/share/mysql
/private/var/mysql

. . . to the corresponding locations in the new restored xserve HD

I start the mysqld with ;

/Library/StartupItems/MySQLCOM/MySQLCOM start

I start the qube supervisor with ;

/Library/StartupItems/supervisor/supervisor start

and I refresh all the workers ;

/Applications/pfx/qube/sbin/qbadmin worker --refresh

. . . and I restarted all the workers on our servers

I did not get any errors starting the daemons ;

But the qube supervisor is not working correctly. I logged into mysql and listed the databases ;

mysql -u root -p

mysql> show databases ;
+----------+
| Database |
+----------+
| 126qube  |
| 127qube  |
| 128qube  |
| 129qube  |
| 130qube  |
| 131qube  |
| 132qube  |
| 133qube  |
| 134qube  |
| 135qube  |
| 136qube  |
| 137qube  |
| 138qube  |
| 139qube  |
| 140qube  |
| 141qube  |
| 142qube  |
| 143qube  |
| 144qube  |
| 145qube  |
| 146qube  |
| 147qube  |
| 148qube  |
| 149qube  |
| 150qube  |
| 151qube  |
| 152qube  |
| 153qube  |
| 154qube  |
| 155qube  |
| 156qube  |
| 157qube  |
| 158qube  |
| 159qube  |
| 160qube  |
| 161qube  |
| 162qube  |
| 163qube  |
| 164qube  |
| 165qube  |
| 166qube  |
| 167qube  |
| 168qube  |
| 169qube  |
| 170qube  |
| mysql    |
| qube     |
| test     |
+----------+
48 rows in set (0.00 sec)

so I tried to repair all of them just in case ;

mysqlcheck -p -r 170qube
mysqlcheck -p -o 170qube

. . . I did this for every database entry, but still the qube supervisor will not even show new jobs.  Actually, the gui shows three jobs from over a month ago,  which cannot be killed

I tried reinstalling the qube supervisor too, but this did not fix the issue.

What , if anything, can I do to get the supervisor working correctly again. 

Please help,

Thanks,

ryjguy7



jburk

  • Administrator
  • *****
  • Posts: 493
Re: qube supervisor on os x 10.5.8 completely broken
« Reply #1 on: July 22, 2010, 10:16:23 PM »
what does
Code: [Select]
mysql> select count(id) from qube.job;return?

anything about database errors in your supe logs?

jburk

  • Administrator
  • *****
  • Posts: 493
Re: qube supervisor on os x 10.5.8 completely broken
« Reply #2 on: July 22, 2010, 10:18:30 PM »
you can always simply run:

Code: [Select]
$QBDIR/utils/upgrade_supervisor --reset
to re-initialize the tables.  This will clear out the qube tables; can't remember if it will drop the nnqube db's

westernx

  • Hero Member
  • *****
  • Posts: 55
Re: qube supervisor on os x 10.5.8 completely broken
« Reply #3 on: July 22, 2010, 10:22:58 PM »
mysql> select count(id) from qube.job ;
+-----------+
| count(id) |
+-----------+
|      1701 |
+-----------+
1 row in set (0.00 sec)

. . . . there should be up to job 21773

. . . in the supe log a bunch of PreForkDaemon messages

PreForkDaemon::eventloop(): Forking child in while(ready)
 procs current[14] max[128]
>>>> sleeping 1 milliseconds
PreForkDaemon::eventloop(): Forking child in while(ready)
 procs current[15] max[128]
>>>> sleeping 1 milliseconds
PreForkDaemon::eventloop(): Forking child in while(ready)
 procs current[16] max[128]
>>>> sleeping 1 milliseconds
PreForkDaemon::eventloop(): Forking child in while(ready)

. . . etc

R

westernx

  • Hero Member
  • *****
  • Posts: 55
Re: qube supervisor on os x 10.5.8 completely broken
« Reply #4 on: July 22, 2010, 10:25:21 PM »
. . . cool, I ran the $QBDIR/utils/upgrade_supervisor --reset command

should i restart the supe and mysqld too ?

R

westernx

  • Hero Member
  • *****
  • Posts: 55
Re: qube supervisor on os x 10.5.8 completely broken
« Reply #5 on: July 22, 2010, 10:29:10 PM »
. . . I guess there is no way reload old job data from /var/spool/qube/job? is there?

R

westernx

  • Hero Member
  • *****
  • Posts: 55
Re: qube supervisor on os x 10.5.8 completely broken
« Reply #6 on: July 22, 2010, 10:58:17 PM »
. . . O.k it is working again,  two test frames rendered, one is complete but it still just shows it is running at %100 percent complete.  Is there a way to speed up the refresh from qui to supervisor?

. . . oh ya thanks for your help,

R

jburk

  • Administrator
  • *****
  • Posts: 493
Re: qube supervisor on os x 10.5.8 completely broken
« Reply #7 on: July 23, 2010, 12:59:19 AM »
Go into the GUI prefs and enable the auto-refresh.

I suggest only enabling the auto-refresh for 'refresh selected' though, to cut down on the amount of data passed between the supe and client on a regular basis.

And no, it's not possible to repopulate the mysql tables from the /var/spool/qube contents.

It should have worked when you copied the old tables in /usr/local/mysql-standard-4.1.22-apple-darwin8.5.1-i386 (as long as the mysql server was not running), but you should have avoided copying the contents of:

/usr/mysql
/usr/share/mysql
/private/var/mysql

In the future, only copy the qube tables in mysql datadir (usually /usr/local/mysql-standard-4.1.22-apple-darwin8.5.1-i386/data/*qube/), and ensure that the mysql server is not running when you do this.

But if this happens again, I'd recommend just starting over with the qube db's unless you absolutely need to resurrect the jobs for some reason.

westernx

  • Hero Member
  • *****
  • Posts: 55
Re: qube supervisor on os x 10.5.8 completely broken
« Reply #8 on: July 23, 2010, 08:24:16 PM »
. . . thanks alot for your help

I still have a job from 3 days ago showing in the gui but not actually running.  If I try to remove using the qube gui I get the following ;

MYSQL>>> SELECT * FROM 170qube.21772work
MySQL Error: (1146, "Table '170qube.21772work' doesn't exist")
MYSQL>>> SELECT * FROM 170qube.21772subjob
MySQL Error: (1146, "Table '170qube.21772subjob' doesn't exist")
MYSQL>>> SELECT * FROM 170qube.21772callback
MySQL Error: (1146, "Table '170qube.21772callback' doesn't exist")

Is there a way to completey remove just this job from the gui so it doesn't confuse others trying to render?

Thanks,

R

jburk

  • Administrator
  • *****
  • Posts: 493
Re: qube supervisor on os x 10.5.8 completely broken
« Reply #9 on: July 25, 2010, 03:25:23 AM »
Delete any references to that job in the qube tables, and check for any tables left over in the 170qube database

Code: [Select]
mysql> delete from qube.job where id=21772;
mysql> delete from qube.duty where jobid=21772;
mysql> delete from qube.assignment where jobid=21772;

And check for any left over tables:

Code: [Select]
mysql> select table_schema, table_name from information_schema.tables where table_name like '21772%';
BTW: the '170' in the 170qube database name comes from

Code: [Select]
mysql> SELECT 21772 >> 7;
+------------+
| 21772 >> 7 |
+------------+
|        170 |
+------------+

(you're dividing by 128)
That should about cover it

westernx

  • Hero Member
  • *****
  • Posts: 55
Re: qube supervisor on os x 10.5.8 completely broken
« Reply #10 on: July 28, 2010, 05:45:17 PM »
Code: [Select]
mysql> select table_schema, table_name from information_schema.tables where table_name like "21820%" ;
ERROR 1146 (42S02): Table 'information_schema.tables' doesn't exist

. . . . I removed the job db entry but in the gui the sub jobs are showing as running on servers in the Host/Worker Layout tab

R
« Last Edit: July 28, 2010, 06:48:56 PM by ryjguy7 »

westernx

  • Hero Member
  • *****
  • Posts: 55
Re: qube supervisor on os x 10.5.8 completely broken
« Reply #11 on: July 28, 2010, 08:29:40 PM »
o.k so I manually deleted each subjob id from the qube database like you recommended for the job id;
Code: [Select]
mysql> delete from qube.job where id=21820.19 ;
Query OK, 0 rows affected (0.00 sec)

mysql> delete from qube.assignment where jobid=21820.19 ;
Query OK, 0 rows affected (0.00 sec)

mysql> delete from qube.duty where id=21820.19 ;
Query OK, 1 row affected (0.00 sec)

. . . etc . . . etc

This worked for cleaning the unwanted subjobs showing in the gui that were not actually running

R
« Last Edit: July 28, 2010, 11:20:19 PM by ryjguy7 »

jburk

  • Administrator
  • *****
  • Posts: 493
Re: qube supervisor on os x 10.5.8 completely broken
« Reply #12 on: July 29, 2010, 05:57:20 PM »
When you delete from the duty or assignment tables, specify "WHERE jobid="  and supply the jobid without the ".nn" subjob id: ie, 21820 instead of 21820.19   This will get clear out all subjobs for the job, otherwise you'll have to run the same command for each and every subjob still appearing.  In those 2 tables, "id" refers to the subjob, and "jobid" is an easy way to get all the subjobs for a single job in one operation.

It's not the tidiest solution in the world, but it is effective. 

It's like using a shotgun to kill rattlesnakes; just don't shoot your foot off and everything's good.