Author Topic: Workers going down  (Read 4600 times)

Nikos

  • Full Member
  • ***
  • Posts: 15
Workers going down
« on: June 21, 2006, 12:07:56 PM »
I got back in the office this morning and looked at the Host List from within qubic and could
see that a couple of our test nodes seemed to have gone down. I tried to restart the qubeworker service but
got the following error message:

---------------------------------------------------
Could not start the qubeworker on Local Computer
Error 1: Incorrect function
---------------------------------------------------

Upon further investigation into the worker log on "RB-01" I could see the following:

Exec Database Error: no such table: assignment(1)
SELECT job_id, subjob_id FROM assignment
Exec Database Error: no such table: variables(1)
SELECT name, value FROM variables
Exec Database Error: no such table: resources(1)
SELECT fullname FROM resources
Exec Database Error: no such table: properties(1)
SELECT fullname FROM properties
Exec Database Error: no such table: assignment(1)
SELECT job_id, job_pid, job_serverid, job_pgrp, job_password, job_cluster, job_priority, job_globalorder, job_localorder, job_user, job_domain, job_name, job_label, job_reservations, job_groups, job_hosts, job_hostorder, job_cpus, job_restrictions, job_requirements, job_status, job_subjobstatus, job_agendastatus, job_data, job_prototype, job_kind, job_path, job_logpath, job_prototypepath, job_todo, job_lastupdate, job_timesubmit, job_timestart, job_timecomplete, job_flags, job_account, job_env, job_reason, job_timeout, subjob_id, subjob_status, subjob_data, subjob_result, subjob_count, subjob_retry, subjob_seq, server_address, procid, trid, verified, outpos, errpos, orders, timestart, started, missing, sid, jobstats_jobid, jobstats_subid, jobstats_threads, jobstats_start, jobstats_end, jobstats_maxmemory, jobstats_maxswap, jobstats_host FROM assignment
Exec Database Error: no such table: variables(1)
SELECT name, value FROM variables
Exec Database Error: no such table: locks(1)
SELECT fullname FROM locks
Exec Database Error: no such table: assignment(1)
SELECT job_reservations FROM assignment
Exec Database Error: no such table: assignment(1)
SELECT job_id, job_pid, job_serverid, job_pgrp, job_password, job_cluster, job_priority, job_globalorder, job_localorder, job_user, job_domain, job_name, job_label, job_reservations, job_groups, job_hosts, job_hostorder, job_cpus, job_restrictions, job_requirements, job_status, job_subjobstatus, job_agendastatus, job_data, job_prototype, job_kind, job_path, job_logpath, job_prototypepath, job_todo, job_lastupdate, job_timesubmit, job_timestart, job_timecomplete, job_flags, job_account, job_env, job_reason, job_timeout, subjob_id, subjob_status, subjob_data, subjob_result, subjob_count, subjob_retry, subjob_seq, server_address, procid, trid, verified, outpos, errpos, orders, timestart, started, missing, sid, jobstats_jobid, jobstats_subid, jobstats_threads, jobstats_start, jobstats_end, jobstats_maxmemory, jobstats_maxswap, jobstats_host FROM assignment
Exec Database Error: no such table: locks(1)

Seems to me as if the mySQL table has gone screwy?
My solution was to run the "upgrader_worker -reset" command and that seem to take care of the problem.

However, this is a bit worrying because it hasn't just happened once but on a few other nodes as well on various occasions.
What could be causing this corruption?
 
For you information, our test supervisor runs Windows XP Professional 64bit and our test nodes run on Windows 2000 for now.

Thanks!
Nikos

anthony

  • Senior Software Engineer
  • Hero Member
  • *****
  • Posts: 183
Re: Workers going down
« Reply #1 on: June 21, 2006, 09:55:50 PM »
The upgrade_worker --reset was the correct action actually.  As far as corruption, I do need to know a little bit more about what you're running on that host.

Are those win x64 machines? or x86?

Thanks,
    Anthony

Nikos

  • Full Member
  • ***
  • Posts: 15
Re: Workers going down
« Reply #2 on: June 22, 2006, 01:25:00 PM »
They are x86 machines on Windows 2000 SP4.

However, after a bit more research it seems like they went randomly down whenever I updated the qbwrk.conf.
I haven't touched the config file for a day now so when I got back in the morning all the nodes were up and running so maybe it had something to do with that...

anthony

  • Senior Software Engineer
  • Hero Member
  • *****
  • Posts: 183
Re: Workers going down
« Reply #3 on: November 27, 2007, 09:33:49 PM »
Hey Nikos,

   Just checking up on this forum post, are you still having issues?

    A.