I have run numerous tests on the farm:
- Worker group Test (nodeA, nodeB), nodeA (/rave), nodeB (/rave/1st). Worker services stopped. Jobs sent to clusters /, /rave and /rave/1st. Worker services started. The /rave/1st job runs 1st, then the /rave and finally the /.
Nothing strange there.
- Worker group Test (nodeA, nodeB, nodeC), nodeA (/rave), nodeB (/rave/1st), nodeC (/other). Worker services stopped. Jobs sent to clusters /, /rave, rave/1st and /other. Worker services started. The /rave/1st job runs 1st, then the /other, then the /rave and finally the /.
All of which makes sense again given that /other has priority on nodeC.
- Worker group Test (nodeA, nodeB, nodeC), nodeA (/rave), nodeB (/rave/1st), nodeC (/). Jobs sent to clusters /, /rave, and /rave/1st. The /rave/1st job runs 1st but doesn't use nodeC even though its free, and then the /rave on nodeC. Odd as I would have expected on the FIFO order of things job / to be next. And finally the /.
Some oddities here given that free cpu's weren't used to start with and that FIFO queueing didn't then apply for the non-clustered (job /) given that cpu's were free.
What happened in both these cases?
Another oddity I noticed in general was that in most cases (I ran more tests than these) it isn't always the job that starts first is finished first. Now all jobs executed were the same, but in most cases the /rave/1st job started first and finished last and the / job start last but finished first. This was because a couple of frames on the 1st job inexplicably took over 10 mins to complete. Whereas the /job all completed in a short time. What was going on with the total completion times here?
But at least the /rave/1st always seemed to start before the /rave or the /.