Hello,
I'm having and issue where, when our farm is full, a single worker will cause all of the jobs on our farm to fail. Basically the worker will fail and become the only idle, unlocked worker. And because all of the other workers are busy or locked, it will be picked up by the next pending job, and proceed to fail it as well. This continues until all pending jobs on the farm have failed. When the farm is being monitored, the fix is as simple as rebooting the problematic machine and restarting the failed jobs. Is there a way to automatically lock a worker when it fails a certain number of subjobs within a certain period of time? I'd also like the machine to automatically lock, reboot, unlock, and then retry the failed subjobs. But I'd be satisfied with a way to automatically lock for now. Any help is greatly appreciated. Thanks