I have a render farm that is generally functioning well. But on occasion when there are many jobs in queue (some processing, others pending, and a new one that comes to the queue which starts processing with recently freed processors) we experience difficulties. I have seen a series of error messages like:
ERROR: unable to establish tcp connection with 10.0.180.1 - unable to connect to host.
ERROR: udp receive socket timed out.
[Nov 14, 2011 18:03:12] VENTUROUS[1812]: WARNING: work report attempt[58] failed; resending.
[Nov 14, 2011 18:03:12] VENTUROUS[1812]: WARNING: sleeping [5] seconds before retry.
I am not sure if the specfic conditions we experienced are relevant or if it is basically expected when dealing with large queues. Anyway I get the feeling that this is a symptom of the way that Windows 2008 R2 is configured and that 'port exhaustion' maybe occurring. But I was wondering if anyone else had come across this issue? Or have any advice on how to tweak the configuration/registry to make it handle the kind of loads a busy queue can exact on the tcp sockets? Should I be doing this on worker and supervisor or supervisor only?
Our render farm has 20 workers. The farm is running version 6.2.1 of Qube and Windows 2008 R2 SP1.