Introduction:
We have scheduled a, two-hour file server outages for Wed March 13 @ 9pm to install hardware in preparation for an computing cluster enhanced network speed roll-out.
If you use Condor, or have long-running computational jobs, you should shut down your jobs during both outages.
Background:
Recently, our file servers have saturated network connections between the file server and our core infrastructure switches. At network saturation levels, general slow-downs to every connected computer occurs. The majority of the network traffic is bound for our computational cluster.
Method:
Offloading cluster compute traffic onto a private network will benefit the department by reducing network saturation levels to the file servers on the biostat network. Additionally, having a dedicated network for cluster compute traffic allows for better management and higher throughput of departmental core services and computational cluster resources.
Preliminary Data:
We are experimenting with a new 40 gigbit cluster computing network infrastructure which has the potential to provide speed improvements for our computational cluster of up to 10x to each node and in some cases up to 80x for some data intensive cluster computing jobs. The new network infrastructure is currently in testing and has shown improvements for throughput on testing compute nodes.
Submitted by harrison on Thu, 03/07/2019 – 14:02