Software to Keep Servers Running During Cooling Failures

Purdue has created software for Linux that will slow down processors during a cooling failure in a data center.

While a processor runs, it generates heat. The slower it runs, the less heat it generates. Thus, when the air cooling system in a data center fails, the less heat the better. When thousands of servers are clocked downwards, the heat savings will be tremendous.

With the software from Purdue, a server will slow way down in order to generate the least amount of heat possible. With this change, servers can actually be kept running longer and thus could potentially avoid downtime entirely.

At Purdue’s supercomputing center where this was developed, they’ve already survived several cooling failures without downtime.

Purdue’s situation, however, does appear to have some unique qualities. One is that the software was designed for their clusters, which number in the 1,000s of CPUs – meaning that activating a slow-down can happen across several thousand servers simultaneously. This has a tremendous affect on the cooling in the data center and also becomes easy since all the servers are identical.

With that many servers, the cluster can dominate the server room as well. In a heterogenous environment like most corporate server rooms, software like this would have to be on all platforms to be effective.

The places that slowdown software could be most effective is in large clustered environments, as well as small or homogenous environments. Slowdowns could be triggered by many things: cooling failures, human intervention, or even heating up of the server itself.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: