IPMI is a system that allows you to maintain a system where it would not otherwise be able to. However, you have to convince others that it will be useful. As obvious as it is for a professional system administrator, there are others who will not see the usefulness of IPMI. This process – making the case for a technology – comes up more than most system administrators might realize.
There are numerous situations that require having a person to be actually present at a machine to do things – such as entering the setup, accessing the UNIX/Linux maintenance shell, changing boot devices, and so forth. So how do you prove how beneficial implementing full IPMI support is?
Provide a business case – and perhaps an informal user story – to show how IPMI can reduce the need for a person to actually be present.
To really make the case for IPMI, compute the actual costs in making a trip to the data center – the hourly cost of the administrator, the driving costs in gasoline (both to and from!), and the costs associated with handling the expense report. Report the other unquantifiable costs – the cost of an administrator unavailable for other tasks during the four hour round trip, including projects being delayed and problems not getting resolved. Combine this with a user story.
For example, create a user story around a possible kernel panic. A user story requires an actual user – an individual – whose story is followed. Here is our example continued as a user story:
Alma received an email that a system (db20) was unresponsive. Checking the system found that there was no response from the network at all, and checking the KVM showed that there was a panic on the display. No keystrokes were accepted by the system, and there was no way to power-cycle the system.
So Alma sends an email stating that she will be unavailable for the rest of the day, and calls her babysitter to take come and take care of her three children for the evening. Then she gets into her car and drives the two hours to the data center and parks in a lot ($10 for one hour). She power-cycles the machine by pressing its front panel power button, and checks the system response using her laptop. She finds that the server is responding and logs in.
Then Alma checks the server: logs show no problems restarting, the system has restarted cleanly, the subsystems are all running, and the monitoring system shows all systems are good.
Alma leaves the data center and drives the two hours back home.
If Alma is paid $60,000 yearly, the cost of her time spent on this event is US$144.23. If she drove 320 miles round trip at .76 cents a mile, she gets US$243.20 as an expense – in addition to the US$10 in parking fees. This makes a total direct cost of US$397.43.
If something like this happens six times a year, then the total yearly cost is US$2384.58 – and total downtime for the server is 24 hours for an uptime rate of 99.72%.
This account doesn’t include the indirect costs – such as projects being delayed because Alma was unable to work on them, nor does it include the personal costs involved such as babysitting and time away from family. It also doesn’t include the time that HR staff spent on yet another expense report. It also doesn’t include the costs associated with the server being unavailable for three hours.
On the other hand, Polly received word that another server in the data center was unresponsive, and also found that the kernel had panicked and there was no response from the console. She then used a command line tool to access the baseboard management controller (BMC) through IPMI. With an IPMI command, she rebooted the server, and watched the response on the KVM. Checking the system over, Polly found that the server had booted cleanly, subsystems were operational, and all looked good.
If Polly is paid the same amount as Alma, and her response took 15 minutes, we get a total cost of US$7.21. Downtime was reduced by 92% (along with an associated reduction in costs tied to the server being down). If this happens to Polly six times a year, the total yearly cost is US$43.27 – and a downtime of 1.5 hours for an uptime rate of 99.98%.
Thus, IPMI and SoL would have saved Alma’s company US$2377.37 per year.
The strongest case can be made if a recent event could have been solved with the technology you are proposing. If you can point to a situation that could have been resolved through ten minutes instead of hours or days without – then the usefulness of the technology will be apparent.
With this user story and business case, the case for IPMI should be readily apparent to just about anybody. Similarly, the case can be made for other technologies in the same way.