Addressing System Failures Before They Wreak Havoc

To quote the late Rodney Dangerfield, IT systems “get no respect.” We all rely on them to accomplish our jobs, pay our bills and carry on with our day-to-day activities, without giving them a second thought – until they stop working.

A recent hardware failure at the Internal Revenue Service (on Tax Day) prevented U.S. taxpayers from filing their taxes online; earlier this year, the U.S. Customs and Border Protection (CBP) computer system experienced an outage that left thousands of travelers waiting in huge lines just to clear customs. And these are just the stories that were reported, but problems like these happen in enterprises all the time. It just goes to show how much damage can be caused when system failures occur and how much they can tax IT departments, who are also struggling to modernize legacy IT systems, prove the value of IT expenditure; and ensure business end-user productivity.

In addition to system failures being caused by aging computing infrastructure, the culprit also can lie in malware infecting computer networks; or even by human error on the part of business end users.

The only real way to avoid the mayhem that can be ignited when systems fail is to work to avoid them in the first place. This means updating legacy systems, regularly checking to make sure your anti-virus software is current, and implementing automated alerts to help you identify where problems could occur. Another way is by providing a positive business end user experience. Not only can endpoint device failures impact productivity, but they can also create huge IT problems.

Just as nine out of 10 fire disasters can be prevented when smoke detectors are used to stamp them out and homeowners take precautionary measures to ensure they don’t happen, so too can IT take measures to avoid problems, so that when the big one occurs, they can be singularly focused on putting it out.

All this aside, however, when that big one ignites, the key to addressing it quickly is through increased visibility across the corporate network, and faster incident resolution.

Increased Visibility. When failures occur, IT need to restore normal computing services as quickly as possible to minimize the impact to the business. Yet ironically, in a technology industry where innovation is the driving force, key elements of incident management have not been updated in 30 years. As IT infrastructure becomes more complex, and security is paramount, organizations need to ensure visibility across the entire network, including the endpoints, where problems can originate.

As new software is rolled out and systems crash, IT must be able to detect where problems might be occurring and what endpoints are impacted in order to resolve them.

They also need to see into situations that are ripe for problems, such as overloaded systems reaching storage limits; inadequate security protocols; end-user activity on suspicious websites; or even burned-out employees, who are spending too much time on their computers past hours.

Faster Incident Resolution. IT departments are increasingly responding to an uptick in business end-user IT incidents both small and large. As they focus on putting these fires out on a daily basis; the real inferno could be occurring in the back-end that could suddenly be one of the show stopper failures mentioned above.  IT departments can have automated systems in place that can alert them to potential issues, provide automatic remediation, as well as faster Mean Time to Resolution (MMTR) of issues, so that they are free to address the large issues and take a more strategic approach to prevention.

There’s no doubt that headline-grabbing system failures will continue to occur as technology becomes ever-more complex and people and industries becoming more reliant on it to perform almost every function of everyday life. As technology becomes increasingly more pervasive and omni-present, businesses and governments can’t wait for a major computer crash to understand that proactively addressing potential problems and monitoring the health of the computing infrastructure – from back-end to the end-point – can go a long way to keeping systems online and business moving forward.