Why Proactive Incident Management is important

Next Story

Containerization 101

What are you waiting for?

Recently intercepted and decrypted conversation between Batman and Superman:

Batman: “Luthor has a bomb planted in Metropolis Central Station, set to go off in 5 minutes; I’m on my way but can’t get there in time to stop it, but you can!”

Superman: “I suppose I could, but you said it hasn’t gone off yet, right?”

Batman: “What kind of question is that!?”

Superman: “Well, I took this ITIL Foundation course, and by definition things are not incidents until they cause an issue – so until the bomb goes off, we don’t need to worry!”

Batman: “What have I told you about taking those online courses? Definitions don’t give you all the information – any condition that will cause an issue is an incident too! Now get over there and stop that bomb!”

Superman: “Okay, okay! (09 second pause). Alright, bomb disarmed. Now, explain it to me…”

The narrowly averted (thanks to Batman) disaster above is an illustration of the application of something I first wrote about in 2002 – proactive incident management. I know that many see incident as reactive only, which has led innumerable times to issues that actually have resulted in disaster because of the myopic focus many seem to have on specific “ITIL” verbiage (though ITIL is not the only topic to have this problem, by far). In fact, the scenario above is a parody of one I actually had with an “ITIL expert” years ago in which she steadfastly insisted that unless there was an actual outage, you could not call something an incident, nor use Incident Management to stop something from happening.

Having encountered similar beliefs over the years, I have finally decided to set down in writing just how silly that is.

Those who penned (and those who currently steward) the guidance found in ITIL never intended to imply that an incident is not an incident until or unless it results in the obvious degradation or loss of a service.  That, frankly, would be ridiculous – almost as ridiculous as those that insist on the idea that something is not an incident unless it results in disruption. Inevitably, they cite “ITIL” as the reason they state this – though amusingly, ITIL guidance is exactly why I can show the notion is not true. (Note I don’t just state “ITIL says…”, because ITIL® is not a person nor an entity or even an AI, and it does not actually “say” anything. Nonetheless, some always speak of ITIL as if it is an old drinking buddy, so it seems important to point this out once in a while).

Unfortunately, while trying to emphasize the importance of quickly restoring service when something has caused disruption (especially in the 2007 rewording of the definition of an incident), the authors inadvertently deemphasized that Incident Management does have a proactive component.

Don’t just think, act

Moving away from ITIL for a moment, let’s ponder what ‘proactive’ really is. If asked, most of you would likely go with ‘preparing for something ahead of time’ as a definition. But if you google it, you will get a much different one: creating or controlling a situation by causing something to happen rather than responding to it after it has happened. Not what you thought, right? Let me wow you a little more by saying that if you think about it (and this is just me now), every proactive action is in fact a reaction to something.

I am guessing a few flash headaches have occurred, but consider: why would you do something proactively? The obvious response is that you wanted to do something now about something that could happen later – in other words, you are reacting to a realization or possibility and doing something now rather than when that possibility occurs.

Yes, it’s in there

So, back to ITIL then. Remember the version 2 definition of an incident, which included the words “any event which disrupts, or which could disrupt, a service”? The part in bold should be pretty clear in telling you that the expectation would be for you to address now – before service is disrupted – any event that could/will impact you if you wait (recall that in ITIL-speak, an event is “any change of state that has significance for the management of a configuration item (CI) or IT service”). And by the way, that “or which could disrupt a service” verbiage is still in there – they just moved it to SO 4.2.2.

Similar statements can be found throughout the Incident Management section of the Service Operation book – but the head slap obvious statement is the last sentence of 4.2.5.1, which is “Ideally, incidents should be resolved before they have an impact on users!

Amen, brother.

So to press my point, one more scenario: You are an air traffic controller, and note on your instruments that two planes are on an intersecting course and will collide midair in 2 minutes unless you redirect them. Should you:

  1. Immediately contact them and redirect their courses so that they do not collide, or
  2. Wait until they collide and then scramble the NTSB

I assume the choice is obvious – but if anyone selected option 2, you need to make an appointment (today) with a good psychiatrist and seek treatment.

Tick Tock

There is a factor some of you have no doubt picked up on that matters here: time. If in the course of monitoring you note a condition that will lead to an issue a week from now, well then use change management or whatever you like to handle it. If however what you discover will lead to an issue in the next 10 minutes (or any timeframe that requires you respond quickly to prevent the issue), then it’s time to start working on that incident!

Treating conditions that could/will lead to issues as incidents and addressing them before they cause impact only makes sense, and is only a reflection of what we do ourselves every day. When driving to work, you stop for or go around obstacles in the road, right? “Baby-proof” your home when you have a child, turn off the power before working on electrical connections, and any of hundreds of other scenarios like these? Again, it only makes sense that you would. So presumably (hopefully) you see the advantage of applying the same thinking to working in IT service provision.

You don’t have to be a hero to be proactive, but if you are, your customer is more likely to see you as one – and will be a lot happier. What are you waiting for?

The following two tabs change content below.

Michael Keeling

Michael has been providing consulting and guidance in IT Operations, ITSM and SIAM to enterprise level organizations in many industries for more than 20 years, and has extensive background in data center and service desk operations, technical writing, mentoring, cause analysis and workflow improvement. He is known for bringing the view of a detective to these efforts, perspective he credits to education in crime scene investigation and over 10 years designing processes and performing risk management in the private security sector prior to his career in IT. A confirmed realist that believes no project can be truly successful unless all involved parties are grounded in reality, Michael is always prepared to paint ‘the elephant in the room’ bright yellow when appropriate….

Latest posts by Michael Keeling (see all)