Close this search box.

The End of Incident Management (as we know it!)

Can you imagine an ITSM world without Incident Management?

While important, Incident Management (IM) is one of the most inefficient (and sometimes ineffective) ITSM processes. So much of the success (or failure) of IM depends on human interpretation.  What is the consumer seeing?  What kind of day is the consumer having? What does the consumer perceive the issue to be?  How is this being communicated to the service desk agent?  What is the perception that the service desk agent of the consumer?  How is the service desk agent interpreting what is being said? Did the service desk just deal with an angry consumer, and hasn’t quite moved on from that call?  What does the service desk agent log in the incident record?

Based on these interpretations and judgements, the IM process is triggered.  Then the fun begins.  Is there an existing known error record or knowledge article that will help?  Is the escalation procedure setup correctly?  Does what is documented in the incident record actually reflect the situation? If the information logged in the incident ticket isn’t correct, known error databases and escalation procedures may not be helpful. Which means follow-up calls.  Or a deskside visit.  All the while, the consumer is not able to use the service.

Why even open an incident ticket?  Think about it.  We already have event logging, which in essence is the ‘black box’ – recording what has happened on a device or within an application.  We have monitoring systems that keep watch for events that meet or exceed some pre-defined threshold.  We have orchestration engines that can automatically imitate actions in response to those events detected by monitoring systems. We have knowledge articles that can provide context and instruction for responding to incidents.

With all that we already ‘know’, why then do we manually log an incident?  Can’t technology log, manage, and resolve incidents for us?

We are closer than you might think.

Many ITSM tools leverage or integrate with orchestration engines to automate responses to incidents.  But this is just scratching the surface.

Technology is converging

We’re seeing a convergence of technology that will make IM (as we know it) obsolete.  We have the puzzle pieces.  With the Internet of Things, we have sensors on machines, detecting issues in real-time and exchanging that information with other machines.   Machine learning is a type of artificial intelligence that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data.[1]  Cognitive computing involves self-learning systems that use data mining, pattern recognition and natural language processing to mimic the way the human brain works.[2]  Robotic Process Automation (RPA) is computer software or a “robot” that can be configured to capture and interpret existing applications for processing a transaction, manipulating data, triggering responses and communicating with other digital systems.[3]

With this convergence in technology and the resulting capabilities, I can envision that IM – as we know it – will end.  Incidents will continue to occur (or will they?).  How we manage incidents – or stated more correctly – how incidents will be managed will change dramatically. The need for human intervention will be reduced – in many cases, completely eliminated. We may not even know at the time that an incident even occurred.

What does this mean for ITSM?

I see this as both good news and a call to action.

The good news:  the need for good ITSM is more important than ever.  Understanding how services deliver value to the business is critical.  With the convergence of technology, automation will be enabled by having well-designed service operation processes.

The call to action?  Get your ITSM implementation – especially your support processes – automation-ready.  How?

  • Invest in problem management. From the reactive perspective, what are the causes of service interruptions or degradations?  What patterns can be detected?   From the proactive perspective, how can technology proactively help with a fault-tree analysis or a component failure impact analysis?  I see problem management as a way to provide input to the design and use of this technology convergence.
  • Take knowledge management beyond just knowledge articles regarding incident response or self-service. Those knowledge articles can become the basis for RPA.  But, knowledge management must evolve into a comprehensive approach that transforms the massive amount of data that will be collected and parsed into actionable information.  What things do you need to know about data that makes it useful information?
  • Commit to the design and implementation of a formal event management process. There is so much information that we already know about incidents.  But because monitoring tools are often implemented in a siloed approach (e.g. the network team have their monitoring tools, the sysadmin team have their tools, the dev team have their tools, and so on), designing an enterprise-wide event management process is often ignored.  The fact is an incident impacting one part of the IT ecosystem has a ripple effect on other parts.  A formal event management process must be able to filter and correlate events from across the ecosystem.  This seems like a natural fit for cognitive computing and machine learning.
  • A formalized approach for continual improvement becomes a “must do”. But here’s a twist – with cognitive computing and machine learning, the technology will be able to suggest improvements, based on pattern recognition and learning.
  • Collaborate with business colleagues to get crystal-clear on rules regarding incident management. RPA provides the ability to respond in real-time to events.  This response must be based on business rules.  What conditions require what type of (automated) response? There won’t be a human interpretation of the event, so these rules must be clearly defined and regularly reviewed for effectiveness.

[1],  retrieved 10/27/2016.

[2],  retrieved 10/27/2016.

[3], retrieved 10/27/2016.

Light green canvas Freshworks and Device 42 logos with a friendly robot reflecting the news: Freshwork acquires Device42.
Hands typing on a laptop with digital icons, representing "What is Enterprise Service Management."

Explore our topics