In the final part of this three-part blog we look at the pleasure and pain involved with Major incident management
Major Incident Management
To my mind the three areas, outside of event management, that really improve the way that major incidents are handled, are communication, learning from the incident and being transparent.
In IT, we are not an island. We are a core part of most organisations, whether they and we like it or not. We have to share what is happening, what we have learnt from the incident and what is going to happen to reduce the chance of it happening again. Otherwise, how can we even hope to be trusted?
I have seen IT teams embrace the opportunity to operate, during major incidents, with heads down in a locked room keeping information secret thereby increasing the chance of recurrence.
The trouble with being like this is that the wider organisation starts to accept that this is the way it is. So they don’t complain formally, they just moan to each other.
And in fairness to IT, how often do you see a review of a failed marketing campaign? Or a full and frank review of a failed pay run (apart from to blame IT; who ever heard finance say that they should have had business continuity plans in place because they accepted that the system was not resilient when it was implemented?)
Get it right however and the pleasure is shared amongst all concerned parties.
Communicate with your customers and users. Let them know what is going on, even if that communication is just to say that you are still working on it. Set up lines of communication. Make sure that the technicians working on the fix are left alone to get on with it, but have one person who gets updates. If you commit to telling users every 30 minutes or hour, get an update 10 mins before. Keep telling people what is happening. AND keep the service desk updated.
When the service is restored, carry out a full review. What went well and what didn’t. Keep to the facts. No emotions. What infrastructure components failed. Can resilience be done better? Should the service be resilient if everyone in the rest of the business was jumping up and down? Should processes inside or outside of IT be amended? Set actions. Set timeframes. Follow up on them.
Share the outcome with as much of the rest of the business as possible. If there has been a failure and you have a plan to mitigate the chances of this happening again, tell everyone. If you have a piece of work in place and it is being progressed, but the issue recurs sooner than you can finish the remediation, less people are going to complain than if you had done nothing.
Use the incident as a driver for good.
Service Management Life Lessons Learnt
Some key lessons that I have learned that I would share with you on removing some of the pain from Operations and Service Management.
- Don’t be afraid to bring in consultants, but find ones with real-world experience. You probably know what needs to be done, but do you have the time to think it through and implement the change, while also doing your day job? Probably not.
- Don’t be afraid to take advice and use a mentor. Other people’s thoughts are sometimes the clarity you need when the trees are hiding the woods.
- Don’t forget to ask the people on the ground what is wrong or what can improve. They have good ideas; often the best ideas.
- Ask a lot of questions before starting.
- Don’t try and do too much too quickly. Keep going for the “low hanging fruit”. It’s always there and it’s always easier to do that than try and pick all fruit in one day. Look up Tipu by Rob England.
As a consultant, remember, you don’t know it all. You can learn from each client. Admit it to them and yourself. Also, use your peers. Accept that other people can sometimes do things better, and generally will help.
Latest posts by James Gander (see all)
- Bringing DevOps and ITSM Together – itSMF NZ Conference 2017 - March 28, 2017
- The Service Desk – How do you make improvements? - January 14, 2016
- The Service Desk is not dead! - January 7, 2016