IT Chronicles reached out to executives, thought leaders, experts, practitioners, and writers about a unique initiative. ITC would donate to Second Harvest for every article submitted in December by our past contributors. Thank you to all who contribute to this food drive. We appreciate your knowledge and leadership.
Fifty years as an employee; banks in the US, UK, and EU or as a freelance consultant.
Fifty years in IT; director, Agile, DevOps, Lean, ITSM, trainer, author, coach.
Fifty years of trying to keep the business in business by the best use of technology!
My teams have led recoveries from hurricanes, floods, significant snowstorms, 9/11, a CIO inadvertently powering off the data centre, London bomb scares, stock market crashes, hackers, lost data, cloud failures, and who knows how many software or hardware changes gone wrong.
We barely lost a customer due to a business continuity event. In fact, in one case, we attracted praise and many new customers such that the bank went from being the third-largest in Texas to one of the biggest banks in the Southwest US.
The purpose of this blog – to help you keep your business in business every day!
What is Business Continuity Management (BCM)?
Can every person in your organisation and every vendor you rely upon answer this question every day:
- Will I be able to do my job and service customers or help staff do their roles tomorrow?
It’s that simple. Imagine going home knowing that if something happens, you are ready to address the issue. The practices, guidelines, tools, skills, and communications are tested and prepared every day.
Not as a bunch of paper in a lengthy document based upon a static impact analysis. Instead, as part of a small set of scenarios that lead people into doing what is necessary to keep the business in business. Look at your current plan: did it predict COVID, WFH, mental stress, people going global for services, political changes, the transition to AI services, distribution challenges, and all the rest of what has happened globally in the last 20 months?
What most organisations consider to be “acceptable” Business Continuity
Actual (unfortunately all too often) conversation:
- Organisation: We have an annually tested plan
- Me: Are you practising agile or DevOps, or ITSM?
- Organisation: Yes, we do Scrum, some DevOps, and all under ITSM change management
- Me: Scrum means you change something 23-26 times a year, all under ITSM change management with the ability to create old or new quickly as you benefit from DevOps, including staff and customer impact.
- Organisation: blank face
- Me: As you rely upon ITSM, you also ensure that your business partners and vendors are part of your plan, AND you also ensure that you test to see what would happen if THEY had an issue.
- Organisation: More blank faces and more people look away
- Me; Finally, I can ask every person that works here or any business partner how did the last test go, and they will tell me what happened, their role, what went wrong, the status of fixing it and when the next test will occur to prove issues are resolved, hopefully within 2-3 sprints.
- Organisation: Someone now angry, says: well, this is not what we meant. We don’t have time for all of that.
- Me; So you don’t have time to keep your business in business. You don’t have time to ensure that customers won’t leave you within a couple of mouse clicks as they realise you are not ready to help or service them. Is this your definition of continuity in your business?
- And it goes on
Is this conversation recognisable where you work? Don’t you believe that it is time to have a different one?
Business Continuity Metrics
Please don’t give me the SLA with our customers’ pitch. Agreements are signed documents between you and someone else. Show me the signatures, and we can talk. Remember the airline that said, hey, we came back up with our 2-hour SLA (sorry, people impacted that took five days to get back home while we had planes in the wrong place or that even missed being a flight as we had to cancel many just to get back up and running). But hey, we were up as promised!
The primary metrics are:
- MAO: Maximum Acceptable Outage – the time it will take for your business to be adversely affected if you cannot provide a product or service.
- Example – You might keep the customers calm for a bit, but after five days of no electricity because of snow, they are going to get angry
- MBCO: Minimum Business Continuity Objective – the minimum level of services you need to provide to keep most of your customers happy.
- Example -Netflix slows down the capability of downloading or even stops some services, fully informing customers, of course.
- MTPD: Maximum Tolerable Period of Disruption – the amount of time that your business can stay in business without your primary services or products being available in either manual or automated fashion.
- Example – sometimes things are so bad that you need to say; sorry, but we are closed, and this is our plan to reimburse you.
- RTO: Recovery Time Objective – maximum time a service or product should be available following an incident.
- Example – our ATM’s are down, but you can perform financial transactions in a branch, Post Office, or designated partner (like a supermarket). People will accept this but only for a certain amount of time.
- RPO: Recovery Point Objective – how much data can you afford to lose or recreate before you no longer are a viable business?
- Example – We have been hacked, well, you know the rest!
- Do we feel you have resilience and reliability?
- Is your capability of continuity better than your competition?
- Is your staff assured that you have a job for them tomorrow?
- Do your customers feel that they can rely upon you to honestly inform them there is an issue, and this is when we expect to be back in service?
Things change: how do you prepare?
You don’t! Yeah, COVID is over; oh wait, a new variant. Darn! What will this mean to us as we still have distribution issues and staff are in a mixed working environment, and our partners are not entirely stable?
What you need are a few scenarios. Agile, DevOps, ITSM all benefit from a lean practice called Value Stream Mapping and Management (VSM).
- Pick a process or service and using no more than 15 post-it notes, describe it holistically from beginning to customer (or staff if an internal process like onboarding)
- Note the people, tools, meetings, forms, communications, approvals for each step
- Note the time of each step: actual time to complete a task and actual time for the action to end, which includes any waiting time
- Note the number of bounces back that each task undergoes: step 2 goes back to step 1 how many times on average or a percentage. Step 3 goes back to step 2 and 1, etc.
You now understand the service or product, which leaders can revise into a new way of working, as applied to normality and in a business continuity event.
If you perform the above at least three times, you will see an incredible amount of commonality amongst your main processes. Therefore, you now have the basics for the main scenarios you need to for a plan of action. Keep it simple, provide roles and responsibilities guidelines, fix the tools and data flow issues, upskill everyone, and share this with partners so they know their part in keeping you in business.
Technology in Business Continuity is your best friend and worse enemy
We rely on technology, but things break. That big telephone cable you connected your business to (the internet) and that place where much of your data and applications reside (the cloud) still breaks. If you can survive an issue for either occurrence, you are digitally transformed, as you have changed your work based on technology. Maybe you transfer work to a partner, perhaps you have a ready backup, maybe you can revert to manual processes or even just stop doing something for a while.
- Use ITSM practices such as monitoring and alerting: these concepts apply to more than just your applications as they can help staff and customers know there has been an event.
- Keep data (customer, documents, all of it) secure, backed up, and frequently archived.
- Benefit from Change Management practices that go from Demand to Live to ensure that the way people work is as safe and straightforward as necessary while still compliant and secure.
Testing does not start when someone codes something which they want to go live. Testing begins with the question of Should we do this? Yes, why? No, why not? Look at your VSM map. Each step requires a test to ensure completeness and reduce the number of backward bounces. Tests will impact the tools and people involved, so make the best of use of them. Tests will not always succeed, so plan how to address an issue by the next sprint.
Don’t forget to test with partners. Declare an event and see how they react.
Leaders your role
My first CEO led by example. He rarely was in meetings or his office. He loved walking around the bank talking to staff and customers. We were used to him arriving late at night with several pizzas and watching how his technology helped the bank. No matter the visit, he always asked: how can I help you keep us in business? He wrote down the ideas, and he thanked us for being candid.
- How often do you explain the metrics of continuity such that everyone knows their role in meeting them?
- Do you explain why this is the time or process required, in terms that reflect how the staff works or against customer expectations?
- Are you ready, as a leader, to help those that may be struggling mentally, as their issues could impact your ability as a business?
- Are you ready to have people help you?
Change your attitude and behaviour. Everyone copies leaders. You now have the beginnings of a business continuity culture. Test it by asking the question this article began with: Can you service and support staff and customers tomorrow? Enjoy the transformation