Implementing and benefiting from service management processes is most definitely a marathon, not a sprint, with value to be gained along the way. In my travels as a consultant, I see varied levels of maturity and struggles faced by organizations. We all continue to look for high availability and outage-free days. But, in some organizations, daily practices that surround the processes may often prevent success. Continual service improvement is the name of the game, but knowing where to look first may help achieve the operational success you’re looking for. This two-part blog addresses two common processes: Change Management and Problem Management, because of their impact on service operations. In this first part of the series I will address Change Management.
One of the biggest culprits is often change management, or as I sometimes call it change control. So, what’s the difference between change management and change control? First, you won’t find change control in the ITIL books. I call it change control when an organization has a CAB meeting to discuss changes coming up in the next week or two with no real planning before deploying the change. This is an immature aspect of the process, when organizations are first getting their arms around the changes being made just before execution in an attempt to “control” outages. People are basically announcing what’s coming, ensuring changes don’t collide and making sure that people know what’s coming up. If this is your current level of maturity, you’re likely trying to control change and doing a poor job of it.
I distinguish Change Management from Change Control by the word “management.” At this level of maturity, you’re looking to achieve the value of the process, by managing change and the risks associated with change in a way that minimizes and successfully predicts risk. To be effective, change management needs to begin early (typically when the request to make a change to a service is initially made), and right size the rigor based on the scale and risk associated with the proposed change. Let’s look at these two aspects of change management.
When the change management process is engaged for introduction of a new or significantly changed service and is started early, it includes some level of joint design work done by a cross-functional team, who look at architecture, the applications and data as well as infrastructure and service level/support requirements before establishing the budget for the project. This is incorporated into the design of the product, before final approval and chartering of the work. In this sense, the change is being managed from the beginning of the process, ensuring it is built appropriately for its service levels (remember Service Design?).
While the service is being built and tested, deployment planning is being performed, including a risk assessment. Performing the risk assessment early provides an opportunity to mitigate risks during the service transition period of the project, ensuring a higher level of success when deployment begins. Finally, when the product is ready to deploy, the early involvement of a cross-functional team provides a much higher liklihood of success.
It’s also worth looking at right-sizing the rigor behind the change management process to ensure each change receives an appropriate level of review. You can even swap the word rigor with integrity. The point to managing change is to lower risk, yet sometimes change management practices force too much rigor for low risk changes and too little for high risk changes. In my experiences, both directly as head of the team that managed change and as a consultant, I see two common indicators of an immature process:
- Difficulty scheduling changes (dates getting pushed week to week)
- Changes spanning a long period of time, with little management
The latter is often associated with large infrastructure maintenance or refreshes and due to their nature require a special level of planning so people know which device(s) will be touched each day for the duration of the change. Many organizations use a blanket change and attached of configuration items (CIs) and then do the work at will over a period of weeks or even months. There are two issues with this: first, in the event of a failure, the Service Desk or Operations may not be able to associate the resulting outage with the blanket change being performed and second, the CMDB will not be properly updated unless each Configuration Item is actually added to the change.
This can be resolved by building a release process for sweeping infrastructure changes. This could involve opening a parent change and authorizing it for the full-time frame during which work will be performed and then by opening a child change for each group of devices, indicating the schedule and specific CIs impacted. To plan each change of this nature appropriately a “dress rehearsal” or deployment test could be leveraged, to see how quickly the change can be executed, enabling the coordinator to predict how many CIs can be touched each day. A set schedule can then be set and the child changes that are part of the release can be recorded, inheriting the approval of the parent change. Thus, the change is approved once but the schedule and CI updates are managed across several recorded changes.
The other issue, deployment dates being a moving target sometimes occurs because of tool configuration that assumes the deployment date is known when the record is logged. This too is a mark of an immature process as the process (and tool configuration) should encourage changes to be logged during the deployment planning stages. At this point, testing is still occurring and the actual deployment date isn’t yet known. If the tool is configured to require users to enter the planned start/end dates when saving the change, the tool forces them to wait until the date is known. Thus, the tool drives one of two possible types of behavior: entering changes at the last minute (as in the “change control” scenario) or picking a placeholder date and sliding it until the deployment date is settled. Needless to say, if you’re in this situation, removing the requirement will resolve the issue. The proper time for this requirement is when the change is submitted for approval of the deployment schedule. This allows the change to be logged during the service design stage, documentation to be collected and risk to be managed early. When the product is ready to be deployed, the final deployment date is added and the change submitted for approval.
CAB windows may also cause dates to slip if they are two far in advance of the deployment. If the window for submission is too many days in advance of the meeting or scheduled change date, it’s possible that all of the required planning documentation is not yet ready when approval is being requested. Thus, the change owner may be forced to push the deployment out a week in order to complete the documentation. This is indicative of a process that lacks responsiveness and is also somewhat old-fashioned, when each change had to be manually reviewed by a change analyst to ensure it was ready to come before the CAB. With more robust tools, required fields and automated business owner approvals can cut down the manual effort, thus shortening the window. Add better CAB agenda management capabilities within some tools and the window can be shorted, possibly running up until end of the business day before the CAB meeting.
Another solution to this issue can be lowering the number of changes that require full CAB approval. As suggested in my blog, “Do we still need a CAB,” organizations should be looking at which changes can bypass the CAB due to being low risk or fully automated deployments, making it easier to manage those that need full CAB approval.
So where do you stack up? Ask yourself a few questions:
- Do we have a significant number of service interruptions during change deployment? or What percentage of incidents is caused by changes or unauthorized changes?
- Do deployment dates keep slipping?
- When a change does have an impact, was the service desk aware of the change before execution so the impact could be mitigated?
- When a “blanket” change (infrastructure refresh) is being performed, if there is an impact is the Service Desk able to access a schedule to see if the CI with the issue was updated the prior day/night?
Even if your process is working well, surveying both the business and people who deploy changes as to the effectiveness of the process can help you build an improvement program for this process.
Read the second instalment of this series tomorrow when I look at problem management.