Use Cases and Project Plans are not Enough – A CIO’s guide to Failure Mode and Effects Analysis

When many of us started in healthcare IT, it was before the days of pervasive Electronic Medical Records.  The largest employer in Kansas City wasn’t Cerner.  Paper ruled the day in hospitals.  Data Processing was used to print up massive reports so that residents and doctors could accomplish their daily tasks.  PACS systems weren’t pervasive yet.  There were much less dependencies on electronic systems to get through the day.

With the numerous technical epochs that have occurred in rapid succession since the days of paper, the complexity of the technology that now powers a hospital or health system has increased exponentially.  With this increase comes a corresponding one for risk of failure.  With every change, we increase the probability of something failing.

Many of us have developed our careers in an environment where failure is not an option.  It has been a career limiter to many when adverse events with system implementations occur.  One item has come up repeatedly when I do post-mortem analyses.  We are dealing with systems of such complexity that failure at some point is inevitable, and we have to plan for it. 

The old management trope of penalizing people for project management failures is going to continue to drive away good people from Healthcare IT.  We are going to leave in its wake people so adverse to taking risk that we will continue to stagnate and not innovate.  We will continue to have security issues because we have managers not willing to address potential failure with applying patches or hardening systems.

We need to learn how to innovate and deal with failure.  We don’t need to outsource our innovation to our EMR providers or vendors.  That’s risk transference.  If we’re going to provide unique and innovative care, we need to own, not pass the buck.

This starts with planning for how to deal with failure.

What is FMEA?

According to the Institute for Healthcare Improvement (IHI), available here, Failure Modes and Effects Analysis (FMEA) is a systematic, proactive method for evaluating a process to identify where and how it might fail, and to assess the relative impact of different failures, in order to identify the parts of the process that are most in need of change.

This includes a review of the steps in the process, Failure Modes (i.e. What can go wrong?), Failure Causes (Why would this happen?), and Failure Effects (what would be the consequences of each one?).  When I have practiced this before, we also added staffing dependencies, system dependencies, and resultant actions.

The reasons for this were to make sure that given the actions taken that we were appropriately staffed to address what would occur, especially if there was a failure.  We also wanted to make sure that we had good alternate mode actions in place when there was an issue.  The FMEA also made sure that we were able to stop taking actions that would have further exacerbated events had they occurred in parallel after an adverse event.  It also served as a communication tool that helped explain in plain English what we were doing to our affected customers, many of which did not have a technology background.  Their jobs were now heavily dependent on these technologies, however.  Their expertise in Alternate Mode Operations was heavily needed.

I first learned of this from the Clinical Risk Management team I worked with at Temple University Hospital, headed by Charles Conklin.  The driving reason for wanting to implement this process was to plan for service recovery in case of an adverse event during a system upgrade.  It was also the same process the clinical teams used.  If we want to be respected by our customers, we need to use the same processes they do to manage change.  Information Services is no different.

Why did we do this?

We did this because project plans are overly optimistic.  If anything, they represent a best-case scenario that doesn’t reflect actual human nature.  They also manifest a philosophy of having to stick with a plan and not be able to deal with adversity.  We need to be able to think through a given plan, understand the complexities, and learn how to plan for when it goes awry.  We also have to plan to be able to accept and address a reasonable degree of residual risk, especially with distributed technologies.

With any large IT system, it’s just too large for one team to effectively manage and assume that everything is going to be OK when you upgrade.  There are multiple teams that support an EMR, Patient Portal, or Population Health system.  There are exponentially more stakeholders than 20 years ago.  We need to include these teams in the planning process, and plan for issues with dependent systems not under the direct control of Information Services.  The days of the CIO or CMIO having control of all of the dependent bits of a major system upgrade are over and have been for years. 

Implications for the New Era

As we start implementing new technologies such as intelligent systems (Artificial Intelligence, Machine Learning, Deep Learning), Internet of Medical Things (IoMT) devices, 5G, Robotic Process Automation, and Smart Hospitals, we need to change how we manage change with them.  COVID-19 and the mass implementation of Telemedicine and Telework, combined with massive cybersecurity threats, also demonstrate the need to change.   Finally, the 21st Century CURES Act Final Rule, and its implications for patient access to data also require change in how we manage service delivery.

We’re not always going to be able to update 100% of devices.  We need to plan for imperfections, lack of communication, staggered updates, and errors in systems we do not control.  We have to accept residual risk.  We also need to ensure that upgrades and changes protect the integrity of data and systems.  We need to include service recovery as part of the plan.  Without FMEA, we’re not going to know where to include it.

With greater complexity comes the need to evolve.  Instead of being the head of the group that manages technology, CIOs have to adapt to become the group that leads technology change as part of the overall strategic mission of the organization.  The actual technical implementations have been migrating to the cloud for years.  While there are still teams focused on that, the main focus of many IS departments is on configuration and usage of applications and services to meet the service delivery and strategic needs of the organization.

Part of leading change is understanding that the only law that applies is Murphy’s.  We need to develop initial project plans, run them through FMEA with our stakeholders, and then modify project and staffing plans.  We need to accommodate issues discovered in the process and include service recovery steps with appropriate staffing and communication plans.


The world is rapidly changing around us.  The primary role of the CIO has changed from technology maintainer to strategic change management.  The complexity of systems has exponentially increased over the past two decades, and now will include patient-facing data interfaces.  We need to plan to address diverse changes strategically.  We need to include stakeholders who have not been included previously.  The cybersecurity threats we face underscore the burning platform we are on, and demand we take action.

Most important, we need to plan to fail.  The old methods of demanding linear plans and expecting perfection don’t work in these new environments, especially given our current threat landscape.  These are Sisyphean goals that no one will meet.  Instead of causing disengagement by expecting them to be perfect, we need to build resilience with our teams by teaching them to look for what can and will go wrong, and how to deal with those occurrences. 

We will develop better leaders who can adapt to a rapidly changing landscape by incorporating Failure Mode and Effects Analysis into our management processes.  We don’t know what the next twenty years will bring in Healthcare IT.  However, we can better plan for them using it than we did the past ones.