True Healthcare Data Integration: Leaving your Data Lake Behind

The following is a guest article by Mike Noshay, Chief Strategy & Marketing Officer, Verinovum

According to a recent report from Gartner, “U.S. Healthcare Payer CIOs Should Avoid Data Lake Mistakes with Clinical Data Integration,” by Mandi Bishop (October 2019), “Payer CIOs want to know whether data lakes can deliver quick wins for business leaders hungry to derive actionable insights from unstructured and nonstandard clinical data. To avoid drowning in data, CIOs must first specify goals and critically compare internal capabilities to vendor solutions.”

We agree with this assessment and believe that the same holds true for healthcare organizations as well. Following, we outline why the data lake method can no longer hold up in today’s healthcare environment, and offer as an alternative a four-step, interconnected framework that can help you make the most of your data.
1. Start with the End in Mind
The first challenge many organizations come face-to-face with when looking at a data lake is the realization that they don’t have confidence in the quality and completeness of all possible data for all possible use cases or scenarios. It’s important to think about what use cases you will want to focus on at the outset based on what’s most important to your stakeholders (whether that means the C-suite, care providers, or patients). Determine what data you are likely to need and for what purpose:

  • As a payer, are you focused on use cases that support population health or HEDIS measures?
  • As an ACO, are you focused on information that will help you in risk mitigation and Medicare Shared Savings Programs?
  • As a hospital, are you focused on value-based care programs such as improving STAR ratings and on reducing preventable hospital readmissions?
  • As a healthcare system, are you focused on MIPs/MACRA and use cases to improve quality of care?

The best way to set up your data thoughtfully and strategically is to think about your end goals when you first receive the data and work backwards from there.

Data that has been ”loaded” or “integrated” into a data lake provides the illusion of an asset that you can use quickly with a high degree of confidence. Many organizations start with a data lake and assume that “someone else” – a data scientist, perhaps – will be the one sifting through the information later to find what they need for any given use case. This type of postprocessing or late-binding data science is a never-ending cycle of data quality that is both costly and potentially insurmountable given organizational resource constraints.

As the previously mentioned Gartner report notes, “Many of our conversations with payer clients regarding their planned uses for clinical data begin the same way: ‘We just want to get the data and then see how it could be applied.’”

Yet we believe that those with the foresight to anticipate use cases at the outset, based on near- and long-term goals, have the highest chance for success. However, if you don’t quite know what all your use cases are at this point or haven’t narrowed them down, working through the other steps in the framework we’re recommending will get you there.

Best Practices in Action:
Knowing that data supporting diabetic care, for example, will be important for your organization is a great place to start. With that information, you identify and cleanse the data you’ll need before it is stored. As you add new use cases and shift priorities, you can carry out similar cleansing tactics across both historical and future data assets. By constantly pushing updates on the data, you can be sure you’re working with the most comprehensive, complete, clean data asset possible at a given point in time.

2. Retain Everything and Remain Flexible
Though you’ve started with the end in mind by realizing what use cases are currently most important to your organization, it’s still crucial to retain all the raw data, even if you don’t use it right away. The fact is, even if you don’t think the data you’re integrating up front has any value for current use cases, goals change.

In healthcare, our goals are a moving target. Retaining all the information and cleansing the data in support of specific use cases incrementally, noting what has been cleansed and what has not along the way, allows you to course-correct as needs arise. This gives you the agility and the flexibility necessary for success in an industry that changes so often and so quickly.

Best Practices in Action:
Our hypothetical payer has cleansed certain data up front to make it easy to look for data supporting diabetic care, such as HbA1c data for population health management. Nonetheless, it’s important that they retain all the data, even if it doesn’t seem necessary for that particular use case at the time. If the payer decides to shift focus down the road because risk mitigation or value-based care becomes more important, for example, different segments of data may become necessary and business-critical.

3. Don’t Pollute the Data
The most common mistake we see healthcare organizations making with data lakes is that they dump all the information in—essentially polluting the data—with no sense of what data is important to drive outcomes. This assumption is further hampered by a desire to get answers to an infinite number of questions without a clear line of sight on which data has been cleansed and curated and which has not.

As the Gartner report we referred to previously says, “CIOs often invoke data lakes as a means to accelerate value realization from clinical data. While data lakes were designed to handle variable, voluminous and high-velocity data, architecture alone does not solve clinical data’s complexities — many of which are process-driven and require clinical expertise to resolve. Payer IT, analytics and informatics organizations typically do not have the combination of data science skills and clinical knowledge needed to normalize and derive meaning from blended clinical data sources. Solution partners offer this capability ‘unicorn’ as a service.”

We recommend being more thoughtful from the very beginning. That means realizing that, instead of focusing on cleansing and enriching thousands of data elements for hundreds of use cases, you should focus on the specific use cases that are most valuable to you (refer back to step 1). While it is acceptable – and indeed advisable – to “retain everything” as we’ve suggested, it’s important to store and retain information related to certain use cases in a “roped off” area of the lake reserved for cleansed data, while the other information is stored separately and retained for possible future use.

This strategy will provide you with a higher level of confidence when running analysis, thereby leading to better results.

Best Practices in Action:
Consider, for example, a payer that is looking for population health data on its members with co-morbid disease states of diabetes and heart disease. A primary use case would be to review members’ HbA1c, blood pressure, and LDL data over time. If the payer started by dumping all its unclean data into a data lake without considering the importance of this use case, essentially polluting the data, it would be much more costly and less efficient to gain actionable insights from the data available. Furthermore, it is difficult, if not impossible, to develop a complete member/patient picture by looking at data elements that are neither clearly defined nor standardized.

4. Do the Hard Work Up Front
Our last recommendation means shifting your mindset to focus on the content rather than the structure or format of your data. Doing the hard work up front does not suggest employing a rigid data integration approach; doing so will delay time to value and limit flexibility downstream. Instead, consider solutions that can extract value from content within available message streams without requiring the entire message to be pristine. Quality scoring and incrementally improving upon the actual clinical value of the information you have will help you to deliver more accurate results.

Best Practices in Action:
Consider an instance where an available data stream has valuable demographic and lab information but has inaccurate or incomplete provider details. By interrogating quality and processing pieces of data streams that are important today, you can build value early and often, while providing a more manageable remediation process. The result of doing the hard work up front to generate clean and curated data now means that a complete view of a patient being treated, along with their completed HbA1c test and its results, can inform treatment plans and interventions. This affords a higher quality of care for patients in a more efficient manner. Bringing this full circle, the impact of clean, structured, and curated data manifests itself in the form of reduced chart chasing, reduced costs, and improved performance against value-based contract measures.

Finding the Right Partner
By thinking carefully and strategically about your data – how you store it, when you’ll need it, and what you’ll need it for – you are empowered to increase efficiencies and make better clinical and business decisions. Finding the right technology partner for data cleansing and curation can go a long way towards that goal.

The aforementioned Gartner report, which recognizes Verinovum as a Representative Vendor for clinical data integration, states, “Your business objectives are clearly aligned to CDI initiatives, and the IT organization is ready to enable clinical data for enterprise application use. However, it is critical to compare commercial vendor solutions to internal capabilities before diving into a data lake investment.”

The truth is that “data curation” means more than “data management.” The right data curation partner will have a technology platform that offers data integration, evaluation, curation, identity resolution, storage, and delivery, which leads to actionable business decisions that allow organizations to deliver on a variety of use cases.

Discover what your data can do.

About Mike Noshay

As Chief Strategy & Marketing Officer of Verinovum, Mike is responsible for all prospective client interactions, including client success and support, strategic planning, marketing, and business development. Mike has an extensive background in finance, business, and entrepreneurship. He was previously Director of Business Development & Innovation for Oklahoma’s Health Information Exchange, MyHealth Access Network, and is also a former Teach for America Corps member. Verinovum is a proud sponsor of Healthcare Scene.