How Much Patient Data Do We Truly Need?

As the demands placed on healthcare data increase, the drive to manage it effectively has of course grown as well. This has led to the collection of mammoth quantities of data — one trade group estimates that U.S. hospitals will manage 665 terabytes of data during 2015 alone — but not necessarily better information.

The assumption that we need to capture most, if not all, of a patient’s care history digitally is clearly driving this data accumulation process. As care moves into the digital realm, the volume of data generated by the healthcare industry is climbing 48% percent per year, according to one estimate. I can only assume that the rate of increase will grow as providers incorporate data feeds from mHealth apps, remote monitoring devices and wearables, the integration of which is not far in the future.

The thing is, most of the healthcare big data discussions I’ve followed assume that providers must manage, winnow and leverage all of this data. Few, if any, influencers seem to be considering the possibility that we need to set limits on what we manage, much less developing criteria for screening out needless data points.

As we all know, all data is not made equal.  One conversation I had with a physician in the back in the early 1990s makes the point perfectly. At the time, I asked him whether he felt it would be helpful to put a patient’s entire medical history online someday, a distant but still imaginable possibility at the time. “I don’t know what we should keep,” he said. “But I know I don’t need to know what a patient’s temperature was 20 years ago.”

On the other hand, providers may not have access to all of the data they need either. According to research by EMC, while healthcare organizations typically import 3 years of legacy data into a new EMR, many other pertinent records are not available. Given the persistence of paper, poor integration of clinical systems and other challenges, only 25% of relevant data may be readily available, the vendor reports.

Because this problem (arguably) gets too little attention, providers grappling with it are being forced to to set their own standards. Should hospitals and clinics expand that three years of legacy data integration to five years? 10 years? The patient’s entire lifetime? And how should institutions make such a decision? To my knowledge, there’s still no clear-cut way to make such decisions.

But developing best practices for data integration is critical. Given the costs of managing needless patient data — which may include sub-optimal outcomes due to data fog — it’s critical to develop some guidelines for setting limits on clinical data accumulation. While failing to collect relevant patient data has consequences, turning big data into astronomically big data does as well.

By all means, let’s keep our eye on how to leverage new patient-centric data sources like wearable health  trackers. It seems clear that such data has a role in stepping up patient care, at least once we understand what part of it is wheat and which part chaff.

That being said, continuing to amass data at exponential rates is unsustainable and ultimately, harmful. Sometimes, setting limits is the only way that you can be sure that what remains is valuable.

About the author

Anne Zieger

Anne Zieger

Anne Zieger is a healthcare journalist who has written about the industry for 30 years. Her work has appeared in all of the leading healthcare industry publications, and she's served as editor in chief of several healthcare B2B sites.


  • You have to watch out for the railroad analyst who can tell you the number of ties between New York and Chicago but not when to sell Penn Central.


  • Your article presents an interesting question about data management and visualization. First off, the amount of digital data we gather on patients will only increase and with machine learning and device interfaces, we will only accumulate more and not less data so putting a limit on the quantity of data to be accumulated would not be feasible or desirable.
    However, the amount of data that should be presented to a clinician for day to day decision making in the care of the patient should be carefully managed. I agree that a physician would most likely not be interested in the temperature of a patient 15 years ago in an office visit for a broken arm but a research clinician might go back 15 years to find if there might be a long term correlation with body temperature and falls for example.
    The cost of storing data is becoming increasingly cheaper, however the cost of making sense of all the data is what is going up. To save clinician’s time and increase effective use of data, organizations would have to make decisions on the key data points to show and how far back the EMR system should look back for such data.
    Using some meaningful use core measures is a good start, for instance showing all the medications that the patient is currently taking and making provision to check a box and see all the medications that the patient had been on (for the last 5 years for example) and having drill down information on when the medication was prescribed and discontinued etc is very helpful if clinicians wish to dig deeper.
    When there is need to make decisions on overall populations, the accumulated data going back several years on multiple patients would give clinicians more information that could be analyzed (by data scientists) and appropriately presented to allow clinicians to make the best decisions based on the data available.

Click here to post a comment