There’s a groundswell of opinion throughout health care that to improve outcomes, we need to share clinical data from patients’ health records with researchers who are working on cures or just better population health measures. One recommendation in the much-studied JASON report–an object of scrutiny at the Office of the National Coordinator and throughout the field of health IT–called on the ONC to convene a conference of biomedical researchers.
At this conference, presumably, the health care industry will find out what researchers could accomplish once they had access to patient data and how EHRs would have to change to meet researchers’ needs. I decided to contact some researchers in medicine and ask them these very questions–along with the equally critical question of how research itself would have to evolve to make use of the new flood of data.
Results of the interviews suggest that systems will have to change on many levels to reap the rich harvest promised for research. Doctors and their EHRs must both tighten up their methods of collecting and recording data. Researchers need to learn what’s popularly called Big Data techniques. And the computer systems that store and process the data have to adapt to the enormous sizes of “omics” data they are receiving.
Byron J. Ruth is one of the researchers already cavorting in the lush fields of EHR data. He is a Lead Analyst/Programmer in the Center for Biomedical Informatics at the Children’s Hospital of Philadelphia (CHOP). Ruth assigns to EHR vendors the primary task of data exchange, a big push by the ONC. He points out that data sharing can lead to larger cohorts–regional or national data instead of data from a single institution–and thereby benefit from bigger data sets, which mean less bias.
For instance, there are few samples of rare conditions in any one region, but a clinical decision system can store information on diseases from far-flung areas and warn doctors of risks such as Ebola outbreaks.
John Wilbanks, who promotes the sharing and advanced processing of health care data through Sage Bionetworks, reports hearing several common objections from researchers he is trying to persuade to take EHR data into account. These are all valid, but there are ways to compensate for them.
EHR data is not specific enough (except genomic data).
EHRs contain too many errors.
EHR data is aimed at treatment and billing rather than research.
Most EHRs are still incapable of generating structured, well-coded data that is useful to researchers. The ONC has made great strides in promulgating structured data exchange standards–Blue Button for structured data and Blue Button Plus for an API–but these are only beginning to be adopted in scattered places.
Wilbanks thinks EHR data is still invaluable, because it contains hard facts such as lab reports as well as expert opinions. Statistical techniques can compensate somewhat for the weakness, but clinicians need workflows more conducive to accurate data collection. The single change that would most reduce errors would be to keep data in the hands of the patients. They are the ones who most often discover and fix errors.
More generally, researchers’ objections reflect the challenge of using Big Data: one has to search through a diverse, inconsistently coded, dirty agglomeration of facts and use statistical techniques to do such things as eliminate outliers and find data sharing common charactertistics. Data scientists with these skills are entering the health field and generating useful findings, so eventually the more traditional clinical researchers will learn these techniques or hook up with those who know them.
Dr. Maxim Mikheev, CTO and co-Founder of BioDatomics, highlighted the computer networking problems created by the size of genomes. He’s glad to see repositories swell with genomic data, but they are far too large to download over the networks available to most researchers. Storage is also a problem.
Ruth encountered this problem on a project called the HeartSmart Pediatric Cardiac Genomics Consortium (PCGC). They were able to continue exchanging data by upgrading their Internet connections. But Ruth and Dr. Mikheev both recognize that a more robust solution is to keep data on the system where it was generated and bring the program to the system. The National Cancer Institute has started a Cancer Genomics Cloud Pilots project that runs three data centers hosting the genomic data and running programs uploaded by researchers.
The final hurdle to data sharing is the willingness of researchers to do so. Wilbanks is dealing with this at Sage Networks on a daily basis. Ruth says it is hard to achieve even within CHOP. “One of the other challenges with any kind of data sharing among researchers is that no one really trusts anyone else,” he writes.”Basing studies on other people’s work is a relatively bold move, especially if you do not have access to the data used for that previous work.” Part of the solution, Ruth says, is to record data provenance, “which can be summed up as the who, what, where, why, and how some data came to be.”