The Burden of Structured Data: What Health Care Can Learn From the Web Experience (Part 2 of 2)

The first part of this article summarized what Web developers have done to structure data, and started to look at the barriers presented by health care. This part presents more recommendations for making structured data work.

The Grand Scheme of Things
Once you start classifying things, it’s easy to become ensnared by grandiose pipe dreams and enter a free fall trying to design the perfect classification system. A good system is distinguished by knowing its limitations. That’s why microdata on the Web succeeded. In other areas, the field of ontology is littered with the carcasses of projects that reached too far. And health care ontologies always teeter on the edge of that danger.

Let’s take an everyday classification system as an example of the limitations of ontology. We all use genealogies. Imagine being able to sift information about a family quickly, navigating from father to son and along the trail of siblings. But even historical families, such as royal ones, introduce difficulties right away. For instance, children born out of wedlock should be shown differently from legitimate heirs. Modern families present even bigger headaches. How do you represent blended families where many parents take responsibilities of different types for the children, or people who provided sperm or eggs for artificial insemination?

The human condition is a complicated one not subject to easy classification, and that naturally extends to health, which is one of the most complex human conditions. I’m sure, for instance, that the science of mosquito borne diseases moves much faster than the ICD standard for disease. ICD itself should be replaced with something that embodies semantic meaning. But constant flexibility must be the hallmark of any ontology.

Transgender people present another enormous challenge to ontologies and EHRs. They’re a test case for every kind of variation in humanity. Their needs and status vary from person to person, with no classification suiting everybody. These needs can change over time as people make transitions. And they may simultaneously need services defined for male and female, with the mix differing from one patient to the next.

Getting to the Point
As the very term “microdata” indicates, those who wish to expose semantic data on the Web can choose just a few items of information for that favored treatment. A movie theater may have text on its site extolling its concession stand, its seating, or its accommodations for the disabled, but these are not part of the microdata given to search engines.

A big problem in electronic health records is their insistence that certain things be filled out for every patient. Any item that is of interest for any class of patient must appear in the interface, a problem known in the data industry as a Cartesian explosion. Many observers counsel a “less is more” philosophy in response. It’s interesting that a recent article that complained of “bloated records” and suggested a “less is more” approach goes on to recommend the inclusion of scads of new data in the record, to cover behavioral and environmental information. Without mentioning the contradiction explicitly, the authors address it through the hope that better interfaces for entering and displaying information will ease the burden on the clinician.

The various problems with ontologies that I have explained throw doubt on whether EHRs can attain such simplicity. Patients are not restaurants. To really understand what’s important about a patient–whether to guide the clinician in efficient data entry or to display salient facts to her–we’ll need systems embodying artificial intelligence. Such systems always feature false positives and negatives. They also depend on continuous learning, which means they’re never perfect. I would not like to be the patient whose data gets lost or misclassified during the process of tuning the algorithms.

I do believe that some improvements in EHRs can promote the use of structured data. Doctors should be allowed to enter the data in the order and the manner they find intuitive, because that order and that manner reflect their holistic understanding of the patient. But suggestions can prompt them to save some of the data in structured format, without forcing them to break their trains of thought. Relevant data will be collected and irrelevant fields will not be shown or preserved at all.

The resulting data will be less messy than what we have in unstructured text currently, but still messy. So what? That is the nature of data. Analysts will make the best use of it they can. But structure should never get in the way of the information.

About the author

Andy Oram

Andy Oram

Andy Oram writes and edits documents about many aspects of computing, ranging in size from blog postings to full-length books. Topics cover a wide range of computer technologies: data science and machine learning, programming languages, Web performance, Internet of Things, databases, free and open source software, and more. My editorial output at O'Reilly Media included the first books ever published commercially in the United States on Linux, the 2001 title Peer-to-Peer (frequently cited in connection with those technologies), and the 2007 title Beautiful Code. He is a regular correspondent on health IT and health policy for He also contributes to other publications about policy issues related to the Internet and about trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business.