The Burden of Structured Data: What Health Care Can Learn From the Web Experience (Part 1 of 2)

Most innovations in electronic health records, notably those tied to the Precision Medicine initiative that has recently raised so many expectations, operate by moving clinical information into structure of one type or another. This might be a classification system such as ICD, or a specific record such as “medications” or “lab results” with fixed units and lists of names to choose from. There’s no arguing against the benefits of structured data. But its costs are high as well. So we should avoid repeating old mistakes. Experiences drawn from the Web may have something to teach the health care field in respect to structured data.

What Works on the Web
The Web grew out of a structured data initiative. The dream of organizing information goes back decades, and was embodied in Standard Generalized Markup Language (SGML) years before Tim Berners-Lee stole its general syntax to create HTML and present information on the Web. SGML could let a firm mark in its documents that FR927 was a part number whereas SG1 was a building. Any tags that met the author’s fancy could be defined. This put semantics into documents. In other words, the meaning of text could be abstracted from the the text and presented explicitly. Semantics got stripped out of HTML. Although the semantic goals of SGML were re-introduced into the HTML successor XML, it found only niche uses. Another semantic Web tool, JSON, was reserved for data storage and exchange, not text markup.

Since the Web got popular, people have been trying to reintroduce semantics into it. There was Dublin Core, then RDF, then microdata in places like schema.org–just to list a few. Two terms denoting structured data on the Web, the Semantic Web and Linked Data, have been enthusiastically taken up by the World Wide Web Consortium and Tim Berners-Lee himself.

But none of these structured data initiatives are widely known among the Web-browsing public, probably because they all take a lot of work to implement. Furthermore, they run into the bootstrapping problem faced by nearly all standards: if your web site uses semantics that aren’t recognized by the browser, they’re just dropped on the ground (or even worse, the browser mangles your web pages).

Even so, recent years have seen an important form of structured data take off. When you look up a movie or restaurant on a major search engine such a Google, Yahoo!, or Bing, you’ll see a summary of the information most people want to see: local showtimes for the movie, phone number and ratings for a restaurant, etc. This is highly useful (particularly on mobile devices) and can save you the trouble of visiting the web site from which the data comes. Google calls these summaries Rich Cards and Rich Snippets.

If my memory serves me right, the basis for these snippets didn’t come from standards committees involving years of negotiation between stake-holders. Google just decided what would be valuable to its users and laid out the standard. It got adopted because it was a win-win. The movie theaters and restaurants got their information right into the viewer’s face, and the search engine became instantly more valuable and more likely to be used again. The visitors doing the search obviously benefitted too. Everyone found it worth their time to implement the standards.

Interestingly, as structure moves into metadata, HTML itself is getting less semantic. The most recent standard, HTML5, did add a few modest tags such as header and footer. But many sites are replacing meaningful HTML markup, such as p for paragraph, with two ultra-generic tags: div for a division that is set off from other parts of the page, and span for a piece of text embedded within another. Formatting is expressed through CSS, a separate language.

Having reviewed a bit of Web history, let’s see what we can learn from it and apply to health care.

Make the Customer Happy
Win-win is the key to getting a standard adopted. If your clinician doesn’t see any benefit from the use of structured data, she will carp and bristle at any attempt to get her to enter it. One of the big reasons electronic health records are so notoriously hard to use is, “All those fields to fill out.” And while lists of medications or other structured data can help the doctor choose the right one, they can also help her enter serious errors–perhaps because she chose the one next to the one she meant to choose, or because the one she really wanted isn’t offered on the list.

Doctors’ resentment gets directed against every institution implicated in the structured data explosion: the ONC and CMS who demand quality data and other fields of information for their own inscrutable purposes, the vendor who designs up the clunky system, and the hospital or clinic that forces doctors to use it. But the Web experience suggests that doctors would fill out fields that would help them in their jobs. The use of structured data should be negotiated, not dictated, just like other innovations such as hand-washing protocols or checklists. Is it such a radical notion to put technology at the service of the people using it?

I know it’s frustrating to offer that perspective, because many great things come from collecting data that is used in analytics and can turn up unexpected insights. If we fill out all those fields, maybe we’ll find a new cure! But the promised benefit is too far off and too speculative to justify the hourly drag upon the doctor’s time.

We can fall back on the other hope for EHR improvement: an interface that makes data entry so easy that doctors don’t mind using structured fields. I have some caveats to offer about that dream, which will appear in the second part of this article.

About the author

Andy Oram

Andy Oram

Andy Oram writes and edits documents about many aspects of computing, ranging in size from blog postings to full-length books. Topics cover a wide range of computer technologies: data science and machine learning, programming languages, Web performance, Internet of Things, databases, free and open source software, and more. My editorial output at O'Reilly Media included the first books ever published commercially in the United States on Linux, the 2001 title Peer-to-Peer (frequently cited in connection with those technologies), and the 2007 title Beautiful Code. He is a regular correspondent on health IT and health policy for HealthcareScene.com. He also contributes to other publications about policy issues related to the Internet and about trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business.

   

Categories