Understanding Personal Health Data: Not All Bits Are the Same (Part 2 of 4, Personal Data and Media Content)

The previous segment of this article introduced the notion that many types of data on the Internet, including personal health data, come entangled with constraints on how we can store, share, and use it. I’ll examine two more types of data–personal data and media content–in this article, and government information in the next.

Personal Data

The photos, status updates, hotel reviews, and other personal postings we upload daily constitute a huge repository of data, along with a huge market. This section talks about the melange of information that determined seekers can find about us online: usually things we voluntarily offer through Facebook, Instagram, etc., but also things that others say about us and “data exhaust” generated by our purchases and other activity that companies and governments track. When we go online, we tend to present the sides of ourselves we would like others to know about–but we don’t always understand what we’re revealing about our predilections, prejudices, and drives.

A 2012 McKinsey report suggests that social technologies offer anywhere from $900 billion to $1.3 trillion in annual value — and that’s just counting four industries (page 9 of the report).

So our personal data clearly has value. However, there are qualifications to this value. The problem is that no one is tasked with making sure the information is correct. People enter lies and distorted versions of their life events to social networks all the time. Marketers and other data-slurping companies hope that the inaccuracies work themselves out during big-data processing. But that assumes that the truth lies in there somewhere (a dubious proposition) and that sophisticated data mining techniques can eliminate inaccurate outliers.

Ownership is a curious and fascinating question for personal data. Do you “own” the data item indicating that you just purchased a shirt from Everlane? Proponents of vendor relationship management would say yes. These Internet reformers would like consumers to be in charge of the data related to their transactions, and would like companies that want to use such data for marketing or planning to pay customers. Others would argue that Everlane has just as much a right to the data as you do — you are both parties to a transaction.

As I have indicated elsewhere, ownership is a slippery concept, even when you generate it yourself. When I take photos of friends, they often ask me not to post the pictures to Facebook. I respect this, treating them as owners of their digital images. It’s interesting, incidentally, that this question of intrusive photo-taking underlies the seminal work on privacy: the 1890 Harvard Law Review article by Warren and Brandeis.

Currently, ownership is something of a Wild West where anyone who gets your personal data can use it, unless you have explicitly put it under license. So protection — the third trait of Internet data I address throughout this article — is weak and oft trampled on in personal data. I think we all want to protect personal health data from this situation, a theme I’ll return to when we get to that section of the article.

Media Content

Because I work for a publisher — and one particularly prescient in its adaptation to the wired world — I have participated in many discussions of media content. I’m talking here of things that aren’t just thrown on the open Internet, like articles on this Radar blog, but are hidden behind walls that you can enter only after paying, or at least by entering an email address and some personal information such as the size and industry of your company. Your email address is tremendously useful to the company providing the content, whether they use it to shove ads at you, sell information to vendors, or determine what future content to produce.

Is media content valuable? Certainly it is, thanks to the years of expertise and hours of effort invested by those who created and curated it. Note that in the previous section, I cited a McKinsey report. I didn’t spend hours vetting the report or checking McKinsey’s credentials. I relied on their reputation as a key source of information in the tech industry — an example of the value created by trusted content sites.

This confirms the dictum that information on the Internet wants to be expensive, as famously said by Stewart Brand. That’s why many people spend good money to access news sites and online books, and other people go to great efforts to get it for free.

The question of ownership is resolved by copyright law, but in ways that are not entirely compatible with the Internet. For instance, many researchers would often love to share their papers with all who want them, but the publishers usually own the content and place restrictions on such sharing. Luckily, many academic publishers now allow authors to place early pre-publication drafts online for free download. I can locate a free copy of most research articles by entering the title and author names into a search engine.

Indeed, when we talk about “owning” data, we fall into a trap prepared by large corporate interests who depend upon notions of Intellectual Property to maintain their income flows. I am not opposed to the exercise of copyrights, patents, and trademarks, but I worry about the extension of these carefully defined concepts to a larger context where casual references to property and (as a consequence) ownership in are at best unhelpful and at worst meaningless.

Protection is also a controversial topic hre. Many publishers (but definitely not my company, O’Reilly Media) take extraordinary efforts to protect data, notably digital rights management, which I cover in other articles. It’s notable that no laws restrict you from downloading software from the Internet to make a gun, but severe laws punish not just downloading copyrighted content, but offering tools that let people break the digital rights management on that content.

Further segments of this article will continue to explore Internet information and its meaning for the health care field.

About the author

Andy Oram

Andy Oram

Andy Oram writes and edits documents about many aspects of computing, ranging in size from blog postings to full-length books. Topics cover a wide range of computer technologies: data science and machine learning, programming languages, Web performance, Internet of Things, databases, free and open source software, and more. My editorial output at O'Reilly Media included the first books ever published commercially in the United States on Linux, the 2001 title Peer-to-Peer (frequently cited in connection with those technologies), and the 2007 title Beautiful Code. He is a regular correspondent on health IT and health policy for HealthcareScene.com. He also contributes to other publications about policy issues related to the Internet and about trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business.


Click here to post a comment