Not So Open: Redefining Goals for Sharing Health Data in Research

The following is a guest blog post by Andy Oram, writer and editor at O’Reilly Media.

One couldn’t come away with more enthusiasm for open data than at this month’s Health Datapalooza, the largest conference focused on using data in health care. The whole 2000-strong conference unfolds from the simple concept that releasing data publicly can lead to wonderful things, like discovering new cancer drugs or intervening with patients before they have to go to the emergency room.

But look more closely at the health care field, and open data is far from the norm. The demonstrated benefits of open data sets in other fields–they permit innovation from any corner and are easy to combine or “mash up” to uncover new relationships–may turn into risks in health care. There may be better ways to share data.

Let’s momentarily leave the heady atmosphere of the Datapalooza and take a subway a few stops downtown to the Health Privacy Summit, where fine points of patient consent, deidentification, and the data map of health information exchange were discussed the following day. Participants here agree that highly sensitive information is traveling far and wide for marketing purposes, and perhaps even for more nefarious uses to uncover patient secrets and discriminate against them.

In addition to outright breaches–which seem to be reported at least once a week now, and can involve thousands of patients in one fell swoop–data is shared in many ways that arguably should be up to patients to decide. It flows from hospitals, doctors, and pharmacies to health information exchanges, researchers in both academia and business, marketers, and others.

Debate has raged for years between those who trust deidentification and those who claim that reidentification is too easy. This is not an arcane technicality–the whole industry of analytics represented at the Datapalooza rests on the result. Those who defend deidentification tend to be researchers in health care and the institutions who use their results. In contrast, many computer scientists outside the health care field cite instances where people have been reidentified, usually by combining data from various public sources.

Latanya Sweeney of Harvard and MIT, who won a privacy award this year at the summit, can be credited both with a historic reidentification of the records of Massachusetts Governor William Weld in 1997 and a more recent exposé of state practices. The first research led to the current HIPAA regime for deidentification, while the second showed that states had not learned the lessons of anonymization. No successful reidentifications have been reported against data sets that use recommended deidentification techniques.

I am somewhat perplexed by the disagreement, but have concluded that it cannot be resolved on technical grounds. Those who look at the current state of reidentification are satisfied that health data can be secured. Those who look toward an unspecified future with improved algorithms find reasons to worry. In a summit lunchtime keynote, Adam Tanner reported his own efforts as a non-expert to identify people online–a fascinating and sometimes amusing tale he has written up in a new book, What Stays in Vegas. So deidentification is like encryption–we all use encryption even though we expect that future computers will be able to break current techniques.

But another approach has flown up from the ashes of the “privacy is dead” nay-sayers: regulating the use of data instead of its collection and dissemination. This has been around for years, most recently in a federal PCAST report on big data privacy. One of the authors of that report, Craig Mundie of Microsoft, also published an article with that argument in the March/April issue of Foreign Affairs.

A simple application of this doctrine in health care is the Genetic Information Nondiscrimination Act of 2008. A more nuanced interpretation of the doctrine could let each individual determine who gets to use his or her data, and for what purpose.

Several proposals have been aired to make it easier for patients to grant blanket permission for certain data uses, one proposal being “patient privacy bundles” in a recent report commissioned by AHRQ. Many people look forward to economies of data, where patients can make money by selling data (how much is my blood pressure reading worth to you)?

Medyear treats personal health data like Twitter feeds, letting you control the dissemination of individual data fields through hash tags. You could choose to share certain data with your family, some with your professional care team, and some with members of your patient advocacy network. This offers an alternative to using services such as PatientsLikeMe, which use participants’ data behind the scenes.

Open data can be simulated by semi-open data sets that researchers can use under license, as with the Genetic Association Information Network that controls the Database of Genotypes and Phenotypes (dbGaP). Many CMS data sets are actually not totally open, but require a license to use.

And many data owners create relationships with third-party developers that allow them access to data. Thus, the More Disruption Please program run by athenahealth allows third-party developers to write apps accessing patient data through an API, once the developers sign a nondisclosure agreement and a Code of Conduct promising to use the data for legitimate purposes and respect privacy. These apps can then be offered to athenahealth’s clinician clients to extend the system’s capabilities.

Some speakers went even farther at the Datapalooza, asking whether raw data needs to be shared at all. Adriana Lukas of London Quantified Self and Stephen Friend of Sage Bionetworks suggested that patients hold on to all their data and share just “meanings” or “methods” they’ve found useful. The future of health analytics, it seems to me, will use relatively few open data sets, and lots of data obtained through patient consent or under license.

Cookie	Duration	Description
__cfruid	session	This cookie is set by the provider Cloudflare. This cookie is used for load balancing and for identifying trusted web traffic.
_GRECAPTCHA	5 months 27 days	This cookie is set by Google. In addition to certain standard Google cookies, reCAPTCHA sets a necessary cookie (_GRECAPTCHA) when executed for the purpose of providing its risk analysis.
AWSALBCORS	7 days	This cookie is used for load balancing services provded by Amazon inorder to optimize the user experience. Amazon has updated the ALB and CLB so that customers can continue to use the CORS request with stickness.
AWSELB	session	This cookie is associated with Amazon Web Services and is used for managing sticky sessions across production servers.
cf_ob_info		This cookie is set by the provider Cloudflare. The cookie provides informations on HTTP Status Code returned by the origin web server, the Ray ID of the original failed request and the data center serving the traffic.
cf_use_ob		This cookie is set by the provider Cloudflare content delivery network. This cookie is used for determining whether it should continue serving "Always Online" until the cookie expires.
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-non-necessary	1 hour	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non-necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
gdpr_status	6 months 2 days	This cookie is set by the provider Media.net. This cookie is used to check the status whether the user has accepted the cookie consent box. It also helps in not showing the cookie consent box upon re-entry to the website.
JSESSIONID	session	Used by sites written in JSP. General purpose platform session cookies that are used to maintain users' state across page requests.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
ts	1 year 1 month	This cookie is provided by the PayPal. It is used to support payment service in a website.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie is set by CloudFlare. The cookie is used to support Cloudflare Bot Management.
_alid_	session	This cookie is set by the provider mielevod-vh.akamaihd.net. This cookie is used for making the live streaming of video content more efficient.
akavpau_ppsd	session	This cookie is provided by Paypal. The cookie is used in context with transactions on the website.
bcookie	2 years	This cookie is set by linkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
language	session	This cookie is used to store the language preference of the user.
lidc	1 day	This cookie is set by LinkedIn and used for routing.
sp_landing	1 day	This cookie is set by the provider Spotify. This cookie is used to implement audio content from spotify on the website. It also helps in collecting information on user interaction with this audio content.
sp_t	1 year	This cookie is set by the provider Spotify. This cookie is used to implement audio content from spotify on the website. It also helps in collecting information on user interaction with this audio content.
v1st	1 year 1 month	This cookie is set by the provider TripAdvisor. This cookie is used to show user reviews, awards and information recieved on the community of TripAdvisor. It helps to collect information about how visitors use the website.

Cookie	Duration	Description
AWSELBCORS	session	This cookie is used for load balancing, inorder to optimize the service. It also stores the information regarding which server cluster is serving the visitor.
dmvk	session	This cookie is set by the provider Dailymotion. This cookie is used for collecting statistical data of the visitor behaviour on the website. It is used for internal analytics.
sid	past	This cookie is very common and is used for session state management.

Cookie	Duration	Description
__gads	1 year 24 days	This cookie is set by Google and stored under the name dounleclick.com. This cookie is used to track how many times users see a particular advert which helps in measuring the success of the campaign and calculate the revenue generated by the campaign. These cookies can only be read from the domain that it is set on so it will not track any data while browsing through another sites.
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_131168995_1	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
CONSENT	16 years 4 months 2 days 9 hours	These cookies are set via embedded youtube-videos. They register anonymous statistical data on for example how many times the video is displayed and what settings are used for playback.No sensitive data is collected unless you log in to your google account, in that case your choices are linked with your account, for example if you click “like” on a video.
UID	2 years	No description available.
vuid	2 years	This domain of this cookie is owned by Vimeo. This cookie is used by vimeo to collect tracking information. It sets a unique ID to embed videos to the website.
WMF-Last-Access	1 month 20 hours	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
DSID	1 hour	This cookie is setup by doubleclick.net. This cookie is used by Google to make advertising more engaging to users and are stored under doubleclick.net. It contains an encrypted unique ID.
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
NID	6 months	This cookie is used to a profile based on user's interest and display personalized ads to the users.
OAGEO	session	This cookie is set by the provider OpenX. This cookie is used for advertising campaigns on the website. The cookie helps in avoiding the same ad showing repeatedly.
OAID	1 year	This cookie is set when an AdsWizz website visitor have opted out the collection of information by AdsWizz service or opted to disable the targeted ads by AdsWizz.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
YSC	session	This cookies is set by Youtube and is used to track the views of embedded videos.
yt-remote-connected-devices	never	These cookies are set via embedded youtube-videos.
yt-remote-device-id	never	These cookies are set via embedded youtube-videos.
yt.innertube::nextId	never	These cookies are set via embedded youtube-videos.
yt.innertube::requests	never	These cookies are set via embedded youtube-videos.

Not So Open: Redefining Goals for Sharing Health Data in Research

About the author

Guest Author

Just for You

Healthcare IT Podcasts

Featured Articles

When Insurance and Patients Start Demanding 24/7 Availability

Frequently Overlooked Forms Management and the Value of EHR Integration

Dropping the F-Word at HIMSS22

You Don’t Have to be a Mega-Conglomerate to Benefit from Health IT: A CareCognitics Deployment

Categories

Popular Articles

Healthcare IT Today Podcast

Follow Us