Mayo Developing Tools To Extract Medical Data From All EMRs

July 17, 2011

3 Min Read

Anne Zieger

Here’s some interesting and potentially important news. According to some recent news items, it seems that Mayo Clinic investigators are putting the finishing touches on a suite of tools which can identify and sort medical data contained in any electronic medical record.

Mayo investigators are working under a federal grant, the $60 million Strategic Health IT Advanced Research Projects (SHARP) program, which is funded by the ONC.

According to a piece in Government HealthIT, the researchers have used natural language processing tools to isolate health data from about 30 digital medical records of patients with diabetes. So far, so good. When the extracted data is run through specialized systems developed with IBM’s Watson Research Center, the 30 patient records “explode” into 134 *bilion* individual pieces of information, Government HealthIT reports.

Unfortunately, none of the sources I have explain what specific data pieces make up this total, which sounds extremely high to me. If we’re talking about just 30 patients, it’s hard for me to imagine that mundane details of care represent even multiple thousands of data points, unless you’re dealing with decades of care. (Perhaps the information involved includes the coding needed to extract the data — readers, can you clarify this for me perhaps?)

While I can’t testify as to how realistic the Mayo researchers’ claims are, I have to think that if they’re on target, something very big is in the works. After all, to date I’ve heard little of tools that can effectively, fluidly extract clinical data from an entire EMR-based patient chart regardless of format or data organization. Concepts like natural language processing are far from new, but it seems they haven’t been up to the job.

Not only would such capabilities allow virtually any set of institutions to share data, a giant leap in and of itself, they would also allow providers to do unprecedented levels of clinical analysis and ultimately improve care.

On the other hand, it’s not clear how practical this approach will be. If it only takes 30 records to generate that much data, just imagine how much data a single mid-sized hospital would have to wrangle! If I’m reading things right, this technology may remain stuck at the research stage, as it’s hard to imagine most institutions could manage terabytes of new data.

Still, there’s clearly much to learn here. I’m eager to find out whether Mayo’s SHARP technology turns out to be usable in everyday clinical life.

About the author

View All Posts

Anne Zieger

Anne Zieger is a healthcare journalist who has written about the industry for 30 years. Her work has appeared in all of the leading healthcare industry publications, and she's served as editor in chief of several healthcare B2B sites.

6 Comments

Dr. Michael West says:

July 18, 2011 at 5:21 pm

I completely agree with your interest in figuring out where 134 billion pieces of data is coming from. Seems like the devil must lie in those details somehwere.
Dr. Michael West says:

July 18, 2011 at 5:22 pm

On the other hand, WOW!
Chuck says:

July 19, 2011 at 5:35 am

“When run through computing systems developed in partnership with IBM’s Watson Research Center, those 30 patient records explode into 134 billion individual pieces of information to be organized and stored.”

125 GB, scanned files? extracted structured data likely much less, perhaps there’s a pointer from each extracted canonical representation back to location within each document image, which might be useful for quality assurance and post editing, or, since it’s research, for a human to tweak the parsing/meaningful assignment software, though that’s a guess
Katherine Rourke says:

July 19, 2011 at 7:47 am

Thanks for your comments folks!

Dr. West:

Yeah, wow, huh? I’d love to think that we have a real breakthrough on our hands. My gut feeling, as I noted, is that what we have is an impressive but not too practical research accomplishment. But you have to start somewhere.

Chuck:

Thanks for the suggestions re: why we’re talking about such a large amount of data. Do you agree that given the volume of information, it’s unlikely that this research will be transferable to everyday providers just yet?
Chuck says:

July 19, 2011 at 8:32 am

Not necessarily. Perhaps it’s like debugging a program. Sometimes there’s code or other resources used while developing software that is stripped out before it’s shipped. At this point we’re just guessing and speculating. I’m looking forward to the actual research reports.
samantha says:

October 29, 2011 at 4:31 pm

thought you would appreciate this young artists (David Foox) work in raising awareness for organ donation. His creepy cute toys stand 3.25″ tall and are one of 24 different body parts. http://organ-donors.us

Click here to post a comment

Personal Branding Starts Young

SRS EHR Receives Full Government Certification of Its Productivity-Focused Meaningful Use EHR

Cookie	Duration	Description
__cfruid	session	This cookie is set by the provider Cloudflare. This cookie is used for load balancing and for identifying trusted web traffic.
_GRECAPTCHA	5 months 27 days	This cookie is set by Google. In addition to certain standard Google cookies, reCAPTCHA sets a necessary cookie (_GRECAPTCHA) when executed for the purpose of providing its risk analysis.
AWSALBCORS	7 days	This cookie is used for load balancing services provded by Amazon inorder to optimize the user experience. Amazon has updated the ALB and CLB so that customers can continue to use the CORS request with stickness.
AWSELB	session	This cookie is associated with Amazon Web Services and is used for managing sticky sessions across production servers.
cf_ob_info		This cookie is set by the provider Cloudflare. The cookie provides informations on HTTP Status Code returned by the origin web server, the Ray ID of the original failed request and the data center serving the traffic.
cf_use_ob		This cookie is set by the provider Cloudflare content delivery network. This cookie is used for determining whether it should continue serving "Always Online" until the cookie expires.
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-non-necessary	1 hour	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non-necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
gdpr_status	6 months 2 days	This cookie is set by the provider Media.net. This cookie is used to check the status whether the user has accepted the cookie consent box. It also helps in not showing the cookie consent box upon re-entry to the website.
JSESSIONID	session	Used by sites written in JSP. General purpose platform session cookies that are used to maintain users' state across page requests.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
ts	1 year 1 month	This cookie is provided by the PayPal. It is used to support payment service in a website.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie is set by CloudFlare. The cookie is used to support Cloudflare Bot Management.
_alid_	session	This cookie is set by the provider mielevod-vh.akamaihd.net. This cookie is used for making the live streaming of video content more efficient.
akavpau_ppsd	session	This cookie is provided by Paypal. The cookie is used in context with transactions on the website.
bcookie	2 years	This cookie is set by linkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
language	session	This cookie is used to store the language preference of the user.
lidc	1 day	This cookie is set by LinkedIn and used for routing.
sp_landing	1 day	This cookie is set by the provider Spotify. This cookie is used to implement audio content from spotify on the website. It also helps in collecting information on user interaction with this audio content.
sp_t	1 year	This cookie is set by the provider Spotify. This cookie is used to implement audio content from spotify on the website. It also helps in collecting information on user interaction with this audio content.
v1st	1 year 1 month	This cookie is set by the provider TripAdvisor. This cookie is used to show user reviews, awards and information recieved on the community of TripAdvisor. It helps to collect information about how visitors use the website.

Cookie	Duration	Description
AWSELBCORS	session	This cookie is used for load balancing, inorder to optimize the service. It also stores the information regarding which server cluster is serving the visitor.
dmvk	session	This cookie is set by the provider Dailymotion. This cookie is used for collecting statistical data of the visitor behaviour on the website. It is used for internal analytics.
sid	past	This cookie is very common and is used for session state management.

Cookie	Duration	Description
__gads	1 year 24 days	This cookie is set by Google and stored under the name dounleclick.com. This cookie is used to track how many times users see a particular advert which helps in measuring the success of the campaign and calculate the revenue generated by the campaign. These cookies can only be read from the domain that it is set on so it will not track any data while browsing through another sites.
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_131168995_1	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
CONSENT	16 years 4 months 2 days 9 hours	These cookies are set via embedded youtube-videos. They register anonymous statistical data on for example how many times the video is displayed and what settings are used for playback.No sensitive data is collected unless you log in to your google account, in that case your choices are linked with your account, for example if you click “like” on a video.
UID	2 years	No description available.
vuid	2 years	This domain of this cookie is owned by Vimeo. This cookie is used by vimeo to collect tracking information. It sets a unique ID to embed videos to the website.
WMF-Last-Access	1 month 20 hours	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
DSID	1 hour	This cookie is setup by doubleclick.net. This cookie is used by Google to make advertising more engaging to users and are stored under doubleclick.net. It contains an encrypted unique ID.
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
NID	6 months	This cookie is used to a profile based on user's interest and display personalized ads to the users.
OAGEO	session	This cookie is set by the provider OpenX. This cookie is used for advertising campaigns on the website. The cookie helps in avoiding the same ad showing repeatedly.
OAID	1 year	This cookie is set when an AdsWizz website visitor have opted out the collection of information by AdsWizz service or opted to disable the targeted ads by AdsWizz.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
YSC	session	This cookies is set by Youtube and is used to track the views of embedded videos.
yt-remote-connected-devices	never	These cookies are set via embedded youtube-videos.
yt-remote-device-id	never	These cookies are set via embedded youtube-videos.
yt.innertube::nextId	never	These cookies are set via embedded youtube-videos.
yt.innertube::requests	never	These cookies are set via embedded youtube-videos.