Hands-On Guidance for Data Integration in Health: The CancerLinQ Story

Andy Oram praxagora

Institutions throughout the health care field are talking about data sharing and integration. Everyone knows that improved care, cost controls, and expanded research requires institutions who hold patient data to safely share it. The American Society of Clinical Oncology’s CancerLinQ, one of the leading projects analyzing data analysis to find new cures, has tackled data sharing with a large number of health providers and discovered just how labor-intensive it is.

CancerLinQ fosters deep relationships and collaborations with the clinicians from whom it takes data. The platform turns around results from analyzing the data quickly and to give the clinicians insights they can put to immediate use to improve the care of cancer patients. Issues in collecting, storing, and transmitting data intertwine with other discussion items around cancer care. Currently, CancerLinQ isolates the data from each institution, and de-identifies patient information in order to let it be shared among participating clinicians. CancerLinQ LLC is a wholly-owned nonprofit subsidiary of ASCO, which has registered CancerLinQ as a trademark.

Help from Jitterbit

In 2015, CancerLinQ began collaborating with Jitterbit, a company devoted to integrating data from different sources. According to Michele Hazard, Director of Healthcare Solutions, and George Gallegos, CEO, their company can recognize data from 300 different sources, including electronic health records. At the beginning, the diversity and incompatibility of EHRs was a real barrier. It took them several months to figure out each of the first EHRs they tackled, but now they can integrate a new one quickly. Oncology care, the key data needed by CancerLinQ, is a Jitterbit specialty.

One of the barriers raised by EHRs is licensing. The vendor has to “bless” direct access to EHR and data imported from external sources. HIPAA and licensing agreements also make tight security a priority.

Another challenge to processing data is to find records in different institutions and accurately match data for the correct patient.

Although the health care industry is moving toward the FHIR standard, and a few EHRs already expose data through FHIR, others have idiosyncratic formats and support older HL7 standards in different ways. Many don’t even have an API yet. In some cases, Jitterbit has to export the EHR data to a file, transfer it, and unpack it to discover the patient data.

Lack of structure

Jitterbit had become accustomed to looking in different databases to find patient information, even when EHRs claimed to support the same standard. One doctor may put key information under “diagnosis” while another enters it under “patient problems,” and doctors in the same practice may choose different locations.

Worse still, doctors often ignore the structured fields that were meant to hold important patient details and just dictate or type it into a free-text note. CancerLinQ anticipated this, unpacking the free text through optical character recognition (OCR) and natural language processing (NLP), a branch of artificial intelligence.

It’s understandable that a doctor would evade the use of structured fields. Just think of the position she is in, trying to keep a complex cancer case in mind while half a dozen other patients sit in the waiting room for their turn. In order to use the structured field dedicated to each item of information, she would have to first remember which field to use–and if she has privileges at several different institutions, that means keeping the different fields for each hospital in mind.

Then she has to get access to the right field, which may take several clicks and require movement through several screens. The exact information she wants to enter may or may not be available through a drop-down menu. The exact abbreviation or wording may differ from EHR to EHR as well. And to carry through a commitment to using structured fields, she would have to go through this thought process many times per patient. (CancerLinQ itself looks at 18 Quality eMeasures today, with the plan to release additional measures each year.)

Finally, what is the point of all this? Up until recently, the information would never come back in a useful form. To retrieve it, she would have to retrace the same steps she used to enter the structured data in the first place. Simpler to dump what she knows into a free-text note and move on.

It’s worth mentioning that this Babyl of health care information imposes negative impacts on the billing and reimbursement process, even though the EHRs were designed to support those very processes from the start. Insurers have to deal with the same unstructured data that CancerLinQ and Jitterbit have learned to read. The intensive manual process of extracting information adds to the cost of insurance, and ultimately the entire health care system. The recent eClinicalWorks scandal, which resembles Volkswagon’s cheating on auto emissions and will probably spill out to other EHR vendors as well, highlights the failings of health data.

Making data useful

The clue to unblocking this information logjam is deriving insights from data that clinicians can immediately see will improve their interventions with patients. This is what the CancerLinQ team has been doing. They run analytics that suggest what works for different categories of patients, then return the information to oncologists. The CancerLinQ platform also explains which items of data were input to these insights, and urges the doctors to be more disciplined about collecting and storing the data. This is a human-centered, labor-intensive process that can take six to twelve months to set up for each institution. Richard Ross, Chief Operating Officer of CancerLinQ calls the process “trench warfare,” not because its contentious but because it is slow and requires determination.

Of the 18 measures currently requested by CancerLinQ, one of the most critical data elements driving the calculation of multiple measures is staging information: where the cancerous tumors are and how far it has progressed. Family history, treatment plan, and treatment recommendations are other examples of measures gathered.

The data collection process has to start by determining how each practice defines a cancer patient. The CancerLinQ team builds this definition into its request for data. Sometimes they submit “pull” requests at regular intervals to the hospital or clinic, whereas other times the health care provider submits the data to them at a time of its choosing.

Some institutions enforce workflows more rigorously than others. So in some hospitals, CancerLinQ can persuade the doctors to record important information at a certain point during the patient’s visit. In other hospitals, doctors may enter data at times of their own choosing. But if they understand the value that comes from this data, they are more likely to make sure it gets entered, and that it conforms to standards. Many EHRs provide templates that make it easier to use structured fields properly.

When accepting information from each provider, the team goes through a series of steps and does a check-in with the provider at each step. The team evaluates the data in a different stage for each criterion: completeness, accuracy of coding, the number of patients reported, and so on. By providing quick feedback, they can help the practice improve its reporting.

The CancerLinQ/Jitterbit story reveals how difficult it is to apply analytics to health care data. Few organizations can afford the expertise they apply to extracting and curating patient data. On the other hand, CancerLinQ and Jitterbit show that effective data analysis can be done, even in the current messy conditions of electronic data storage. As the next wave of technology standards, such as FHIR, fall into place, more institutions should be able to carry out analytics that save lives.

About the author

View All Posts

Andy Oram

Andy is a writer and editor in the computer field. His editorial projects have ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. A correspondent for Healthcare IT Today, Andy also writes often on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM (Brussels), DebConf, and LibrePlanet. Andy participates in the Association for Computing Machinery's policy organization, named USTPC, and is on the editorial board of the Linux Professional Institute.

ChartSpan Closes $16,000,000 Round of Venture Capital

A Look Into the Future of HIM with Rita Bowen – HIM Scene

Cookie	Duration	Description
__cfruid	session	This cookie is set by the provider Cloudflare. This cookie is used for load balancing and for identifying trusted web traffic.
_GRECAPTCHA	5 months 27 days	This cookie is set by Google. In addition to certain standard Google cookies, reCAPTCHA sets a necessary cookie (_GRECAPTCHA) when executed for the purpose of providing its risk analysis.
AWSALBCORS	7 days	This cookie is used for load balancing services provded by Amazon inorder to optimize the user experience. Amazon has updated the ALB and CLB so that customers can continue to use the CORS request with stickness.
AWSELB	session	This cookie is associated with Amazon Web Services and is used for managing sticky sessions across production servers.
cf_ob_info		This cookie is set by the provider Cloudflare. The cookie provides informations on HTTP Status Code returned by the origin web server, the Ray ID of the original failed request and the data center serving the traffic.
cf_use_ob		This cookie is set by the provider Cloudflare content delivery network. This cookie is used for determining whether it should continue serving "Always Online" until the cookie expires.
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-non-necessary	1 hour	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non-necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
gdpr_status	6 months 2 days	This cookie is set by the provider Media.net. This cookie is used to check the status whether the user has accepted the cookie consent box. It also helps in not showing the cookie consent box upon re-entry to the website.
JSESSIONID	session	Used by sites written in JSP. General purpose platform session cookies that are used to maintain users' state across page requests.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
ts	1 year 1 month	This cookie is provided by the PayPal. It is used to support payment service in a website.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie is set by CloudFlare. The cookie is used to support Cloudflare Bot Management.
_alid_	session	This cookie is set by the provider mielevod-vh.akamaihd.net. This cookie is used for making the live streaming of video content more efficient.
akavpau_ppsd	session	This cookie is provided by Paypal. The cookie is used in context with transactions on the website.
bcookie	2 years	This cookie is set by linkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
language	session	This cookie is used to store the language preference of the user.
lidc	1 day	This cookie is set by LinkedIn and used for routing.
sp_landing	1 day	This cookie is set by the provider Spotify. This cookie is used to implement audio content from spotify on the website. It also helps in collecting information on user interaction with this audio content.
sp_t	1 year	This cookie is set by the provider Spotify. This cookie is used to implement audio content from spotify on the website. It also helps in collecting information on user interaction with this audio content.
v1st	1 year 1 month	This cookie is set by the provider TripAdvisor. This cookie is used to show user reviews, awards and information recieved on the community of TripAdvisor. It helps to collect information about how visitors use the website.

Cookie	Duration	Description
AWSELBCORS	session	This cookie is used for load balancing, inorder to optimize the service. It also stores the information regarding which server cluster is serving the visitor.
dmvk	session	This cookie is set by the provider Dailymotion. This cookie is used for collecting statistical data of the visitor behaviour on the website. It is used for internal analytics.
sid	past	This cookie is very common and is used for session state management.

Cookie	Duration	Description
__gads	1 year 24 days	This cookie is set by Google and stored under the name dounleclick.com. This cookie is used to track how many times users see a particular advert which helps in measuring the success of the campaign and calculate the revenue generated by the campaign. These cookies can only be read from the domain that it is set on so it will not track any data while browsing through another sites.
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_131168995_1	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
CONSENT	16 years 4 months 2 days 9 hours	These cookies are set via embedded youtube-videos. They register anonymous statistical data on for example how many times the video is displayed and what settings are used for playback.No sensitive data is collected unless you log in to your google account, in that case your choices are linked with your account, for example if you click “like” on a video.
UID	2 years	No description available.
vuid	2 years	This domain of this cookie is owned by Vimeo. This cookie is used by vimeo to collect tracking information. It sets a unique ID to embed videos to the website.
WMF-Last-Access	1 month 20 hours	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
DSID	1 hour	This cookie is setup by doubleclick.net. This cookie is used by Google to make advertising more engaging to users and are stored under doubleclick.net. It contains an encrypted unique ID.
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
NID	6 months	This cookie is used to a profile based on user's interest and display personalized ads to the users.
OAGEO	session	This cookie is set by the provider OpenX. This cookie is used for advertising campaigns on the website. The cookie helps in avoiding the same ad showing repeatedly.
OAID	1 year	This cookie is set when an AdsWizz website visitor have opted out the collection of information by AdsWizz service or opted to disable the targeted ads by AdsWizz.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
YSC	session	This cookies is set by Youtube and is used to track the views of embedded videos.
yt-remote-connected-devices	never	These cookies are set via embedded youtube-videos.
yt-remote-device-id	never	These cookies are set via embedded youtube-videos.
yt.innertube::nextId	never	These cookies are set via embedded youtube-videos.
yt.innertube::requests	never	These cookies are set via embedded youtube-videos.

Hands-On Guidance for Data Integration in Health: The CancerLinQ Story

About the author

Andy Oram

Just for You

Healthcare IT Podcasts

Featured Articles

SNF Setback: Acute Care Providers Share Limited Data when Transferring Patients to Post-Acute Care

The Move to Value-Driven Health Plans for Employers

Discover The Secret Trifecta of Performance Data to Deliver Better Patient Care

The Future of Fax in Healthcare Is Paperless

Cybersecurity’s Impact on Patient Safety and Trust

Categories

Popular Articles

Healthcare IT Today Podcast

Follow Us

You may also like

About the author

Andy Oram

Just for You

Healthcare IT Podcasts

Featured Articles

Categories

Popular Articles

Healthcare IT Today Podcast

Follow Us