The previous section of this article introduced Apixio’s analytics for payers in the Medicare Advantage program. Now we’ll step through how Apixio extracts relevant diagnostic data.
The technology of PDF scraping
Providers usually submit SOAP notes to the Apixio web site in the form of PDFs. This comes to me as a surprise, after hearing about the extravagant efforts that have gone into new CCDs and other formats such as the Blue Button project launched by the VA. Normally provided in an XML format, these documents claim to adhere to standards and offer a relatively gentle face to a computer program. In contrast, a PDF is one of the most challenging formats to parse: words and other characters are reduced to graphical symbols, while layout bears little relation to the human meaning of the data.
Structured documents such as CCDs contain only about 20% of what CMS requires, and often are formatted in idiosyncratic ways so that even the best CCDs would be no more informative than a Word document or PDF. But the main barrier to getting information, according to Schneider, is that Medicare Advantage works through the payers, and providers can be reluctant to give payers direct access to their EHR data. This reluctance springs from a variety of reasons, including worries about security, the feeling of being deluged by requests from payers, and a belief that the providers’ IT infrastructure cannot handle the burden of data extraction. Their stance has nothing to do with protecting patient privacy, because HIPAA explicitly allows providers to share patient data for treatment, payment, and operations, and that is what they are doing giving sensitive data to Apixio in PDF form. Thus, Apixio had to master OCR and text processing to serve that market.
Processing a PDF requires several steps, integrated within Apixio’s platform:
Optical character recognition to re-create the text from a photo of the PDF.
Further structuring to recognize, for instance, when the PDF contains a table that needs to be broken up horizontally into columns, or constructs such the field name “Diagnosis” followed by the desired data.
Natural language processing to find the grammatical patterns in the text. This processing naturally must understand medical terminology, common abbreviations such as CHF, and codings.
Analytics that pull out the data relevant to risk and presents it in a usable format to a human coder.
Apixio can accept dozens of notes covering the patient’s history. It often turns up diagnoses that “fell through the cracks,” as Schneider puts it. The diagnostic information Apixio returns can be used by medical professionals to generate reports for Medicare, but it has other uses as well. Apixio tells providers when they are treating a patient for an illness that does not appear in their master database. Providers can use that information to deduce when patients are left out of key care programs that can help them. In this way, the information can improve patient care. One coder they followed could triple her rate of reviewing patient charts with Apixio’s service.
Caught between past and future
If the Apixio approach to culling risk factors appears round-about and overwrought, like bringing in a bulldozer to plant a rosebush, think back to the role of historical factors in health care. Given the ways doctors have been taught to record medical conditions, and available tools, Apixio does a small part in promoting the progressive role of accountable care.
Hopefully, changes to the health care field will permit more direct ways to deliver accountable care in the future. Medical schools will convey the requirements of accountable care to their students and teach them how to record data that satisfies these requirements. Technologies will make it easier to record risk factors the first time around. Quality measures and the data needed by policy-makers will be clarified. And most of all, the advantages of collaboration will lead providers and payers to form business agreements or even merge, at which point the EHR data will be opened to the payer. The contortions providers currently need to go through, in trying to achieve 21