The first part of this article described different approaches to quality–and in fact to different qualities. In this part, I’ll look at the problems with quality measures, and at emerging solutions.
Difficulties of assessing quality
The Methods chapter of a book from the National Center for Biotechnology Information at NIH lays out many of the hurdles that researchers and providers face when judging the quality of clinical care. I’ll summarize a few of the points from the Methods chapter here, but the chapter is well worth a read. The review showed how hard it is to measure accurately many of the things we’d like to know about.
For instance, if variations within a hospital approach (or exceed) the variations between hospitals, there is little benefit to comparing hospitals using that measure. If the same physician gets wildly different scores from year to year, the validity of the measure is suspect. When care is given by multiple doctors and care teams, it is unjust to ascribe the outcome to patient’s principal caretaker. If random variations outweigh everything, the measure is of no use at all. One must also keep in mind practical considerations, such as making sure the process of collecting data would not cost too much.
Many measures apply to a narrow range of patients (for instance, those with pneumonia) and therefore may be skewed for doctors with a relatively small sample of those patients. And a severe winter could elevate mortality from pneumonia, particularly if patients have trouble getting adequate shelter and heat. In general, “For most outcomes, the impacts of random variation and patient factors beyond providers’ control often overwhelm differences attributable to provider quality.” ACMQ quality measures “most likely cannot definitively distinguish poor quality providers from high quality providers, but rather may illuminate potential quality problems for consideration of further investigation.”
The chapter helps explain why many researchers fall back on standard of care. Providers don’t trust outcome-based measures because of random variations and factors beyond their control, including poverty and other demographics. It’s hard even to know what contributed to a death, because in the final months it may not have been feasible to complete the diagnoses of a patient. Thus, doctors prefer “process measures.”
Among the criteria for evaluating quality indicators we see, “Does the indicator capture an aspect of quality that is widely regarded as important?” and more subtly, “subject to provider or public health system control?” The latter criterion heed physicians who say, “We don’t want to be blamed for bad habits or other reasons for noncompliance on the part of our patients, or for environmental factors such as poverty that resist quick fixes.”
The book’s authors are certainly aware of the bias created by gaming the reimbursement system: “systematic biases in documentation and coding practices introduced by awareness that risk-adjustment and reimbursement are related to the presence of particular complications.” The paper points out that diagnosis data is more trustworthy when it is informed by clinical information, not just billing information.
One of the most sensitive–and important–factors in quality assessment is risk adjustment, which means recognizing which patients have extra problems making their care more difficult and their recovery less certain. I have heard elsewhere the claim that CMS doesn’t cut physicians enough slack when they take on more risky patients. Although CMS tries to take poverty into account, hospital administrators suspect that institutions serving low-income populations–and safety-net hospitals in particular–are penalized for doing so.
Risk adjustment criteria are sometimes unpublished. But the most perverse distortion in the quality system comes when hospitals fail to distinguish iatrogenic complications (those introduced by medical intervention, such as infections incurred in the hospital) from the original diseases that the patient brought. CMS recognizes this risk in efforts such as penalties for hospital-acquired conditions. Unless these are flagged correctly, hospitals can end up being rewarded for treating sicker patients–patients that they themselves made sicker.
Distinguishing layers of quality
Theresa Cullen,associate director of the Regenstrief Institute’s Global Health Informatics Program, suggests that we think of quality measures as a stack, like those offered by software platforms:
The bottom of the stack might simply measure whether a patient receive the proper treatment for a diagnosed condition. For instance, is the hemoglobin A1C of each diabetic patient taken regularly?
The next step up is to measure the progress of the first measure. How many patients’ A1C was under control for their stage of the disease?
Next we can move to measuring outcomes: improvements in diabetic status, for instance, or prevention of complications from diabetes
Finally, we can look at the quality of the patient’s life–quality-adjusted life years.
Ultimately, to judge whether a quality measure is valid, one has to compare it to some other quality measure that is supposedly trustworthy. We are still searching for measures that we can rely on to prove quality–and as I have already indicated, there may be too many different “qualities” to find ironclad measures. McCallum offers the optimistic view that the US is just beginning to collect the outcomes data that will hopefully give us robust quality measures, Patient ratings serve as a proxy in the interim.
When organizations claim to use quality measures for accountable care, ratings, or other purposes, they should have their eyes open about the validity of the validation measures, and how applicable they are. Better data collection and analysis over time should allow more refined and useful quality measures. We can celebrate each advance in the choices we have for measures and their meanings.