Using AI technologies to analyze medical images is looking more and more promising by the day. However, new research suggests that when AI tools have to cope with images from multiple health systems, they have a harder time than when they stick to just one.
According to a new study published in PLOS Medicine, interest is growing in analyzing medical images using convolutional neural networks, a class of deep neural networks often dedicated to this purpose. To date, CNNs have made progress in analyzing X-rays to diagnose disease, but it’s not clear whether CNNs trained on X-rays from one hospital or system will work just as well in other hospitals and health systems.
To look into this issue, the authors trained pneumonia screening CNNs on 158,323 chest X-rays, including 112,120 X-rays from the NIH Clinical Center, 42,396 X-rays from Mount Sinai Hospital and 3,807 images from the Indiana University Network for Patient Care.
In their analysis, the researchers examined the effect of pooling data from sites with a different prevalence of pneumonia. One of their key findings was that when two training data sites had the same pneumonia prevalence, the CNNs performed consistently, but when a 10-fold different in pneumonia rates were introduced between sites, their performance diverged. In that instance, the CNN performed better on internal data than that supplied by an external organization.
The research team found that in 3 out of 5 natural comparisons, the CNNs’ performance on chest X-rays from outside hospitals was significantly lower than on held-out X-rays from the original hospital system. This may point to future problems when health systems try to use AI for imaging on partners’ data. This is not great to learn given the benefits AI-supported diagnosis might offer across, say, an ACO.
On the other hand, it’s worth noting that the CNNs were able to determine which organization originally created the images at an extremely high rate of accuracy and calibrate its diagnostic predictions accurately. In other words, it sounds as though over time, CNNs might be able to adjust to different sets of data on the fly. (The researchers didn’t dig into how this might affect their computing performance.)
Of course, it’s possible that we’ll develop a method for normalizing imaging data that works in the age of AI, in which case the need to adjust for different data attributes may not be needed. However, we’re at the very early stages of training AIs for image sharing, so it’s anyone’s guess as to what form that normalization will take.