Society is being doubly convulsed by the revolutions of software and artificial intelligence. What passes for a moderate observation is the 2011 statement by Marc Andreessen that “Software is eating the world.” At the more extreme end, people dread a future envisioned by the Terminator and Matrix movies. But an alert concern is justified. Health care is starting to grapple with the safe and effective use of these tools, and the FDA joined the discussion this past April with a proposed regulatory framework. Unfortunately, fear and confusion muddle organizations’ response to change even here.
In this article, I explain the approach to the FDA proposal that we took in a group of computer experts I’m involved with, the U.S. Technology Policy Committee (USTPC) of the Association for Computing Machinery (ACM). The ACM is one of the earliest organizations in computing, having been formed just three years after ENIAC demonstrated in 1945 that digital computing was applicable to real-life problems. The organization pulls together an enormous variety of deep and well-trained professionals, so the members of the Technology Policy Committee felt we could find something to offer the FDA. We drew in a non-member, programmer and health care expert Shahid Shah to guide us.
The Need to Reframe the Discussion
The seven people who joined the team quickly realized that the FDA comment, which claimed to focus on AI, really dealt with issues common to all devices using software. This was one of the signs telling us that the FDA staff, while diligent, was having trouble getting their hands around these issues. Although they investigated the principles of today’s machine learning, they fell back on familiar FDA classifications of medical devices that weren’t really relevant to the question of AI. Their proposal culminated in a grab-bag of items called an Algorithm Change Protocol, which manufacturers would supposedly have to give the FDA for approval of the device, and then update when some change triggered a change in the device’s status in the eyes of the FDA. We questioned whether FDA staff could even keep up with the glut of data they were demanding, little of which would help to determine whether the software was safe.
Actually, the FDA has been looking for some time at the special requirements for software in medicine, and already is conducting a sophisticated pilot program that recognizes such issues as speed of development and the risks of change. Currently, the pilot is limited to what the FDA calls “Software as a Medical Device (SaMD),” better known to the general public as health care apps (although these apps support professional medical work and may replace more conventional devices, so they must be checked with great care). However, the pilot’s approach to modern software practices–Agile development, continuous integration, etc.–clearly prepare them to be more widely applied in the future to devices mixing proprietary hardware with software. I reported on the status of this pilot in October, 2018. In short, the FDA has a smart foundation to build on, but when taking on the additional challenge of AI, they don’t seem to realize the two domains’ commonalities and differences.
The first recommendation in our USTPC comment, therefore, was to distinguish the traits or risks that are specific to AI from those that apply to the use of software generally. We then followed our own advice by addressing software in devices and AI in turn.
Promoting Robust Software Development
The dilemma of software in medicine is that a rigorous test and review cycle–which has been demonstrated often to catch critical flaws–is strongly counterindicated in software development, characterized by repeated releases and fast responses to errors. There is ample evidence that software can introduce serious errors and security risks, particularly as more and more devices connect to hospital networks.
The best solution the field has come up with so far is to review the software development process as well as its result. That lies at the heart of the FDA’s pre-certification process mentioned earlier.
The USTPC team suggested that much of information about a software development process can be collected automatically: how many bugs were fixed between releases, what passed through a test cycle, and so forth. If this information were submitted by the manufacturer for every release–even if the release followed another one by a few hours–it would demonstrate that the safety of the resulting device was protected. But the FDA would need an automated means for evaluating the information submitted, because no team of people could keep up. The FDA doesn’t care whether you use Jenkins or Travis for continuous integtration: the point is that you have an explicit, transparent, and repeatedable process for quality control.
Pinpointing the Role of AI
Having separated out the issues that affect all software and digitized assets, our USTPC team addressed the particular requirements of artificial intelligence. We ended up questioning whether AI had any special impact on FDA regulation. It all depends on how the manufacturers use AI.
We thought it most likely that manufacturers would gather in a lot of “big data” from various sources (perhaps including from the users of their devices), run machine learning algorithms over the data, and use the results in the next release of the devices. Used in this straightforward manner, AI can be handled exactly like any other research results. So long as the device passes routine tests for safety and effectiveness, neither the FDA nor the purchasers need to care whether AI was involved.
A thornier decision arises if the devices adjust themselves automatically in the field, based on the data they collect. Although this use of AI has precedents–notably in ad auctions on the Web, which choose which annoying advertisement to show you when you visit a page–this scenario in medicine is highly unlikely. First, it would be hard to collect enough data to make a measurable impact on the model being used by the software to control its behavior. Second, the risks tower over the likely benefits of making a tweak to behavior.
We are unfortunately enough to have examples of software that makes medical devices that go haywire. Our comment to the FDA cited the Therac-25 radiation machine (certainly known to the FDA already) that killed patients by delivering doses of radiation hundreds of times greater than any organism should receive. For every software-enabled device that performs a critical function, manufacturers should set bounds for reasonable behavior and install “governor” software to make sure the device stays within those bounds, no matter what crazy data is handed to it.
Regarding the use of AI to change behavior in the field–as opposed to using it as input to research–the FDA can take a couple paths. It can make the issue go away entirely by banning this model of AI deployment, at least until some unseen time in the future when we more thoroughly understand the behavior and risks of AI technologies. In theory, the FDA could allow this use with the installation of governor software and careful vetting of AI algorithms.
While preparing our comment, the team discussed a few other intriguing opportunities for incorporating AI into medicine. We suggested creating pools of data to which all manufacturers contribute, because AI generally works better with more data. We discussed the potential for setting up sandbox environments for testing, but decided we did not have a clear and detailed enough description of such environments to present them to the FDA.
Buzzwords entice everyone: businesses, governments, and the general public. AI is one of those buzzwords that has periodically generated hope, hype, and fear. There is no doubt that machine learning is now bringing astonishing tasks within humanity’s reach, so we need to define what we’re talking about and how far its influence reaches, in order to reach good policies. We’re glad the FDA has taken up the topic, and we hope our comment will contribute to progress.