The ONC has announced a challenge inviting developers to build out and leverage solutions based on an engine that generates synthetic health data.
The ONC’s Synthetic Health Data Challenge is looking for new ways to take advantage of Synthea, an open-source synthetic patient generator that creates model medical histories of fictitious patients. Synthea is built on a Generic Module Framework which allows it to model varied diseases and conditions playing a role in the medical history of these patients.
The engine was built by The MITRE Corporation, a not-for-profit research and development organization sponsored by the U.S. government. MITRE launched the Synthea effort using models based on the top 10 reasons patients see primary care doctors and the top ten conditions that shorten patient lifespans. The Synthea modules create synthetic patients using clinical data and real-world statistics collected by agencies like the CDC and NIH.
The synthetic patient profiles include a full medical history, complete with medication lists, allergies, physician encounters and social determinants of health. The data can be shared using C-CDA, HL7 FHIR, CSV and other formats.
Now, ONC is offering a total of $100,000 in cash prizes for those who develop innovative tools and resources that support validation and novel uses of synthetic data for patient-centered synthetic data for PCOR researchers and/or health IT developers.
ONC is accepting entries addressing one of two challenge categories. One category focuses on enhancements to Synthea, including the development and/or enhancement of Synthea modules and the development of solutions that enhance or address the limitations of Synthea. The other category involves proposals outlining novel uses of Synthea-generated synthetic data.
The challenge comes as part of a larger effort focused on using synthetic health data to accelerate patient-centered outcomes research (PCOR). This effort, which will extend through 2022, has four key objectives:
- Identifying and convening a multidisciplinary panel of experts to help it find use cases and develop modules further
- Developing opioid, pediatric and complex care data generation modules for Synthea to increase the number and diversity of synthetic patient health records to meet PCOR needs
- Engaging the broader research and development community to validate these synthetic health data sets, as well as demonstrate the potential uses of synthetic health records
I’m surprised this topic hasn’t attracted more attention since I wrote about it in late 2017. This seems like a very useful approach to performing population-level analyses, something that’s particularly important given the need to understand the pandemic on a high level. Let’s hope this new challenge brings some useful thinking on this topic to the fore.