Driving Collaboration Among Researchers Through BurstIQ’s Research Foundry

We’re all anxious for better data sharing among pharmaceutical companies, as well as other institutions researching COVID-19. Although most of the barriers are cultural–a fear of being beaten to a cure or having valuable insights stolen–sharing is still hard technically. For years, I’ve covered organizations that try to promote collaboration of various types, such as the tranSMART Foundation and Sage Bionetworks (the latter with a particular interest in data sharing). With Research Foundry, a company founded in 2015 named BurstIQ is making progress in this noble endeavor.

BurstIQ is a secure data network for health care that facilitates machine learning and data sharing. Through its Research Foundry, organizations can use these analytics, as well as licensing data, controlling its distribution, and digitally signing contracts to limit the data’s use. The following figure shows some of the graphs created by analytics on Research Foundry.

Bar charts and scatter charts produced by Research Foundry
Bar charts and scatter charts produced by Research Foundry

BurstIQ’s CEO, Frank Ricotta, likes to call its clients and partners a “network,” illustrating their goal of giving access by deserving institutions enabling technology. The company has an international reach, with clients in many countries. They work with the OECD and several nations in the British Commonwealth. A big part of their mission, according to Ricotta, is to help under-represented communities and bring social determinants of health (SDoH) to the surface.

To that end, they have recently announced a data challenge with the American Heart Association, focusing on finding disparities and SDoH related to COVID-19 infections. Researchers have access to data in the AHA’s Precision Medicine Platform (hosted on the Hitachi Vantara cloud) along with BurstIQ’s analytics.

Other recent accomplishments include a deal with Empiric Health, which is helping hospitals reduce surgical costs through real-time access to data, and the analytics that this access provides for such tasks as comparing outcomes across different physicians. Many of BurstIQ’s clients are in pharma.

Blockchain architecture

BurstIQ has joined a number of enthusiasts in using blockchain to accomplish the data sharing and licensing. An organization that wants to share data can put a pointer to it on the blockchain, or the data itself. As is typical for such applications, the blockchain records the contract and each data access is determined by the contract. Also typical is the use of digital signatures to protect both the privacy of personal data and the rights of the data holder.

Storing primary data right on the blockchain (instead of just registering pointers to data that’s stored elsewhere) increases the blockchain’s size, with potential impacts on performance. But Ricotta lists several advantages to putting the data directly on the blockchain.

First, contracts and legal requirements can be enforced in an auditable manner. For instance, consider right-to-be-forgotten laws in the European Union and elsewhere. If the person’s data is in a conventional database, the owner can’t prove that the data was deleted as required, but a blockchain can render the data inaccessible. Furthermore, particular elements of data can be protected by contract. So if some columns contain personally identifiable information, the owner can prevent another party from reading them while allowing access to other columns. In contrast, if you keep the data off-chain, you can still refuse access in the contract, but you can’t prove that the contract was followed.

Storing data on the chain also provides a single interface through the blockchain’s API. If you lease data from another owner, and that data is already on the blockchain, you can offer their data along with your own through the same API. This makes analytics much easier.

Ricotta provides an insightful view of how to make blockchain work in health care: “Most blockchain solutions targeted towards healthcare are storing data off-chain and putting pointers on the chain. That approach has proven to be of limited value, because they don’t solve the core challenge with healthcare data: the need to simplify the mechanisms for secure data sharing and exchange. The data itself is still being stored in traditional, centralized, breach-prone systems, and exchange still requires a mess of 1:1 integrations. One of the key reasons BurstIQ is different from both traditional systems and blockchain-based systems is because we are able to securely put data on-chain and manage the ownership and sharing of that data. Blockchain is one technology that we use, but we combine it with big data and machine intelligence.”

The BurstIQ blockchain is a distributed one, so that it doesn’t need to be stored at one single location or by any single organization. Copies and backups are easy to arrange. BurstIQ discourages organizations from asking it to be guardian of the data; BurstIQ does not seek to be a cloud data service.

Ricotta referred to the responsibility of client organizations to be guardians of data shared by them and their partners. “An example of a data guardian would be a health information exchange, which is a guardian of data on behalf of citizens.”

Governance is also distributed. BurstIQ’s blockchain allows organizations to set up zones and share them with partners. Here’s one example Ricotta offered of where zones are useful: some modern oncology treatments take a person’s biosample (clearly very sensitive personal data) and pass it through many different companies on the way to creating a personalized treatment. These companies can set up their own consent rules and police their observance. BurstIQ doesn’t have to handle the details of these particular processes.

Quoting Ricotta again: “The BurstIQ platform is a network of networks, rather than a single blockchain network. Therefore, data owned by one organization is not accessible to or stored by other organizations on the network. Each organization maintains direct control of (and storage of) their data within their subnetwork.”

Flexible governance also allows organizations to enforce regulations that vary from one company and one country to another, such as a requirement that the data be stored in the country where it was generated.

In one sense, the BurstIQ network acts as a kind of data store, and its functions could in theory be performed by a conventional, centralized database. But the blockchain here presents several advantages. Data signing is built in, whereas it would have to be a separate service tacked on to a conventional data store. People can have more confidence that neither BurstIQ nor an attacker corrupted or destroyed data. And if BurstIQ was ever to vanish, the blockchain would still be there and be accessible.

I am skeptical of most blockchain solutions, because blockchains have scalability problems: they require sequential, synchronous writes and grow linearly. But for an application like BurstIQ, with a large but still circumscribed set of participants, blockchain looks feasible. BurstIQ still has to deal with the problem of identifying and trusting the participants.

API-based platform

BurstIQ offers a rich API exposing all blockchain-related activities such as signing, and support for standards such as FHIR and older HL7 formats. Any developer can build an app for free on the platform. There are already a number of apps, some for researchers and some for consumers. BurstIQ has also started to release software development kits (SDKs) and Jupyter notebooks to facilitate its use for analytics, public health surveillance, and predictive modeling. With these services to far-flung health organizations, BurstIQ is positioning its analytics platform to be of universal value to health care researchers.

This article is part of the #HealthIT100in100

About the author

Andy Oram

Andy Oram

Andy Oram writes and edits documents about many aspects of computing, ranging in size from blog postings to full-length books. Topics cover a wide range of computer technologies: data science and machine learning, programming languages, Web performance, Internet of Things, databases, free and open source software, and more. My editorial output at O'Reilly Media included the first books ever published commercially in the United States on Linux, the 2001 title Peer-to-Peer (frequently cited in connection with those technologies), and the 2007 title Beautiful Code. He is a regular correspondent on health IT and health policy for HealthcareScene.com. He also contributes to other publications about policy issues related to the Internet and about trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business.

Add Comment

Click here to post a comment

   

Categories