Privacy concerns that anonymous patients could be “re-identified” without their consent while being analysed by artificial intelligence may be holding the entire global health industry back from exploiting new opportunities, according to new research.
Existing healthcare algorithms rely on huge amounts of data designed to be stripped of personal information. A team of MIT researchers has quantified the potential risk of patient re-identification in new research and found that between 2016 and 2021 - the period examined in the study led by MIT Principal Research Scientist Leo Anthony Celi - there were no reports of patient re-identification through publicly available health data.
MIT’s findings suggest the potential risk to patient privacy is outweighed by gains for patients, says Celi, who hopes these datasets will include a more diverse group of patients and become more widely available.
“We agree that there is some risk to patient privacy, but there is also a risk of not sharing data,” he says. “There is harm when data is not shared, and that needs to be factored into the equation.”
Celi is the senior author of the new study and Kenneth Seastedt, a Thoracic Surgery Fellow at Beth Israel Deaconess Medical Centre, is the lead author of the corresponding paper published in PLOS Digital Health. The research was funded by the National Institutes of Health through the National Institute of Biomedical Imaging and Bioengineering.
When patient data is entered into large health record databases created by hospitals and other institutions, certain types of identifying information are typically removed, including patients’ names, addresses, and phone numbers. This is intended to prevent patients from being re-identified and having information about their medical conditions made public.
However, concerns about privacy have slowed the development of more publicly available databases with this kind of information, Celi says. In the new study, he and his colleagues set out to ask what the actual risk of patient re-identification is.
Patient privacy is important, but cyber security is the biggest threat
Researchers searched PubMed, a database of scientific papers, for any reports of patient re-identification from publicly available health data, but found none. They also examined media reports from September 2016 to September 2021 and say they could not find a single instance of patient re-identification from publicly available health data. During the same time period, the health records of nearly 100 million people were stolen through data breaches, the research team noted.
“Of course, it’s good to be concerned about patient privacy and the risk of re-identification, but that risk, although it’s not zero, is minuscule compared to the issue of cyber security,” Celi says.
More widespread sharing of de-identified health data is necessary, Celi says, to help expand the representation of minority groups in the United States, who have traditionally been underrepresented in medical studies. He is also working to encourage the development of more such databases in low- and middle-income countries.
“We cannot move forward with AI unless we address the biases that lurk in our datasets,” he says. “When we have this debate over privacy, no one hears the voice of the people who are not represented. People are deciding for them that their data need to be protected and should not be shared. But they are the ones whose health is at stake; they’re the ones who would most likely benefit from data-sharing.”
Instead of asking for patient consent to share data, which he says may exacerbate the exclusion of many people who are now underrepresented in publicly available health data, Celi recommends enhancing the existing safeguards that are in place to protect such datasets.
“What we are advocating for is performing data analysis in a very secure environment so that we weed out any nefarious players trying to use the data for some other reasons apart from improving population health,” he says. “We’re not saying that we should disregard patient privacy. What we’re saying is that we have to also balance that with the value of data sharing.”