Machine learning tool joins the battle against “long COVID”
Long COVID, a condition in which patients experience chronic symptoms from the initial COVID-19 infection, has become a pandemic within the pandemic, and researchers are using machine learning to discover why some people develop debilitating long-lasting symptoms.
A team of US researchers has developed a machine learning tool that analyses electronic health records (EHRs) to identify common symptoms and define subtypes of long COVID. The research, published in eBioMedicine, also found strong correlations between long COVID subtypes and pre-existing conditions like diabetes and hypertension.
According to Justin Reese, a Computer Research Scientist at Berkeley Lab's Biosciences Area, this research can improve our understanding of long COVID and enable more effective treatments by helping clinicians create tailored therapies for different groups. For example, the best treatment for patients experiencing nausea and abdominal pain might differ from treatment for those suffering from a persistent cough and other lung symptoms.
The team validated their software using EHR information from 6,469 long COVID patients who had confirmed COVID-19 infections.
“Basically, we found long COVID features in the EHR data for each long COVID patient, and then assessed patient-patient similarity using semantic similarity, which essentially allows ‘fuzzy matching’ between features – for example, ‘cough’ is not the same as ‘shortness of breath,’ but they are similar since they both involve lung problems,” says Reese. “We compare all symptoms for the pair of the patients in this way, and get a score of how similar the two long COVID patients are. We can then perform unsupervised machine learning on these scores to find different subtypes of long COVID.”
Machine learning adapts to different EHR systems
The researchers applied machine learning to patient-patient similarity scores to cluster patients into groups. These groups were characterised by analysing relationships between symptoms and pre-existing diseases, as well as other demographic features like age, gender, and race.
According to Reese and his colleagues, the tool will be useful for researchers because the machine learning approach is adaptable to different EHR systems, allowing researchers to gather data from various medical establishments.
This research builds on previous work to create the Human Phenotype Ontology, an open-access database and research tool that provides a standardised vocabulary of symptoms and features found in all human diseases. The latest work was funded by the National COVID Cohort Collaborative.