Big data is changing the way we diagnose disease

by | Jun 11, 2024

Scientists are approaching disease and diagnosis in a new way, leverage big data to provide better options for both clinicians and patients.
AI-generated image of big data being sorted.

Decades of data on human disease could change the way researchers and clinicians think about disease and diagnosis.

“Traditionally, when we go to the doctor’s, the first thing we do is try to anatomically define where it hurts,” said Dario Greco, a bioinformatician at Tampere University in Finland. Greco, along with post-doctoral researcher Lena Möbus, are challenging this body-centric view of disease.

For them, the wealth of molecular and genetic data now available is an opportunity. “If we forget for a moment where the disease happens, and we look at the molecular buildup […] and the symptoms, if we put all this information together, what do we get?” asked Greco.

The answer is a new categorization based on how a disease develops and how the body responds to it, rather than simply its symptoms or where it occurs. This entails understanding the underlying mechanisms, whether they stem from genetic mutations, infection, immune dysregulation, or other causes.

By considering these aspects alongside symptoms and anatomical manifestations, this approach seeks to provide a more comprehensive understanding of disease, potentially leading to more effective diagnoses and therapeutic strategies tailored to individual patients.

Turning to big data

It all began with an extensive collection of data. “In the last few years, our group built a knowledge graph,” said Greco, “[which is] basically […] data organized like a huge network.”

Their graph had about 60 million data points covering things like genes associated with disease and how they work in the body’s tissues and cells, as well as how they are associated with symptoms and pharmaceuticals.

From this web of data points, Möbus identified unique layers, or dimensions, through which the data could be analyzed — a task that required careful consideration and an understanding of the trade-offs involved with big data, specifically quantity versus quality. 

“We think [the layers] cover a reasonable amount of information for the diseases, but it’s also a matter of what data is available,” said Möbus. “We need to find a good compromise between how deep we want to look and how broadly we want to look.”

In a modern world awash with data, Greco says this is a fundamental question facing data science. “We are living in a moment in history where we are really fighting with ourselves against two completely different ways of looking at data,” he said. One approach is to “kill the noise by volume, on the other hand, you have the super-detailed curation of data.”  

Including more data provides a broad view of human disease and what common features or treatments to watch for and try. However, using more curated data reveals how the diseases are interacting with the immune system and the body, illuminating possible targets for novel drugs or treatments.

Grouping diseases to provide better diagnoses

The team was striving for a model that struck an important and delicate balance, providing a better way to describe what is happening in the body to produce the diseases and symptoms the doctors observe.

For this, Möbus leaned toward more curated and detailed data on the genes and molecular pathways involved in disease as well as the drugs or treatments used and what genes or systems these drugs act on.

The analysis then considers how similar or dissimilar each condition is based on these dimensions and assigns a distance value between them, corresponding to how similar they are, allowing the team to then see which conditions cluster together and why.

Surprisingly, the results revealed groupings that traditional diagnoses based on anatomical site miss. “The groups were mostly not reflecting organ systems or anatomical sites, but they were reflecting functional groups,” said Möbus.

For example, psoriasis, a common inflammatory condition of the skin, was grouped with chronic inflammatory bowel diseases. Despite occurring in different parts of the body, these conditions share similar immune and inflammatory responses and sometimes use the same drugs as treatments.

A defining line

There was also a clear split between cancer and non-cancerous diseases. “This was also very impressive for me to see that there is such a clear line apparently between clusters that contain solely cancerous diseases and clusters that solely contain non-cancerous diseases,” Möbus said. The split highlights the differences in how cancers occur and affect the body compared with other types of disease.

Even within the cancer group, the analysis continued to surprise the team with novel insights regarding the origins of specific cancers.

“We don’t see cancers of the head, we don’t see cancers of the of the gut,” explained Möbus. “We see cancers related [to] the tissue of origin.” Again, the location a tumor appears was less important, instead the analysis instead pointed out similarities in which cell types became cancerous.

“It was completely independent of the of the location where the primary cancer develops,” she said, “it was really based on the on the tissue or the cell type of origin and that was really surprising.” Understanding how different cancers originate and the common features associated with the origins provides new leads for doctors to explore regarding treatment.   

The analysis also revealed some trends that on the surface seem obvious but again point to the basic ways which diseases disrupt the body. For example, the most common genes in the analysis were immune system genes related to inflammation.

According to Greco, this makes sense because all disease is essentially tissue damage, which results in inflammation. This potentially highlights a common mechanism for all disease, and could provide a starting point for researchers struggling to understand a rare condition.

“You might infer a number of characteristics simply by looking at the similarity [rare diseases] have with more popular diseases,” Greco explained.

What would leveraging even more data do?

Greco believes this analysis shifts focus from a disease or body centric view of diagnosis and treatment toward this more holistic approach, which considers how the body responds and the specific mechanisms by which a disease produces symptoms.

This view could spark new ideas on how to treat individual patients as it pinpoints how a disease is operating rather than where and when it is happening.

Greco and Möbus hope that this work inspires others to share more data and rethink how this data is used. “Imagine how powerful our analysis and predictions would be if we could put our hands on the enormous amount of data that exists in pharma companies,” said Greco.

In the meantime, they believe there is more to squeeze from existing datasets if researchers embrace different ways of using them.

Reference: Dario Greco, et al., A Multi-Dimensional Approach to Map Disease Relationships Challenges Classical Disease Views, Advanced Science (2024). DOI: 10.1002/advs.20240175

Feature image credit: Google Deepmind on Unsplash

ASN Weekly

Sign up for our weekly newsletter and receive the latest science news.

Related posts: