Graph analysis and knowledge graphs facilitate scientific research for COVID-19

Like all epidemics, The COVID-19 pandemic is escalating in a crisis of an individual and collective nature. While crises in some ways bring people together, they also expose and emphasize systemic deficiencies and vulnerabilities. Case: Scientific research.

Science is one of the pillars on which modern society is based. The scientific method is what underpins many results, including technology and data driven decision making. However, this does not mean that science is without its own problems.

Producing results for the fight against SARS-CoV-2 (coronavirus) is one of the most pressing issues today, bringing together the entire scientific community. Tackling issues related to scientific research can help produce results under pressure.

ZDNet joined with two prominent researchers to discuss how to use the latest technology in analytics and AI, graph analysis and knowledge graphs, to facilitate scientific research for COVID-19-pandemic.

Scientific data is unreasonable and it hinders COVID-19 research

Dr. Alexander Jarasch is head of data management and knowledge management at Germany’s National Center for Diabetes Research (DZD). Jarasch notes that data is typically spread across different locations. Furthermore, data for larger organizations and for historical reasons are unreasonable – the opposite FAIR: Findable, accessible, interoperable and reusable.

“Especially in life science, we have highly connected data, very heterogeneous data, and the devices are connected in a very complicated way. And GDPR rules make working with data a little more complicated,” Dr. Jarasch.

Jarasch pointed out that coronavirus causes infectious disease, so it is especially complex. Each virus on its own has a strategy for entering the cell to reproduce and infect other cells. Research must continue as we do not yet have enough experiments available. Many events in this disease are not yet known because there is not enough data. Because of the way the virus replicates and mutates, developing a vaccine can be really complicated:

“There is not one drug that is likely to save us from everything. There are many different drugs on many different patient groups responding to one or the other treatment. I would not recommend blindly running any algorithm on any data. data points and dependencies between data points are too high for humans to handle.

Therefore, you need computer-assisted analysis or AI or other machine learning algorithms to analyze the data. Graph enables a new dimension of data analysis by helping us connect very heterogeneous data from different disciplines. We need to identify compounds in our graph to get new hypotheses and new evidence for one or the other problem. “


The COVID-GRAPH project is a voluntary initiative by graph enthusiasts and businesses aimed at building a knowledge graphic with relevant information about COVID-19 and the SARS-CoV-2 virus. (Image: Sebastian Mueller / Works)

Dr. Jarasch is involved in COVID GRAPH project. This is a voluntary initiative by graph enthusiasts and companies that aims to build a knowledge graph with relevant information on COVID-19 and the SARS-CoV-2 virus. As he pointed out, it includes about 44,000 publications, mostly from pre-printed servers:

“This is a good example because no one can ever read all these articles, understand them, analyze them, and bring them together in a way that makes sense. Then we have coronavirus-relevant patents, case studies, genes, functions, molecular data, and every day there are multiple data sources that need to be integrated. “

COVID GRAPH brings together a diverse team of scientists, developers, computer scientists and more than seven companies. It is mainly intended for researchers in health care or life sciences, but it may also be of interest to others. It is publicly available, free of charge, and soon it can also help researchers studying other diseases potentially linked to coronavirus.

The goal is to provide sources of information linked through the basic units of the biomedical domain: genes, proteins and their functions. Gathering black data can reveal previously unnoticed connections, and this is where knowledge graphs provide benefits.

Creating data FAIR with knowledge graphs

Producing data FAIR is key to facilitating scientific research in general and coronavirus research in particular. This is also a key target for Open Research Knowledge Graph (ORKG) project. ORKG aims to describe research articles in a structured way, which makes them easier to find and compare.

Dr. Sören Auer is the director of TIB, Leibniz Science and Technology Information Center and University Library. TIB acts as the German National Library of Science and Technology. Dr. Auer is a computer science professor with many contributions in knowledge graph research and is a leading ORKG.

Dr. Auer identified two key issues in scientific research. First, the integration and semantic representation of heterogeneous data on patients, diseases, medications, clinical trials, etc. Second, to represent advanced species from papers in a more comparable and reproducible way.


the Open Research Knowledge Graph (ORKG) the project works with technologies for open graphs on research knowledge.

Knowledge graphs help capture the meaning of data, information and knowledge. Knowledge graphs are a technology that is enjoying its hype moment now, has been around for about 20 years, and is here to stay. It enables the interconnection, interconnection and integration of heterogeneous data from different sources in different formats, modalities, structure levels and government schemes.

As a result, the effort required to prepare and integrate data to answer specific research questions is dramatically reduced, and AI techniques can be easily applied. ORKG focuses on representing scientific contributions from articles semantically. This makes it easier to compare differences and similarities between different approaches by juxtaposing them into table views or domain-specific visualizations.

Dr. Auer pointed to one example of representation and comparison of R0 reproductive numbers of SARS-CoV-2 from several publications. In epidemiology, R0 basic reproduction number for an infection can be considered as the expected number of cases directly generated by a case in a population where all individuals are susceptible to infection. R0 expresses how quickly infections spread.

R0 is a key parameter used in epidemiology models and publications, and comparing it across different publications can help researchers be aware of the underlying assumptions about different models. Visualizations offered by ORKG give a quick overview of different studies without having to read and compare them manually. This is infinitely more scalable.

Also in COVID GRAPH there are two aspects. One is the database itself that stores the connected data. there is also one GUI through which users can query and examine data. Having the result of a query is just the beginning of interactive browsing and discovering new things associated with the result.

Unite forces

Knowledge graphs can be stored in any back end from files to relational databases or document stores. But since they are graphs, it makes sense to store them in one graphical database. This greatly facilitates storage and retrieval as graph databases offer specialized structures, APIs and query languages ​​tailored to graphs.

Graph databases are available in two main flavors, depending on which graph model they support: property graph and RDF. In general, RDF graph databases emphasize semantics and interoperability, while feature graph databases emphasize usability and performance.

Work is ongoing on bridging the two approaches in the graph database community, and we have been actively involved in it. So when it comes to scientific research, especially in a time of crisis, we would expect to see them join forces to build on this momentum. We were not disappointed.

Auer and Jarasch not only eagerly agreed to provide an overview of their efforts, but they also did a joint performance in an online Meetup to elaborate further. There is a common goal (light scientific research for COVID-19) and a common approach (using graph analysis and knowledge graphs). The focus is on describing and structuring publications semantically.

Like Dr. Jarasch noted, a property graph is a bit different than a knowledge graph, in the sense that you store properties on nodes and edges that you can query. In a knowledge graph, you can integrate more knowledge as you create new relationships between nodes that have specific evidence attached to them.


In response to the COVID-19 pandemic, Allen Institute for AI You have partnered with leading research groups to prepare and distribute the COVID-19 Open Research Dataset (CORD-19). This is just one of the data sets included in COVID GRAPH.

Like Dr. Jarasch said:

“COVID GRAPH is, I would say, a bit of both. It’s more a knowledge graphic than a feature graphic, but as we integrate basic entities like genes, proteins and transcripts and clinical trials, I would also say that this is part of I would say that the answer is both dependent on what you ask.

We have publications and patents, and some text extracts from various sources. They need to be structured in a way that you connect the elements that belong together. On the other hand, you divide larger chunks of text into parts that make sense, and then step by step analyze semantically and annotate the texts and connect them to the various entities. “

Dr. Auer noted that property graph technology can be a basis for building knowledge graphs:

“We use a feature graphic as a base, but equip it with unique URI identifiers, vocabulary as well as RDF export and SPARQL query facilities. To make it easier to distribute knowledge integration, we need to build on W3C semantic technology standards such as URIs, RDFs, OWL, SPARQL, etc. “

ORKG is looking for partners to help develop domain-specific showcases, especially for virology and epidemiology. The plan is to create domain-specific knowledge observatories that represent the current technology in a particular field and allow researchers to get a quick overview. ORKG is open source, open data and open knowledge, and Dr. Auer noted that they are happy to participate in the collaboration.

COVID GRAPH is currently integrating multiple data sources such as clinical trials and connecting devices from potentially related diseases such as diabetes, cancer or lung disease. Other action points run pattern finding algorithms to find new patterns or relationships and work more on the GUI and user experience page. There is a public chat forum where you can get involved or contact the team.

Source link