Your Guide to Knowledge Graphs

All you need to know about Knowledge Graphs

Diego Lopez Yse
7 min readAug 14, 2023
Photo by Alina Grubnyak on Unsplash

Imagine you could weave information into a living tapestry of interconnected insights: a web where data comes alive with context and understanding. As our world becomes increasingly connected, the traditional methods of organizing and navigating data fall short of capturing the intricate web of relationships that govern knowledge. With this article, I invite you to enter the world of knowledge graphs — an approach that transcends the boundaries of conventional databases, paving the way for a holistic representation of information.

Knowledge graphs (KGs) organise data from multiple sources, capture information about entities of interest in a given domain or task (like people, places or events), and forge connections between them. In Data Science and Artificial Intelligence (AI), KGs are commonly used to:

  • Facilitate access to and integration of data sources;
  • Add context and depth to other, more data-driven AI techniques such as machine learning; and
  • Serve as bridges between humans and systems, such as generating human-readable explanations, or, on a bigger scale, enabling intelligent systems for scientists and engineers.

Companies like Astrazeneca use KGs to harness vast networks of scientific data to give our scientists the information they need about genes, proteins, diseases and drugs, and their relationships — how they interact, work together, or work against each other. Amazon is another example of a company using KGs to represent the hierarchical relationships among products; the relationships between creators and content; and information for their question-answering service, improving search, recommending products, and infering missing information.

KGs can be created from scratch, e.g., by domain experts, learned from unstructured or semi-structured data sources, or assembled from existing KGs, typically aided by various semi-automatic or automated data validation and integration mechanisms. They can be a self-contained unit living in a single graph data store, or they can involve several coordinated graph stores forming a federation of graphs.

The anatomy of knowledge graphs

KGs are also known as semantic networks, representing a network of real-world entities — i.e. objects, events, situations, or concepts — and illustrating the relationship between them. This information is usually stored in a graph database and visualized as a graph structure, prompting the term knowledge “graph.”

We’ve detailed in another article how graphs are made up of two main components: nodes and edges. Any object, place, or person can be a node. An edge defines the relationship between the nodes.

Graphs are a simple but powerful way of describing how things connect. But what makes KGs different from simple graphs?

What transforms a graph into a KG is the application of an organizing principle that helps human and software users to understand it. Sometimes this is loftily called semantics, but we just think of it as making the data smarter.

Hetionet is an example of a KG, that combines information from 29 public databases. The network contains 47,031 nodes of 11 types and 2,250,197 edges of 24 types. Source: Hetionet

KGs are a specific type of graph with an emphasis on contextual understanding. They are interlinked sets of facts that describe real-world entities, events, or things and their interrelations in a human- and machine-understandable format. KGs use an organizing principle so that a user (or a computer system) can reason about the underlying data. The organizing principle gives us an additional layer of organizing data (metadata) that adds connected context to support reasoning and knowledge discovery. The organizing principle makes the data itself smarter, rather than locking away the tools to understand data inside application code.

Differences between a graph, a KG, and applying inference to a KG. Source Ontotex

Today, KGs are used extensively in anything from search engines and chatbots to product recommenders and autonomous systems. Some of the most popular ones are:

  • DBPedia and Wikidata are two different KGs for data on Wikipedia.org. DBPedia is comprised of data from the infoboxes of Wikipedia while Wikidata focuses on secondary and tertiary objects. Both typically publish in a RDF format.
  • Google Knowledge Graph is represented through Google Search Engine Results Pages (SERPs), serving information based on what people search. This KG is comprised of over 500 million objects, sourcing data from Freebase, Wikipedia, the CIA World Factbook, and more.

KGs are agnostic on the physical storage of the underlying data and support different types of architectural approaches, from the more virtualized ones where the KG is a smart index over externally stored data to the fully materialized ones where the external data is fully replicated in a graph platform — and any hybrid approach in between the two.

It’s all about semantics

A KG organises and integrates data according to an ontology, which is called the schema of the KG, giving the possibility to apply a reasoner to derive new knowledge.

An ontology is a formal description of knowledge as a set of concepts within a domain and the relationships that hold between them. It ensures a common understanding of information and makes explicit domain assumptions thus allowing organizations to make better sense of their data. There are other methods that use formal specifications for knowledge representation such as vocabularies, taxonomies, thesauri, topic maps and logical models. However, unlike taxonomies or relational database schemas, ontologies express relationships and enable users to link multiple concepts to other concepts in a variety of ways.

A KG is created when you apply an ontology (our data model) to a set of individual data points (our book, author, and publisher data). In other words: ontology + data = KG. Source: Enterprise Knowledge

Ontologies provide users with the necessary structure to link one piece of information to other pieces of information on the Web of Linked Data. Because they are used to specify common modeling representations of data from distributed and heterogeneous systems and databases, ontologies enable database interoperability, cross-database search and smooth knowledge management.

Ontologies make knowledge actionable. They enable human or software agents to carry out sophisticated tasks.

Ontologies are classification schemes that describe the categories in a domain and the relationships between them, and are not restricted to just the hierarchical (broader-narrower) structures. They allow for the definition of more complex types of relationships between categories, such as part_of, compatible_with, or depends_on. They also allow for the definition of hierarchies of relationships and for further characterization of relationships (transitive, symmetric, etc.). Following the instructions in an ontology, we can explore the categories in a domain not just vertically (hierarchically) but also horizontally, where we can address cross-cutting concerns.

There are several standard ontologies in existence that service a variety of domains, such as SNOMED for clinical documentation and reporting, Financial Industry Business Ontology (FIBO) for finance and business, and Schema.org and Dublin Core for general-purpose web resource annotation, to name just a few of the most popular ones. The Web Ontology Language (OWL) is an example of a widely adopted ontology, that is supported by the World Wide Web Consortium (W3C), an international community that champions open standards for the longevity of the internet.

Speaking the same common language in your KG will simplify communication, and in some cases, even be mandatory to use a specific standard for things like regulatory reporting.

When knowledge graphs meet Machine Learning

While the native representation of a KG is high-dimensional, bringing high computation and space costs, there are methods to project the information into a lower-dimensional latent space that best preserves the graph structure to perform tasks more efficiently. Graph embedding methods convert graph data into a low dimensional space where the structural information and properties are maximumly preserved.

Graph embeddings are a special type of algorithm tha can encode the topology of a KG (its nodes and relationships) into a structure suitable for consumption by ML processes. We use these when we know important data exists in the graph, but it’s unclear which patterns to look for and we’d like the ML pipeline to do the heavy lifting of discovering patterns. KG embeddings can be used in conjunction with queries and algorithms to enrich ML input data to provide additional features.

Graph embeddings create numerical representations of your specific graph data. Source: Neo4j

Graph embeddings are very useful because rather than running multiple algorithms to describe specific aspects of our graph topology, we can use graph structure itself as a predictor. Graph embeddings expand our predictive capabilities, but they typically take longer to run and have more parameters to tune than other graph algorithms. If we know what elements are predictive, we use queries and algorithms for feature engineering in ML. If we don’t know what is predictive, we use graph embeddings. Both are good ways to improve decisioning graphs.

Graph ML is often used for KG completion to predict missing data and relationships.

Final thoughts

AI applications can benefit from KGs because they provide a structured, interconnected representation of data. This allows AI algorithms to better understand and make use of the information they are working with. By providing an interconnected network of relationships between different entities, KGs enable AI applications to better comprehend the context and relationships between data points. They can also make AI systems more explainable to its users.

Imagine trying to find information about a specific person. A KG can provide a quick overview of that person’s background, relationships, and relevant facts without having to search through countless pages of unorganized information. KGs are a key component of digital assistants and search engines, and they contribute to a wide range of AI applications, including link prediction, entity relationship prediction, recommendation systems, and question answering systems.

KGs can also provide complementary, real-world factual information to augment limited labeled data to train ML algorithms.

Interested in these topics? Follow me on Linkedin or Twitter

--

--