An Introduction to Graph Data

Diego Lopez Yse
3 min readFeb 9, 2023
Photo by Shannon Potter on Unsplash

Our world is a complex web of connections. From molecules to genes, from people to organizations, our reality can be considered a connection of components. Everything is connected, meaning everything can be represented as components related to other components.

In the Machine Learning world, this representation can be done using Graphs. A Graph intends to accumulate and express knowledge of the real world, using nodes to represent entities of interest and edges to represent relations between these entities.

The anatomy of Graphs

Nodes are the elements that create the network. They could represent houses, locations, airports, ports, bus stops, buildings, users, and anything you could represent as being connected to similar elements in a network.

Edges are connections between the nodes. They could represent streets, flights, bus routes, a connection between two users in a social network, or anything that could represent a connection between the nodes in the context you are working with.

Node A and Node B here are two different entities. These nodes are connected by an edge that represents the relationship between the two nodes. Source Analytics Vidhya

Nodes and edges are diverse, and they may represent physical networks such as electrical circuits, roadways, or organic molecules. They can also represent less tangible interactions like ecosystems, sociological relationships, databases, or a control flow in a computer program.

Graphs are a simple but powerful way of describing how things connect.

A triple is the most basic knowledge graph model you can build with two nodes and one edge explaining their connection. Often, the triple is shown as either “subject-predicate-subject” or “subject-predicate-object.” That is, an entity (subject) can be associated with another entity or with a simple value (an object) through some property (a predicate). For example, the triple “Columbia University is located in NYC” connects the subject “Columbia University” and the object “NYC” using the “located in” predicate.

Whatsmore, nodes and relationships can have labels, and nodes also can have attributes (or properties). Another way to view nodes, relationships, and attributes is through grammar. A node is a noun. A relationship is a verb. Attributes for nouns are like adjectives, and attributes for relationships are like adverbs.

An example of a triple viewed through grammar. Source: Content Rules

This way, nodes that represent entities can:

  • Contain zero or more properties, key-value pairs representing entity data such as price or date of birth.
  • Have zero or more labels, which declare the node’s purpose in the graph, such as representing customers or products.

Edges that represent how entities interrelate:

  • Have a type, such as bought or liked.
  • Have a direction going from one node to another (or back to the same node).
  • Can contain zero or more properties, which are key-value pairs representing some characteristic of the link, such as a timestamp or distance.
  • Never dangle: there is always a start and end node (which can be the same node).

Nodes, relationships, properties, and rules can be used to assemble sophisticated, high-fidelity graph data models.

Improving information discovery

Finding connections between data points is a natural and powerful way of making information discoveries. Graphs and graph theory are amazing tools in their own right for modeling and analyzing data.

Graph data models can uniquely represent complex, indirect relationships in a way that is both human-readable and machine friendly.

Contrary to relational databases, a graph database stores nodes and relationships instead of tables or documents. Data is stored without restricting it to a pre-defined model, allowing a very flexible way of thinking about and using it.

References

  • Handbook of Graph Theory (Gross, Yellen, Zhang)
  • Knowledge Graphs (Barrasa, Hodler, Webber)

Interested in these topics? Follow me on Linkedin or Twitter

--

--