Make Interactive Knowledge Graphs with Python
Create an interactive knowledge graph using NetworkX and Plotly, and explore different layouts
We already detailed how to build a knowledge graph (KG) and perform basic analysis. Now, let’s make it interactive using NetworkX and Plotly.
Create a knowledge graph
First, we define the data that represents the relationships in our KG. We have three lists: head
, relation
, and tail
, which represent the starting entity, the relationship between entities, and the ending entity, respectively. We create a dataframe from the defined lists, and use NetworkX to create a graph representation of the relationships.
import pandas as pd
import networkx as nx
import plotly.graph_objects as go
# Define the heads, relations, and tails
head = ['drugA', 'drugB', 'drugC', 'drugD', 'drugA', 'drugC', 'drugD', 'drugE', 'gene1', 'gene2','gene3', 'gene4', 'gene50', 'gene2', 'gene3', 'gene4']
relation = ['treats', 'treats', 'treats', 'treats', 'inhibits', 'inhibits', 'inhibits', 'inhibits', 'associated', 'associated', 'associated', 'associated', 'associated', 'interacts', 'interacts', 'interacts']
tail = ['fever', 'hepatitis', 'bleeding', 'pain', 'gene1', 'gene2', 'gene4', 'gene20', 'obesity', 'heart_attack', 'hepatitis', 'bleeding', 'cancer', 'gene1', 'gene20', 'gene50']
# Create a dataframe
df = pd.DataFrame({'head': head, 'relation': relation, 'tail': tail})
# Create a graph
G = nx.Graph()
for _, row in df.iterrows():
G.add_edge(row['head'], row['tail'], label=row['relation'])
Node positioning
Next, we define node positioning, a crucial aspect of graph visualization. Here, we determine where nodes (entities) are placed on the canvas, making the graph more visually organized and readable. In this example, we’ll use the Fruchterman-Reingold layout algorithm, a force-directed layout, to position nodes.
# Get positions for nodes
pos = nx.fruchterman_reingold_layout(G, k=0.5)
The fruchterman_reingold_layout
function takes the graph and a parameter k
, which controls the optimal distance between nodes. A smaller k
value makes nodes more spread out, while a larger value makes them more compact. Adjusting this parameter can influence the overall layout density.
Creating Traces and Defining Layout
In this step, we create traces for edges, nodes, and edge labels, and define the layout for the Plotly figure.
- Create Edge Traces: we create an edge trace for each edge in the graph using Plotly’s Scatter plot. These traces consist of coordinates representing the start and end points of edges. This creates a visual representation of the connections between nodes.
- Create Node Trace: we create a trace for nodes using a scatter plot. Nodes are shown as markers with text labels, and the Fruchterman-Reingold layout algorithm determines their positions. This step establishes the appearance of nodes and their positions in the graph.
- Create Edge Label Trace: another trace is created for edge labels. The labels are positioned at the midpoint of each edge. This provides textual information about the relationships between nodes.
- Define Layout: we define the layout for the Plotly figure. This includes specifying the title, title font size, margins, legend visibility, hover behavior, and axis visibility settings. These settings control the overall appearance and interactivity of the graph visualization.
Finally, we create a Plotly figure and use the show()
function to display the interactive graph visualization in the output.
# Create edge traces
edge_traces = []
for edge in G.edges():
x0, y0 = pos[edge[0]]
x1, y1 = pos[edge[1]]
edge_trace = go.Scatter(
x=[x0, x1, None],
y=[y0, y1, None],
mode=’lines’,
line=dict(width=0.5, color=’gray’),
hoverinfo=’none’
)
edge_traces.append(edge_trace)
# Create node trace
node_trace = go.Scatter(
x=[pos[node][0] for node in G.nodes()],
y=[pos[node][1] for node in G.nodes()],
mode=’markers+text’,
marker=dict(size=10, color=’lightblue’),
text=[node for node in G.nodes()],
textposition=’top center’,
hoverinfo=’text’,
textfont=dict(size=7)
)
# Create edge label trace
edge_label_trace = go.Scatter(
x=[(pos[edge[0]][0] + pos[edge[1]][0]) / 2 for edge in G.edges()],
y=[(pos[edge[0]][1] + pos[edge[1]][1]) / 2 for edge in G.edges()],
mode=’text’,
text=[G[edge[0]][edge[1]][‘label’] for edge in G.edges()],
textposition=’middle center’,
hoverinfo=’none’,
textfont=dict(size=7)
)
# Create layout
layout = go.Layout(
title=’Knowledge Graph’,
titlefont_size=16,
title_x=0.5,
showlegend=False,
hovermode=’closest’,
margin=dict(b=20, l=5, r=5, t=40),
xaxis_visible=False,
yaxis_visible=False
)
# Create Plotly figure
fig = go.Figure(data=edge_traces + [node_trace, edge_label_trace], layout=layout)
# Show the interactive plot
fig.show()
And voilá!
Node positioning layouts
We already saw how the Fruchterman-Reingold layout models nodes as charged particles that repel each other and edges that pull connected nodes closer. This layout aims to balance repulsive forces that push nodes apart and attractive forces that pull connected nodes together, creating a proper layout for our KG. It’s well-suited for visualizing small to medium-sized graphs but not for very large graphs due to its computational complexity. Also, it’s sensitive to parameter choices, requiring tuning for optimal results.
But what about other layout models?
Kamada-Kawai Layout
This layout emphasizes the physical interpretation of nodes and edges. It models edges as springs and aims to minimize the system's energy. It considers both the topology and geometry of the graph. It’s well-suited for graphs where edge lengths and geometry matter, as it tries to preserve the underlying graph structure. On the other hand, it’s computationally expensive due to the need for distance calculations and can struggle with graphs containing long paths.
Circular Layout
In this layout, nodes are positioned evenly around a circle. It’s useful for displaying cyclic or radial relationships. It’s also suitable when node positions are less important than the overall structure. Nevertheless, it doesn’t always provide clear insights into the relative distance between nodes and might not effectively represent non-cyclic graphs.
Random Layout
This layout method assigns random positions to nodes within a specified area. While it might not have specific use cases where node positions carry meaningful information, it can serve as a baseline or initial layout for more sophisticated methods. It’s also sometimes used for visualizations where the specific node positions are less important than the overall graph structure. It’s quick, simple to implement, and useful as an initial layout for more advanced algorithms. Alternatively, it doesn’t provide meaningful information about relationships or structures, nor is it suitable for emphasizing specific graph features.
The choice of method depends on the characteristics of the graph, the visual representation goals, and the desired insights. Different methods offer different trade-offs between computational efficiency, accuracy, and the ability to capture specific structural features of the graph.
Experiment with different layouts to find the one that best highlights your graph’s relationships and conveys your intended message.