NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. It is usually imported with the nx abbreviation.
We need to install the networkx package.
Note: if you have installed geopandas, you most likely also installed networkx already, as one of its dependency.
If you have Anaconda installed, open the Anaconda Prompt and type in:
conda install -c conda-forge networkx
If you have standalone Python3 and Jupyter Notebook install on Linux, open a command prompt / terminal and type in:
pip3 install networkx
The netwrokx package is a module which you can simply import. It is usually aliased with the nx
abbreviation:
import networkx as nx
NetworkX supports 4 type of graphs:
Graph
DiGraph
MultiGraph
MultiDiGraph
Creation of a new, empty graph is straightforward:
import networkx as nx
graph = nx.Graph() # undirected, simple graph
To represent the graphs, two data structures as very common practices are well-known. One has a purely arithmetic representation (adjacency matrix), and the other has a mixed arithmetic and chain representation (edge list or neighborhood list).
In graph theory and computer science, an adjacency matrix is a square matrix. Its elements indicate whether pairs of vertices are adjacent in the graph or not.
The edge list is a data structure used to represent a graph as a list of its edges for each vertices. The internal data structures of NetworkX is based on the adjacency list representation and implemented using Python dictionary data structures.
We can add nodes and edges to a graph:
graph.add_node(1)
graph.add_node(2)
graph.add_node(3)
graph.add_node(4)
graph.add_node(5)
graph.add_node(6)
graph.add_node(7)
graph.add_node(8)
graph.add_edge(1, 2)
graph.add_edge(1, 3)
graph.add_edge(1, 4)
graph.add_edge(2, 3)
graph.add_edge(2, 5)
graph.add_edge(2, 6)
graph.add_edge(3, 6)
graph.add_edge(4, 5)
graph.add_edge(4, 7)
Adding an edge to a non-existing node will also create that particular node:
graph.add_edge(1, 9)
NetworkX has tight integration with matplotlib, therefore visualization of a graph can be done easily.
import matplotlib.pyplot as plt
# Special Jupyter Notebook command, so the plots by matplotlib will be displayed inside the Jupyter Notebook
%matplotlib inline
nx.draw_networkx(graph)
plt.show()
Let's use the following basic dataset of airroutes flight data:
The dataset is given in the flights.csv
file in the data
folder. The used delimiter is the semicolon (;
) character.
Parse the CSV file into a pandas DataFrame:
import pandas as pd
flight_table = pd.read_csv('../data/flights.csv', delimiter = ';')
display(flight_table)
NetworkX has a from and to conversion for pandas DataFrames. Assuming all airroutes are bi-directional, build an undirected graph:
flight_graph = nx.from_pandas_edgelist(flight_table, 'From city', 'To city')
plt.figure(figsize=[15,8])
nx.draw_networkx(flight_graph, node_color = 'lightgreen')
plt.show()
You can define the type of the graph with the optional create_using
parameter. Its default value is Graph
.
nx.from_pandas_edgelist(flight_table, 'From city', 'To city', create_using = nx.DiGraph)
As an alternative solution a CSV file can be processed line-by-line with the built-in csv Python package:
import csv
flight_graph = nx.Graph()
csv_file = open('../data/flights.csv')
csv_reader = csv.reader(csv_file, delimiter=';')
next(csv_reader, None) # skip header line
for row in csv_reader:
print('Reading flight {0} <=> {1}, distance: {2}km'.format(row[0], row[1], row[2]))
flight_graph.add_edge(row[0], row[1])
csv_file.close()
plt.figure(figsize=[15,8])
nx.draw_networkx(flight_graph, node_color = 'lightgreen')
plt.show()
Closing an opened file is easy to forget and a common programmer mistake. Use the with
statement, which will automatically close the file (if it was successfully opened):
flight_graph = nx.Graph()
with open('../data/flights.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=';')
next(csv_reader, None) # skip header line
for row in csv_reader:
#print('Reading flight {0} <=> {1}, distance: {2}km'.format(row[0], row[1], row[2]))
flight_graph.add_edge(row[0], row[1])
plt.figure(figsize=[15,8])
nx.draw_networkx(flight_graph, node_color = 'lightgreen')
plt.show()
print('Number of nodes: {0}'.format(flight_graph.order()))
print('Number of edges:{0}'.format(flight_graph.size()))
print('Degrees of the nodes: {0}'.format(flight_graph.degree()))
For directed graphs, there is also in_degree
and out_degree
defined.
for node in flight_graph.nodes:
print(node)
Note: iterating through the graph itself (flight_graph
) is the same.
for from_node, to_node in flight_graph.edges:
print("{0} <=> {1}".format(from_node, to_node))
for neighbor in flight_graph.neighbors('Budapest'):
print(neighbor)
Pay attention that it is written as neighbors
(American English) and NOT neighbours
(British English).
if flight_graph.has_node('Budapest'):
print('The Budapest node exists.')
if flight_graph.has_edge('Budapest', 'Paris'):
print('The Budapest <=> Paris edge exists.')
Attributes (metadata) can be assigned to the nodes and edges of a graph.
When creating the graph from a pandas DataFrame, the 4th parameter of the from_pandas_edgelist
function defines which Series (columns) of the DataFrame shall be added to the edges as attributes. If True
, all the remaining columns will be added. If None
, no edge attributes are added to the graph. Its default value is None
.
flight_graph = nx.from_pandas_edgelist(flight_table, 'From city', 'To city', ['Distance'])
plt.figure(figsize=[15,8])
nx.draw_networkx(flight_graph, node_color = 'lightgreen')
plt.show()
Optional: when building a graph "manually", the node and edge attributes can be passed to the add_node
an add_edge
methods.
flight_graph = nx.Graph()
with open('../data/flights.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=';')
next(csv_reader, None) # skip header line
for row in csv_reader:
print('Reading flight {0} <=> {1}, distance: {2}km'.format(row[0], row[1], row[2]))
flight_graph.add_edge(row[0], row[1], dist = row[2])
The metadata, called the weight of an edge can be queried then:
print('Metadata for the Budapest <=> Paris edge: {0}'.format(flight_graph['Budapest']['Paris']))
print('Metadata for all edges from Budapest: {0}'.format(flight_graph['Budapest']))
Breadth-first search (BFS) is an algorithm for traversing or searching a graph. It starts at some arbitrary node of a graph, and explores all the neighbour nodes at the present depth prior to moving on to the nodes at the next depth level.
The breadth-first search traversal can be implemented with a queue data structure (see Chapter 7).
As a showcase, let's request a starting city from the user and a number of maximum flights. Calculate which cities can be reached!
Handle the case of a not existing starting city.
from collections import deque
start_city = input('Start city: ')
flight_count = int(input('Max number of flights: '))
# Check existence of start city
if flight_graph.has_node(start_city):
ready_list = []
process_queue = deque([(start_city, 0)])
# Process until queue is empty
while len(process_queue) > 0:
# Move first item of process queue to ready list
process_item = process_queue.popleft()
process_city, process_dist = process_item
ready_list.append(process_item)
# NOTE: if process_dist > flight_count, we can halt the algorithm here,
# all reachable cities are in the ready list
#if process_dist > flight_count:
# break
# "Expand" the processed node: add its neighbors to the process queue
for neighbor_city in flight_graph.neighbors(process_city):
# Only add neighbors which are not already in the ready list or the process_queue
found = (neighbor_city in [city for city, dist in process_queue] or
neighbor_city in [city for city, dist in ready_list])
if not found:
process_queue.append((neighbor_city, process_dist + 1))
# Display results
for city, dist in ready_list:
if dist <= flight_count:
print(city)
else:
print('{0} city is unknown' % start_city)
NetworkX contains several traversal algorithms out of the box, so we don't need to reimplement them.
start_city = input('Start city: ')
flight_count = int(input('Max number of flights: '))
# Check existence of start city
if flight_graph.has_node(start_city):
reachable_cities = [ start_city ]
# Do breadth first search
successors = nx.bfs_successors(flight_graph, start_city, flight_count - 1)
for item in successors:
print('{0} -> {1}'.format(item[0], item[1]))
reachable_cities += item[1]
print('Reachable cities: {0}'.format(reachable_cities))
else:
print('{0} city is unknown'.format(start_city))