The sample dataset for this lecture is given in the airports.csv
and airroutes.csv
files in the data
folder. (The column separator is the ;
character.)
airports.csv
file contains information about (larger) airports all over the world:airroutes.csv
consists of the direct flight relations between the airports, identifying them with their IATA code. The distance of the airports / length of the flight route is also given (in miles). The flights are directed, if there is a flight between both directions of two airports, then there will be two records in the file, with opposite direction.First read the airports data into a pandas DataFrame.
import pandas as pd
airports = pd.read_csv('../data/airports.csv', delimiter = ';')
display(airports)
Note: the length of the longest runway and the elevation is given in foots.
Lets set the column iata
as the index column, so each row of data will be accessible later by indexing the airports with their IATA code.
airports.set_index('iata', inplace=True)
display(airports)
Reminder: the set_index
function can be configured to modify the index in place or return a new Dataframe with the inplace
parameter (defaults to False
).
It can also be configured to drop or keep the index column with the drop
parameter (defaults to True
).
The information of the Budapest Airport can now be accessed both by numerical and associative (string) indexing:
print('The Budapest airport by the numerical index:')
print(airports.iloc[111])
print()
print('The Budapest airport by the associative index:')
print(airports.loc['BUD'])
The number of runways the Budapest Airport can be fetched (or modified) now 4 possible ways:
print(airports.iloc[111]['runways'])
print(airports.loc['BUD']['runways'])
print(airports['runways'][111])
print(airports['runways']['BUD'])
airroutes = pd.read_csv('../data/airroutes.csv', delimiter = ';')
display(airroutes)
Note: the distance is given in miles.
NetworkX has an integrated conversion for pandas DataFrames which can be used.
Lets create a directed graph (networkx.DiGraph
) from the flights. The edges shall be weighted with the distance of the routes.
import networkx as nx
flight_graph = nx.from_pandas_edgelist(airroutes, 'from', 'to', ['distance'], create_using = nx.DiGraph)
print('Metadata for the BUD -> JFK edge: {0}'.format(flight_graph['BUD']['JFK']))
Reminder: The 4th parameter defines which Series (columns) of the DataFrame shall be added to the edges as attributes. If True
, all of the remaining columns will be added. If None
, no edge attributes are added to the graph. Its default value is None
.
NetworkX supports various shortest path algorithms:
Beside the algorithm-specific functions, NetworkX also provides a uniform interface to calculate the shortest paths from a starting point to a target (or to all):
nx.shortest_path(graph, source, target, weight, method)
The default algorithm is Dijsktra.
Calculate the path between 2 user given airports with the minimal number of transfers.
from_airport = input("From airport: ")
to_airport = input("To airport: ")
if flight_graph.has_node(from_airport) and flight_graph.has_node(to_airport):
route = nx.shortest_path(flight_graph, from_airport, to_airport)
print("Route: {0}".format(route))
length = 0
for i in range(1, len(route)):
length += flight_graph[route[i-1]][route[i]]['distance']
print("Length: {0} mi".format(length))
else:
print("Source or destination airport was not found.")
Calculate the shortest path by distance between 2 user given airports.
from_airport = input("From airport: ")
to_airport = input("To airport: ")
if flight_graph.has_node(from_airport) and flight_graph.has_node(to_airport):
route = nx.dijkstra_path(flight_graph, from_airport, to_airport, 'distance')
length = nx.dijkstra_path_length(flight_graph, from_airport, to_airport, 'distance')
print("Route: {0} ({1} mi)".format(route, length))
else:
print("Source or destination airport was not found.")
Calculate the shortest between 2 user given airports by distance, but with the following additional conditions:
def custom_distance(from_node, to_node, edge_attr):
if airports.loc[to_node]['longest'] < 8000:
return None
if airports.loc[to_node]['runways'] == 1:
return edge_attr['distance'] * 1.5
return edge_attr['distance']
from_airport = input("From airport: ")
to_airport = input("To airport: ")
if flight_graph.has_node(from_airport) and flight_graph.has_node(to_airport):
route = nx.dijkstra_path(flight_graph, from_airport, to_airport, custom_distance)
length = nx.dijkstra_path_length(flight_graph, from_airport, to_airport, custom_distance)
print("Route: {0} ({1} mi)".format(route, length))
else:
print("Source airport was not found.")
Calculate which airports can be reached from a starting, user given airport within a reach of also user given distance (in miles). Also compute the shorthest path by distance to each of them.
from_airport = input("From airport: ")
max_distance = int(input("Max distance: "))
if flight_graph.has_node(from_airport):
lengths, routes = nx.single_source_dijkstra(flight_graph, from_airport, None, max_distance, 'distance')
for to_airport in routes.keys():
print("{0} -> {1}: {2} ({3} mi)".format(from_airport, to_airport, routes[to_airport], lengths[to_airport]))
else:
print("Source airport was not found.")
Calculate which cities can be reached from a starting, user given city within a reach of also user given distance (in miles).
from_city = input("From city: ")
max_distance = int(input("Max distance: "))
from_airports = airports[airports['city'] == from_city].index
result = []
for from_airport in from_airports:
routes = nx.single_source_dijkstra_path(flight_graph, from_airport, max_distance, 'distance')
to_airports = routes.keys()
to_cities = [airports.loc[ap]['city'] for ap in to_airports]
result += to_cities
result_unique = set(result) # remove duplicates
print(sorted(result_unique)) # sort the printed result