Unit 4: Graph 2 (Shortest Path, Minimum Spanning Tree)

📌

Algorithms Part II - Princeton University (Coursera)

DFS & BFS Recap


# goal: systemaltically search through a graph
# idea: mimic maze exploration

DFS (to visit a vertex v):
	Mark vertex v as visited
	Recursively visit all unmarked vertices adjacent to v


# compute shortest paths (fewest number of edges) from s to all other N vertices
# time complexity: O(E+V)

BFS (from source vertex s):
	Put s onto a FIFO queue and mark s as visited
	Repeat until the queue is empty:
		Remove the least recently added vertex v
		Add each of v's unvisited neighbors to the queue
		Mark them as visisted

Connected Components

A connected component is a maximal set of connected vertices


Connected Components:
	Initialize all vertices v as unmarked
	For each unmarked vertices:
		Run DFS (to identify all vertices discovered as part of the same components):
			Mark vertex v as visited
			Mark vertex v as belonged to the current component (Components_Num)
			Recursively visit all unmarked vertices adjacent to v
		Components_Num++ # number of components

Digraph

DFS enable direct solution of simple digraph problems

Reachability
Path finding
Topological sort
Directed cycle detection

Reachability

Program control-flow analysis: there are many usages in program compilation and runtime

Mark-sweep algorithm

Mark: mark all reachable objects
Sweep: if object is unmarked, it is garbage (so add to free list)

BFS is used to find the shortest paths.

It is useful in web crawling

Minimum Spanning Tree

File

MST size (# of edges) is

📌

MST is fundamental problem with diverse applications.

・Dithering. ・Cluster analysis. ・Max bottleneck paths. ・Real-time face verification. ・LDPC codes for error correction. ・Image registration with Renyi entropy. ・Find road networks in satellite and aerial imagery. ・Reducing data storage in sequencing amino acids in a protein. ・Model locality of particle interactions in turbulent fluid flows. ・Autoconfig protocol for Ethernet bridging to avoid cycles in a network. ・Approximation algorithms for NP-hard problems (e.g., TSP, Steiner tree). ・Network design (communication, electrical, hydraulic, computer, road).

There are many algorithms to compute the MST

Greedy MST Algorithm

Greedy algorithms seek to find the best local solution (in the short term)

Cut edges and choose the edge with the minimum weight

Kruskal’s Algorithm

It is a special case of greedy algorithm


Kruskal Algo (weighted graph G):
	Sort edges in G ascending order of weight
	For each edge in the ascending order:
		Add next edge to tree T unless doing so would create a cycle
	Return T

How to find if adding an edge e would create a cycle? The efficient solution is to use union-find data structure

Maintain a set for each connected component in
If and are in the same set, then adding would create a cycle
To add to , merge sets containing and

Java implementation

Queue for holding the MST edges
Priority Queue (Minimum Queue) for sorting the edges
Union-Find for maintaining the connected components (prevent cycles)

Running time is in the worst case

Prim’s Algorithm

It is a special case of greedy algorithm


Prim (weighted edge graph G):
	Start with vertex 0 and greedily grow tree T
	Add to T the min weight edge with exactly one endpoint in T
	Repeat until V-1 edges
	Return T

How to find if adding an edge e has the min weight with exactly one endpoint in T? There are two solutions/implementations for that:

Lazy implementation
Eager implementation

Lazy Implementation


Lazy Prim (weighted edge graph G):
	Start with vertex 0 and greedily grow tree T
	Create a Priority Queue (PQ) sorted by the edges weights

	Add all edges adjacent to the vertex v to the queue

	From the PQ, add to T the min weight edge with exactly one endpoint in T
	If e has more than one endpoint in T 	then delete it

	Repeat until V-1 edges

	Return T

Java Implementation

Queue for holding the MST edges
Priority Queue (Minimum Queue) for sorting the edges
Boolean Matrix for marking visited vertices

Running Time: it computes the MST in time proportional to

Space Cost: it needs extra space proportional to

Eager Implementation

In this implementation we need to create an indexed PQ (special implementation of PQ)

📌

Prim’s algo can be used to find the Euclidean MST in runtime

📌

MST is used in clustering (K-Clustering & Decision Tree)

Shortest Paths

Shortest Paths algorithms work in Directed Weighted Graphs

Variants:

Source-sink: from one vertex to another
Single source: from vertex to every other
All pairs: between all pairs

Restrictions on edge weights:

Non-negative weights
Arbitrary weights
Euclidean weights

Single-source shortest paths API

Goal: find the shortest path from s to every other vertex


public class SP {
	SP(EdgeWeightedDigraph G, int s) {...}
	double distanceTo(int v) {...}
	Iterable<DirectedEdge> pathTo(int v) {...}
}

Basic SSSP algorithms:

Bellman-Ford:

Slow and steady
Works on any graph, any weight
Detects negative weight cycles

Dijkstra

Faster
No negative edge weights

Directed Acyclic Graph (DAG)

Negative edge weights is okay

Edge Relaxation

For start vertex s:

If you find a path from s → u of total weight x, the shortest path from s → u has no more than x

If there is a path from s → u of total weight x, and an edge form u → v of weight y, there is a path from s → v of total weight x+y

This is called Relaxing the Edge (meaning: relax, there are no edges shorter than that)


Relax(e):
	u = e.from()
	v = e.to()
	If (u.d + weight(e) < v.d):
		v.d = u.d + weight(e)
		v.parent = u

Generic Single-Source Shortest Path Algorithm

The following is a pseudocode represents a generalization of SSSP algorithm


Generic SSSP Algorithm(G, s):
	SSSP Initialize(G, s)
	For each edge e in G (possibly repeating edges) in same order:
		Relax(e) // find shorter valid path to each vertex
              

Relax(e):
	u = e.from()
	v = e.to()

	// don't change the edge endpoints' d and parent values if 
	// the new path is longer or the same distance
	If (u.d + weight(e) < v.d):
		v.d = u.d + weight(e)
		v.parent = u

SSSP Initialize(G, s):
	For all v in G:
		v.d = infinity    // d stores an upperbound distance estimate to reach the vertex v
		v.parent = None   // parent stores the parent node used on the path of length d
	s.d = 0

Find Path(s, v):
	If v == s:
		return new List().add(v)
	if v.parent == None:
		return None
	return Find Path(s, v.parent).add(v)

The generic algorithm builds a data structure called short path tree

Bellman-Ford Algorithm

An SSSP algorithm that:

Is slow and steady

It has a time complexity of O(EV) (worst than Dijkstra)
This makes it not ideal for most SSSP problems

Works on any graph, any edge weight
Detects reachable negative weight cycles (which Dijkstra’s algorithm cannot). In the following figure S→G→H is a reachable negative cycle and the others are not.

This algorithm runs through 2 iterations (for each vertex in the graph):

First iteration: apply relaxation function on each edge

Second iteration: detect negative cycles and mark the nodes that are part of these cycles

Let’s implement the Bellman-Ford algorithm based on the generic SSSP algorithm


	
SSSP Initialize(G, s):
	For all v in G:
		v.d = infinity    // d stores an upperbound distance estimate to reach the vertex v
		v.parent = None   // parent stores the parent node used on the path of length d
	s.d = 0


Relax(e, NegativeCycle):
	u = e.from()
	v = e.to()
	
		
	// don't change the edge endpoints' d and parent values if 
	// the new path is longer or the same distance
	If (u.d + weight(e) < v.d):
		
		// if relaxation is used to detect nodes in negative cycles
		If NegativeCycle == True:
			v.d = Negative Infinity
		Else:	
			v.d = u.d + weight(e)
			v.parent = u



Bellman-Ford Algorithm(Digraph G, s):
	SSSP Initialize(G, s)

	For each vertex v in G:
		For each edge e in G:
			Relax(e, False) // find shorter valid path to each vertex
	
	// run algorithm a second time to detect which nodes
	// are part of a negative cycle. A negative cycle has occured if we 
	// can find a better path beyond the optimal solution
	For each vertex v in G:
		For each edge e in G:
			Relax(e, True)
		
	// build an array containing the shortest distances to every node
	distances = []
	For each vertex v in G:
		distances.add(v.d)

	return distances