Algorithms Part II - Princeton University (Coursera)
DFS & BFS Recap
# goal: systemaltically search through a graph # idea: mimic maze exploration DFS (to visit a vertex v): Mark vertex v as visited Recursively visit all unmarked vertices adjacent to v
# compute shortest paths (fewest number of edges) from s to all other N vertices # time complexity: O(E+V) BFS (from source vertex s): Put s onto a FIFO queue and mark s as visited Repeat until the queue is empty: Remove the least recently added vertex v Add each of v's unvisited neighbors to the queue Mark them as visisted
Connected Components
- A connected component is a maximal set of connected vertices
Connected Components: Initialize all vertices v as unmarked For each unmarked vertices: Run DFS (to identify all vertices discovered as part of the same components): Mark vertex v as visited Mark vertex v as belonged to the current component (Components_Num) Recursively visit all unmarked vertices adjacent to v Components_Num++ # number of components
Digraph
- DFS enable direct solution of simple digraph problems
- Reachability
- Path finding
- Topological sort
- Directed cycle detection
Reachability
- Program control-flow analysis: there are many usages in program compilation and runtime
- Mark-sweep algorithm
- Mark: mark all reachable objects
- Sweep: if object is unmarked, it is garbage (so add to free list)
- BFS is used to find the shortest paths.
- It is useful in web crawling
Minimum Spanning Tree
- MST size (# of edges) is
MST is fundamental problem with diverse applications.
・Dithering.
・Cluster analysis.
・Max bottleneck paths.
・Real-time face verification.
・LDPC codes for error correction.
・Image registration with Renyi entropy.
・Find road networks in satellite and aerial imagery.
・Reducing data storage in sequencing amino acids in a protein.
・Model locality of particle interactions in turbulent fluid flows.
・Autoconfig protocol for Ethernet bridging to avoid cycles in a network.
・Approximation algorithms for NP-hard problems (e.g., TSP, Steiner tree).
・Network design (communication, electrical, hydraulic, computer, road).
- There are many algorithms to compute the MST
Greedy MST Algorithm
- Greedy algorithms seek to find the best local solution (in the short term)
- Cut edges and choose the edge with the minimum weight
Kruskal’s Algorithm
- It is a special case of greedy algorithm
Kruskal Algo (weighted graph G): Sort edges in G ascending order of weight For each edge in the ascending order: Add next edge to tree T unless doing so would create a cycle Return T
- How to find if adding an edge
e
would create a cycle? The efficient solution is to use union-find data structure - Maintain a set for each connected component in
- If and are in the same set, then adding would create a cycle
- To add to , merge sets containing and
- Java implementation
- Queue for holding the MST edges
- Priority Queue (Minimum Queue) for sorting the edges
- Union-Find for maintaining the connected components (prevent cycles)
- Running time is in the worst case
Prim’s Algorithm
- It is a special case of greedy algorithm
Prim (weighted edge graph G): Start with vertex 0 and greedily grow tree T Add to T the min weight edge with exactly one endpoint in T Repeat until V-1 edges Return T
- How to find if adding an edge
e
has the min weight with exactly one endpoint in T? There are two solutions/implementations for that: - Lazy implementation
- Eager implementation
Lazy Implementation
Lazy Prim (weighted edge graph G): Start with vertex 0 and greedily grow tree T Create a Priority Queue (PQ) sorted by the edges weights Add all edges adjacent to the vertex v to the queue From the PQ, add to T the min weight edge with exactly one endpoint in T If e has more than one endpoint in T then delete it Repeat until V-1 edges Return T
- Java Implementation
- Queue for holding the MST edges
- Priority Queue (Minimum Queue) for sorting the edges
- Boolean Matrix for marking visited vertices
- Running Time: it computes the MST in time proportional to
- Space Cost: it needs extra space proportional to
Eager Implementation
- In this implementation we need to create an indexed PQ (special implementation of PQ)
Prim’s algo can be used to find the Euclidean MST in runtime
MST is used in clustering (K-Clustering & Decision Tree)
Shortest Paths
- Shortest Paths algorithms work in Directed Weighted Graphs
- Variants:
- Source-sink: from one vertex to another
- Single source: from vertex to every other
- All pairs: between all pairs
- Restrictions on edge weights:
- Non-negative weights
- Arbitrary weights
- Euclidean weights
Single-source shortest paths API
- Goal: find the shortest path from
s
to every other vertex
public class SP { SP(EdgeWeightedDigraph G, int s) {...} double distanceTo(int v) {...} Iterable<DirectedEdge> pathTo(int v) {...} }
- Basic SSSP algorithms:
- Bellman-Ford:
- Slow and steady
- Works on any graph, any weight
- Detects negative weight cycles
- Dijkstra
- Faster
- No negative edge weights
- Directed Acyclic Graph (DAG)
- Negative edge weights is okay
Edge Relaxation
For start vertex s:
- If you find a path from s → u of total weight x, the shortest path from s → u has no more than x
- If there is a path from s → u of total weight x, and an edge form u → v of weight y, there is a path from s → v of total weight x+y
This is called Relaxing the Edge (meaning: relax, there are no edges shorter than that)
Relax(e): u = e.from() v = e.to() If (u.d + weight(e) < v.d): v.d = u.d + weight(e) v.parent = u
Generic Single-Source Shortest Path Algorithm
- The following is a pseudocode represents a generalization of SSSP algorithm
Generic SSSP Algorithm(G, s): SSSP Initialize(G, s) For each edge e in G (possibly repeating edges) in same order: Relax(e) // find shorter valid path to each vertex Relax(e): u = e.from() v = e.to() // don't change the edge endpoints' d and parent values if // the new path is longer or the same distance If (u.d + weight(e) < v.d): v.d = u.d + weight(e) v.parent = u SSSP Initialize(G, s): For all v in G: v.d = infinity // d stores an upperbound distance estimate to reach the vertex v v.parent = None // parent stores the parent node used on the path of length d s.d = 0 Find Path(s, v): If v == s: return new List().add(v) if v.parent == None: return None return Find Path(s, v.parent).add(v)
- The generic algorithm builds a data structure called short path tree
Bellman-Ford Algorithm
- An SSSP algorithm that:
- Is slow and steady
- It has a time complexity of O(EV) (worst than Dijkstra)
- This makes it not ideal for most SSSP problems
- Works on any graph, any edge weight
- Detects reachable negative weight cycles (which Dijkstra’s algorithm cannot). In the following figure S→G→H is a reachable negative cycle and the others are not.
- This algorithm runs through 2 iterations (for each vertex in the graph):
- First iteration: apply relaxation function on each edge
- Second iteration: detect negative cycles and mark the nodes that are part of these cycles
- Let’s implement the Bellman-Ford algorithm based on the generic SSSP algorithm
SSSP Initialize(G, s): For all v in G: v.d = infinity // d stores an upperbound distance estimate to reach the vertex v v.parent = None // parent stores the parent node used on the path of length d s.d = 0 Relax(e, NegativeCycle): u = e.from() v = e.to() // don't change the edge endpoints' d and parent values if // the new path is longer or the same distance If (u.d + weight(e) < v.d): // if relaxation is used to detect nodes in negative cycles If NegativeCycle == True: v.d = Negative Infinity Else: v.d = u.d + weight(e) v.parent = u Bellman-Ford Algorithm(Digraph G, s): SSSP Initialize(G, s) For each vertex v in G: For each edge e in G: Relax(e, False) // find shorter valid path to each vertex // run algorithm a second time to detect which nodes // are part of a negative cycle. A negative cycle has occured if we // can find a better path beyond the optimal solution For each vertex v in G: For each edge e in G: Relax(e, True) // build an array containing the shortest distances to every node distances = [] For each vertex v in G: distances.add(v.d) return distances
- Runtime analysis: