Unit 3: Graphs 1 (BFS, DFS, Greedy)

 
Disc
Prog
LJ
 
Graphs provide the ultimate in data structure flexibility. Therefore, Graphs can model both real-world systems and abstract problems

Terminology and Representations

  • A graph consists of vertices and edges
  • Edges:
    • is a connection between a pair of vertices in
    • |E| is the number of edges
    • (directed or undirected)
    • if the graph is simple
    • Special case: Some graph applications require that a given pair of vertices can have multiple or parallel edges connecting them, or that a vertex can have an edge to itself
 
  • Vertices:
    • Two vertices are adjacent or neighbors if they are joined by an edge
    • An edge connecting Vertices and is written . Such an edge is said to be incident on Vertices and
  • Paths: A sequence of vertices forms a path of length if there exist edges from to for .
    • Simple path: A path is simple if all vertices on the path are distinct
    • The length of a path is the number of edges it contains
  • Cycle: is a path of length three or more that connects some vertex to itself
    • Simple cycle: A cycle is simple if the path is simple, except for the first and last vertices being the same
  • Subgraph is formed from graph by selecting a subset s of ’s vertices and a subset s of ’s edges such that for every edge in s, both of ’s vertices are in s.
 
 
  • Types of graphs:
    • Sparse: A graph with relatively few edges is called
    • Dense: A graph with many edges
    • Complete: A graph containing all possible edges
      • notion image
    • Directed graph (Digraph) - figure b: A graph with edges directed from one vertex to another
    • Undirected graph - figure a: A graph whose edges are not directed
      • Connected undirected graph: An undirected graph is connected if there is at least one path from any vertex to any other
    • Labeled graph (or Weighted graph) - figure c: A graph with labels or weights associated with its vertices
    • Acyclic graph: A graph w/o cycles
    • Directed acyclic graph (DAG): A directed graph w/o cycles
 
  • Free Tree: a connected, undirected graph with no simple cycles
    • or, it is a connected and has edges
 

Graph Representation

There are two commonly used methods for representing graphs:
  1. Adjacency Matrix
  1. Adjacency List
 

Adjacency Matrix

  • It is illustrated by figure (b) below
  • It is a array
  • Column in row is marked if there is an edge from to and is not marked otherwise
    • In labeled or weighted graphs, we put the weight or the distance between the two vertices instead of 1
  • Space cost:
    • The space requirements for the adjacency matrix are
    • The adjacency matrix requires space for each potential edge, whether it exists or not
    • We need only 1 bit to represent an edge
    • As the graph becomes denser, the adjacency matrix becomes relatively more space efficient
  • Asymptotic cost:
    • The adjacency matrix requires no overhead for pointers, which can be a substantial cost
    • The adjacency matrix often requires a higher asymptotic cost for an algorithm than would result if the adjacency list were used.
      • The reason is that it is common for a graph algorithm to visit each neighbor of each vertex.
      • The adjacency matrix must look at each of its potential edges, yielding a total cost of time
      • This is a considerable disadvantage when the graph is sparse, but not when the graph is closer to full
    • Time complexity:
      • Check if there is edge
        List all neighbors of node
        Traversing the whole graph
       

Adjacency List

  • It is illustrated by figure (c) below
  • It is an array of linked lists and it is a items long array
  • The position in the array stores a pointer to a linked list of edges for vertex
  • This linked list represents the edges by the vertices that are adjacent to vertex
  • The adjacency list is therefore a generalization of the “list of children” representation for trees
  • Space cost:
    • The space requirements depend on both the number of edges and the number of vertices in the graph
    • The cost is
    • We need bit to represent an edge
    • The adjacency list stores information only for those edges that actually appear in the graph
    • They are likely more space efficient in sparse graphs
 

Handshaking Lemma (undirected graph)

  • The degree of each vertex in HS Lemma is
    • Because we record in and out degrees
 
 

Undirected Graph vs. Directed Graph Representation

Both the adjacency matrix and the adjacency list can be used to store directed or undirected graphs.
  • Directed Graph:
    • The edge is recorded only once
notion image
  • Undirected Graph
    • The edge is recorded in both directions (twice).
    • From to and from to
    • This create symmetry in the matrix as shown in figure (b)
notion image
 

Implementation

  • ADT Specifications:
    • Vertices are defined by an integer index value.
    • Vertex can store any additional information (e.g. name)
    • ADT is not implemented using a template, because it is the Graph class users’ responsibility to maintain information related to the vertices themselves.
    • The Graph class need have no knowledge of the type or content of the information associated with a vertex, only the index number for that vertex
  • Methods:
    • n()returns the number of vertices
    • e()returns the number of edges
    • weight(int v0, int v1)returns the weight of a given edge identified by its two incident vertices
    • setEdge(unsigned int w)sets the weight of an edge
      • w should be larger than zero
    • delEdge()removes the edge from the graph
    • setMark(int v, int val)sets the mark value for a vertex (Mark array is used for traversal)
    • getMark(int v, int val)gets the mark value for a vertex

Code (from Shafer’s textbook)

Graph ADT (C++)
// Graph abstract class. This ADT assumes that the number // of vertices is fixed when the graph is created. class Graph { private: void operator =(const Graph&) {} // Protect assignment Graph(const Graph&) {} // Protect copy constructor public: Graph() {} // Default constructor virtual ˜Graph() {} // Base destructor // Initialize a graph of n vertices virtual void Init(int n) =0; // Return: the number of vertices and edges virtual int n() =0; virtual int e() =0; // Return v’s first neighbor virtual int first(int v) =0; // Return v’s next neighbor virtual int next(int v, int w) =0; // Set the weight for an edge // i, j: The vertices // wgt: Edge weight virtual void setEdge(int v1, int v2, int wght) =0; // Delete an edge // i, j: The vertices virtual void delEdge(int v1, int v2) =0; // Determine if an edge is in the graph // i, j: The vertices // Return: true if edge i,j has non-zero weight virtual bool isEdge(int i, int j) =0; // Return an edge’s weight // i, j: The vertices // Return: The weight of edge i,j, or zero virtual int weight(int v1, int v2) =0; // Get and Set the mark value for a vertex // v: The vertex // val: The value to set virtual int getMark(int v) =0; virtual void setMark(int v, int val) =0; };
Graph Implementation using Adjacency Matrix (C++)
// Implementation for the adjacency matrix representation class Graphm : public Graph { private: int numVertex, numEdge; // Store number of vertices, edges int **matrix; // Pointer to adjacency matrix int *mark; // Pointer to mark array public: Graphm(int numVert) // Constructor { Init(numVert); } ˜Graphm() { // Destructor delete [] mark; // Return dynamically allocated memory for (int i=0; i<numVertex; i++) delete [] matrix[i]; delete [] matrix; } void Init(int n) { // Initialize the graph int i; numVertex = n; numEdge = 0; mark = new int[n]; // Initialize mark array for (i=0; i<numVertex; i++) mark[i] = UNVISITED; matrix = (int**) new int*[numVertex]; // Make matrix for (i=0; i<numVertex; i++) matrix[i] = new int[numVertex]; for (i=0; i< numVertex; i++) // Initialize to 0 weights for (int j=0; j<numVertex; j++) matrix[i][j] = 0; } int n() { return numVertex; } // Number of vertices int e() { return numEdge; } // Number of edges // Return first neighbor of "v" int first(int v) { for (int i=0; i<numVertex; i++) if (matrix[v][i] != 0) return i; return numVertex; // Return n if none } // Return v’s next neighbor after w int next(int v, int w) { for(int i=w+1; i<numVertex; i++) if (matrix[v][i] != 0) return i; return numVertex; // Return n if none } // Set edge (v1, v2) to "wt" void setEdge(int v1, int v2, int wt) { Assert(wt>0, "Illegal weight value"); if (matrix[v1][v2] == 0) numEdge++; matrix[v1][v2] = wt; } void delEdge(int v1, int v2) { // Delete edge (v1, v2) if (matrix[v1][v2] != 0) numEdge--; matrix[v1][v2] = 0; } bool isEdge(int i, int j) // Is (i, j) an edge? { return matrix[i][j] != 0; } int weight(int v1, int v2) { return matrix[v1][v2]; } int getMark(int v) { return mark[v]; } void setMark(int v, int val) { mark[v] = val; } };
Graph Implementation using Adjacency List (C++)
// Edge class for Adjacency List graph representation class Edge { int vert, wt; public: Edge() { vert = -1; wt = -1; } Edge(int v, int w) { vert = v; wt = w; } int vertex() { return vert; } int weight() { return wt; } };
class Graphl : public Graph { private: List<Edge>** vertex; // List headers int numVertex, numEdge; // Number of vertices, edges int *mark; // Pointer to mark array public: Graphl(int numVert) { Init(numVert); } ˜Graphl() { // Destructor delete [] mark; // Return dynamically allocated memory for (int i=0; i<numVertex; i++) delete [] vertex[i]; delete [] vertex; } void Init(int n) { int i; numVertex = n; numEdge = 0; mark = new int[n]; // Initialize mark array for (i=0; i<numVertex; i++) mark[i] = UNVISITED; // Create and initialize adjacency lists vertex = (List<Edge>**) new List<Edge>*[numVertex]; for (i=0; i<numVertex; i++) vertex[i] = new LList<Edge>(); } int n() { return numVertex; } // Number of vertices int e() { return numEdge; } // Number of edges int first(int v) { // Return first neighbor of "v" if (vertex[v]->length() == 0) return numVertex; // No neighbor vertex[v]->moveToStart(); Edge it = vertex[v]->getValue(); return it.vertex(); } // Get v’s next neighbor after w int next(int v, int w) { Edge it; if (isEdge(v, w)) { if ((vertex[v]->currPos()+1) < vertex[v]->length()) { vertex[v]->next(); it = vertex[v]->getValue(); return it.vertex(); } } return n(); // No neighbor } // Set edge (i, j) to "weight" void setEdge(int i, int j, int weight) { Assert(weight>0, "May not set weight to 0"); Edge currEdge(j, weight); if (isEdge(i, j)) { // Edge already exists in graph vertex[i]->remove(); vertex[i]->insert(currEdge); } else { // Keep neighbors sorted by vertex index numEdge++; for (vertex[i]->moveToStart(); vertex[i]->currPos() < vertex[i]->length(); vertex[i]->next()) { Edge temp = vertex[i]->getValue(); if (temp.vertex() > j) break; } vertex[i]->insert(currEdge); } } void delEdge(int i, int j) { // Delete edge (i, j) if (isEdge(i,j)) { vertex[i]->remove(); numEdge--; } } bool isEdge(int i, int j) { // Is (i,j) an edge? Edge it; for (vertex[i]->moveToStart(); vertex[i]->currPos() < vertex[i]->length(); vertex[i]->next()) { // Check whole list Edge temp = vertex[i]->getValue(); if (temp.vertex() == j) return true; } return false; } int weight(int i, int j) { // Return weight of (i, j) Edge curr; if (isEdge(i, j)) { curr = vertex[i]->getValue(); return curr.weight(); } else return 0; } int getMark(int v) { return mark[v]; } void setMark(int v, int val) { mark[v] = val; } };
 
 

Graph Traversal

  • The most popular algorithms:
    • Depth First Search (DFS)
      • A recursive technique (using Stack)
      • It focuses on going deeper
    • Breadth First Search (BFS)
      • An iterative technique (using Queue)
      • It focuses on going parallel
  • These algorithms work with both, directed and undirected graphs
  • Usually, graph traversal algorithms have linear time complexity in terms of their nodes and edges

Depth First Search (DFS)

  • It is a recursive algorithm
    • We can implement it using Stack as an iterative algorithm instead
  • Goal: Systematically search through a graph
  • Idea: Mimic maze exploration
  • It is not all that useful by itself, but it shines when augmented to perform such tasks:
    • Count connected components
    • Determine connectivity
    • Find bridges/articulations points
  • It starts from a given node DFS(node) and continue recursively. This is a pseudo code:
    • DFS(node): if (node.neighbor == null AND node.mark == VISTED): return edge = node.pick_edge() edge.neighbor.mark = VISITED DFS(edge.neighbor)
  • C++ Code for Reachability Algorithm (by Ibrahim Moustafa)
    • Finding the reachable nodes starting from a node
    • void dfs(GRAPH &graph, int node, vector<bool> &visited) { visited[node] = true; for (int neighbour : graph[node]) { if (!visited[neighbour]) { // Avoid cycling cout << "\tWe can reach " << neighbour << "\n"; dfs(graph, neighbour, visited); // Recursion } } } void reachability(GRAPH &graph) { int nodes = graph.size(); for (int i = 0; i < nodes; ++i) { vector<bool> visited(nodes); // RESET cout << "Reachable set of node " << i << "\n"; dfs(graph, i, visited); } }
  • Time Complexity
    • In Adjacency Matrix Implementation where is the number of vertices/nodes
      • Because we have to iterate in for all possible edges
    • In Adjacency List Implementation where is the number of edges and is the number of vertices/nodes.
      • This is more efficient than in matrix implementation, because the vertex is visited once at most
  • Space Complexity
    • We create a boolean array to store the visit marks for each node:
    • The recursion stack will have at most calls, so the total is
  • The code and complexity is the same for directed and undirected graphs
 

Breadth First Search (BFS)

  • It is particularly useful for one thing: finding the shortest path on unweighted graphs
  • Time complexity:
  • Worst case
    • For a random graph, the time complexity is O(V+E)Breadth-first search
      As stated in the link, according to the topology of your graph, O(E) may vary from O(V) (if your graph is acyclic) to O(V^2) (if all vertices are connected with each other).
      Therefore the time complexity varies fromO(V + V) = O(V) to O(V + V^2) = O(V^2) according to the topology of your graph.
      Besides, since |V| <= 2 |E|, then O(3E) = O(E) is also correct, but the bound is looser.
       
  • It is implemented with the aid of a queue to record and trace all unvisited neighbors, and to track which node to visit next.
    • Current vertex is put in the head/front of the queue (ready to dequeue)
    • We enqueue unvisited neighbors of the current vertex
      • If one vertex is already existed in the queue, skip that (keep it)
    • After that, we dequeue the current vertex
  • This process runs until we run out of nodes in the queue.
  • Pseudocode
# global variables n = number of nodes in the graph g = adjacency list representing unweighted graph # s = start node, e = end node # 0 =< e # s < n function bfs(s, e): # do a BFS starting at node s prev = solve(s) # return reconstruct path from s -> e return reconstructPath(s, e, prev)
function solve(s): q = queue with support for enqueue and dequeue q.enqueue(s) # if the node is visited, then: # it has either been visited (dequeued) # or is being visited and on the queue (enqueued) visited = [false, ..., false] # size n visited[s] = true prev = [null, ..., null] # size n while !q.isEmpty(): node = q.dequeue() neighbours = g.get(node) for(next : neighbours): if !visited[next]: q.enqueue(next) visited[next] = true prev[next] = node return prev
function reconstructPath(s, e, prev): # reconstruct path going backwards from e # prev was initialized with null values to indicate where the for loop should stop path = [] for(at = e; at != null; at = prev[at]): path.add(at) path.reverse() # if s and e are connected return the path if path[0] == s: return path return []
 

Greedy

  • It is a strategy used to solve optimization problems
  • Optimization Problem: is the problem that requires either minimum result or maximum result
  • It aims to find the best solution at every stage based on the constraint defined in the problem
  • The constraint may be a condition
    • In this case, there will be multiple Feasible Solutions that satisfy the given condition
    • Example: 0-1 Knapsack
  • The constraint may be a statement asking for solving the problem at the minimum/maximum cost
    • In this case, there will be only one Optimal Solution that gives the best result
    • Example: Fractional Knapsack
 
 

Minimum Spanning Tree (MST)

  • Tree: Connected Acyclic Graph
  • Spanning Tree: A subset of edges of that form a tree that hit all vertices of
  • Minimum Spanning Tree:
    • Input: Given graph and edge weights ,
    • Output: A spanning tree that connect all the vertices of minimum weight (for simplicity, assuming that all weights are distinct)
    • This algorithm is important in distributed systems
    • Finding the MST in the Naïve method (testing all possible spanning trees) has an exponential time complexity
 
 

Greedy Properties

  • Optimal Substructure: having overlapping subproblems
    • The parts of the optimal solution are also optimal
    • When we have this, we can use dynamic programming
  • Greedy Choice: A local optimal choice is globally optimal
    • The MST problem can be solved by a greedy algorithm because the the locally optimal solution is also the globally optimal solution
 

Prim’s Algorithm

     

    Resources

    • Clifford Shafer’s textbook - Chapter 11.1 → 11.3