Graphs provide the ultimate in data structure flexibility. Therefore, Graphs can model both real-world systems and abstract problems
Terminology and Representations
- A graph consists of vertices and edges
- Edges:
- is a connection between a pair of vertices in
- |E| is the number of edges
- (directed or undirected)
- if the graph is simple
- Special case: Some graph applications require that a given pair of vertices can have multiple or parallel edges connecting them, or that a vertex can have an edge to itself
- Vertices:
- Two vertices are adjacent or neighbors if they are joined by an edge
- An edge connecting Vertices and is written . Such an edge is said to be incident on Vertices and
- Paths: A sequence of vertices forms a path of length if there exist edges from to for .
- Simple path: A path is simple if all vertices on the path are distinct
- The length of a path is the number of edges it contains
- Cycle: is a path of length three or more that connects some vertex to itself
- Simple cycle: A cycle is simple if the path is simple, except for the first and last vertices being the same
- Subgraph is formed from graph by selecting a subset s of ’s vertices and a subset s of ’s edges such that for every edge in s, both of ’s vertices are in s.
- Types of graphs:
- Sparse: A graph with relatively few edges is called
- Dense: A graph with many edges
- Complete: A graph containing all possible edges
- Directed graph (Digraph) - figure b: A graph with edges directed from one vertex to another
- Undirected graph - figure a: A graph whose edges are not directed
- Connected undirected graph: An undirected graph is connected if there is at least one path from any vertex to any other
- Labeled graph (or Weighted graph) - figure c: A graph with labels or weights associated with its vertices
- Acyclic graph: A graph w/o cycles
- Directed acyclic graph (DAG): A directed graph w/o cycles
- Free Tree: a connected, undirected graph with no simple cycles
- or, it is a connected and has edges
Graph Representation
There are two commonly used methods for representing graphs:
- Adjacency Matrix
- Adjacency List
Adjacency Matrix
- It is illustrated by figure (b) below
- It is a array
- Column in row is marked if there is an edge from to and is not marked otherwise
- In labeled or weighted graphs, we put the weight or the distance between the two vertices instead of 1
- Space cost:
- The space requirements for the adjacency matrix are
- The adjacency matrix requires space for each potential edge, whether it exists or not
- We need only 1 bit to represent an edge
- As the graph becomes denser, the adjacency matrix becomes relatively more space efficient
- Asymptotic cost:
- The adjacency matrix requires no overhead for pointers, which can be a substantial cost
- The adjacency matrix often requires a higher asymptotic cost for an algorithm than would result if the adjacency list were used.
- The reason is that it is common for a graph algorithm to visit each neighbor of each vertex.
- The adjacency matrix must look at each of its potential edges, yielding a total cost of time
- This is a considerable disadvantage when the graph is sparse, but not when the graph is closer to full
- Time complexity:
Check if there is edge | |
List all neighbors of node | |
Traversing the whole graph |
Adjacency List
- It is illustrated by figure (c) below
- It is an array of linked lists and it is a items long array
- The position in the array stores a pointer to a linked list of edges for vertex
- This linked list represents the edges by the vertices that are adjacent to vertex
- The adjacency list is therefore a generalization of the “list of children” representation for trees
- Space cost:
- The space requirements depend on both the number of edges and the number of vertices in the graph
- The cost is
- We need bit to represent an edge
- The adjacency list stores information only for those edges that actually appear in the graph
- They are likely more space efficient in sparse graphs
Handshaking Lemma (undirected graph)
- The degree of each vertex in HS Lemma is
- Because we record in and out degrees
Undirected Graph vs. Directed Graph Representation
Both the adjacency matrix and the adjacency list can be used to store directed or undirected graphs.
- Directed Graph:
- The edge is recorded only once
- Undirected Graph
- The edge is recorded in both directions (twice).
- From to and from to
- This create symmetry in the matrix as shown in figure (b)
Implementation
- ADT Specifications:
- Vertices are defined by an integer index value.
- Vertex can store any additional information (e.g. name)
- ADT is not implemented using a template, because it is the Graph class users’ responsibility to maintain information related to the vertices themselves.
- The Graph class need have no knowledge of the type or content of the information associated with a vertex, only the index number for that vertex
- Methods:
n()
returns the number of verticese()
returns the number of edgesweight(int v0, int v1)
returns the weight of a given edge identified by its two incident verticessetEdge(unsigned int w)
sets the weight of an edgew
should be larger than zerodelEdge()
removes the edge from the graphsetMark(int v, int val)
sets the mark value for a vertex (Mark array is used for traversal)getMark(int v, int val)
gets the mark value for a vertex
Code (from Shafer’s textbook)
Graph ADT (C++)
// Graph abstract class. This ADT assumes that the number // of vertices is fixed when the graph is created. class Graph { private: void operator =(const Graph&) {} // Protect assignment Graph(const Graph&) {} // Protect copy constructor public: Graph() {} // Default constructor virtual ˜Graph() {} // Base destructor // Initialize a graph of n vertices virtual void Init(int n) =0; // Return: the number of vertices and edges virtual int n() =0; virtual int e() =0; // Return v’s first neighbor virtual int first(int v) =0; // Return v’s next neighbor virtual int next(int v, int w) =0; // Set the weight for an edge // i, j: The vertices // wgt: Edge weight virtual void setEdge(int v1, int v2, int wght) =0; // Delete an edge // i, j: The vertices virtual void delEdge(int v1, int v2) =0; // Determine if an edge is in the graph // i, j: The vertices // Return: true if edge i,j has non-zero weight virtual bool isEdge(int i, int j) =0; // Return an edge’s weight // i, j: The vertices // Return: The weight of edge i,j, or zero virtual int weight(int v1, int v2) =0; // Get and Set the mark value for a vertex // v: The vertex // val: The value to set virtual int getMark(int v) =0; virtual void setMark(int v, int val) =0; };
Graph Implementation using Adjacency Matrix (C++)
// Implementation for the adjacency matrix representation class Graphm : public Graph { private: int numVertex, numEdge; // Store number of vertices, edges int **matrix; // Pointer to adjacency matrix int *mark; // Pointer to mark array public: Graphm(int numVert) // Constructor { Init(numVert); } ˜Graphm() { // Destructor delete [] mark; // Return dynamically allocated memory for (int i=0; i<numVertex; i++) delete [] matrix[i]; delete [] matrix; } void Init(int n) { // Initialize the graph int i; numVertex = n; numEdge = 0; mark = new int[n]; // Initialize mark array for (i=0; i<numVertex; i++) mark[i] = UNVISITED; matrix = (int**) new int*[numVertex]; // Make matrix for (i=0; i<numVertex; i++) matrix[i] = new int[numVertex]; for (i=0; i< numVertex; i++) // Initialize to 0 weights for (int j=0; j<numVertex; j++) matrix[i][j] = 0; } int n() { return numVertex; } // Number of vertices int e() { return numEdge; } // Number of edges // Return first neighbor of "v" int first(int v) { for (int i=0; i<numVertex; i++) if (matrix[v][i] != 0) return i; return numVertex; // Return n if none } // Return v’s next neighbor after w int next(int v, int w) { for(int i=w+1; i<numVertex; i++) if (matrix[v][i] != 0) return i; return numVertex; // Return n if none } // Set edge (v1, v2) to "wt" void setEdge(int v1, int v2, int wt) { Assert(wt>0, "Illegal weight value"); if (matrix[v1][v2] == 0) numEdge++; matrix[v1][v2] = wt; } void delEdge(int v1, int v2) { // Delete edge (v1, v2) if (matrix[v1][v2] != 0) numEdge--; matrix[v1][v2] = 0; } bool isEdge(int i, int j) // Is (i, j) an edge? { return matrix[i][j] != 0; } int weight(int v1, int v2) { return matrix[v1][v2]; } int getMark(int v) { return mark[v]; } void setMark(int v, int val) { mark[v] = val; } };
Graph Implementation using Adjacency List (C++)
// Edge class for Adjacency List graph representation class Edge { int vert, wt; public: Edge() { vert = -1; wt = -1; } Edge(int v, int w) { vert = v; wt = w; } int vertex() { return vert; } int weight() { return wt; } };
class Graphl : public Graph { private: List<Edge>** vertex; // List headers int numVertex, numEdge; // Number of vertices, edges int *mark; // Pointer to mark array public: Graphl(int numVert) { Init(numVert); } ˜Graphl() { // Destructor delete [] mark; // Return dynamically allocated memory for (int i=0; i<numVertex; i++) delete [] vertex[i]; delete [] vertex; } void Init(int n) { int i; numVertex = n; numEdge = 0; mark = new int[n]; // Initialize mark array for (i=0; i<numVertex; i++) mark[i] = UNVISITED; // Create and initialize adjacency lists vertex = (List<Edge>**) new List<Edge>*[numVertex]; for (i=0; i<numVertex; i++) vertex[i] = new LList<Edge>(); } int n() { return numVertex; } // Number of vertices int e() { return numEdge; } // Number of edges int first(int v) { // Return first neighbor of "v" if (vertex[v]->length() == 0) return numVertex; // No neighbor vertex[v]->moveToStart(); Edge it = vertex[v]->getValue(); return it.vertex(); } // Get v’s next neighbor after w int next(int v, int w) { Edge it; if (isEdge(v, w)) { if ((vertex[v]->currPos()+1) < vertex[v]->length()) { vertex[v]->next(); it = vertex[v]->getValue(); return it.vertex(); } } return n(); // No neighbor } // Set edge (i, j) to "weight" void setEdge(int i, int j, int weight) { Assert(weight>0, "May not set weight to 0"); Edge currEdge(j, weight); if (isEdge(i, j)) { // Edge already exists in graph vertex[i]->remove(); vertex[i]->insert(currEdge); } else { // Keep neighbors sorted by vertex index numEdge++; for (vertex[i]->moveToStart(); vertex[i]->currPos() < vertex[i]->length(); vertex[i]->next()) { Edge temp = vertex[i]->getValue(); if (temp.vertex() > j) break; } vertex[i]->insert(currEdge); } } void delEdge(int i, int j) { // Delete edge (i, j) if (isEdge(i,j)) { vertex[i]->remove(); numEdge--; } } bool isEdge(int i, int j) { // Is (i,j) an edge? Edge it; for (vertex[i]->moveToStart(); vertex[i]->currPos() < vertex[i]->length(); vertex[i]->next()) { // Check whole list Edge temp = vertex[i]->getValue(); if (temp.vertex() == j) return true; } return false; } int weight(int i, int j) { // Return weight of (i, j) Edge curr; if (isEdge(i, j)) { curr = vertex[i]->getValue(); return curr.weight(); } else return 0; } int getMark(int v) { return mark[v]; } void setMark(int v, int val) { mark[v] = val; } };
Graph Traversal
- The most popular algorithms:
- Depth First Search (DFS)
- A recursive technique (using Stack)
- It focuses on going deeper
- Breadth First Search (BFS)
- An iterative technique (using Queue)
- It focuses on going parallel
- These algorithms work with both, directed and undirected graphs
- Usually, graph traversal algorithms have linear time complexity in terms of their nodes and edges
Depth First Search (DFS)
- It is a recursive algorithm
- We can implement it using Stack as an iterative algorithm instead
- Goal: Systematically search through a graph
- Idea: Mimic maze exploration
- It is not all that useful by itself, but it shines when augmented to perform such tasks:
- Count connected components
- Determine connectivity
- Find bridges/articulations points
- It starts from a given node
DFS(node)
and continue recursively. This is a pseudo code:
DFS(node): if (node.neighbor == null AND node.mark == VISTED): return edge = node.pick_edge() edge.neighbor.mark = VISITED DFS(edge.neighbor)
- C++ Code for Reachability Algorithm (by Ibrahim Moustafa)
- Finding the reachable nodes starting from a node
void dfs(GRAPH &graph, int node, vector<bool> &visited) { visited[node] = true; for (int neighbour : graph[node]) { if (!visited[neighbour]) { // Avoid cycling cout << "\tWe can reach " << neighbour << "\n"; dfs(graph, neighbour, visited); // Recursion } } } void reachability(GRAPH &graph) { int nodes = graph.size(); for (int i = 0; i < nodes; ++i) { vector<bool> visited(nodes); // RESET cout << "Reachable set of node " << i << "\n"; dfs(graph, i, visited); } }
- Time Complexity
- In Adjacency Matrix Implementation where is the number of vertices/nodes
- Because we have to iterate in for all possible edges
- In Adjacency List Implementation where is the number of edges and is the number of vertices/nodes.
- This is more efficient than in matrix implementation, because the vertex is visited once at most
- Space Complexity
- We create a boolean array to store the visit marks for each node:
- The recursion stack will have at most calls, so the total is
- The code and complexity is the same for directed and undirected graphs
Breadth First Search (BFS)
- It is particularly useful for one thing: finding the shortest path on unweighted graphs
- Time complexity:
- Worst case
For a random graph, the time complexity isO(V+E)
: Breadth-first searchAs stated in the link, according to the topology of your graph,O(E)
may vary fromO(V)
(if your graph is acyclic) toO(V^2)
(if all vertices are connected with each other).Therefore the time complexity varies fromO(V + V) = O(V)
toO(V + V^2) = O(V^2)
according to the topology of your graph.Besides, since|V| <= 2 |E|
, thenO(3E) = O(E)
is also correct, but the bound is looser.
- It is implemented with the aid of a queue to record and trace all unvisited neighbors, and to track which node to visit next.
- Current vertex is put in the head/front of the queue (ready to dequeue)
- We enqueue unvisited neighbors of the current vertex
- If one vertex is already existed in the queue, skip that (keep it)
- After that, we dequeue the current vertex
- This process runs until we run out of nodes in the queue.
- Pseudocode
# global variables n = number of nodes in the graph g = adjacency list representing unweighted graph # s = start node, e = end node # 0 =< e # s < n function bfs(s, e): # do a BFS starting at node s prev = solve(s) # return reconstruct path from s -> e return reconstructPath(s, e, prev)
function solve(s): q = queue with support for enqueue and dequeue q.enqueue(s) # if the node is visited, then: # it has either been visited (dequeued) # or is being visited and on the queue (enqueued) visited = [false, ..., false] # size n visited[s] = true prev = [null, ..., null] # size n while !q.isEmpty(): node = q.dequeue() neighbours = g.get(node) for(next : neighbours): if !visited[next]: q.enqueue(next) visited[next] = true prev[next] = node return prev
function reconstructPath(s, e, prev): # reconstruct path going backwards from e # prev was initialized with null values to indicate where the for loop should stop path = [] for(at = e; at != null; at = prev[at]): path.add(at) path.reverse() # if s and e are connected return the path if path[0] == s: return path return []
Greedy
- It is a strategy used to solve optimization problems
- Optimization Problem: is the problem that requires either minimum result or maximum result
- It aims to find the best solution at every stage based on the constraint defined in the problem
- The constraint may be a condition
- In this case, there will be multiple Feasible Solutions that satisfy the given condition
- Example: 0-1 Knapsack
- The constraint may be a statement asking for solving the problem at the minimum/maximum cost
- In this case, there will be only one Optimal Solution that gives the best result
- Example: Fractional Knapsack
Minimum Spanning Tree (MST)
- Tree: Connected Acyclic Graph
- Spanning Tree: A subset of edges of that form a tree that hit all vertices of
- Minimum Spanning Tree:
- Input: Given graph and edge weights ,
- Output: A spanning tree that connect all the vertices of minimum weight (for simplicity, assuming that all weights are distinct)
- This algorithm is important in distributed systems
- Finding the MST in the Naïve method (testing all possible spanning trees) has an exponential time complexity
Greedy Properties
- Optimal Substructure: having overlapping subproblems
- The parts of the optimal solution are also optimal
- When we have this, we can use dynamic programming
- Greedy Choice: A local optimal choice is globally optimal
- The MST problem can be solved by a greedy algorithm because the the locally optimal solution is also the globally optimal solution
Prim’s Algorithm
Resources
- Clifford Shafer’s textbook - Chapter 11.1 → 11.3