CENG 205 - Data Structures and Algorithms - Week 11a - Graphs PDF
Document Details
Uploaded by Deleted User
Dr. Yücel Tekin
Tags
Summary
This document is a lecture presentation on data structures and algorithms, specifically focusing on graphs. It covers fundamental concepts such as the classification of data structures, properties of graphs, graph traversal algorithms, spanning trees, and shortest path algorithms.
Full Transcript
CENG 205 – Data Structures and Algorithms Week 11-a – Graphs Dr. Yücel Tekin 2.2 CLASSIFICATION OF DATA STRUCTURES Data structure Primitive (basic) Data Non-Primitive Data Structure Integer structure...
CENG 205 – Data Structures and Algorithms Week 11-a – Graphs Dr. Yücel Tekin 2.2 CLASSIFICATION OF DATA STRUCTURES Data structure Primitive (basic) Data Non-Primitive Data Structure Integer structure Real Character Linear Non-Linear Boolean Arrays Trees Linked lists Graphs fundamental data types supported by a programming language Stacks not stored in a sequential order Queues the elements of a data structure are stored in a linear or sequential order Chapter 13 : Graphs In this chapter, we will discuss another non-linear data structure called graphs. the representation of graphs in the memory the operations that can be performed on them Last but not the least, the real-world applications of graphs. 13.1 INTRODUCTION A graph is an abstract data structure that is used to implement the mathematical concept of graphs. It is basically a collection of vertices (also called nodes) and edges that connect these vertices. A graph is often viewed as a generalization of the tree structure, instead of having a purely parent-to-child relationship between tree nodes, any kind of complex relationship can exist. Tree Graph Why are Graphs Useful? Graphs are widely used to model any situation where entities or things are related to each other in pairs. For example, the following information can be represented by graphs: Family trees in which the member nodes have an edge from parent to each of their children. Transportation networks in which nodes are airports, intersections, ports, etc. The edges can be airline flights, one-way roads, shipping routes, etc. Definition A graph G is defined as an ordered set (V, E) V(G) represents the set of vertices E(G) represents the edges that connect these vertices Example V(G) = {A, B, C, D and E} E(G) = {(A, B), (B, C), (A, D), (B, D), (D, E), (C, E)}. A graph can be directed or undirected. In an undirected graph, Edges do not have any direction associated with them. The nodes can be traversed from A to B as well as from B to A In a directed graph, Edges form an ordered pair. There is a path from A to B but not from B to A The edge (A, B) is said to initiate from node A (also known as initial node) and terminate at node B (terminal node). 13.2 Graph Terminology Adjacent nodes or neighbours For every edge, e = (u, v) that connects nodes u and v, the nodes u and v are the end-points and are said to be the adjacent nodes or neighbours. Nodes A and B are neighbours in our example Degree of a node Degree of a node u, deg(u), is the total number of edges containing the node u. If deg(u) = 0, it means that u does not belong to any edge such a node is known as an isolated node. Node F is isolated in the example, deg(F) = 0 deg (A) = 2 A B C D E F 13.2 Graph Terminology Regular graph It is a graph where each vertex has the same number of neighbours. Every node has the same degree A regular graph with vertices of degree k is called a k–regular graph 13.2 Graph Terminology Path A B C A path P written as P = {v0, v1, v2,..., vn), of length n from a node u to v is defined as a sequence of (n+1) nodes. Here, u = v0, v = vn and vi–1 is adjacent to vi for i = 1, 2, 3,..., n. In the example a path between A and E can be described as D E F Closed path PA(A,E) = {A,B,C,E} path P is known as a closed path if the edge has the same end-points. That is, if v0 = vn. In the example a closed path between can be described as P(A,A) = {A,B,C,E,D,A} Simple path A path P is known as a simple path if all the nodes in the path are distinct with an exception that v0 may be equal to vn. If v0 = vn, then the path is called a closed simple path. P(A,A) = {A,B,C,E,D,A} is an example of simple path P(C,C) = {C,B,A,D,B,C} is not a simple path Cycle A path in which the first and the last vertices are same. A simple cycle has no repeated edges or vertices (except the first and last) 13.2 Graph Terminology Connected graph A graph is said to be connected if for any two vertices (u, v) in set V there is a path from u to v That is to say that there are no isolated nodes in a connected graph A connected graph that does not have any cycle is called a tree Therefore, a tree is treated as a special graph connected graph - Tree 13.2 Graph Terminology Complete graph A graph G is said to be complete if all its nodes are fully connected. That is, there is a path from one node to every other node in the graph. A complete graph has n(n–1)/2 edges, where n is the number of nodes in G. Example n= 5 e= 5*4/2 = 10 13.2 Graph Terminology Clique In an undirected graph G = (V, E), clique is a subset of the vertex set C V, every two vertices in C - u, v ∈ C ∧ u ≠ v - there is an edge that connects two vertices- uv ∈ E(G) Example G: C: {A,B,C,F} S: {C,D,F,E} B C D B C C D F A F F E A E C is Clique Every two nodes in C S is not a Clique Because not two nodes are connected with edges in S are connected with C is also a complete edges C is also not a complete graph graph 13.2 Graph Terminology Labelled graph or weighted graph A graph is said to be labelled if every edge in the graph is assigned some data In a weighted graph, the edges of the graph are assigned some weight or length. The weight of an edge denoted by w(e) is a positive value which indicates the cost of traversing the edge. Below figure shows a weighted graph. weighted graph 13.2 Graph Terminology Multiple edges Distinct edges which connect the same end-points are called multiple edges. That is, e = (u, v) and e' = (u, v) are known as multiple edges of G. Loop An edge that has identical end-points is called a loop. That is, e = (u, u). Multi-graph A graph with multiple edges and/or loops is called a multi-graph. Figure shows a multi-graph. Size of a graph The size of a graph is the total number of edges in it. Size of the mutli-graph in the figure is 9 Multi-graph 13.3 Directed Graphs A directed graph G, also known as a digraph, is a graph in which every edge has a direction assigned to it. An edge of a directed graph is given as an ordered pair (u, v) of nodes in G. For an edge (u, v): The edge begins at u and terminates at v. u is known as the origin or initial point of edge. Correspondingly, v is known as the destination or terminal point of egde. u is the predecessor of v. Correspondingly, v is the successor of u. Nodes u and v are adjacent to each other. directed graph 13.3.1 Terminology of a Directed Graph Out-degree of a node The out-degree of a node u, written as is the number of edges that originate at u. In-degree of a node The in-degree of a node u, written as indeg(u), is the number of edges that terminate at u. Degree of a node The degree of a node, written as deg(u), is equal to the sum of in-degree and out-degree of that node. Therefore, deg(u) = indeg(u) + outdeg(u). Example outdeg(B) = 1 indegre(B) = 2 Deg(B) = indeg(B) + outdeg(B) = 2 + 1=3 13.3.1 Terminology of a Directed Graph Out-degree of a node The out-degree of a node u, written as is the number of edges that originate at u. In-degree of a node The in-degree of a node u, written as indeg(u), is the number of edges that terminate at u. Degree of a node The degree of a node, written as deg(u), is equal to the sum of in-degree and out-degree of that node. Therefore, deg(u) = indeg(u) + outdeg(u). Example outdeg(B) = 1 indegre(B) = 2 Deg(B) = indeg(B) + outdeg(B) = 2 + 1=3 13.3.1 Terminology of a Directed Graph Isolated vertex J K A vertex with degree zero. Such a vertex is not an end-point of any edge. H Pendant vertex I (also known as leaf vertex) A vertex with degree one. Cut vertex Cut vertex G Sink A vertex which when deleted would disconnect the remaining graph. B C Source D A node u is known as a source if it has a positive out- degree but a zero in-degree. F Leaf A E vertex Sink A node u is known as a sink if it has a positive in-degree Isolate but a zero out-degree. Sourc e d vertex 13.3.1 Terminology of a Directed Graph Reachability A node v is said to be reachable from node u, if and B C only if there exists a (directed) path from node u to E node v. For example, E is reachable from node A but A is not D reachable from any node. A Strongly connected directed graph A digraph is said to be strongly connected if and only if there exists a path between every pair of nodes in G. That is, if there is a path from node u to v, then there must be a path from node v to u. F G I H 13.3.1 Terminology of a Directed Graph Unilaterally connected graph A digraph is said to be unilaterally connected if there B C exists a path between any pair of nodes u, v in G such that there is a path from u to v or a path from v to u, but not both. A D Weakly connected digraph A directed graph is said to be weakly connected if it is connected by ignoring the direction of edges. it is possible to reach any node from any other node by traversing edges in any direction (may not be in the direction they point). The nodes in a weakly connected directed graph must have either out-degree or in-degree of at least 1. 13.3.1 Terminology of a Directed Graph F G Parallel/Multiple edges Distinct edges which connect the same end-points are called multiple edges. I That is, e = (u, v) and e' = (u, v) are known as H multiple edges of G. Edges connecting nodes H and I are multiple edges Simple directed graph A B A directed graph G is said to be a simple directed graph if and only if it has no parallel edges. However, a simple directed graph may contain cycles with an exception that it cannot have more than one C loop at a given node. D 13.5 REPRESENTATION OF GRAPHS There are three common ways of storing graphs in the computer’s memory. They are: Sequential representation by using an adjacency matrix. Linked representation by using an adjacency list that stores the neighbours of a node using a linked list. Adjacency multi-list which is an extension of linked representation. 13.5.1 Adjacency Matrix Representation An adjacency matrix is used to represent which nodes are adjacent to one another. By definition, two nodes are said to be adjacent if there is an edge connecting them. For any graph G having n nodes, the adjacency matrix will have the dimension of n x n. In an adjacency matrix, the rows and columns are labelled by graph vertices. An entry aij in the adjacency matrix will contain 1, if vertices vi and vj are adjacent to each other. However, if the nodes are not adjacent, aij will be set to zero. Since an adjacency matrix contains only 0s and 1s, it is called a bit matrix or a Boolean matrix. 13.5.1 Adjacency Matrix Representation From the above examples, we can draw the following conclusions: For a simple graph (that has no loops), the adjacency matrix has 0s on the diagonal. The adjacency matrix of an undirected graph is symmetric. The memory use of an adjacency matrix is O(n2), where n is the number of nodes in the graph. Number of 1s (or non-zero entries) in an adjacency matrix is equal to the number of edges in the graph. The adjacency matrix for a weighted graph contains the weights of the edges connecting the nodes. 13.5.1 Adjacency Matrix Representation Now let us discuss the powers of an adjacency matrix. From adjacency matrix A1, we can conclude that an entry 1 in the ith row and jth column means that there exists a path of length 1 from vi to vj. Now consider, A2, A3, and A4. Any entry aij = 1 if aik = akj = 1. That is, if there is an edge (vi, vk) and (vk, vj), then there is a path from vi to vj of length 2. 13.5.1 Adjacency Matrix Representation Similarly, every entry in the ith row and jth column of A3 gives the number of paths of length 3 from node vi to vj. In general terms, we can conclude that every entry in the ith row and jth column of An (where n is the number of nodes in the graph) gives the number of paths of length n from node vi to vj. 13.5.1 Adjacency Matrix Representation Now, based on the above calculations, we define matrix B as: Br = A1 + A2 + A3 +... + Ar An entry in the ith row and jth column of matrix Br gives the number of paths of length r or less than r fr vertex vi to vj. The main goal to define matrix B is to obtain the path matrix P. The path matrix P can be calculated from B by setting an entry Pij = 1, if Bij is non-zero and Pij = 0, if otherwise. 13.5.2 Adjacency List Representation This structure consists of a list of all nodes in G. Every node is in turn linked to its own list that contains the names of all other nodes that are adjace to it. Adjacency lists can also be modified to store weighted graphs. The key advantages of using an adjacency list are: It is easy to follow and clearly shows the adjacent nodes of a particular node. It is often used for storing graphs that have a small-to-moderate number of edges. (sparse graph) An adjacency matrix is a good choice for dense graphs. Adding new nodes in G is easy and straightforward when G is represented using an adjacency list. Adding new nodes in an adjacency matrix is a difficult task, as the size of the matrix needs to be changed and existing nodes may have to be reordered. sparse graph dense graph 13.6 GRAPH TRAVERSAL ALGORITHMS We will discuss how to traverse graphs. By traversing a graph, we mean the method of examining the nodes and edges of the graph. There are two standard methods of graph traversal: 1. Breadth-first search 2. Depth-first search As an auxiliary data structure to store nodes for further processing; Breadth-first search uses a queue Depth-first search uses a stack But both these algorithms make use of a variable STATUS. During the execution of the algorithm, every node in the graph will have the variable STATUS set to 1 or 2, depending on its current state. 13.6.1 Breadth-First Search Algorithm Breadth-first search (BFS) is a graph search algorithm Begins at the root node and explores all the neighbouring nodes. Then for each of those nearest nodes, the algorithm explores their unexplored neighbour nodes, and so on, until it finds the goal. 13.6.1 Breadth-First Search Algorithm We start examining the node A and then all the neighbours of A are examined. In the next step, we examine the neighbours of neighbours of A, so on and so forth. This is accomplished by using a queue that will hold the nodes that are waiting for further processing and a variable STATUS to represent the current state of the node. Example 13.1 Consider the graph G and the adjacency list are given in Figure Assume that G represents the daily flights between different cities We want to fly from city A to I with minimum stops. That is, find the minimum path P from A to I given that every edge has a length of 1. Solution The minimum path P can be found by applying the breadth-first search algorithm that begins at city A and ends when I is encountered. During the execution of the algorithm, we use two arrays: QUEUE and ORIG. QUEUE is used to hold the nodes that have to be processed ORIG is used to keep track of the origin of each edge Initially, FRONT = REAR = –1. Solution 1 - Add A to QUEUE and add NULL to ORIG. 2 - Dequeue a node by setting FRONT = FRONT + 1 (remove the FRONT element of QUEUE) and enqueue the neighbours of A. Also, add A as the ORIG of its neighbours. Solution 3 - Dequeue a node by setting FRONT = FRONT + 1 and enqueue the neighbours of B. Also, add B as the ORIG of its neighbours. 4 - Dequeue a node by setting FRONT = FRONT + 1 and enqueue the neighbours of C. Also, add C as the ORIG of its neighbours. Note that C has two neighbours B and G. Since B has already been added to the queue and it is not in the Ready state, we will not add B and only add G. Solution 5 - Dequeue a node by setting FRONT = FRONT + 1 and enqueue the neighbours of D. Also, add D as the ORIG of its neighbours. Note that D has two neighbours C and G. Since both of them have already been added to the queue and they are not in the Ready state, we will not add them again. 6 - Dequeue a node by setting FRONT = FRONT + 1 and enqueue the neighbours of E. Also, add E as the ORIG of its neighbours. Note that E has two neighbours C and F. Since C has already been added to the queue and it is not in the Ready state, we will not add C and add only F. Solution 6 - Dequeue a node by setting FRONT = FRONT + 1 and enqueue the neighbours of G. Also, add G as the ORIG of its neighbours. Note that G has three neighbours F, H, and I. Since F has already been added to the queue, we will only add H and I. As I is our final destination, we stop the execution of this algorithm as soon as it is encountered and added to the QUEUE. Now, backtrack from I using ORIG to find the minimum path P. Thus, we have P as A -> C - > G -> I. Applications of Breadth-First Search Algorithm Breadth-first search can be used to solve many problems such as: Finding all connected components in a graph G. Finding all nodes within an individual connected component. Finding the shortest path between two nodes, u and v, of an unweighted graph. Finding the shortest path between two nodes, u and v, of a weighted graph. 13.6.2 Depth-first Search Algorithm The depth-first search algorithm progresses by expanding the starting node of G and then going deeper and deeper until the goal node is found, or until a node that has no children is encountered. This algorithm is similar to the in-order traversal of a binary tree. Its implementation is similar to that of the breadth- first search algorithm but here we use a stack instead of a queue. Again, we use a variable STATUS to represent the current state of the node. Example 13.2 Consider the graph G given and the adjacency list of G is also given as in the figure Suppose we want to print all the nodes that can be reached from the node H (including H itself). One alternative is to use a depth-first search of G starting at node H. Example 13.2 A - Push H onto the stack. B - Pop and print the top element of the STACK, that is, H. Push all the neighbours of H onto the stack that are in the ready state. The STACK now becomes C - Pop and print the top element of the STACK, that is, I. Push all the neighbours of I onto the stack that are in the ready state. The STACK now becomes Example 13.2 D - Pop and print the top element of the STACK, that is, F. Push all the neighbours of F onto the stack that are in the ready state. (Note F has two neighbours, C and H. But only C will be added, as H is not in the ready state.) The STACK now becomes E - Pop and print the top element of the STACK, that is, C. Push all the neighbours of C onto the stack that are in the ready state. The STACK now becomes Example 13.2 F - Pop and print the top element of the STACK, that is, G. Push all the neighbours of G onto the stack that are in the ready state. Since there are no neighbours of G that are in the ready state,no push operation is performed. The STACK now becomes G - Pop and print the top element of the STACK, that is, B. Push all the neighbours of B onto the stack that are in the ready state. Since there are no neighbours of B that are in the ready state, no push operation is performed. The STACK now becomes Example 13.2 H - Pop and print the top element of the STACK, that is, E. Push all the neighbours of E onto the stack that are in the ready state. Since there are no neighbours of E that are in the ready state, no push operation is performed. The STACK now becomes empty. Since the STACK is now empty, the depth-first search of G starting at node H is complete and the nodes which were printed are: H, I, F, C, G, B, E These are the nodes which are reachable from the node H. Applications of Depth-First Search Algorithm Depth-first search is useful for: Finding a path between two specified nodes, u and v, of an unweighted graph. Finding a path between two specified nodes, u and v, of a weighted graph. Finding whether a graph is connected or not. Computing the spanning tree of a connected graph. P rogramming E xample 13.8 SHORTEST PATH ALGORITHMS 1. Minimum spanning tree (adjacency list) 2. Dijkstra’s algorithm 3. Warshall’s algorithm While the first two use an adjacency list to find the shortest path, Warshall’s algorithm uses an adjacency matrix to do the same. 13.8.1 Minimum Spanning Trees A spanning tree of a connected, undirected graph G is a sub-graph of G which is a tree that connects all the vertices together. A graph G can have many different spanning trees. 13.8.1 Minimum Spanning Trees We can assign weights to each edge (which is cost of the edge) We can calculate the sum of the weights of the edges in that spanning tree A minimum spanning tree (MST) is defined as a spanning tree with weight less than or equal to the weight of every other spanning tree. MST Applications of Minimum Spanning Trees 1. MSTs are widely used for designing networks. A minimum spanning tree is used to determine the least costly paths with no cycles in this network, thereby providing a connection that has the minimum cost involved. 2. MSTs are used to find airline routes. While the vertices in the graph denote cities, edges represent the routes between these cities. MSTs are used to optimize airline routes by finding the least costly path with no cycles. 3. MSTs are also used to find the cheapest way to connect terminals, such as cities, electronic components or computers via roads, airlines, railways, wires or telephone lines. 4. MSTs are applied in routing algorithms for finding the most efficient path. 13.8.2 Prim’s Algorithm Prim’s algorithm is aa algorithm that is used to form a minimum spanning tree for a connected weighted undirected graph. For this, the algorithm maintains three sets of vertices which can be given as below: Tree vertices Vertices that are a part of the minimum spanning tree T. Fringe vertices Vertices that are currently not a part of T, but are adjacent to some tree vertex. Unseen vertices Vertices that are neither tree vertices nor fringe vertices fall under thiscategory. Example 13.7 Construct a minimum spanning tree of the graph given in Fig. Example: Construct a minimum spanning tree of the graph given in Fig. By using Prim’s Algorithm Solution Example Construct a minimum spanning tree of the graph given in Fig. By using Prim’s Algorithm Solution 13.8.3 Kruskal’s Algorithm Kruskal’s algorithm is used to find the minimum spanning tree for a connected weighted graph. The algorithm aims to find a subset of the edges that forms a tree that includes every vertex. The total weight of all the edges in the tree is minimized. However, if the graph is not connected, then it finds a minimum spanning forest. Note that a forest is a collection of trees. Similarly, a minimum spanning forest is a collection of minimum spanning trees. Example 13.9 Apply Kruskal’s algorithm on the graph given in Figure A 1 B C D E F Example 13.9 Apply Kruskal’s algorithm on the graph given in Figure A 1 B C D E F 2 Example 13.9 Apply Kruskal’s algorithm on the graph given in Figure A 1 B C D 3 E F 2 Example 13.9 Apply Kruskal’s algorithm on the graph given in Figure A 1 B C D 4 3 E F 2 Example 13.9 Apply Kruskal’s algorithm on the graph given in Figure A 1 B C D 4 3 E F 2 Example 13.9 Apply Kruskal’s algorithm on the graph given in Figure A 1 B C D 4 3 E F 2 Example 13.9 Apply Kruskal’s algorithm on the graph given in Figure A 7 1 B C D 4 3 E F 2 Example 13.9 Apply Kruskal’s algorithm on the graph given in Figure A 7 1 B C D 4 3 E F 2 Example: Construct a minimum spanning tree of the graph given in Fig. By using Kruskal’s Algorithm Solution xample Construct a minimum spanning tree of the graph given in Fig. By using Kruskal’s Algorithm Solution 13.8.4 Dijkstra’s Algorithm Dijkstra’s algorithm, given by a Dutch scientist Edsger Dijkstra in 1959, is used to find the shortest path tree. This algorithm is widely used in network routing protocols, most notably IS-IS and OSPF (Open Shortest Path First). Given a graph G and a source node A, the algorithm is used to find the shortest path (one having the lowest cost) between A (source node) and every other node. Moreover, Dijkstra’s algorithm is also used for finding the costs of the shortest paths from a source node to a Edsger Wybe Dijkstra destination node. 1930 - 2002 Algorithm Dijkstra’s algorithm is used to find the length of an optimal path between two nodes in a graph. The term optimal can mean anything, shortest, cheapest, or fastest. If we start the algorithm with an initial node, then the distance of a node Y can be given as the distance from the initial node to that node. Difference between Dijkstra’s Algorithm and Minimum Spanning Tree Minimum spanning tree algorithm is used to traverse a graph in the most efficient manner Dijkstra’s algorithm calculates the distance from a given vertex to every other vertex in the graph. Dijkstra’s algorithm is very similar to Prim’s algorithm. Both the algorithms begin at a specific node and extend outward within the graph, until all other nodes in the graph have been reached. The point where these algorithms differ is that while Prim’s algorithm stores a minimum cost edge, Dijkstra’s algorithm stores the total cost from a source node to the current node. Moreover, Dijkstra’s algorithm is used to store the summation of minimum cost edges, while Prim’s algorithm stores at most one minimum cost edge. Review Question Consider the graph given below. Find the minimum spanning tree of this graph using (a) Prim’s algorithm, (b)(b) Kruskal’s algorithm, and (c) Dijkstra’s algorithm.