M269 CHAPTER 21.pdf

CHAPTER 21 GRAPHS 2 This chapter further illustrates the versatility of graphs and how general abstract algorithms on graphs can be used to solve concrete practical problems. Section 17.1 on modelling with graphs mentioned that graphs can represent transport networks. These are usually connected: one can reach every node from any other node. However, that may not be the case after disruptions like road flooding and flight cancellations. In the extreme case, the network breaks down into separate, disconnected subnetworks and travel from one subnetwork to another becomes impossible. Sections 1 and 2 of this chapter are about the problem of finding the nodes that are in their own separate subgraph and therefore cut off from other nodes. Section 17.1 also noted that directed graphs can represent dependency networks, like which spreadsheet cells depend on which other ones. Section 3 of this chapter introduces an algorithm that determines the order in which cells must be evaluated so that no cell is evaluated before any cell it depends on. Section 17.1 included an example of a state transition graph for the Noughts and Crosses game. The nodes represent the states of the board; the edges represent the possible player moves. Section 4 of this chapter shows that problems that look complicated may be solved with a standard graph algorithm if we model those problems with state transition graphs. This chapter supports these learning outcomes: Develop and apply algorithms and data structures to solve computational problems – you will learn that some problems can be solved with general-purpose graph algorithms by transforming the input into a graph. Analyse the complexity of algorithms to support software design choices – you will be asked to determine the complexity of alternative algorithms. You should first read again the summaries of Chapters 17 and 18 to refresh your memory about graph-related concepts and algorithms needed for this chapter. Before starting to work on this chapter, check the M269 news and errata, and check the TMAs for what is assessed. 621 Algorithms, data structures and computability 21.1 Undirected graph components Sometimes we want to know if there’s a path between two given nodes, but the actual path, if it exists, is of little relevance. For example, if bad weather brings power cables down, we want to know for each residential or industrial area if it’s connected to a generator. We don’t need the path from the generator to the area: we only want to know whether it exists, to determine the cut-off areas. The more general problem is to compute, for a given graph, its various separate subgraphs of mutually reachable nodes. The problem is slightly different for undirected and directed graphs, so let’s first look at the former. Consider the following undirected graph. Figure 21.1.1 It has three subgraphs of connected nodes. One subgraph consists of nodes A, B, C and their edges, another of nodes D and E and their edge, and the third has node F by itself. Each of these subgraphs is called a connected component of the whole graph. A component of an undirected graph is a largest possible connected subgraph. For example, edge A–B and its nodes aren’t a component: while they form a connected subgraph, it isn’t as large as possible because we can add node C and edge A–C to get a larger connected subgraph. There are no further edges for nodes A, B and C, so they (and their edges) form a component: there’s no larger connected subgraph they’re part of. In other words, a connected component of an undirected graph is a largest set of nodes that are mutually reachable together with all their edges. For example, if the above graph had edge B–C, the component would be subgraph A–B–C–A: omitting any of the edges between those nodes would lead to a subgraph that isn’t the largest possible. 622 Chapter 21. Graphs 2 Algorithms, data structures and computability 21.1.1 Problem definition and instances The general problem we want to solve is: given an undirected graph and two nodes A and B, is there a path between them? Are they mutually reachable? In the power grid example, we want to know if electricity can flow from a generator A to a residential or industrial estate B and back. Since components are subgraphs of mutually reachable nodes, to answer the question we compute the components and then check if A and B are in the same component. For example, in the graph above, A and E are in different components, so they’re not mutually reachable. We need an ADT that associates each node with the component that it’s in: that’s the map ADT. The easiest way to label the components is to number them from 1 onwards. Here’s the precise problem definition. Function: connected components Inputs: graph, an undirected graph Preconditions: true Output: component, a map of objects to integers Postconditions: component maps the nodes of graph to the integers from 1 to the number of connected components in graph component(a) = component(b) if and only if nodes a and b are mutually reachable We need some problem instances for testing. Here’s the above graph. : %run -i../m269_digraph %run -i../m269_ungraph undirected = UndirectedGraph() for node in 'ABCDEF': undirected.add_node(node) undirected.add_edge('A', 'B') undirected.add_edge('A', 'C') undirected.add_edge('D', 'E') The edges cases are a graph with the fewest possible components and a graph with the most components. Describe two such graphs and how many components they have. A null graph (every node is isolated) has the most components: one per node. A connected graph has the fewest components: only one. We already have functions to generate null and connected graphs. We’ll use them later to create problem instances for testing. 21.1. Undirected graph components 623 Algorithms, data structures and computability 21.1.2 Algorithm and complexity The key idea to compute the components is that traversing an undirected graph from node A visits all nodes reachable from A, and therefore visits all nodes in the same component as A. To compute all components, we repeatedly traverse the graph from each node that hasn’t been assigned to a component yet. Each traversal adds the nodes it visits to a new component. Here’s an outline of the algorithm: Create an empty map. Initialise a component counter with 1. Go through each node in the graph. If the node is in the map, we already know its component, so do nothing. Otherwise, do any graph traversal from that node. Add all nodes returned by the traversal to the map, associated to the current component counter. Then increment the counter. After going through all nodes, return the map. The complexity can be analysed as follows. Remember that n and e refer to the number of nodes and edges in a graph. Checking if a node is in the map takes constant time if the map is implemented with a hash table. So checking all n nodes takes Θ(n). The counter increments take Θ(1) in the best case and Θ(n) in the worst case, because that’s the least and most number of components, as seen earlier. Each traversal only visits part of the graph, but together the traversals visit every node and edge once. They’re equivalent to a single traversal of the whole graph, which has complexity Θ(n + e) (Section 17.7.2). Considering only the fastest-growing term, we can say that the complexity of the algorithm is Θ(n + e). This is both the best- and worst-case complexity, because the whole graph is traversed to find all components. 21.1.3 Code and tests In translating the algorithm outline to Python we must remember that our traversal functions return the tree of all paths from the start node. We must add each tree node to the map. Any traversal will work. The code below uses depth-first search (DFS). You can replace it with breadth-first search (BFS) and confirm you get the same components. : # this code is also in m269_ungraph.py def connected_components(graph: UndirectedGraph) -> dict: """Return the connected components of graph. Postconditions: the output maps each node to its component, numbered from 1 onwards. """ component = dict() counter = 1 (continues on next page) 624 Chapter 21. Graphs 2 Algorithms, data structures and computability (continued from previous page) for node in graph.nodes(): if node not in component: tree = dfs(graph, node) for reached in tree.nodes(): component[reached] = counter counter = counter + 1 return component Let’s test the code with the example graph. : connected_components(undirected) : {'F': 1, 'E': 2, 'D': 2, 'B': 3, 'A': 3, 'C': 3} As expected, it finds the three components: nodes A, B and C, nodes D and E, and node F by itself. Let’s test with the edge cases. Remember that our generated graphs have nodes 0, 1, 2,... , n – 1. : %run -i../m269_graphs # most components: all nodes are isolated connected_components(null_graph(5)) : {0: 1, 1: 2, 2: 3, 3: 4, 4: 5} : # fewest components: a connected graph; could be cycle or complete␣ ˓→graph connected_components(path_graph(5)) : {0: 1, 1: 1, 2: 1, 3: 1, 4: 1} As expected, every node of the null graph is in a separate component and all nodes of a connected graph are in the same component. Exercise 21.1.1 This exercise asks you to apply your knowledge of how connected components are computed to create a bespoke algorithm for a particular problem. Consider again the power grid example. Implement the next function, which returns the set of nodes not connected to any power source node. : def disconnected(graph: UndirectedGraph, sources: set) -> set: """Return all nodes not connected to any of the sources. Preconditions: sources is a non-empty subset of the graph's nodes """ (continues on next page) 21.1. Undirected graph components 625 Algorithms, data structures and computability (continued from previous page) pass disconnected(undirected, {'A'}) # you should obtain {'D', 'E', 'F'} Hint Answer Exercise 21.1.2 The following is an exercise in modelling a situation. There’s no right or wrong answer. A government agency wants to know which train stations are critical: that is, which stations would cause the most disruption if, due to accident or incident, they had to be closed and no trains could start, terminate or pass through those stations. Given a connected undirected graph representing the train network, how would you define the critical nodes, based on the notion of components? Hint Answer 21.2 Directed graph components Contrary to undirected graphs, digraphs have two kinds of components. Consider the following example. Figure 21.2.1 If we ignore the edge directions then we get this undirected graph, with three connected components: Figure 21.2.2 626 Chapter 21. Graphs 2 Algorithms, data structures and computability The weakly connected components of a digraph G are the connected components of the undirected version of G. So, the above digraph has three weakly connected components: nodes A, B, C and their three directed edges; nodes D and E and their edge; node F. Nodes of weakly connected components are mutually reachable if we ignore the edge directions. By contrast, a strongly connected component is a largest subgraph where all nodes are mutually reachable when considering the edge directions. Nodes A, B and C are mutually reachable because their edges form a cycle. Each other node forms a strongly connected component by itself, because the other nodes can only reach themselves, via paths of length zero. To sum up, the above digraph has three weakly and four strongly connected components. Just to check your understanding, how many weakly and strongly connected components does digraph A−→B←−C have? It has one weakly connected component (the whole graph) and three strongly connected components (each node by itself). 21.2.1 Problem and instances Before we turn to the problem of computing components in digraphs, here’s the digraph in Figure 21.2.1, for testing. : %run -i../m269_digraph digraph = DiGraph() for node in 'ABCDEF': digraph.add_node(node) for edge in ('AB', 'BC', 'CA', 'DE'): digraph.add_edge(edge, edge) I will compute the strongly connected components and leave the weakly connected components to you. 21.2. Directed graph components 627 Algorithms, data structures and computability Exercise 21.2.1 Complete the following function. : def weakly_connected_components(graph: DiGraph) -> dict: """Return the weakly connected components of graph. Postconditions: the output maps each node to its component, numbered from 1 onwards. """ pass weakly_connected_components(digraph) You should obtain three components: A, B and C; D and E; F. Hint Answer 21.2.2 Algorithm and complexity The algorithm for the strongly connected components is similar to the algorithm for connected components in undirected graphs: for each node A that isn’t yet in the map, we find the nodes that are in the same component as A and add them to the map. Node B is in the same component as A if there’s a path from A to B and a path from B to A. If we compute the set of all nodes that have paths from A and the set of all nodes that have paths to A, then the intersection of both sets is the set of nodes in the component of A, because those nodes have a path from and to A. Traversal algorithms follow the directions of the edges and hence compute the paths from a given node A. One way to compute all paths to A is to reverse the direction of all edges to obtain the reverse graph, and then compute the paths from A in the reverse graph. As mentioned in Section 17.1, if a digraph represents the ‘follows’ relationship on Twitter, then its reverse graph represents the ‘is followed by’ relation. Here is the reverse graph of our example. Figure 21.2.3 628 Chapter 21. Graphs 2 Algorithms, data structures and computability Info: The reverse graph is also called the transpose graph because it can be obtained by transposing the adjacency matrix. If we traverse the original graph from A, we obtain the tree A−→B−→C and thereby the nodes that can be reached from A. If we traverse the reverse graph from A, we obtain the tree A−→C−→B, and thereby the nodes that can reach A in the original graph. Nodes A, B and C are in both trees, so they all can reach A and be reached from A, which means they form a strongly connected component. As a further example, if we traverse the original graph from D, we obtain tree D−→E. If we traverse the reverse graph from D, we obtain tree D (a single node). Both trees have only D in common, so it forms a strongly connected component by itself. To sum up: for each node V that isn’t yet in the map, we find which nodes are reachable from V (using the input graph) and from which nodes we can reach V (using the reverse graph). The nodes in both sets are by definition in the same strongly connected component as V. Here’s the algorithm, using again a depth-first traversal, but any kind of traversal will do. 1. let reversed be the reverse of graph 2. let component be an empty map 3. let current be 1 4. for each node in graph: 1. if node not in component: 1. let forward be the nodes of DFS(node, graph) 2. let backward be the nodes of DFS(node, reversed) 3. for each common in forward intersected with backward: 1. let component(common) be current 21.2. Directed graph components 629 Algorithms, data structures and computability 4. let current be current + 1 Step 1 always has complexity Θ(n + e). In the worst case: steps 4.1.1 to 4.1.4 are executed n times steps 4.1.1 and 4.1.2 visit together the whole graph in Θ(n + e) step 4.1.3 takes Θ(n) to compute the intersection of two node sets. The total complexity is Θ(n + e) + n × (Θ(n + e) + Θ(n)) = n × Θ(n + e) = Θ(n 2 + ne) by considering only the fastest growing part of the expression. In the best case, the loop executes only once, adding all nodes to the map. (This means the best-case scenario is when the graph has only one strongly connected component.) Replacing the worst-case n iterations with a best-case single iteration in the previous formula, we get the best-case complexity: Θ(n + e) + 1 × (Θ(n + e) + Θ(n)) = Θ(n + e). 21.2.3 Code and tests Here’s the code for reversing a digraph and computing its strongly connected components. : # this code is also in m269_digraph.py def reverse(graph: DiGraph) -> DiGraph: """Return the same graph but with edge directions reversed.""" reversed = DiGraph() for node in graph.nodes(): reversed.add_node(node) for edge in graph.edges(): reversed.add_edge(edge, edge) return reversed def strongly_connected_components(graph: DiGraph) -> dict: """Return the strongly connected components of graph. Postconditions: the output maps each node to its component, numbered from 1 onwards. """ reversed = reverse(graph) component = dict() counter = 1 for node in graph.nodes(): if node not in component: forward = dfs(graph, node).nodes() (continues on next page) 630 Chapter 21. Graphs 2 Algorithms, data structures and computability (continued from previous page) backward = dfs(reversed, node).nodes() for common in forward.intersection(backward): component[common] = counter counter = counter + 1 return component Let’s test the code with the example digraph. : strongly_connected_components(digraph) : {'D': 1, 'E': 2, 'F': 3, 'C': 4, 'A': 4, 'B': 4} As expected, nodes A, B, C are in one component and each other node is in its own component. 21.3 Topological sort I mentioned in Section 17.1 that digraphs can represent scheduling constraints: an edge A −→ B states that task or event A must occur before task or event B. The example was the evaluation of formulas in spreadsheet cells. Here again is the cell dependency digraph. If cell A2 changes, the spreadsheet must first re-evaluate B2 and only then C2 because C2 depends on A2 and B2. Figure 21.3.1 21.3.1 Problem Given a digraph, we want to know a possible schedule that indicates in which order to do the tasks or carry out the events represented by the nodes. Such a schedule is called a topological sort of the digraph: it’s a sequence of the graph’s nodes so that for every edge A−→B, node A appears before node B in the sequence. If you can lay out a digraph so that every edge points from left to right, then a topological sort is obtained by reading the nodes from left to right. The spreadsheet graph above is laid out from left to right and so has topological sort (A2, B2, C2). That’s the only possible topological sort. For example, sequence (A2, C2, B2) isn’t a topological sort because C2 comes before B2, contrary to the order imposed by edge B2−→C2. 21.3. Topological sort 631 Algorithms, data structures and computability Some digraphs have multiple topological sorts. For example, Figure 21.3.2 has topological sorts (A, B, C, D), (A, C, B, D), (C, A, B, D), etc. Only the node permutations where B appears before A or D appears before C, like (B, A, C, D) and (D, A, B, C), aren’t topological sorts. A cyclic digraph, like the following one, has no topological sort. Figure 21.3.3 Any ordering of the nodes will go against one of the edges. For example, (A, B, C) isn’t a topological sort because C must appear before A due to edge C−→A. No matter which node we choose to start the sequence, some other node must appear before it, so no topological sort is possible. There’s also a visual explanation. A graph with a cycle can’t be laid out with all edges pointing left to right, because there’s always a right-to-left edge to close the cycle. In summary, every acyclic digraph has at least one topological sort. We want to compute one of them, it doesn’t matter which. Function: topological sort Inputs: graph, a digraph Preconditions: graph is acyclic 632 Chapter 21. Graphs 2 Algorithms, data structures and computability Output: schedule, a sequence of objects Postconditions: schedule is a permutation of the graph’s nodes for every edge A−→B in graph, A appears before B in schedule Let’s construct the spreadsheet graph for testing. : %run -i../m269_digraph spreadsheet = DiGraph() for node in ('A2', 'B2', 'C2'): spreadsheet.add_node(node) spreadsheet.add_edge('A2', 'B2') spreadsheet.add_edge('A2', 'C2') spreadsheet.add_edge('B2', 'C2') 21.3.2 Algorithm and code The key idea to obtain a topological sort is that the first node we visit (the first task we schedule) must not have incoming edges, otherwise it would have to come after some other node. Let’s call the first visited node V. V has no incoming edges but it may have an outgoing edge V−→B. Since V was visited first, it will come before all other nodes. The ordering imposed by edge V−→B is therefore satisfied. If we remove all outgoing edges from V, to ‘discharge’ those order constraints, nodes that only depend on V will have no incoming edge anymore and can be visited next. The algorithm thus proceeds by visiting and removing nodes with in-degree zero: those nodes depend on no other node and can be scheduled next. Here’s the outline of what’s known as Kahn’s algorithm: Create an empty sequence. While there’s a node with in-degree zero, remove it from the graph and append it to the sequence. When the while-loop ends, return the sequence. It’s a greedy algorithm: at each step it chooses one of the ‘best’ remaining nodes – those without incoming edges. Let’s see the algorithm in action on the example graph. The next figure shows from left to right how each iteration removes one node and appends it to the sequence. We start with a digraph and the empty sequence. We finish with the empty graph and a topological sort. The numbers next to the nodes are their in-degrees. Figure 21.3.4 21.3. Topological sort 633 Algorithms, data structures and computability This version of the algorithm is not very good as it destroys the input graph. We can instead simulate the removal of nodes and how the in-degrees change. We store the initial in-degree of each node and simulate the removal of an edge A−→B by decrementing the in-degree value of B. Here’s the new version of the algorithm applied to the example graph. This version only changes the in-degree values associated to the nodes: it doesn’t modify the graph. For example, the removal of A2 and its edges is simulated by decrementing the in-degrees of B2 and C2. Figure 21.3.5 Kahn’s algorithm can be reformulated as follows: Create an empty sequence. Compute and store the in-degree of each node. Put all zero-degree nodes in a collection of nodes to visit. While the collection isn’t empty, remove one node from it, append the node to the sequence and decrement the in-degree of the node’s out-neighbours. If an out-neighbour’s degree becomes zero, add that out-neighbour to the nodes to visit. When the while-loop ends, return the sequence. The collection of nodes to visit can be a set, queue or stack. The latter is the most efficient in terms of memory and run-time. : # this code is also in m269_digraph.py def topological_sort(graph: DiGraph) -> list: """Return a topological sort of graph. Preconditions: graph is acyclic Postconditions: - the output is a permutation of the graph's nodes - for every edge A -> B, node A appears before B in the output """ schedule = [] # compute the initial in-degrees indegree = dict() for node in graph.nodes(): indegree[node] = 0 (continues on next page) 634 Chapter 21. Graphs 2 Algorithms, data structures and computability (continued from previous page) for edge in graph.edges(): indegree[edge] = indegree[edge] + 1 # compute the nodes that can be visited first to_visit = [] # use a list as a stack for node in graph.nodes(): if indegree[node] == 0: to_visit.append(node) while len(to_visit) > 0: visited = to_visit.pop() schedule.append(visited) # simulate the removal of the visited node for neighbour in graph.neighbours(visited): indegree[neighbour] = indegree[neighbour] - 1 if indegree[neighbour] == 0: to_visit.append(neighbour) return schedule : topological_sort(spreadsheet) : ['A2', 'B2', 'C2'] 21.3.3 Complexity The algorithm always goes through the whole graph and adds all nodes to the output sequence, so there’s no best- or worst-case scenario. The complexity can be broken down as follows: Go through the graph to construct the in-degree map: Θ(n + e). Do a linear search for the nodes with in-degree zero: Θ(n). Add each node to the to_visit set, remove it from the set and append it to the sequence: Θ(n). For each node, go through its out-neighbours: Θ(e), because visiting all neighbours of all nodes goes through all edges. The complexity is given by the fastest-growing term: Θ(n + e). 21.3. Topological sort 635 Algorithms, data structures and computability 21.3.4 Exercises The following exercises show a different application of Kahn’s algorithm and ask you to consider the efficiency of alternative algorithms. Exercise 21.3.1 Alice remembers that the DiGraph class has a method to compute the in-degree. She simplifies the topological_sort code as follows. (The rest of the function remains the same.) indegree = dict() to_visit = set() for node in graph.nodes(): indegree[node] = graph.in_degree(node) if indegree[node] == 0: to_visit.add(node) Is this more efficient than the original, unmodified code? Hint Answer Exercise 21.3.2 What happens if we ignore the preconditions of topological_sort and provide as input a cyclic digraph? Does the function stop with an error? Does it enter an infinite loop? If neither of those cases happen, what is the output? Hint Answer Exercise 21.3.3 1. Based on the previous exercise, write and test a function that checks if a digraph is cyclic. : %run -i../m269_util def is_cyclic(graph: DiGraph) -> bool: """Return True if and only if the graph has a cycle.""" pass digraph = DiGraph() # from Section 21.2.1 for node in 'ABCDEF': digraph.add_node(node) for edge in ('AB', 'BC', 'CA', 'DE'): # cycle A -> B -> C -> A digraph.add_edge(edge, edge) (continues on next page) 636 Chapter 21. Graphs 2 Algorithms, data structures and computability (continued from previous page) is_cyclic_tests = [ # case, graph, is cyclic? ('has cycle', digraph, True), ('no cycle', spreadsheet, False) ] test(is_cyclic, is_cyclic_tests) 2. What’s the worst-case scenario for your algorithm to decide whether a digraph is cyclic? What’s the worst-case complexity? Hint Answer Exercise 21.3.4 Looking at the tests above, Bob realises a digraph is cyclic if and only if it has a strongly connected component with two or more nodes. He adapts the algorithm for computing the strongly connected components as follows: When computing the intersection of the forward and backward sets of nodes, check if the intersection’s size is larger than 1. If so, immediately return true: the digraph is cyclic. Otherwise continue the algorithm as normal. Return false after the loop goes through all nodes: the digraph is acyclic because each component has one node only. 1. Explain why a cyclic digraph has a strongly connected component with more than one node. 2. What’s the worst-case complexity of Bob’s algorithm? Is it worth using his algorithm instead of Kahn’s to check for cycles? Hint Answer 21.4 State graphs As I mentioned in Section 17.1, directed graphs can represent states and transitions between states. The example I gave was the states of the board during a game of Noughts and Crosses (also known as Tic-tac-toe). The transitions between states are the player moves. This section shows an example of a problem that can be easily solved if we model it with a state transition graph. 21.4. State graphs 637 Algorithms, data structures and computability 21.4.1 Problem The problem to solve is similar to the one about a knight moving on a chessboard, but this time using a rook – a chess piece that moves only horizontally or vertically. We’re given a rectangular board of squares, a start square and an end square. We want to find the fewest moves for the rook to go from the start to the end square, with the proviso that moves have to successively be 1, 2, 3, 1, 2, 3, 1,... squares long until the end square is reached. This means that the first move goes to an adjacent square, the second move jumps over one square, the third move jumps over two, the fourth move goes again to an adjacent square, etc. If there’s a path (a sequence of moves) from the start to the end that stays within the board, then output the length of the path (the number of moves), otherwise output –1. The next figure shows a 4×4 board, with a rook symbol on the start square and an E marking the end square. For this input, the output should be four. A path of length four goes first one square right, then two squares right, then three squares down and finally one square left. Figure 21.4.1 A path that first goes two squares to the right and then three down is shorter, but it’s not a valid path because it doesn’t start with a move of one square. To check your understanding of the movement, find another shortest valid path from the start to the end, also in four moves. Move one square down, then two squares down, then three to the right and finally one to the left. As usual, I first construct some test cases. The inputs and output are exactly as for the knight moves problem: three input pairs of integers indicating the size of the board and the 638 Chapter 21. Graphs 2 Algorithms, data structures and computability coordinates of the start and end squares, and one output integer indicating the shortest path length. : rook_moves_tests = [ # case, size, start, end, moves ('1x1 board', (1, 1), (0, 0), (0, 0), 0), ('1 row, 2 cols', (1, 2), (0, 0), (0, 1), 1), ('start = end', (3, 3), (1, 1), (1, 1), 0), ('figure example', (4, 4), (0, 0), (3, 2), 4), ('2 away', (4, 4), (0, 0), (0, 2), -1) ] Info: This is a simplification of problem Eternal Truths from the 2004 Portuguese University Programming Contest. 21.4.2 Graph The problem asks for the fewest moves. As for the knight moving on a chessboard, this seems to be a shortest path problem on an undirected graph, with one node per square. However, the distance of the move changes in each step, so the neighbours of each square are constantly changing. It seems we need three graphs instead of one. Here they are for the example board above. In the left-hand graph, the edges connect the adjacent squares. In the middle graph, the edges connect nodes that are two squares away. In the right-hand graph, the edges connect nodes that are three squares away. Figure 21.4.2 The algorithm would be a modified breadth-first search that is constantly switching between graphs, because the first, fourth, seventh,... moves are done on the left-hand graph, the second, fifth, eighth,... moves are done on the middle graph, the third, sixth, ninth,... moves are done on the right-hand graph. This sounds too complicated and error-prone to me. We need an approach that uses a single, unchanging graph, because all our graph algorithms work on such graphs, not on graphs where the edges are changing as the algorithm progresses. 21.4. State graphs 639 Algorithms, data structures and computability The solution is to define a graph that represents the possible states of the rook. As breadth-first search explores the paths from the start node, it needs to know which square the rook is on and what move it can do next. The state of the rook is its current position and the distance of the next move. For each square S we need three nodes (S, 1), (S, 2) and (S, 3) that represent the three possible states for when the rook is in that square: it can next move by one, two or three squares. The graph has one edge (A, 1) −→ (B, 2) for each square B that is adjacent to square A. The edges state that the rook can move from any square to any adjacent square if the next move is by one square. Once it does the move, the rook is in a state where it next moves by two squares. Likewise there are edges (A, 2) −→ (B, 3) for each position B that is two squares away from A (A, 3) −→ (B, 1) for each position B that is three squares away from A. Here’s the state transition graph for a 1×4 board: a single row of four squares. Figure 21.4.3 The layout of the edges shows when the rook can move left or right. The first move, from (A, 1) to (B, 2), goes to the square left or right of A, when possible. The second move, from (A, 2) to (B, 3), goes left or right by two squares when possible. Finally, the third move, from (A, 3) to (B, 1), is only possible from the left-most to the right-most square and vice versa. Having constructed this graph, we apply BFS to find the shortest path from (start, 1) to (end, move) where move can be any value. Once we reach the end square, we don’t really care what the next move should be. 640 Chapter 21. Graphs 2 Algorithms, data structures and computability 21.4.3 Code First, I construct the state transition graph. : %run -i../m269_digraph def state_transitions(size: tuple) -> DiGraph: """Return the state transition graph for a board of the given size. Preconditions: size is a pair of positive integers, the number of␣ ˓→rows and columns """ rows = size columns = size states = DiGraph() # add nodes (S, 1), (S, 2), (S, 3) for every square S for row in range(rows): for column in range(columns): for move in (1, 2, 3): states.add_node( ((row, column), move) ) # add edges for state in states.nodes(): position = state distance = state row = position column = position next_distance = distance % 3 + 1 # 1 -> 2 -> 3 -> 1 ->... # generate the 4 possible moves: up, left, down, right for move in (-distance, distance): # do vertical move if it stays within board if 0 0: to_visit = unprocessed.popleft() current = to_visit length = to_visit if current == end: # change 3 return length elif current not in visited: visited.add(current) for neighbour in graph.out_neighbours(current): unprocessed.append( (neighbour, length+1) ) # change 4 return -1 The changes were as follows, besides the trivial modifications to the header and docstring. 1. Replace the code that creates the graph for the knight’s moves with a call to state_transitions. 2. The initial node is (start, 1) instead of start because nodes now represent states, not squares. 3. For the same reason, extract the square from the current node before comparing it to the end square. 4. Further simplify the BFS algorithm, which I could have already done for the knight moves problem. Instead of adding to the queue edge (A, B) with the length of the path to B, I just add B and the length, because the shortest path is not asked for, only its length. Finally, let’s run the code on the test table created at the start. : %run -i../m269_util test(rook_moves, rook_moves_tests) Tests finished. The moral of this and similar problems is: Note: Instead of inventing a new graph algorithm, model the problem with a graph that allows you to apply or adapt a standard graph algorithm. 21.4. State graphs 643 Algorithms, data structures and computability Nodes can represent anything, including places, tasks, events, states. Edges may be weighted and directed. This gives graphs a great modelling power. Once we represent the input as a graph, we can often solve the problem with a standard graph algorithm or some small adaptation of it, as done above, because many graph problems fall into one of a small number of categories: find a shortest path, a minimum spanning tree, a topological sort or the graph’s components. 21.4.4 Complexity The complexity of this kind of approach (transform the input into a graph and apply a graph algorithm) is Θ(n + e) to construct the graph plus whatever the complexity of the graph algorithm is. For this problem, the algorithm used is BFS so the overall complexity is Θ(n + e) + Θ(n + e) = Θ(n + e). However, the complexity must be stated in terms of the input, not of the constructed graph. We must determine the number of nodes and edges in terms of the input variables, and restate the complexity in those terms. Exercise 21.4.1 The main input variable is the size of the board: the number of rows r and the number of columns c. 1. How many nodes does the state transition graph have, in terms of r and c? In other words, give an expression for n, using r and c. 2. How many edges does the state transition graph have at most? State e as an expression in terms of r and c. 3. Using the previous expressions, give the complexity Θ(n + e) in terms of r and c. Hint Answer 21.5 Practice This section is for you to practice solving problems by converting the input to a graph and then applying a graph algorithm. I present some guiding questions to help you figure out what kind of graph and what algorithm are needed to solve such problems. I first recap the solution for the rook’s moves problem of the previous section, but now following the guiding questions. Then I present a new problem for you to solve, following the same questions. 644 Chapter 21. Graphs 2 Algorithms, data structures and computability 21.5.1 Rook’s moves The problem asked for the fewest vertical and horizontal rook moves from a start square to an end square, but the distance of each move varies. The output must be –1 if the end square can’t be reached. To obtain a solution, I had to find answers to the following questions. What graph problem is this problem most similar to? The graph problems seen so far are: finding a shortest path, a shortest tour, a minimum spanning tree, all components, a topological sort and determining if a digraph is cyclic. Finding the least number of moves suggests a shortest path problem. What algorithm and graph are needed to solve that graph problem? The shortest path problem can be solved with a breadth-first search if the graph isn’t weighted and with Dijkstra’s algorithm if it is, provided no weight is negative. Both algorithms work on directed and undirected graphs, so for this problem thinking about what algorithm is needed has unfortunately not helped restricting the kind of graph. What graph can be constructed from the input? In particular: – What do the nodes represent? Are they places, states, tasks, events,... ? – What do the edges represent? When is there an edge between nodes A and B? – Are the edges directed? What do the directions represent? – Are the edges weighted? Do weights represent distance, time, cost,... ? The problem is about something (a rook) moving in a space (a board). This suggests nodes represent places (squares of the board) and edges connect nodes that can be reached from each other. However, each move depends on past moves, e.g. the rook can move to an adjacent square if it’s the first move or the previous move was three squares. To know which move the rook can make at any point, we must know its state: the square it’s on and how far it can move next. We therefore need a directed state transition graph. The answers to the above questions are as follows. The graph is directed but not weighted. The nodes represent the rook’s possible states and the edges the state transitions. The nodes are pairs (s, d) for each square s and distance d = 1, 2, 3. There’s an edge (A, dA) −→ (B, dB) for each pair of squares A and B that are dA squares away from each other. If dA = 3, then dB = 1, otherwise dB = dA + 1. Which standard graph algorithm can be used to produce the output? Which modifications, if any, are needed? The graph algorithms seen so far are: BFS, DFS, Dijkstra’s, Prim’s and Kahn’s algorithms. The graph isn’t weighted, so we can use BFS to find the shortest path from the start square to the end square. The rook begins in state (start, 1) but it can finish in one of three states: (end, 1), (end, 2), (end, 3). We must modify the BFS algorithm to stop at any of those nodes. We must also modify it so that instead of returning the tree traversed from the start node it returns the length of the path to the target node first reached (or –1 if there’s no path). 21.5. Practice 645 Algorithms, data structures and computability Now that you’ve seen how this set of questions led me to the solution of the rook’s moves problem, here’s a problem for you to solve, following the same questions. 21.5.2 Islands This problem is about counting islands on a satellite image. The image was divided into small squares with the same area, say 10×10 metres. Each square is classified as water if most of its pixels are blue, otherwise as land. The result can be stored as a grid of Booleans but we’ll use a more visual representation: letter L for land and space for water. The problem’s input is a list of strings of the same length, like this: : image = [ 'LLL L ', 'L L L', 'L LL L', ' LLL L' ] Two land squares belong to the same island if they’re vertically or horizontally adjacent. The example has three islands: 1. a big island in the west, with 11 land squares surrounding a lake 2. a 3-square north–south island in the east 3. a single-square island in the north-east. Given such a list of strings, we want to know how many islands there are. Info: This is LeetCode problem 200 with a different input format. Exercise 21.5.1 What graph problem is this problem most similar to? Hint Answer Exercise 21.5.2 What algorithm and graph are needed to solve that graph problem? Hint Answer 646 Chapter 21. Graphs 2 Algorithms, data structures and computability Exercise 21.5.3 What graph can be constructed from the input? In particular: What do the nodes represent? Are they places, states, tasks, events,... ? What do the edges represent? When is there an edge between nodes A and B? Are the edges directed? What do the directions represent? Are the edges weighted? Do weights represent distance, time, cost,... ? Hint Answer Exercise 21.5.4 Which standard graph algorithm can be used to produce the output? Which modifications, if any, are needed? Hint Answer Exercise 21.5.5 (optional) Implement the outlined solution, i.e. write code that takes a list of equal-length strings of Ls and spaces, and returns the number of islands. The code first constructs the graph you defined in the third exercise and then applies the algorithm you defined in the previous exercise. 21.6 Summary In this chapter you’ve seen further practical problems on graphs: Compute the components of a graph to find the disconnected parts of a network. Compute a schedule that is compatible with the precedence given by the edges. Check if a digraph has a cycle. You’ve also seen that finding a suitable graph representation can make problems easier to solve than designing bespoke algorithms. The chapter introduced the following concepts and algorithms. A connected component of an undirected graph is a largest possible subgraph of mutually reachable nodes. These components can be found with repeated traversals of any kind. Each traversal from a node A finds all other nodes in the same component as A. The complexity is linear in the size of the graph: Θ(n + e). A weakly connected component of a digraph is a largest possible subgraph of mutually reachable nodes, if we ignore the direction of edges. These components can be found by computing the connected components of the undirected version of the digraph. The complexity is also Θ(n + e). 21.6. Summary 647 Algorithms, data structures and computability A strongly connected component of a digraph is a largest possible subgraph of mutually reachable nodes. These components can be found with repeated traversals of the original graph and its reverse. The worst-case complexity is Θ(n×(n+e)). The reverse graph of a digraph has the same nodes and edges but with the directions reversed. The reverse graph represents the inverse relation of the original graph, e.g. ‘A follows B’ becomes ‘B is followed by A’. The reverse graph is computed in Θ(n + e) time. A topological sort of a directed acyclic graph (DAG) is a permutation of its nodes so that for every edge A−→B, node A comes before node B in the permutation. A DAG has one or more topological sorts. Cyclic digraphs have no topological sort. Kahn’s algorithm computes a topological sort in a greedy fashion. In each iteration it (virtually) removes one node with in-degree zero from the digraph and appends it to an initially empty sequence. The complexity is Θ(n + e). If the digraph has a cycle, Kahn’s algorithm returns a sequence without all the graph’s nodes, thus making it easy to detect cycles in digraphs. 648 Chapter 21. Graphs 2

M269 CHAPTER 21.pdf

Document Details

Tags

Related

Full Transcript

Upgrade to continue