106106131.pdf
Document Details
Uploaded by IncredibleCalifornium3354
Indian Institute of Information Technology, Sonepat
Full Transcript
Design and Analysis of Algorithms Prof. Madhavan Mukund Computer Science andEngineering IIT Madras INDEX S.NO TOPICS PAGE.NO Week 1 1 Course Outline...
Design and Analysis of Algorithms Prof. Madhavan Mukund Computer Science andEngineering IIT Madras INDEX S.NO TOPICS PAGE.NO Week 1 1 Course Outline 4 2 Example: Air Travel 12 3 Example: Xerox shop 21 4 Example: Document similarity 27 5 Introduction and motivation 34 6 Input size, worst case, average case 47 7 Quantifying efficiency: O( ), Omega( ), Theta( ) 55 8 Examples: Analysis of iterative and recursive algorithms 67 Week 2 9 Arrays and lists 78 10 Searching in an array 83 11 Selection Sort 90 12 Insertion sort 101 13 Merge sort 113 14 Merge sort - analysis 124 15 Quicksort 134 16 Quicksort - analysis 144 17 Sorting - Concluding remarks 153 Week 3 18 Introduction to graphs 159 19 Representing graphs 169 20 Breadth first search (BFS) 180 21 Depth first search (DFS) 197 22 Applications of BFS and DFS 206 1 23 Directed acyclic graphs: topological sort 222 24 Directed acyclic graphs: longest paths 242 Week 4 25 Single source shortest paths: Dijkstra's algorithm 256 26 Dijkstra's algorithm: analysis 271 27 Negative edge weights: Bellman-Ford algorithm 283 28 All pairs shortest paths 295 29 Minimum Cost Spanning Trees 308 30 Prims Algorithm 320 31 Kruskal's algorithm 341 Week 5 32 Union-Find using arrays 354 33 Union-Find using pointers 369 34 Priority queues 383 35 Heaps 391 36 Heaps: Updating values, sorting 416 37 Counting inversions 429 38 Closest pair of points 442 Week 6 39 Binary Search Trees 454 40 Balanced search trees 491 41 Interval scheduling 511 42 Scheduling with deadlines: minimizing lateness 526 43 Huffman codes 539 Week 7 44 Introduction to dynamic programming 562 45 Memoization 576 46 Grid Paths 591 47 Common subwords and subsequences 611 2 48 Edit distance 632 49 Matrix multiplication 642 Week 8 50 Linear Programming 653 51 LP modeling: Production Planning 668 52 LP modeling: Bandwidth allocation 678 53 Network Flows 687 54 Reductions 703 55 Checking Algorithms 712 56 P and NP 729 3 Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute Week - 01 Module - 01 Lecture - 01 (Refer Slide Time: 00:06) So, welcome to the NPTEL MOOC on the design and the analysis of algorithms. So, here are some of the things that we would be looking at in this course. When we study algorithms, the first thing that we need to convince our selves is that the algorithm is correct and it is doing the job that we expected. So, we look at the strategies for proving the correctness of algorithms. The other important aspect of algorithm is of course its efficiency. How much time does the algorithms take on inputs? Now, of course we have to factor in the size of the input. And we need a notation are a way of comparing two different algorithms, which operates on the same types of inputs and produce the same type of outputs. So, this will be achieved through the concept called asymptotic complexity, which measures the running time of an algorithm as inputs grow larger and larger as the function of the inputs. And we will develop some notation, typically the big O notation in order to smoothen out some ((Refer Time: 01:00)) of algorithms and group them into large chunks, which are equivalent. 4 An important part of problem solving in any domain and in particular algorithms is the art of modelling the problem at a suitable level of detail. In most algorithms that we will see we need to find a suitable mathematical model. One of these will be graphs. We need a way of representing the concepts of these models in our algorithm. For this, we need appropriate data structures. And of course typically in order to solve a problem we need to break it down into manageable sub problems. So, we will look at strategies to decompose problems into smaller problems and see how to put them together to solve the problem it has. Over the course of time, many generic techniques have been developed to solve the large number of problems. And we will see examples of how these techniques can be applied to very standard problems that we come across repeatedly. Among the techniques are divide and conquer. Where, we break up the problem into individual components which do not overlap with each other and then combine these solutions in order to get the solution for the overall problems. In some cases, we can identify a strategy which looks at the local state of the problem and chooses an optimal path and arrives at the final solution without having to look at all possibilities. These kind of greedy algorithms are there. It is important to know how to prove such an algorithms correct. But if a greedy algorithms does exists, it is typically much more efficient than other types of algorithms. When greedy does not work, we need a systematic way of exploring all the possibilities and choosing the best one. In this process, sometimes we have to discard overlapping problems and make sure that we do not waste fully recomputed things. So, this is covered by the concept of dynamic programming. 5 (Refer Slide Time: 03:06) Now along with the theory, in this course we will have some programming assignments. So, we do expect that all of you have some background in programming. This could be in C or C plus plus or java. We are flexible about the language we use, but you should be able to standard programs and implement some of the basic algorithms that we are using in our course. In order to do this, we will of course cover some new data structures in this course. But we do expect that in the language that you use, you are familiar with basic concepts like arrays and lists and also things like stacks and queues, which build up on these. Of course, when we come to stacks and queues we will try to give some detail about how they are used. But we do expect that you have seen them before. So, this not the first time you are seeing these concepts. 6 (Refer Slide Time: 03:54) Here is a kind of approximate list of the topics we expect to cover in the course. So, after looking at a few examples we will start with asymptotic complexity, which is the way of measuring the efficiency of algorithms and writing down this measure in a way that we can compare easily across algorithms. We will then move to the most basic problem that we have for arrays, which is to search an array for an element. And in the process we will realize that it is important to be able to sort the array efficiently in order to arrange the elements in a way where we can search in an effective manner. So, we will look at searching and sorting. So, we will look at binary search and we will look at different notions of sorting. Some of which are more elementary and intuitive like insertion sort and selection sort. But I am not the best in terms the efficiency. And then we will look at more efficient algorithms like merge sort, quick sort; which are not obvious to begin with. Moving on from searching and sorting, we come to graphs and graph algorithms. So, we will introduce graphs. We will see how we can use them to model certain types of problems. We need to know how to represent a graph. How do you? Graph is essentially a picture, so how do you translate this picture into something that an algorithm can manipulate? So, that is representation. We will look at standard problems in graphs; reachability and connectedness. We will look at a special class of graphs called directed acyclic graphs, which are very useful for modelling certain kinds of problems. And then 7 we will look at other canonical problems on graphs including shortest paths and spanning trees. (Refer Slide Time: 05:31) As we mentioned before, there are some basic algorithmic design techniques; which are applied across a class of problems. So, we will look in particular divide and conquer, greedy algorithms and dynamic programming. Among the data structures that we will encounter in this course are priority queues, which are often implemented to heaps. You will look at binary search trees, which are efficient ways of maintaining information in a sorted order dynamically as information comes and goes. And we will look at a very useful data structures to maintain a partition of the set into a disjoint collection of subsets; the so called union-find algorithm. And finally, depending on how much time is left we will look at some miscellaneous topics. It is important to realize that not every algorithm admits an efficient solution. 8 (Refer Slide Time: 06:34) So, we will look at problem are intractability and some provably hard problems and some other problems of which this state is unknown. This is an eight week course. The tentative schedule is as follows: in the first week, we will do some motivating examples and we will look at some notations and work out some problems involving asymptotic complexity. In the second week, we will move on to searching and sorting. In the third week, we will introduce graphs and look at basic graph algorithms. We will continue with more graph algorithms. And then in the process, introduce the disjoint set data structures. We will then formally look at divide and conquer, which you have already seen along the way. But we will look at it again. And then we will introduce heaps. In the sixth week, we will do search trees and some examples of greedy algorithms and their proofs. In the seventh week, we will be devoted to dynamic programming. And then the last week we will cover miscellaneous topics. Now, you should remember that this is only an approximate schedule and there will be some variations. But roughly this is the sequence and the speed at which you planned to cover the material in the course. You can also see alongside the weeks. The calendar weeks that these correspond to. So, to the first week of the course is the current week which is January fifth to ninth. And this course will go on to the last week of February. 9 (Refer Slide Time: 07:53) Now as part of the evaluation for the course, there will be continuous evaluations. Every week, there will be quizzes. You will also be programming assignments; roughly six programming assignments across the eight weeks. After the course ends, there will be a certification exam. In order to get a certificate that you successfully completed this course, you need to score 60 percent in the quizzes and in the certification exam. You need to submit at least five of six; out of the six assignments. And at least, four of them you must do something ((Refer Time: 08:27)) review. (Refer Slide Time: 08:30) 10 We will be following two main text books. Although, I will not use the text books directly in the course, but if you want to look at some materials you will find them in these two books. The first book is called “Algorithm design” by Jon Kleinberg and Eva Tardos. And the second book is just called “Algorithms” by Sanjay Dasgupta, Christos Papadimitriou and Umesh Vazirani. The book by Kleinberg and tardos is more detail and it has a number of really nice examples to illustrate many of the concepts and quite detail proofs. The book by Dasgupta Papadimitriou and Vazirani is a slimmer book. It is more easy and accessible to read on your own. But on the other hand it does leave a lot of details as exercises. So, in order to really understand the material, you really need to spend more time working out the exercises and not just go by the materials that present in the chapters. 11 Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute Week - 01 Module - 02 Lecture - 02 So, before we start the formal part of the course, I would like to discuss a few examples to motivate the kind of problems we will be looking at. (Refer Slide Time: 00:10) So, we start with a problem of air travel. So, we have an airline; Barbet airlines, which serves several cities in the country. And of course, although it serves several cities, it does not really connect all these cities directly. Only some of the cities are connected by direct flights. And for other pair of cities you have to take a hopping flight. You have to go via an intermediate city. So, our first goal maybe to compute every pair of cities, which are actually connected by this network served by this airline. So, how do we find out all pairs of cities A, B? Such that A and B are connected by a sequence of flights. 12 (Refer Slide Time: 00:50) So, first we need to look at the network. So, this is a typical way that we might find the network. For example, if we open the in-flight magazine of an airline, you find a route map. And it is written like this. You have a physical map of the country and you have the cities which are served; marked out. There are about ten cities; Delhi, Varanasi, Ahmedabad, down to Trivandrum in the south. And you have some arrows indicating the flights. Now, some of these flights are in one direction. You can go from Delhi to Varanasi, but you cannot come back directly to Delhi. You must go to Ahmedabad and then come back. So, this is quite common. If you look at airline's schedules in airlines, you will find that there were these kind of triangular routes. Where you go around a triangle and you cannot go back directly without hopping in between. Some pairs of important cities; like in this case Mumbai and Delhi might be connected by flights in both directions or even Mumbai and Calcutta. And so now we have these ten cities. We want to know is it really possible go from say Varanasi to Trivandrum or is it not possible, is it possible to go from Hyderabad to Delhi or is not possible. So, our first step is to model this problem in such a way that we retain the essential details, which are relevant to solving the problem and get to do for the unnecessary details. 13 (Refer Slide Time: 02:03) So, in this case what we really would like to know is the structure of this network. Right. So, the map itself is not relevant. We just need to know how many cities are there, which are on the network and how are they connected by the flights. So, the picture below which has these gray circles and arrows represents this network. And these cities are the gray circles and the flights are the arrows and the arrow heads indicates the direction. So, if there is an arrow head in one direction, it is a one directional flight; if there is an arrow head in both ends, its means it is a bidirectional flight. The actual names of the cities I am not so relevant. So, we can call them 1, 2, 3, 4, 5 or a, b, c, d, e or whatever and solve the problem. So, this kind of a picture is called graph. We will study the graphs more formally when we come to this module in our course. But a graph is just a picture of this kind. So it has some nodes, this dots and edges. 14 (Refer Slide Time: 03:00) So, now one nice thing about moving to this abstract level is that the actual picture can be distorted without changing its meaning. So, we can move. For instance, if we look at this city here, right, we can move it to the right and it does not make any difference in terms of solving the problem. Or we could simplify the picture by moving, for instance, this edge on side, so that we get no crossing edges. And this is again the same picture. Though it looks quite different from this picture that we have started with, this is again the same network. Now, in some situations it is useful to realize that the graph that we have looks like this; that there are no crossing edges. Technically, such a graph is called a planar graph. It can be drawn on a flat piece of paper without any edges crossing. For planar graphs, we might have better algorithms and for arbitrary graphs. Now what do you want to do in such a graph? 15 (Refer Slide Time: 04:00) So, in this case we want to compute what we call a path. That is the sequence of edges going from one city to another city; where of course the direction must be correct. So, you cannot go backwards, across and edge which is flying from A to B. You cannot take this flight B to A, unless there is another flight. So, our first question is how do we take this picture and put it into form that we can manipulate using a program or an algorithm. So, we need a suitable data structure in order to represent this graph. Now given the way we represent the graph, we need to manipulate it to answer the question that we handle. In this case, connectivity. How do we go from A to B or can we go from A to B or which all cities B can I reach from A? So, how do we design such an algorithm, given the way we have represented these cities in the graph? Does it depends on the representation or there multiple representations, some of which gives us more or less efficient algorithm? These are all the questions that we need to answer before we can decide on whether we have got the best solution at act. Now, in terms of efficiency we have to look at what are the things that determine the complexity of a problem. It is fairly obvious in this particular case that if we have more cities, the problem is more complicated. So the number of cities, which we can call N, is certainly one parameter which determines how complicated the algorithm is going to be, 16 or not how complicated the algorithm is going to be, or rather how long it is going to take to run. The other question which determines how complex the network is this how many direct flights there are... Obviously there are fewer flights, there are fewer places, which can be connected and we have to explore fewer possibilities. So from this, it follows that computing the paths depends on both N and F. So, we will not have an algorithm which will always takes a twenty steps. It will have to depend some number of steps depending on N and F. Now what is this dependency, how does it grow? If N doubles, does our algorithm take two times more times? Does it takes four times more times? If N term grows to factor of ten, does it takes ten times more or hundred times more time? The other question related to this is given this dependency on N and F what realistic size of networks can we handle? If the airline grows to twenty flights, we will still have be able to give our answer in the reasonable time. Remember that this kind of an answer is typically required when somebody is making an online booking or something. And you will not allowed to reply in a few seconds, right, it is not enough to come back after an hour and say “yes, there is a flight from Trivandrum to Calcutta”. So, what is the limit of our efficiency? Can we scale this algorithm to cover airlines, multiple airlines? So, we have a website which actually sells. Across all airlines I can take you from place A to place B. That depends on how larger value of N and F we can handle. 17 (Refer Slide Time: 06:55) And then of course the problem that we have looked at is a very simple problem; can I get from A to B. But very often it is not good enough to get from A to B. You want to get from A to B within some reasonable time frame. For instance, it is not usually acceptable to break journey overnight on aircraft. At the same time, you also do not want to spend more than the certain amount of time meeting in between flights. So, there are only some connections. Although there may be; erratically there may be the connections, only some of them may actually feasible. So, now our problems becomes little more constraint. So, we do not just want to look at the connected paths from A to B. But connected paths A to B, which meet some additional constraints in terms of timing and other things. So, can we solve this problem with the same approach that we solve a simpler problem or we need to take a radically different approach or do we need more information in order to decide or solve the problem. 18 (Refer Slide Time: 07:53) Suppose, as you would you expect each sector on this thing has a cost. As a passenger, the cost would be the price of ticket. So, if you are trying to compute the best way to go from A to B, your motivation might be to choose the cheapest route in terms of the ticket cost. Of course cost is not only money, cost could be time as well. You might also want the quickest route from A to B, the one which involves the least waiting. So, it depends on what your priority is. Or you urgently required from where, in which case you do not mind paying more. Where, I am going for a vacation with my family; in which case, you have a relax time schedule, but you want to make sure you get a value from money. from the airlines point of view there may be other questions. Periodically aircraft have to be brought down for a day for maintenance. Now, you do not want to have so many aircrafts that you keep all the routes flying and wastefully keep planes unused. At the same time if you keep two few planes, then when you bring an aircraft down for maintenance you have to sacrifice some routes. Now, which routes should we sacrifice? So that, you ensure that the connectivity of the network remains the same. If you could go earlier from Trivandrum to Calcutta, during a maintenance shutdown you should still be able to go from Trivandrum to Calcutta; maybe by different. So, this is the problem to be addressed by the airlines stop; where as the cheapest route 19 might be a problem to be addressed by the customers. So, there are very many different points of questions you can ask about this basic air network that we have described using a graph. And we will see answers to some of these problems in this course. 20 Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute Week - 01 Module - 03 Lecture - 03 So, our first example was to do with airlines, scheduling, network and using the graph. Now, let us look at its different part. (Refer Slide Time: 00:08) So, suppose we have a photo copy shop, campus Xerox inside the university campus. The deadline for projects is approaching and bunch of student want their projects copied urgently. So, they all go and submit their reports to the photo copying shop, and say we want so many copies made within such a time. So, now the question for the shop is how best to schedule this job. Now, there are many different ways in which this problem can be phrased. So, let us look at one of them. 21 (Refer Slide Time: 00:42) So, suppose the students are submitted their jobs and this shop campus Xerox is competing against some rivals. So, it is offering a special deal. So, it says that it will give each customer a promise delivery time and if does not meet the schedule like a pizza shop, it will give a discount. So, I will promise your report within 6 hours and if you do not get within 6 hours, you pay less. Now, in this time frame there are of course some bigger jobs and some smaller jobs. So, some photo copying jobs will be finished faster, some will take longer, but at the same time they all have to run on the same machines that the Xerox shop has. So, now you can reorder things. So, you could take something which came later and put it earlier on the machine and hope to finish it within its deadline. Therefore, not to have to give discount and take something which is going to take longer and postpone it saying, anyway you are not going to meet the dead line and give up the discount on that job, right. So, the job, the problem the Xerox shop has is how to do this schedule, right. 22 (Refer Slide Time: 01:50) So, there is always at the background what is called group force approach. You can say now I have to allocate these photo copying jobs to the machines in some order. So, let me try every possible order and choose the one which gives me the best return. The problem with this is that it will take a very large amount of time to do this because number of possibilities is exponential. Even if you have just 30 requests pending, it could take several hours to find an optimized schedule, and that several hours might have gone ahead and done some work, so that you got the jobs done and perhaps not optimally at least got some money for it. So, here is where we come to the idea of decomposition, right. So, we can solve this problem by reducing it to a simpler problem. 23 (Refer Slide Time: 02:30) So, supposing we fix one job to run first. If we fix this job to run first, we are left to the remaining jobs of the remaining jobs are smaller in numbers. So, if there was a way to optimally solve for n minus 1 job, then we can pick each of the first jobs, each of the first jobs to be the first job and for each of them determine how much time it takes efficiently if we can do the remaining n minus 1 and choose the best one. (Refer Slide Time: 03:13) So, this would give us the kind of recursive solution, pick one and solve the rest and then add the time for this. Another option is to just come up with a strategy. Looking at all the 24 jobs which are yet to be done, we find some criteria by which we choose one to do next. Now, we could have different criteria for which we could choose the one to do next which has the least number of pages that you take the shortest time to process, or we could take the one to do next which is closest to it dead line that is one for which we are most likely to miss finishing it in time and having to give a discount. So, for each of these, we could have a strategy which would tell us which job to do next without looking at all the possibilities of doing the other jobs, but then we have to justify to the strategy that we have chosen is actually optimal. Is it better to choose shortest or (( )) earliest deadline or there is yet another criterion, and how do this criteria justify the choice. Will I always get the best possible return on machine by choosing this strategy. (Refer Slide Time: 04:17) Now, as we saw with the airline network problem, the basic problem has many different variations which are possible. For instance, if we assume that the shop has many photo copiers, it is reasonable to assume that some are new and some are old. So, some that are new one may work faster than the ones are that are old. Therefore, the time that will take to finish a job depends on which machine we put the job on. So, if you have this additional competition, thus the strategies that we chose for all types of machines that are uniform, still old, then we have to look at the different strategy. The other factor is of course that the cost of doing something varies across machines. So, if we use a machine; we use some resources, we use some ink, use paper, we use 25 electricity and this cost may vary from one machine to another. So, now the question becomes related to the first question, the previous question which is that now if I split my job across machines, it might not only take more or less time, it also may cost the shop more or less. So, the actual revenue of the shop realizes maybe more or less depending on which machine it chooses. The other thing that we might want to keep in mind is that a machine cannot run indefinitely without having to be stopped for some time for maybe some maintenance, for loading paper, for something. So, we cannot realistically assume that every machine is continuously available. (Refer Slide Time: 05:29) Now, under all these situations, it is still a valid greedy strategy or we have to do something else. So, you see the general idea. The general idea is there is a basic problem with some constraints which you want to solve, but that problem can be amplified or made more realistic by adding several new features. By the time you add a new feature, you have to see whether the solution that you have for the simpler problems still works or the new feature demands radically to new approach and if so how you should get that. 26 Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute Week - 01 Module - 01 Lecture - 04 (Refer Slide Time: 00:09) So far at final example before we delegate in this course, let us look at a problem involving documents. So, we have two documents and our goal is to find out how similar they are, right. So, these two documents really variations of the same field. Now, there may be many different scenarios where this problem is interesting. So, one question may be for plagiarism detection. So, it could be that somebody has forced to an article in a newspaper or on a website and you believe that this author has not really written the article themselves. They have copied these articles from somewhere else or if you are a teacher in a course, you might be worried that the student, two students have submitted the same assignments or one student has copied an assignment from some source, some detail. So, while looking at how similar, if you can measure or similar two documents are, you can try to quantify this notion that is somebody has copied from somebody else. Now, it may not always have a negative connotation like this. It might also be to look at some kind of things when some people are writing code typically writing programs for some application, over the period of time documents evolve with in this sense the 27 programs evolves, right. So, people add features. Now, you might want to look at two different pieces of code and try to figure out what are the changes that had happened. How similar they are, how different they are, what the actual changes that had happened. Another place where there is positive notion towards documents similarity is to look for web search. If you ask a question to a search engine and it reports results, typically it tries to group together result which is similar because they are not really different answers. Now, if there are 10 different copies or similar copies of a document saying more or less the same thing and these show up as your first 10 search results, then another document will be highly relevant and quite different from these will now be lost because it will be of the first page of searches. So, it is useful to be able to group together the results of a search query by similarity, so that the user is actually presented by an effective choice between different answers to the search query and not just the large number of variations of the same answers. (Refer Slide Time: 02:13) So, if this is our motivation, we need a way of comparing documents what is the good measure of similarity of documents. Now, there are many different notions that people have come up with. Obviously, it has to do something with the order towards and the choice of letters and so on, but one way of quantifying the distance looking to document is to use what is called the edit distance, namely how many changes to you have to make to transform one document in to another document, and the edit we mean supposing you 28 actually loaded the document in a text editor or word processor, what would be the kind of things that you could do when we could limit because you got of course block out and delete the entire document and then cut and paste another document and say edit in two steps, but this could be kind of cheating. So, we have to limit what operations you do, so that we have a uniform way of counting this. So, we could say that edit involves how many characters which changing. So, each step of editing will either add or remove a letter and perhaps you can allow you to replace one letter by another letter and call that one change. So, now we want to count these as are basic steps adding or removing the letter or replacing one letter by another, and find out how many steps it takes to edit one document make it to make it to another document. (Refer Slide Time: 03:42) So, the minimum number of edit operations will then return the distance. Now, the question that we have as an algorithm problem is how do compute this minimum distance, right. How do you decide what is the best way to edit one document and make it another document. Of course, there is always the trivial solution like that block cut and block space. You can just delete all the letters and then type in the new documents. So, there is a brute force way of doing it, but this is not likely to be the best possibility, right. So, you can also try out all possible delete and insert sequences and see which among them gives you the best solutions, but all of these are very inefficient kind of solutions. 29 (Refer Slide Time: 04:26) So, again we can go to this question is decomposing the problem. So, supposing out first goal is just make the first character of the two documents say, if they are already the same, we leave and go on. If they are not the same, well then we have two options, right. We can either transform the character, the first character to be equal or we can insert a character in one of the two documents. So, supposing the document, first document start with an x and the second document have the z. Either we can say we do one operation to make x into z or z into x or we can insert x before the z insert before the x or insert the z before the x, but then we do not necessarily get the same answer. Then, once we have done this, once we have made the first character the same, then we can recursively try to fix the rest of the document. 30 (Refer Slide Time: 05:27) So, now one of the difficulties we face when we do recursion in this manner is that the same sub-problem becomes up for solutions many times. So, a typical example of this is the recursive solution to finding the n Fibonacci. So, the Fibonacci numbers are defined, it is a very classical sequence. So, the first two Fibonacci numbers are 1 and 1. After this you get the next Fibonacci number adding the previous tools. So, after 1 and 1, the next one is 2 which is 1 plus 1. The next one is 3 which is 1 plus 2 and so on. 5 is 2 plus 3 and so on. So, in general the recursive relationship is given by the fact that f n is the sum of the previous two numbers, n minus 1 n minus 2, and then you have as a base case that the first two numbers f 1 and f 2 for which n minus 1 and n minus 2 may not be defined for these two numbers, the values 1. Now, the problem is that when you apply the recursion directly, so if try to compute the seventh Fibonacci number for example, it will say for this I need to compute f 6 plus f 5, but by the excursively apply this f 6 and f 5, we find things like f 4 coming up twice. So, we have an f 4 which comes here and then f 4 comes here because when I do f 5, I need to apply this written, and when I do f 6, I need to apply recursively. So, If I do it, I do f depth forth twice and in fact, I compute this f 5, I actually get another f 4. This f 4 I am computing a number of times, f 3 a number of times and so on. So, this is really an inefficient way to calculate it whereas, I just do it for in since here, I get 1 1 2 3 5 8 13 21 and I find that the seventh Fibonacci number is actually you sequence not use that it, right. So, there is intuitively a very fast way to do this. The recurrence is respected, but if 31 I do it recursively, I end up solving a lot of problems again and again. So, how do we get around it and this is what we call dynamic program. (Refer Slide Time: 07:17) So, dynamic programming says do not compute same sub-problems twice. Whenever we solve the problems, if have found f of 4, just look it up, store it somewhere, look it up and make sure that you do not do f 4 again. So, this is one of the techniques that we have seen in beginning as that we are going to do in this course, and it is important that when break-up problems into sub-problems, it is not always the case that sub problems can be solved efficiently unless we look slightly more deeply to this structure of sub-problem and make sure we solve them in an effective sequence. 32 (Refer Slide Time: 07:55) Now, as usual this problem of, the difference or similarity between two documents can be at many different levels. So, we are focused on the words, the actual text, but if we do not really look at the sequence of words, we just want to set of words, then we might the for apart in terms edit distance because we need to rearrange the words, but the content if you just measure in terms of what types of words are there, this might give us an accurate understanding of the meaning of the documents. So, if you actually search for a document in a typical search engine, you will often find that the words that you ask for may not occur together, may not occur in the sequence that you mention. It will find just document which have that collection of words. So, this is very useful for web search and the other thing that you might want to do is, measure similarity of words and terms of meanings. So, if you search for a document which contains the word car and there is another document which contains the words automobile, it might to be a good idea for the search engine to go to documents containing automobile, because automobile and car is essentially the same thing. So, like the other example you seen before, there can be variations on the problem and the solution you have for the regional problem, may or may not be valid for these variations. So, there is always a whole space of new and interesting problem to solve ((Refer Time: 09:25)). 33 Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute Week - 01 Module - 05 Lecture – 05 So, this course is called design and analysis of algorithms. So, design is something that is easier to understand. We have a problem, we want to find an algorithm, so we are trying to design an algorithm. But what exactly does analysis mean? (Refer Slide Time: 00:14) So, analysis means trying to estimate how efficient an algorithm is. Now, there are two fundamental parameters that are associated with this. One is time, how long does the algorithm take to execute on a given piece of hardware. Remember an algorithm will be executed as a program, so we will have to write it in some programming language and then run it on some particular machine. And when it runs, we have all sorts of intermediate variables that we need to keep track of in order to compute the answer. So, how much space does it take? How much memory does it require? 34 (Refer Slide Time: 00:47) Now, we will argue that, in this course at least, we will focus on time rather than space. One reason for this is, time is a rather more limiting parameter in terms of the hardware. It is not easy to take a computer and change its speed. So, if we are running algorithm on a particular platform, then we are more or less stuck with the performance that the platform can give us in term of speed. Memory, on the other hand, is something, which is relatively more flexible. We can add memory card and increase the memory. And so in a sense, space is a more flexible requirement and so we can think about it that way. But essentially, for this course we will be focusing on time more than space. 35 (Refer Slide Time: 01:32) So, if you are looking at time, we have to ask how we measure the running time. So, of course, we could run a program on a given computer and report the answer in seconds or in milliseconds, but the problem with this is, that the running time of an algorithm measured in terms of particular piece of hardware will, of course, not be a robust measure. We run it on different computer or we use a different programming language, we might find, that the same algorithm takes different amount of time. More importantly, if we are trying to compare two different algorithms, then if we run one algorithm on one computer, run the other on other computer, we might get a misleading comparison. So, instead of looking at the concrete running time in terms of some units of time, it is better to do it in terms of the some abstracts units of how many steps the algorithm takes. So, this means, that we have to decide what a notion of a step is. A step is some kind of a basic operation, a simple one step operation, that an algorithm performs and the notion of the step, of course, depends on what kind of language we are using. If we look at a very low-level, at an assembly language kind of thing, then the steps involves moving data from the main memory to register, moving it back, doing some arithmetic operation within the CPU and so on. But typically, we look at algorithms and we design them and we implement them at a higher level. We use programming languages such as C, C plus plus or Java where we have variables, we assign values to 36 variables, we compute expression and so on. So, for the most part we will look at basic operations as what we would consider single statement or steps in a high-level language. So, this could be an operation such as assigning a value of X equal to Y plus 1 or doing a comparison, if A less than B, then do. So, checking whether A is less than B is one step for us. Now, we could look at slightly more elaborate notions of steps. For example, we might assume, that we have a primitive operation to actually exchange the values in two variables though we know, that to implement is we actually have to go by a temporary variable. What we will see is, that we will come up with a notion, which is robust so that the actual notion of the basic operation is not so relevant because we will be able to scale it up and down according to what notion we choose to focus. (Refer Slide Time: 03:55) The other important to think to realize, of course is, that the algorithm will take a different amount of time depending on the size of the problem that it is presented with. It is quite natural and obvious, that if we are trying to sort an array, it will take longer to sort a large array than it will take to sort a short array. So, we would like to represent the efficiency of an algorithm as a function of its input size. So, if the input is of some size n, then it will take time t of n where t will be function depending on the input where even this is not immediately an obvious definition because not all inputs of size n are going to take the same amount of time. Some input will take 37 less time, some inputs will take more time. So, what do we take? So, it will turn out, that the notion, that we will typically look at is to look at all the inputs of size n and look at the worst possible one that takes a longest time, right. So, this is called a worst case estimate, it is a pessimistic estimate, but we will justify as we go along. But this is what we mean. Now, when we are looking at the time efficiency of an algorithm what we mean is, what is the worst possible time it will take on inputs of size n and express this as a function of n? So, before we formalize this, let us just look at a couple of examples and get a feel for what efficiency means in terms of practical running time on the kinds of computers that we have available to us. (Refer Slide Time: 05:13) So, let us start with sorting, which is a very basic step in many algorithms. So, we have an array with n elements and we would like to arrange these elements, say in ascending or descending order, for further processing. Now, a naïve sorting algorithm would compare every pair of elements more or less and try to determine how to reorder that. This would take time proportional to n square because we are comparing all pairs of elements. On the other hand, we will see soon in this course, that there are much cleverer sorting algorithms, which work at time proportional to n log n. So, we can go from a naive algorithm which takes time n square to a better algorithm, which takes time n log n. 38 So, what we have really done is, we have taken a factor of n and replaced it by factor of log n. So, this seems a relatively minor change. So, can we see what the effect of this change is? So, one way to look at this is to actually look at concrete running times. We said, that we will not look at running time as measuring the efficiency of the algorithm, but of course, the efficiency of the algorithm does have an impact on how long the program will take to execute, and therefore on how usable the program is from our practical perspective. So, if you take a typical CPU that we find on a desktop or a laptop, which we have these days, it can process up to about 10 to the 8 operations. These are the basic operations in the high-level languages like an assignment or checking whether one value is less than another value, right. We can say, that it takes about, it can do about 10 to the 8 operations. Now, this is an approximate number, but it is a very, we need to get some handle on numbers so that we can do some quantitative comparisons of these algorithms. So, it is a useful number to have for approximate calculations. Now, one thing to remember is that this number is not changing. Obviously, we use to have a situation where CPUs were speeding up every one and a half year, but now we have kind of reached the physical limits of current technology. So, CPUs are not speeding up. We are using different types of technologies to get around this, parallelizing, multicore and so on. But essentially, the sequential speed of a CPU is stuck at about 10 to the 8 operations per second. 39 (Refer Slide Time: 07:46) So, now when we take a small input, suppose we are just trying to rearrange our contact list on our mobile phone. We might have a few hundred contacts, maybe a thousand contacts, maybe a few thousand contacts. If you try to sort a contact list, say by name, it really would not make much difference to us whether we use n square or n log n algorithm. Both of them will work in a fraction of second and before we know it, we will have the answer. But if we go to more non-trivial sizes, then the difference becomes rather stark. So, consider the problem of compiling a sorted list of all the mobile subscribers across the country. So, it turns out that India has about one billion, that is, 10 to the 9 mobile subscribers. This includes data cards, phones, various things, but these are all people who are all registered with some telecom operator and own a sim card. So, this is a number of sim cards, which are in active use in India today. So, suppose we want to compile a list of all owners of sim cards in India in some sorted fashion. Since we have 10 to the 9 subscribers, if we run an n square algorithm, this will take 10 to the 18 operations because 10 to the 9 square is 10 to the 18. Now, 10 to the 18 operations per seconds, since we can do only 10 to the 8 operations in 1 second, will take 10 to the 10 seconds. So, how much is 10 to the 10 seconds? Well, it is about 2.8 million hours; it is about 115 thousand days, that is, about 300 years. So, you can imagine, that if we really want to do this using an n squared algorithm, it would really not be practical 40 because it would take more than our lifetime, more than several generations in fact, to compute this on the current hardware. On the other hand, if we were to move to the smarter n log n algorithm, which we claim we will find, then it turns out, that sorting this large number of own users takes only 3 times 10 to the 10 because the log to the base 2 of 10 to the 9 is 30. It is useful to remember, that 2 to the 10 is 1000. So, log to the base 2 of 1000 is 10. Now, since logs add 1000 times, 1000 is 10 to the 6, right, so the log of 10 to the 6 is 20, log of 10 to the 9 is 30. So, 30 into 10 to the 9 n log n is 3 times 10 to the 10. So, this means, that it will take about 300 seconds. So, it will take about 5 minutes. Now, 5 minutes is not a short time, you can go and have a tea and come back, but still it will get done in a reasonably short amount of time, so that we can then work with this useful information and go on as opposed to 300 years, which is totally impractical. (Refer Slide Time: 10:35) So, let us look at another example. So, supposing we are playing a video game, right. So, this might be one of the action type games where there are objects moving around the screen and we have to know, identify certain object, shoot them down, capture them and whatever. So, let us assume that as part of the game in order to compute the score, it has to periodically find out the closest pair of objects on the screen. 41 So, now, how do you find the closest pair of objects among group of objects? Well, of course, you can take every pair of them, find the distance between each pair and then take the smallest one, right. So, this will been an n squared algorithm, right. So, you compute the distance between any two objects and then after doing this for every pair you take the smallest value. Now, it turns out, that there is a clever algorithm, again, which takes time n log n. So, what is this distinction between n square and n log n in this context? (Refer Slide Time: 11:35) Now, on a modern kind of gaming computer with a large display it is not unreasonable to assume, that we could have a resolution of say, 2500 by 1500 pixel. If we have one of these reasonable size 20 inch monitors, we could easily get this kind of resolutions. So, we will have about 3.75 millions points on the screen. Now, we have some objects placed at some of these points. So, let us assume that we have five lakh objects, five hundred thousand objects placed on the screen. So, if you were to now compute the pair of objects among these five hundred thousand, which are closest to each other and we use the naïve n squared algorithm, then you would expect to take 25 10 into the 10 steps because that is 5 into 10 to the 5 whole square. 25 into 10 to the 10 is 2500 second, which is around 40 minutes. Now, if you are playing a game, an action game in which reflexes determine your score, obviously, each update cannot take 40 minutes. That would not be an effective game. On 42 the other hand, you can easily check, that if you do n log n calculation for 5 into 10 to the 5, you take something like 10 to the 6 or 10 to the 7 seconds. So, this will be a fraction of second, 1-10th or 1-100th of the second, which is well below your human response time. So, it will be, essentially, instantaneously. So, we move from a game, which is hopelessly slow to one in which it can really test your reflexes as a human. (Refer Slide Time: 13:00) So, we have seen in these two examples, that there is a real huge practical gulf between even something as close to each other as n log n and n squared. The size of the problems, that we can tackle for n squared are much smaller than the size probably we can tackle for n log n. So, when we look at these functions of n, typically we will ignore constants. Now, this will partly be justified later on by the fact, that we are not fixing the basic operation, but essentially by ignoring constant we are looking at the overall function of the efficiency as the function of n, in the, as it increases, right. So, it is what we call asymptotic complexity, as n gets large how does the function behave. So, for instance, supposing we have some function, which grows like n squared with large coefficient and something, which goes like n cube with a coefficient of 1, initially it will look like n squared, say 5000 n squared is much more than n cube, but very rapidly, say at 5000, right. At 5000 we will have, both will be 5000 cube and beyond 5000 the function n cube will grow faster than function 5000 n squared, right. 43 So, there will be a point beyond which n cube will overtake n square and after that it will rapidly pull out. So, this is what we would typically like to measure. We would like to look at the functions as functions of n without looking at the individual constants and we will look at this little more in detail. (Refer Slide Time: 14:30) So, since we are only interested in orders of magnitude, we can broadly classify functions in terms of what kind of functions they look like, right. So, we could have function, which are proportional to log n; functions, which are proportional to n, n squared, n cube, n to the k; which is, so any n to the fixed k is a polynomial. So, we can call these, call these functions, you will look at, as polynomial, right. So, these are polynomial. And then we could have something, which is 2 to the n if 2 to the n essentially, come, comprises of looking at all possible subset. So, these are typically the brute force algorithms where we look at every possibility and then determine the answer, right. So, we are logarithmic polynomial and exponential. So, what do these look like in terms of the numbers that you ((Refer Time: 15:23)). 44 (Refer Slide Time: 15:25) So, if you look at this chart, the left column varies the input size from 10 to the 1 to the 10 and then each column after that is the different type of complexity behaviour as function, which ignores constants and only looks at the magnitude. So, we have log n and then we have something, which is polynomial n, n square, n cube. In between we have n log n we saw before and then we have these two exponential function, which are 2 to the n and n factorial, right. So, now, in this we are trying to determine how efficiency determines practical useable. So, now, if you look at this, we said, that we can do 10 to the 8 operation in 1 second. So, maybe we are willing to work at 10 seconds. So, 10 to the 9 operation is about the limit of what we would consider. The efficient 10 to the 10 operation would mean 100 second, which would mean 2 minutes and may be 2 minutes is too long. So, now clearly, if we are looking in logarithmic scale, there is no problem. Up to 10 to the 10, right, we can do everything in about 33 step, which will take very, very ((Refer Time: 16:37)). Now, in a linear scale given, that we are expecting 10 to the 9 as our limit, this becomes our limit, right. So, we can draw a line here and say, that below this red line things are impossibly slow. n log n is only slightly worse than n. So, we can do inputs of size 10 to the 8 because n log n to 10 to the 8 is something proportional to 10 to the 9. 45 Now, there is this huge gulf that we saw. If we move from n log n to n squared, then the feasibility limit is somewhere between 10 to the 4 and 10 to the 5 and together, 10 to the 5, we already reach time and running time of 10 to the 10, which is beyond what we want and now if we move to n cube then the feasibility limit drops further. So, we cannot go beyond an input size of 1000 right. So, we can see that there is a drastic drop and when we come to exponentials really beyond some trivial things of the size 10 or 20, we cannot really do anything, anything, which is even a 100 will be impossibly slow, right. So, we have this kind of, this sharp dividing line of feasibility right. And so we can see that it is important to get efficient algorithm because if we have inefficient algorithm, even if we think that they work correctly, we will not be able to solve problem of any reasonable size right. When we are looking at computational algorithms running on a PC, you would really expect our input sizes to be 100s, 1000s and much more right. So, we are shorting things. We really do expect large volume to data if you want to you look at voters list, population data or data coming out of say biology experiment about genetic material, now these things are typically very large amount of data, so it is really important to get the less possible algorithm to do these things otherwise the problem at hand will not be effectively solved. 46 Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute Week - 01 Module - 06 Lecture - 06 So, we argued that we would like to measure efficiency of an algorithm in terms of basic operations, and we would like to compute the running time of an algorithm in terms of a function of its input size n, and we also saw that if you go from say n square to n log n then the size of inputs you can effectively handle becomes dramatically larger. Now, today we will try to formulate some of these notions a little more clearly. (Refer Slide Time: 00:27) So, the first thing is the input size. So, remember that the running time of an algorithm necessarily depend on the size of the input. So, we want to write the running time as some function t of n. And the main thing to remember is that not all inputs of size n will give the same running time. So, there is going to be a notion of worst case estimate which you will need to explain and justify. 47 (Refer Slide Time: 00:54) Before we do this let us look at the notion of input size itself - how do we determine the input size for a given problem? So, the input size more or less represents the amount of space it takes to write down the distribution of the problem or becomes a natural parameter of the problem. So, for instance, when we are sorting arrays what really matters is how many objects there are to solve, so we have to move them around and rearrange them. So, the size of an array is quite a natural notion of input size for a sorting problem. On the other hand, if you had trying to do something like rearranging elements or take say we have some items which we need to load into a container and we are looking for a optimum subset load in terms of weight or volume then the number of objects would be a natural input parameter. We saw in one of the early lectures an example of air travel where we constructed a graph of an airline route map where the nodes where the cities and the edges where the flights. And we argued that both the number of cities and the number of flights will have an impact on any analysis we need to do. So, this is a general property of all graphs; if we have a graph then both the number of nodes or vertices and the number of edges will determine the input size. 48 (Refer Slide Time: 02:14) Now, there is an important class of problems where we have to be a little bit careful about how we talk about input size, and these are problems involving numbers. Suppose we were to write an algorithm for primality checking whether the given number is prime. Now, how should we think of the size of the input? For instance, suppose we ask it to solve the question for say 5003 and then for 50003 then 50003 is roughly 10 times 5003. So, would we expect the time to group proportional to 10. So, should the magnitude of n actually p ignores the input size. Now, it is quite obvious when we do arithmetic by hand, the kind of arithmetic we do in school, that we do not go by magnitude, we go by the number of digits. When we do arithmetic such as addition with carry we right down the numbers in columns then we atom them column by column. So, the number of digits determines how many columns we have to add. The same is true with subtraction or addition or multiplication or long division and so on. So, clearly, it is a number of digits which matters. And the number of digits is actually the same as the log. If you have numbers written in base 10 then if we have a 6 digit number its log is going to be 5.something. And we go to log 6, we will have a 7 digit number and so on. So, the number of digits is directly proportional to the log, so we can think of the log of the number as the input size. So, this is a special case. So, for arithmetic function with log in numbers, it is not the number itself which is the input size, but the size of the number as expressed in how many digits with excess to write 49 down the number. (Refer Slide Time: 04:00) Now, the other thing, we mentioned this that we are going to ignore constants. We are going to look at these functions in terms of orders or magnitude, thus the function grow as n, n square, n cube, and so on. So, one justification for this is because we are ignoring the notion of a basic operation or rather we are being a bit vague about it. (Refer Slide Time: 04:20) So, let us look at an example. So, supposing, we would originally consider our basic operations to be assignment to variables or comparisons between variables. Now, we 50 decide that we will include swapping 2 values exchanging the contents of x and y as a basic operation. Now, one of the first things we learn in programming is that in order to do this we need to go via temporary variables. So, in order to exchange x and y in most programming languages you have to first say x in temporary variable then copy y to x and then this store y from the temporary variable; this takes 3 assignments. So, if we take swap as a basic operation in our language as compared to a calculation where we only look at assignments we are going to collapse 3 assignments into 1 basic operation. So, there is the factor of 3 differences between how we would account for the operation if we account for swap as a single operation. So, in order to get away from worrying about these factors of 3 and 2 and so on, one important way to do this is to just ignore these constants when we are doing this calculation. So, that is in other motivation for only looking at orders of magnitude. (Refer Slide Time: 05:30) So, let us come back to this notion of worst case. So, as we said we are really looking at all inputs of size n; and, among these inputs which inputs drive the algorithm to take the maximum amount of time. So, let us look at a simple algorithm here which is looking for a value k in an unsorted array A. So, in an unsorted array we have no idea where the value k can be. So, the only thing we can do is swap the beginning to i. So, we start by initializing this index i which is in position to 0, and then we have a loop which says, so along as we have not found the array element. So, along as a i is not k we 51 increment, right, we move to the next element. So, this is the loop, and when we exit this loop there are 1 or 2 possibilities, either we have found the element in which case i is less than 1, or we have not found the element in this case i is become n. So, we check whether i is less than n; if i is less than n then we have found it, and if i is bigger than n which means it is not found. So, this is the simple algorithm. So, now, in this algorithm the bottle neck is this loop, right. So, this can take upto n iterations. Now, when will take n iterations? That is the worst case, right. (Refer Slide Time: 06:47) So, the worst case in this particular algorithm is it must go the end either if the last element is k, or more generally if there is no copy of k in the array. If there is no k in the array we have to scan all the elements to determine that k does not exist. So, this becomes our worst case input. So, it is important to be able to look at an algorithm and try to reconstruct what input to drive it to take the maximum amount of time. So, in this simple case, this simple example we can see that the case which forces us to execute the entire loop can be generated by choosing a value of k which is not in the array A. And in this case, therefore the worst case is proportional to the size of the array n. The crucial thing to remember is that in order to determine which is the worst case input, we have to understand the algorithm and look at it. We cannot just blindly determine what is the worst case without knowing the problematic act. To different algorithms we have to come up with different worst cases depending on what the algorithm is supposed to do, 52 and how the algorithm is constructing. So, the worst case input is the function of the algorithm itself. (Refer Slide Time: 07:55) Now, we could look at a different measure, right. So, supposing we do not look at the worst case, we say, look at the average case, right. So, look at the all possible inputs and then try to see how much time it takes when each of the inputs are somehow average in time. Now, mathematically in order to compute this kind of an average we need to have a good way of estimating what are all the possible inputs to a problem. So, although this sounds like a very attractive notion, in many problems is very difficult. So, supposing we are doing the airline route problem, how do we consider this space of all possible route maps, and what is a typical route map, and so on. So, what are we going to average over are all this inputs equally likely, so we need look at probabilities. And it is very often very difficult to estimate the probabilities of different types of inputs. So, though it would make more sense from the point of view of the behavior of the algorithm in practical situations look at the average case that is how does it behave over a space of inputs. In practice, it is very hard to do this because we cannot really quantify this space of possible inputs and assign them with meaningful probabilities. 53 (Refer Slide Time: 09:09) To summarize, we look at worst case even though it could be unrealistic because the average case is hard if not impossible to compute. There are very limited situations where it is possible to do an average case analysis, but these are very rare. So, the good thing about a worst case analysis is if we can do a good upper bound, so in that, even in the worst case if algorithm performs efficiently then we have got a useful piece of information about the algorithm; that is this always works well. On the other hand, if we find out that this algorithm has a bad worst case upper bound we may have to look a little further, how rare is this worst case, does this often arise in practice, what type of inputs are worst case, are they inputs that we would typically see, are there any simplifying assumptions that we can make which will rule out these worst cases and so on. So, though worst case analysis is not a perfect way of doing it, it is something which is mathematically practical, it is something that we can kind of hope to compute. So, that is one good reason for doing it so that we actually come up with a quantitative estimate. And secondly, it does give us some useful information; even though in some situations it not be, may not be the most realistic situations that we have likely to come across in practice. 54 Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute Week - 01 Module – 07 Lecture - 07 (Refer Slide Time: 00:02) So, we have said that we will measure the time efficiency of algorithms only upto an order of magnitude. So, we will express the running time as a function t of n of the input size n, but we will ignore constant. So, we where only said that t of n is propositional to n square or n log n or 2 to the n. So, now, the next step is to have an effective way of comparing is, running times across algorithms. If i know the order of magnitude of 1 algorithm, and the order of magnitude of another algorithm how do I compare? 55 (Refer Slide Time: 00:34) So, the notation we need or the concept we need is that of an upper bound which is given by the notation big O. So, we say that a function g of n is an upper bound for another function t of n if beyond some point g of n dominates t of n. Now, remember that g of n is going to be now a function which is an order a magnitude. So, we are thrown away all the constant factors which we play a role in g of n. So, we allow ourselves this constant. So, we say that it is not g of n al1 which dominates t of n, but g of n times some constants. So, there is a fixed constants c and beyond some limits. So, there is a initial portion where we do not care, but beyond this limit we have that t of n always lies below c times g of n. In this case c times g of n and is upper bound for t of n and we say that t of n is big O of g of n. 56 (Refer Slide Time: 01:34) So, let us look at an example. So, supposing we have this function t of n is 100 and plus5 then, we claim that it is big O of n square now remember that n is suppose to be the input size. So, the input size to a problem is always going to be at least 1, there is no problem that needs to be solve if you are input to zero and certainly we cannot have negative. So, we are always having in mind the situation that, n is bigger than or equal to 1. So, if we now start with our function 100 n plus 5 then, if you choose n to be bigger than 5 then n will be bigger than this value. So, we can say 100 n plus 5 is smaller than equal to 100 n plus n. And now we can collapse this is 101 right. So, 100 n plus 5 is smaller than 101 provided n is bigger than to 5, now, since n is at least 1 n square is bigger than n. So, 101 times n is going to be smaller than 100 and 1 n square. So, by choosing n 0 to be 5 and c to be 101 we have established that n square it is an upper bound to 100 n plus 5. So, 100 n plus 5 is big o the n square. Now, we can do this using a slightly different calculations, we can say that 100 n plus 5 is smaller than 100 n plus 5 n for n bigger than 1 because n is at least 1. So, 5 times n is going to be at least 5. So, now, if you collapse is we get 105 n now, but the same logic 105 n the smaller than 105 n square whenever n is bigger than 1. So, new way of establishing the same fact, where we have chosen n 0 equal to 1 and c equal to 105 right. So, n 0 and c or not unique right depending on how we do the calculation in we might find different n 0 and different c. But it does not matter how we choose them so, long as we can establish the fact that beyond a certain n 0 there is a uniform constant c such that c times g of n dominates t of n. Notice that the same 57 calculation can give us a tighter upper bound, this is kind of a loose upper bound you would expect the 100 n is smaller than n square. But we can also say that this is big O of n, why is that? Because if you just top the calculation at this point we do not come to this stage at all you have establish that 100 n plus 5 is less equal to 101. But, the same values n 0 equal to 5 and c equal to 101 this also tells us that 100 n plus 5 big O. Likewise at this point if you just ignore this step then we say that 100 n plus 5 is smaller than 105 n. So, for n 0 equal to 1 and c equal to 105 you have established this 1. (Refer Slide Time: 04:15) Let us look at us another example supposing we look at 100 n square plus 20 n plus 5. Now, again assuming that n is bigger than 1 we know that we can multiply by n and do not get n is smaller. So, 20 n will be dominated it 20 n square right and 5 will be dominate the 5 times n times n 5 n square. So, I now have 100 n square plus 20 n square plus 5 n square, is bigger than my original function 100 n square plus 20 n plus 5. So, I combine these, I get 125 n square and now all I have assumed is that, n is bigger than equal to 1. So, for n 0 equal to 1 and c equal to 125 we have that n square dominates 100 n square plus 20 n plus 1. So, you can easily see that, in general if I have a n square plus b n plus c right this is going to be dominated by a plus, b plus, c times n square right. So, this is going to be less than this for all n greater than equal to 1. So, we can generally speaking, take a function like this and ignore the lower terms because they are dominated by the higher term in just focus on the value with the highest exp1nt. So, in this case in this whole thing n square is the biggest term therefore, this whole thing this going to be big over the n square. So, this is a very typical shortcut that we cans take, 58 you can just take an expression ignore the coefficients pick the largest exp1nt and choose that to be the big O right (Refer Slide Time: 05:37) Now, we can also show that things are not to be good. So, for instances its intuitively clear that, n cube is bigger than n square now, how do we formally show that n cube is not big O of n square. Well, supposing it was, then there is exists some n 0, such that for all n bigger than equal to n 0, n cube must be smaller than are equal to c times n square right. If this works we go up n square this is what we must have. Now supposing, we choose n is equal to c then we have on the left hand side c cube, on the right hand side we have c cube and certainly we have that c cube less than equal to c cube. If i go to c plus 1 I will have c plus 1 whole cube and this side I will have c times c plus 1 whole square and now the problem is, this is bigger because c plus 1 whole cube is bigger than c times c plus 1 whole square. Therefore, no matter what c we choose, if we go to n equal to c we will find that inequality that we want gets flipped around. Therefore, there is no c that we can choose to make n cubes smaller than c n square beyond a certain point and therefore, this is not bigger. So, our intuitive idea that n cube grows faster than n square can be formally proved using this step function. 59 (Refer Slide Time: 00:07) Now, here is the useful fact about big O, if I have a function f 1 which is big O of g 1 and another function f 2 which is big O of g 2 then, f 1 plus f 2 is actually dominated by the max of g 1 and g 2. You might think it is g 1 plus g 2 this is the obvious thing that comes to mind looking at this, that f 1 plus f 2 is smaller than g 1 plus g 2, but actually it is max. (Refer Slide Time: 07:31) How do we prove this? Well, is not very difficult. By definition if f 1 is big o of g 1 there exists some n 1 such that beyond n 1 f 1 is dominated by c 1 of g 1 c 1 times g 1. Similarly, if f 2 is big O of g 2 there is an n 2 such that beyond n 2 f 2 is dominated by c 2 times g 2 right. So, now, what we can do is we can choose n 3 to be the maximum of n 1 and n 2, and we can choose c 3 to be the maximum of c 1 and c 2. So, now, let us see 60 what happens beyond n 3, beyond n 3 both these inequalities are effective. So, we have f 1 plus f 2 will be less than c 1 and g 1 plus c 2 times g 2 right. Because, this is beyond both n 1 and n 2 so, both f 1 is less than c 1 and g 1 whole and f 2 less than c 2 g 2 whole. So, I can add the 2 and, this is the first obvious thing that we said is it g 1 plus g 2, but now we can be a little clever we can say there we have c 3. So, c 1 is smaller than c 3 because this is an maximum c 2 a smaller than c 3. So, I can combine these and say that this is less than c 3 g 1 plus c 3 g 2. Now, having combined these I can of course, push them together and say this is less then c 3 times g 1 plus g 2. But g 1 plus g 2 if I take the maximum of those then 2 times the maximum will be bigger than that. So, I will get this is less than c 3 times 2 times the maximum of g 1 and g 2 right. I can take this 2 out and say that therefore, this is less then equal to 2 c 3 and max of g 1 and g 2 right. So, now, if I take this as my n 0 and this as my c then I have established that for every n bigger than n 0 and maximum n 1 and n 2 there is a constant which is 2 times the max of c 1 c 2 such that f 1 plus f 2 is dominated by c times max of g 1 g 2. Why this mathematical fact useful to us? (Refer Slide Time: 09:51) So, very often when we are analyzing an algorithm, it will have different phases. It will do something in one part then it will continue to some other thing and so, on. So, we could have 2 phases, phase a which takes time big O of g a and phase b which takes time big O of g. So, now, what is the good upper bound for the overall running time of the algorithm. So, the instinctive thing would be to say g a plus g b. But what this result tells us is that it is not g a plus g b that is useful for the upper bound, but the maximum of g a 61 and g b right. In other words, when we are analyzing an algorithm it is enough to look at the bottle necks. You go to many steps look at the steps which take the maximum amount of time, focus on those and that will determine the overall running time of the other. So, when we look at a function on algorithm which has a loop, we typically look at the loop how long does the loop go. We ignore, may be the initialization that takes place before the loop or some prints statement that takes place after the loop because that does not contribute as much as the complexity as the loop itself ok. So, when we have multiple phases, it is the most inefficient phase which dominates the overall behavior and this is formalized by the result we just saw. (Refer Slide Time: 11:01) Now, there is a symmetric notion to an upper bound namely a lower bound. So, just like we said that t of n is always lying below c f c times g of n. We might say that t of n always lies above c times g of n and this is described using this notation omega. So, this is just a symmetric definition which just says that t of n is omega of g of n, if for every n beyond n 0 t of n lies above c times g of n for some fixed constituency. So, here we have the same thing we have an initial thing that we are not interested in, because at this point nothing can be said. But beyond this n 0 we have that t of n lies above so, t of n is always above c times g of n 62 (Refer Slide Time: 11:50) So, we earlier saw that n cubed is not big O of n square, but intuitively n cube should be lying above n square and this is certainly the case because, n cubed is greater than equal to n square for every n bigger than equal to 1 right. So, at n equal to 1 both are 1, but n equal to 2 this will be 8 this will be 4 and so, on. So, if given n 0 equal to 0 or n 0 equal to 1 and c equal to 1 we can establish this. Now of course, when we are establishing an upper bound we are using talking of about the algorithm we have. You are saying this algorithm has an upper bound of so, much and therefore, I can definitely solve the problem with in this much time. Now, when we are talking about lower bounds it is not that useful to talk about a specific algorithm. It is not so, useful to say that this algorithm takes at least so, much time. What we would like to say something like this problem takes at least so, much time, no matter how you write the algorithm it is going to take at least so, much time. So, typically what we would like to do to make a useful lower bound statement is to say that a problem takes the certain amount of time no matter how you try to solve it. So, the problem has a lower bound rather than the algorithm has a lower bound. Now, as you might imagine this is the fairly complex into say because what you have to able to show is that no matter how clever you are, no matter how you design an algorithm you cannot do better than a (Refer time 13:13). This is much harder than saying I have a specific way of doing it and I am analyzing how to do that. So, establishing lower bounds is often very tricky. One of the areas where lower bound is have been established is sort. So, it can be shown that, if you are relying on comparing 63 values to sort them then, you must at least do n log n comparisons, no matter how you actually do the sorting. No matter how clever you are sorting algorithm, it cannot be faster than n login in terms of comparing elements but, this hard to do remember, because you really show this independent of the algorithm. (Refer Slide Time: 13:51) Now, we could have a nice situation where we have matching upper and lower bounds. So, we say that t is theta of g of n if it is both, we go g of n and omega of g of n. In other words, in suitable constants t of n can be dominated by g of n, and it also lies above g of n for two different constants of course. So, what this really means is that, t of n and g of n are basically are same order of the magnitude, they are essentially the same function therefore, you have reached a kind of optimum value. 64 (Refer Slide Time: 14:26) So, as an example we can say since that n into n minus 1 by 2 is theta of s square. In order to prove something like this, we have to show that there is an upper bound, that is we can find a constants (Refer time: 14:37) dominates this and the lower bound. There is another constant (Refer time: 14:41) is below this. So, for the upper bound we just expand our n into n minus 1 by 2. So, we get n squared by 2 in the first term and n minus n by 2. Now, since it is upper bound n squared by 2 minus n by 2, if I ignore n by 2, this is going to be less than n squared by 2. Therefore, now I have an upper bound saying that, the constant half this is dominated by n square for n bigger than 0. On the other hand, if I want to do a lower bound then, I will say same thing I will expand out to n into n minus 2, I will get same expansion. And now I will want to lower bound, so, now what I will do is I will make this even smaller. I will say that I subtract not n by 2 but n by 2 times n by 2. So, this will be bigger than this, because I am subtracting more. N squared by 2 minus n by 2 will be bigger than n squared by 2 minus n by 2, but this is again n squared by 4. So, I have n squared by 2 minus n squared by 4, so this simplifies to n squared by 4. In other words, I have shown that n into n minus 1 by 2 is bigger than equal to n squared 4. But now, in order to justify this, to justify that n by 2 is increasing, n must be at least 2. Because if n smaller than 2 this is the fraction so I am actually reducing. Here, I have different n greater than equal to 2. I have established a lower bound result that for n bigger than equal to 2 n into n minus 1 by 2 is above one fourth of n square. So, therefore now if we chose our constant to be 2 for all values bigger than 2, I have that 65 n into n minus 1 by 2 is less than half of n square and n into n minus 2 is bigger than one fourth of n square. So, I have found this matching upper and lower bound which shows that n into n minus 1 theta of n square. (Refer Slide Time: 16:44) So, to summarize when we use big O, we have discovered an upper bound. If we say f of n is big O of g of n, it means that g of n dominates f of n so f of n is no bigger than g of n. And this is useful to describe the limit of the worst case running time. So, we can say that worst running time is upper bounded by g of n. On the other hand, if we use omega we are saying that f of n is at least g of n, g of n is lower bound (Refer time: 17:14). As we described this is more useful for problems as a whole, sorting as a general problem rather than for individual algorithm. Because it tells us no matter how you do something, you will have to spend at least that much time, but this hard to establish. And if you have a situation where a lower bound has been established for a problem and you find an algorithm which achieves the same bound as an upper bound then, you have found in some sense the best possible algorithm. Because, you cannot do any better than g of n because we have a lower bound of g of n and you have achieved g of n because you have shown your algorithm as big O of g of n. So, theta is a way of demonstrating that you have found an algorithm which is asymptotically as efficient as possible. 66 Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute Week - 01 Module - 08 Lecture – 08 The last lecture in this unit, what we are going to do is, actually look at some examples of algorithms and see how to compute their upper nodes. (Refer Slide Time: 00:12) So, we will look at two basic classes of algorithms in this unit, in this lecture. So, we will look at some iterative examples and some recursive examples. So, an iterative example will basically be something where there is a loop and a recursive program of course, will be something where you have to solve a smaller problem before you can solve the larger problem. So, you have to recursively apply the same algorithm with smaller input. 67 (Refer Slide Time: 00:34) So, our first example is a very standard problem which you must have done in your basic programming course. Suppose, we want to find the maximum element in an array a. So, what we do is we initially assume that the maximum value is the first value and then we scan the rest of the array and wherever we see a value which is bigger than the current maximum value, max value replace it. And at the end of this scan we return that value of maxval that we have found which should be the largest value that we shown the entire value. Now, remember that we said that if we have two phases in this case, we have one phase where we do an initialization, we have three phases actually we do an inner loop and then we do a return, it is enough to look at the bottle neck phase, we said that if we have two parts f 1 and f 2 then the order of magnitude of f 1 plus f 2 is the maximum of the order of magnitudes of f 1 and f 2. So, in this case it is clear that this loop is what is we want to take the most amount of time. So, it is enough to analyze complexity of this loop. So, now, this loop takes exactly n minus 1 steps. So, the worst case, any input is the worst case, because we must go from beginning to end an order to find the maximum value, we cannot assume anything about where the maximum value lies. Now, when we are scanning the loop in every iteration we do at least one step. So, this is the comparison, one basic operation and this may or may not happen. So, the assignment happens, if we find the new value A i which is bigger than maxval. But, since we are 68 ignoring constants we can treat this as some c operations, some constant number of operations per iterations. So, we have some c times n minus 1 basic operations and if we ignore the c and we ignore this minus 1, overall this algorithm is linear, it takes order n time. (Refer Slide Time: 02:36) So, let us now move on to an example in which we have two nested loops. So, supposing we are trying to find whether or not an array has all distinct values that is no two values in the array A are the same. So, what we will do is, you will take this array A and then we will compare every A i and every A j and if I am ever find an A i equal to A j, then I will return false. If I find no such A i and A j, then I will not return false I would return true. Now, the point is in order to optimize this if I am at position i, then I will only look at elements to it is right. So, I will start with i plus 1 and go to n minus 1 and this will be my range for j. So, in order to not compare A i, A j and then A j, A i just to avoid this duplicate thing what we have written is for i equal to 0 to n minus 1. So, as I look at each element for j equal to i plus 1 to n minus 1 to it is right, check if A i is equal A j. So, now if I look at the number of times this actually executes, then when i is equal to 0, j varies from 1 to n minus 1. So, there are n minus 1 steps, when i is equal to 1 there are n minus 2 steps and so, on. So, as I go down when i is equal to n minus 1, there will be 1 step, when i is equal to 2 there will be one step. When an i equal n minus 1 the outer loop will terminate, but the inner would not run at all, because you will go from n to n minus 1. 69 So, overall what we are doing is we are doing 0 plus 1, plus 2, plus n minus 2, plus n minus 1 steps. So, this is a familiar summation, this summation of i is equal to 1 to n minus 1 of i and this you should know is n minus 1 into n, this is a very familiar recurrence and this what we have already seen actually, this is O n square. So, we just ignore constant n square by 2 minus n by 2 O n square actually we showed at this theta n square, but for this moment we are only looking at upper bounds. So, let us say this arguments is O n square. So, it is not a trivial, O n square in the sense is not two nested loops of equal size, it is not i equal to 0 to n, j is equal to 0 to n is i equal to 0 to n minus 1 and j equal i plus 1 to n minus 1. But, still this summation 1, 2, 3, 4 up to n is o n square and this is something we will see often. So, useful to remember this. (Refer Slide Time: 05:10) So, this is another example of a nested loops and this is one we choose 3 nested loops. Now, here what we are trying to do is, we are trying to multiply 2 square matrices A and B. So, we have two matrices A and B and we are trying to compute the product C. Now, in this product C, if I want the i, j’th entry, then what I do that look at row i in the first matrix, column j the second matrix and then I have to pair wise I have to do the first entry here, this route I do the first entry that columns I would to multiply those two, then I have to multiply the second entry and. So, on and then I have to multiplied the last entry and then I have to add that. So, that is what this program is saying. So, this is for each row for i equal 0 to n minus 1, 70 then for each column j equal to 0 to... So, this is going through all possible entries C i j. Now, I am saying that for this new entry I start for assuming C i j is 0 and then I run through this row k equal to 0 to n minus 1, I look at A i k that is the kth element in the row, B k j the kth element in this column, multiply them and added to C i j. So, this is a loop outer loop of size n, this is another outer loop, inner loop of size n and the inner most loop of size n and this in order n cube. So, this is an natural example of an n cube value. (Refer Slide Time: 06:33) So, our final iterative example is one to find the number of bits in the binary representation of n. So, this is just the same as dividing n by 2 until we reach 1 or 0. So, let us assume that n is a non-negative number. So, n is 0 or 1. So, we assume by send that the number of bits is at least 1 and then. So, long as we have a number which is bigger than 1, we will add one more to the count number of digits and then this is a short form for integer division. So, we will replace n by n by 2. So, for instant supposing we start with the number like 9, then we will start with count is equal to 1, because that is what I said. When a while n is bigger than 2, I will divide by 2 and add 1 becomes. So, I will replace count make it 2, now make this 4, then I will say that still greater than 1. So, I will make this 3 and I will make this 4 by 2, then I will say this is still bigger than 1. So, I will make this 1, then I make this 4. So, now, I have count equal to 4 and n equal to 1. So, this loop exits and I return count. So, it says that it requires 4 bits to that print number 9 which is correct, because the number 9 and binary it is 1001. So, now what is the complexity of this loop? Well, how many times does this 71 execute? Well, it will execute as many times it takes for n to come down from it is value 2. So, I want n, n by 2 n by 4 etcetera to come down to 1. So, how many times should I divide n by 2 to reach 1 and this is the same as going backwards, how many times should I multiply 1 by 2 to reach n. So, dividing n by 2 repeatedly to reach 1 is the same as multiplying 1 by 2 repeatedly to reach n and this is nothing but, the definition of the log, what power of 2 reaches n. So, this iterative loop actually though does not decrement by 1, decrements by halving an each time, we can still calculate it explicitly as requiring log to the base 2 n steps. (Refer Slide Time: 09:00) So, we have seen iterative examples of linear time, quadratic time, cubic time, this n, n squared, n cube and also a linear example with log n time. So, now let us look at one recursive example to see, how we would try to do this when we have the recursive solution. So, we would not look at a formal algorithm, but rather than formal puzzle. So, this is a well known Towers of Hanoi. So, in the towers of Hanoi person we have as we seen this picture here, we have 3 wooden pegs which we will call for the moment A, B and C. So, we have takes A, B and C and our goal is to move these n disks from A to B. So, the thing that we are not allowed to do is to put a larger disk on a smaller disk. So, if we take the small disk and move it here. So, we move the first disk here, then we must take the second disk and move it there, because we cannot put the second disk on top of the first disk. So, the goal is to do this in an effective way. 72 So, the actual goal is to move everything from A to B and this is are intermediate thing, because as we saw, we move the first disk from A to B we are struck, we cannot move anything else I ((Refer Time: 10:09)). So, you must use C as a kind of transit to take temporary oxitory peg in order to do this job. So, if you have not seen this problem before, you might want to think about it in a spare time, but this is a very classical person that it has a very standard recursive solution. (Refer Slide Time: 10:27) And the standard recursive solution is the following that you first assume that you know how to solve the problem for n minus 1 disks. So, at this moment you want to move n disks from A to B. So, what we do is you first move n minus 1 disks. So, you have on A only the bottom disk left and you have now B empty and you are move all the other n minus 1 disk to C. So, there are now n minus 1 disk here. So, you are assume that you can do this using B as my transit pegs. So, now I move things from A to C, now what I do, then move this disk here. So, I now have disk here and I no longer have anything here. So, now, I have one biggest disk on B. So, I can put anything on it and I ha