Algorithm Complexity - Part I PDF
Document Details
Uploaded by AstoundingPyramidsOfGiza
Université Ferhat Abbas de Sétif
Dr. Ahlem Drif
Tags
Summary
This document details algorithm complexity, including time and space complexity analysis. It uses big O notation to represent growth rates as functions of input size.
Full Transcript
Computer Sciences Department Faculty of sciences. Sétif 1 University -UFAS. The algorithm complexity - Part I - Presented by: Dr. Ahlem Drif Section: 2 nd year Computer Engineer. 1 Dr. Ahlem Drif 2 nd...
Computer Sciences Department Faculty of sciences. Sétif 1 University -UFAS. The algorithm complexity - Part I - Presented by: Dr. Ahlem Drif Section: 2 nd year Computer Engineer. 1 Dr. Ahlem Drif 2 nd year Computer Engineer. Chapter I: Complexity analysis basics I.1What is complexity? Complexity in algorithms refers to the amount of resources (such as time or memory) required to solve a problem or perform a task. The most common measure of complexity is time complexity, which refers to the amount of time an algorithm takes to produce a result as a function of the size of the input. I.1.1 Time Complexity: Definition: Time complexity is a measure of the amount of time an algorithm takes to complete as a function of the size of the input. Notation: It is commonly expressed using big O notation (e.g., O(n), O(n^2)), representing the upper bound of the growth rate concerning the input size. Example: An algorithm with O(n) time complexity means that the time taken by the algorithm grows linearly with the size of the input. I.1.2 Space Complexity: Definition: Space complexity is a measure of the amount of memory an algorithm uses as a function of the size of the input. Notation: Like time complexity, it is expressed using big O notation (e.g., O(n), O(n^2)), representing the upper bound of the growth rate concerning the input size. Example: An algorithm with O(n) space complexity means that the amount of memory required by the algorithm grows linearly with the size of the input. The goal in analyzing algorithmic complexity is typically to choose or design algorithms that are efficient in terms of time and space. Different algorithms may have different trade- offs; some may be more time-efficient but consume more memory, while others may optimize for space but take longer to run. 2 Dr. Ahlem Drif 2 nd year Computer Engineer. I.2 Big-O, big-Omega, big-Theta, little-o, little-w notations Notations: 1- Big-O notation: Mathematical definition: f(n) = O(g(n)) if ∃ c, n₀ ∀ n≥n₀ f(n)≤cg(n) Translation: f(n) is O(g(n)) if there exist two constants c and n₀ such that for every n greater than or equal to n₀ , f(n) is smaller than or equal to cg(n) 2- Big-Omega notation: Mathematical definition: f(n) = Ω(g(n)) if ∃ c,n₀ ∀ n≥n₀ f(n)≥cg(n) Translation: f(n) is Ω(g(n)) if there exist two constants c and n₀ such that for every n greater than or equal to n₀ , f(n) is greater than or equal to cg(n). 3- Big-Theta notation: Mathematical definition: f(n) = Θ(g(n)) if ∃c1 ,c2 ,n0 ∀n≥n0 c₁g(n)≤f(n)≤c₂g(n)g(n)≤f(n)≤c₂g(n)g(n) Translation: f(n) is Θ(g(n)) if there exist three constants c₁g(n)≤f(n)≤c₂g(n), c₂g(n), and n₀ such that for every n greater than or equal to n₀ , f(n) is greater than or equal to c₁g(n)≤f(n)≤c₂g(n)g(n) and smaller than or equal to c₂g(n)g(n). 3 Dr. Ahlem Drif 2 nd year Computer Engineer. 4- Little-o notation: Mathematical definition: f(n) = o(g(n)) if ∀ c>0 ∃n₀ ∀ n≥n₀ f(n)0 ∃ n₀ ∀ n≥n₀ f(n)>cg(n) Translation: f(n) is ω(g(n)) if, for every c greater than 0, there exists a constant n₀ such that for every n greater than or equal to n₀ , f(n) is greater than cg(n). Example: O(1) - Constant Time: The algorithm's performance is constant, regardless of the input size. O(log n) - Logarithmic Time: The algorithm's performance grows logarithmically with the input size. O(n) - Linear Time: The algorithm's performance grows linearly with the input size. O(n log n) - Linearithmic Time: Common in efficient sorting algorithms like merge sort and quicksort. O(n2) - Quadratic Time: The performance is proportional to the square of the input size. Choosing an algorithm with an appropriate time and space complexity is crucial for solving problems efficiently, especially when dealing with large datasets or resource-constrained environments. 4 Dr. Ahlem Drif 2 nd year Computer Engineer. I.2 Best, average, and worst case When analyzing algorithmic complexity, three important scenarios are often considered: best case, average case, and worst case. These cases help provide a more comprehensive understanding of an algorithm's behavior under different conditions: 1. Best Case: Definition: The best-case time complexity represents the minimum amount of time an algorithm takes to complete for a given input. Scenario: This scenario occurs when the algorithm performs exceptionally well, typically when the input is already in an optimal or sorted state. Notation: Represented using big O notation as O(g(n)), where g(n) is the minimum time required for an input of size n. 2. Average Case: Definition: The average-case time complexity represents the average amount of time an algorithm takes to complete over all possible inputs. Scenario: This scenario considers the expected performance over a range of inputs, considering the likelihood of each input. Notation: Represented using big O notation as O(g(n)), where g(n) is the average time required for an input of size n. 5 Dr. Ahlem Drif 2 nd year Computer Engineer. 3. Worst Case: Definition: The worst-case time complexity represents the maximum amount of time an algorithm takes to complete for a given input. Scenario: This scenario considers the most unfavorable conditions for the algorithm, typically when the input is in the least favorable state. Notation: Represented using big O notation as O(g(n)), where g(n) is the maximum time required for an input of size n. 6 Dr. Ahlem Drif 2 nd year Computer Engineer. Chapter II: Complexities hierarchy I.3 Complexities hierarchy In computational complexity theory, a complexity class is a set of computational problems "of related resource-based complexity". The two most commonly analyzed resources are time and memory. In general, a complexity class is defined in terms of a type of computational problem, a model of computation, and a bounded resource like time or memory. In particular, most complexity classes consist of decision problems that are solvable with a Turing machine, and are differentiated by their time or space (memory) requirements. For instance, the class P is the set of decision problems solvable by a deterministic Turing machine in polynomial time. There are, however, many complexity classes defined in terms of other types of problems (e.g. counting problems and function problems) and using other models of computation. I.4 Complexity classes I.4.1. P Class: The P in the P class stands for Polynomial Time. It is the collection of decision problems(problems with a “yes” or “no” answer) that can be solved by a deterministic machine in polynomial time. Features: The solution to P problems is easy to find. P is often a class of computational problems that are solvable and tractable. Tractable means that the problems can be solved in theory as well as in practice. But the problems that can be solved in theory but not in practice are known as intractable This class contains many problems: 1. Calculating the greatest common divisor. 2. Finding a maximum matching. 3. Merge Sort 7 Dr. Ahlem Drif 2 nd year Computer Engineer. I.4.2 NP Class: The NP in NP class stands for Non-deterministic Polynomial Time. It is the collection of decision problems that can be solved by a non-deterministic machine in polynomial time. Features: The solutions of the NP class are hard to find since they are being solved by a non- deterministic machine but the solutions are easy to verify. Problems of NP can be verified by a Turing machine in polynomial time. Example: Let us consider an example to better understand the NP class. Suppose there is a company having a total of 1000 employees having unique employee IDs. Assume that there are 200 rooms available for them. A selection of 200 employees must be paired together, but the CEO of the company has the data of some employees who can’t work in the same room due to personal reasons. This is an example of an NP problem. Since it is easy to check if the given choice of 200 employees proposed by a coworker is satisfactory or not i.e. no pair taken from the coworker list appears on the list given by the CEO. But generating such a list from scratch seems to be so hard as to be completely impractical. It indicates that if someone can provide us with the solution to the problem, we can find the correct and incorrect pair in polynomial time. Thus for the NP class problem, the answer is possible, which can be calculated in polynomial time. This class contains many problems that one would like to be able to solve effectively: 1. Boolean Satisfiability Problem (SAT). 2. Hamiltonian Path Problem. 3. Graph coloring. What is Hamiltonian Cycle? Hamiltonian Cycle or Circuit in a graph G is a cycle that visits every vertex of G exactly once and returns to the starting vertex. If graph contains a Hamiltonian cycle, it is called Hamiltonian graph otherwise it is non-Hamiltonian. 8 Dr. Ahlem Drif 2 nd year Computer Engineer. Finding a Hamiltonian Cycle in a graph is a well-known NP-complete problem, which means that there’s no known efficient algorithm to solve it for all types of graphs. However, it can be solved for small or specific types of graphs. The Hamiltonian Cycle problem has practical applications in various fields, such as logistics, network design, and computer science. Hamiltonian Path in a graph G is a path that visits every vertex of G exactly once and Hamiltonian Path doesn’t have to return to the starting vertex. It’s an open path. Similar to the Hamiltonian Cycle problem, finding a Hamiltonian Path in a general graph is also NP-complete and can be challenging. However, it is often a more easier problem than finding a Hamiltonian Cycle. Hamiltonian Paths have applications in various fields, such as finding optimal routes in transportation networks, circuit design, and graph theory research. Problems Statement: Given an undirected graph, the task is to determine whether the graph contains a Hamiltonian cycle or not. If it contains, then prints the path. Example: Input: graph[][] = {{0, 1, 0, 1, 0},{1, 0, 1, 1, 1},{0, 1, 0, 0, 1},{1, 1, 0, 0, 1},{0, 1, 1, 1, 0}} Output: {0, 1, 2, 4, 3, 0}. 9 Dr. Ahlem Drif 2 nd year Computer Engineer. Input: graph[][] = {{0, 1, 0, 1, 0},{1, 0, 1, 1, 1},{0, 1, 0, 0, 1},{1, 1, 0, 0, 0},{0, 1, 1, 0, 0}} Output: Solution does not exist Naive Algorithm: This problem can be solved using below idea: Generate all possible configurations of vertices and print a configuration that satisfies the given constraints. There will be n! (n factorial) configurations. So the overall Time Complexity of this approach will be O(N!). Hamiltonian Cycle using Backtracking Algorithm: Create an empty path array and add vertex 0 to it. Add other vertices, starting from the vertex 1. Before adding a vertex, check for whether it is adjacent to the previously added vertex and not already added. If we find such a vertex, we add the vertex as part of the solution. If we do not find a vertex then we return false. Illustrations: nput: graph[][] = {{0, 1, 0, 1, 0},{1, 0, 1, 1, 1},{0, 1, 0, 0, 1},{1, 1, 0, 0, 1},{0, 1, 1, 1, 0}} 10 Dr. Ahlem Drif 2 nd year Computer Engineer. Start with the node 0. Apply DFS for finding the Hamiltonian path. When base case reach (i.e. total no of node traversed == V (total vertex)): Check weather current node is a neighbour of starting node. As node 2 and node 0 are not neighbours of each other so return from it. Starting from start node 0 calling DFS As cycle is not found in path {0, 3, 1, 4, 2}. So, return from node 2, node 4. 11 Dr. Ahlem Drif 2 nd year Computer Engineer. Now, explore another option for node 1 (i.e node 2) When it hits the base condition again check for Hamiltonian cycle As node 4 is not the neighbour of node 0, again cycle is not found then return. Return from node 4, node 2, node 1. 12 Dr. Ahlem Drif 2 nd year Computer Engineer. Now, explore other options for node 3. Found the Hamiltonian Cycle In the Hamiltonian path {0,3,4,2,1,0} we get cycle as node 1 is the neighbour of node 0. So print this cyclic path. This is our Hamiltonian cycle. I.4.3 Co-NP Class Co-NP stands for the complement of NP Class. It means if the answer to a problem in Co- NP is No, then there is proof that can be checked in polynomial time. Features: If a problem X is in NP, then its complement X’ is also in CoNP. For an NP and CoNP problem, there is no need to verify all the answers at once in polynomial time, there is a need to verify only one particular answer “yes” or “no” in polynomial time for a problem to be in NP or CoNP. Some example problems for CoNP are: 1. To check prime number. 2. Integer Factorization. I.4.4 NP-hard Class An NP-hard problem is at least as hard as the hardest problem in NP and it is a class of problems such that every problem in NP reduces to NP-hard. 13 Dr. Ahlem Drif 2 nd year Computer Engineer. Features: All NP-hard problems are not in NP. It takes a long time to check them. This means if a solution for an NP-hard problem is given then it takes a long time to check whether it is right or not. A problem A is in NP-hard if, for every problem L in NP, there exists a polynomial- time reduction from L to A. Some of the examples of problems in Np-hard are: 1. Halting problem. 2. Qualified Boolean formulas. 3. No Hamiltonian cycle. I.4.5 NP-complete Class A problem is NP-complete if it is both NP and NP-hard. NP-complete problems are the hard problems in NP. Features: NP-complete problems are special as any problem in NP class can be transformed or reduced into NP-complete problems in polynomial time. If one could solve an NP-complete problem in polynomial time, then one could also solve any NP problem in polynomial time. Some example problems include: 1. Hamiltonian Cycle. 2. Satisfiability. 3. Vertex cover. 14 Dr. Ahlem Drif 2 nd year Computer Engineer. Complexi Characteristic feature ty Class P Easily solvable in polynomial time. NP Yes, answers can be checked in polynomial time. Co-NP No, answers can be checked in polynomial time. All NP-hard problems are not in NP and it takes a long time to check NP-hard them. NP- A problem that is NP and NP-hard is NP-complete. complete I.5 How is complexity calculated? I.5.1 How to analyze the time and space complexity of an algorithm See Assignment 1-2-3 (TD) I.6 Complexity analysis examples I.6.1 Linear search algorithm A linear search algorithm, also known as a sequential search, is a simple method for finding a particular value in a list. The basic idea is to iterate through each element in the list, starting from the beginning, and compare it with the target value until a match is found or the end of the list is reached. Here's a step-by-step description of the linear search algorithm: 1. Start from the first element in the list. 2. Compare the current element with the target value. 3. If the current element is equal to the target value, the search is successful, and the index or position of the element is returned. 4. If the current element is not equal to the target value, move to the next element in the list. 5. Repeat steps 2-4 until the target value is found or the end of the list is reached. 6. If the target value is not found after checking all elements, return a special value (such as -1) to indicate that the element is not in the list. 15 Dr. Ahlem Drif 2 nd year Computer Engineer. The time complexity of the linear search algorithm is O(n), where n is the number of elements in the list. This notation indicates that the time taken by the algorithm grows linearly with the size of the input. In a linear search, each element in the list is checked exactly once until either the target element is found or the end of the list is reached. Therefore, the number of operations performed by the algorithm is directly proportional to the size of the input. In the worst-case scenario, where the target element is at the end of the list or is not present in the list at all, the linear search algorithm would need to iterate through all n elements. The average and best-case time complexities are also O(n) because, in general, the algorithm might need to inspect a constant fraction of the elements in the list before finding the target or determining that it is not present. I.6.2 Binary search algorithm Binary search is a more efficient algorithm for finding a specific target value in a sorted array or list. It follows a divide-and-conquer strategy to narrow down the search range and quickly locate the target. Here's how the binary search algorithm works: 1. Start with the entire sorted array: Define a search space that includes the entire array. 2. Compare with the middle element: Find the middle element of the current search space. 3. Three possible cases: If the middle element is equal to the target, the search is successful, and the index is returned. If the middle element is greater than the target, repeat the search in the left half of the current search space. If the middle element is less than the target, repeat the search in the right half of the current search space. 4. Repeat until the target is found or the search space is empty: Repeat steps 2-3 until the target is found, or the search space is empty (indicating that the target is not in the array). 16 Dr. Ahlem Drif 2 nd year Computer Engineer. The time complexity of the binary search algorithm is O(log n), where n is the number of elements in the sorted array or list. I.6.3 KMP algorithm The Knuth–Morris–Pratt (KMP) algorithm is a string-searching algorithm that efficiently finds occurrences of a pattern within a text. The KMP algorithm avoids unnecessary character comparisons by precomputing a prefix function that indicates the length of the longest proper prefix (which is also a suffix) for each position in the pattern. This information is then used during the search process to determine where the next comparison should start. 1. Compute the Prefix Function: Create an auxiliary array (often called the "prefix function" or "failure function") to store the information about the pattern. 2. Initialize Pointers: Initialize two pointers, one for the pattern and one for the text. 3. Search Process: While comparing characters in the pattern and the text: If the characters match, move both pointers to the next positions. If the characters do not match, use the prefix function to determine how far to shift the pattern pointer. 4. Repeat Until a Match or End of Text: Repeat the process until a match is found or the end of the text is reached. The key to the efficiency of the KMP algorithm lies in the construction and use of the prefix function, which allows the algorithm to skip unnecessary comparisons when a mismatch occurs. The time complexity of the Knuth–Morris–Pratt (KMP) algorithm is O(n + m), where n is the length of the text and m is the length of the pattern being searched for. In the worst case, the algorithm performs a linear number of character comparisons, making it highly efficient for string searching. The linear time complexity of O(n + m) is an improvement over naive string-searching algorithms that have a worst-case time complexity of O(n * m). The KMP algorithm's 17 Dr. Ahlem Drif 2 nd year Computer Engineer. efficiency comes from its ability to skip unnecessary character comparisons by exploiting the information stored in the prefix function. #include using namespace std; void computePrefix(string p, int m, int lps[]) { // length of the previous longest prefix suffix int l = 0; lps = 0; // lps is always 0 //calculating lps[i] for i = 1 to M-1 int i = 1; while (i < m) { if (p[i] == p[l]) { l++; lps[i] = l; i++; } else { if (l != 0) { l = lps[l - 1]; } else { lps[i] = 0; i++; } } } } void kmpPatternSearch(string p, string S) { int m = p.length(); int n = S.length(); // creating lps array int lps[m]; // finding prefix table computePrefix(p, m, lps); int i = 0; int j = 0; while (i < n) { if (p[j] == S[i]) { j++; i++; } if (j == m) { cout