EVSU Module 1 - Part 1 IT 213 Data Structure and Algorithm.pdf

‘Data Structure & Algorithm Course Module [ SALVADOR L. FLORES JR., MSIT Assistant Professor II College of Information Technology VISION A provide of relevant and quality education...

‘Data Structure & Algorithm Course Module [ SALVADOR L. FLORES JR., MSIT Assistant Professor II College of Information Technology VISION A provide of relevant and quality education to a society where citizens are competent, skilled, dignified and community- oriented. MISSION An academic institution providing technological, professional, research and extension programs to form principled men and women of competencies and skills responsive to local and global development needs. QUALITY POLICY Northwest Samar State University commits to provide quality outcomes-based education, research, extension and production through continual improvement of all its programs, thereby producing world class professionals. CORE VALUES Resilience. Integrity. Service. Excellence. INSTITUTIONAL GRADUATE OUTCOMES Creative and critical thinkers Life-long learners Effective communicators Morally and socially upright individuals ITE 2: Computer Programming 1 Page 2 of 30 Rationale This module contains comprehensive exercises in java programming which guide them to learn how to create programs, understand problem, test and debug specific activities and assignments. Use these module as your basic and procedural manual to operate online and offline activities. It helps the learner to equip knowledge and information in doing such requirements and so the students are advice to scan, review and study this module religiously.  The learners’ are guided with learning plan reflected on this modules. The following the are the contents: o Let’s hit these (objectives) o Let’s get started (enabling activity) o Let’s find out (analysis) o Let’s read (any readings for enrichment) o Let’s remember (summary of the lesson) o Let’s do this (formative assessment) o Let’s check (answer’s key) o Let’s rate (Rubric evaluation) o Let’s reflect (Module improvement) o Let’s chat (Communication) Course Code: IT 213 Course Title: Data Structure and Algorithm Course Description: This course covers the standard data representation and algorithms to solve computing problems efficiently (with respect to space requirements and time complexity algorithm). It covers the following topics: arrays, linked list, stacks, queues, trees, and graphs. Thorough discussion of sorting and searching algorithms is covered. In this course, Java language programs are used for implementing various concepts, but you can easily code them in any other language like Java, C++, C#, or Python, etc. At the end of the course, the students are expected to be able to design and implement a program based on a given specification using any of the different data structures and algorithms. Also submit all activities (i.e. laboratory/hands-on exercises, classroom activities, and performance activities). Course Outcomes: At the end of the course, your students must be able to:  Create, declare, determine, implement, test and debug a program, using array, string, structure, pointers, heap, stacks, memory, polymorphism and inheritance.  Assess and recommend revisions to another program’s code. Course Content: As explained above, IT 213 introduces students the concept, theories, principles and practice, discusses the functions to come up interventions addressing problems. ITE 2: Computer Programming 1 Page 3 of 30 The table below shows the outline of the topics to be discussed in the lecture per week vis-à-vis the course outcomes. It is designed based on the course syllabus approved by the college Dean in the College of Information Techonolgy. ITE 2: Computer Programming 1 Page 4 of 30 Course Requirements: As evidence of attaining the course outcomes, the student has to do and submit the following:  Design a program that maybe implemented as a solution to a given problem. Student must design and implement appropriate solution by applying their knowledge of: fundamental of programming, user-defined methods which is recursive and parameter passing mechanisms. (Individual project).  Submission of all activities (i.e. laboratory/hands-on exercises, classroom activities, and performance activities) in the form of hardcopy and softcopy. (individual project) Grading Criteria: Requirement/Assessment Task Percentage Major Course Output Program Exercises/Activities 30% Major Exams Midterm & Final 40% Class Standing Attendance 10% Quizzes 20% TOTAL 100% Course Materials:  Rubrics  Course policies References: Books: Goldwasser, Goodrich & Tamassia. Data Structures & Algorithms in Java 6 th edition WILEY, 2013. Baile, Duane A. Java Structure. Data Structures in Java for the Principled Programmer. 7th Edition. 2007 Anderson & Franceschi. Java Illuminated (An Active Learning Approach) Vol 1. Bronson, Gary J. Java Programming Principles and Practices for Scientists and Engineers. Cengage Learning, 2013. Internet Resources: https://www.geeksforgeeks.org/overview-of-data-structures-set-1-linear-data-structures/ https://www.tutorialspoint.com/data_structures_algorithms/algorithms_basics.htm ITE 2: Computer Programming 1 Page 5 of 30 https://www.studytonight.com/data-structures/introduction-to-data-structures ITE 2: Computer Programming 1 Page 6 of 30 Module 1 Module Title: Data Structure and Algorithm Module Description: This module contains standard data representation and algorithms to solve computing problems efficiently (with respect to space requirements and time complexity algorithm). The students are expected of basic knowledge in computer programming using java language. This module helps them to learn new topics presented below: Purpose of the Module: This module let the students learn the following:  Overview of Data Structure and Algorithm  What is data structure  Primitives vs aggregates  Basic types of Data Structures  Character of data structure  Character of algorithm  Represent an algorithm using flowcharts and pseudocode  Measuring algorithm efficiency  Algorithm Complexity  Asymptotic notations  Big oh Notation (?)  Omega Notation (Ω)  Theta Notation (θ) Module Guide: The learner are advice to read the lessons and suggested lessons to equip more knowledge about the topic. It teaches the students on how to properly use new topics in java programming. The student must also read, analyze, and carefully understand its program sample in order to create their own program. It is also good to read, practice and participate the lessons, activities and exercises in order to understand the complicated problems. The activities must be solving and submit in order to pass the basic requirements. Module Outcomes: It teaches the students how to understand standard data representation and algorithms to solve computing problems efficiently (with respect to space requirements and time complexity algorithm). ITE 2: Computer Programming 1 Page 7 of 30 Assess and recommend revisions to another program’s code regarding documentation and program style standards that contribute to readability and maintainability of software. Module Requirements: At the end of this module, the students will come up with the following  Create computer program using java language with new topics.  Declare and create the array, string, structure, pointers, heap, stacks, memory, polymorphism and inheritance. ITE 2: Computer Programming 1 Page 8 of 30 Learning Plan Lesson No: 1 Lesson Title: Overview of Data Structure and Algorithm Let’s Get Started: What is a data structure? “ [A] mathematical model for data types where a data type is defined by its behavior (semantics) from the point of view of a user of the data, specifically in terms of possible values, possible operations on data of this type, and the behavior of these operations. ” Data Structure is a way of collecting and organizing data in such a way that we can perform operations on these data in an effective way. Data Structures is about rendering data elements in terms of: some relationship, for better organization and storage. For example:  Creating a new and empty list  Appending a value to the end of the list  Inserting a value within the list  Deleting a value from the list  Iterating over the list  Destroying the list We can organize this data as a record like Player record, which will have both player's name and age in it. Now we can collect and store player's records in a file or database as a data structure. For example: "Dhoni" 30, "Gambhir" 31, "Sehwag" 33 If you are aware of Object Oriented programming concepts, then a class also does the same thing, it collects different type of data under one single entity. The only difference being, data structures provides for techniques to access and manipulate data efficiently. In simple language, Data Structures are structures programmed to store ordered data, so that various operations can be performed on it easily. It represents the knowledge of data to be organized in memory. It should be designed and implemented in such a way that it reduces the complexity and increases the efficiency. A data structure is a particular way of organizing data in a computer so that it can be used effectively. The idea is to reduce the space and time complexities of different tasks. Or simply a Data Structure is a systematic way to organize data in order to use it efficiently. Following terms are the foundation terms of a data structure.  Interface − Each data structure has an interface. Interface represents the set of operations that a data structure supports. An interface only provides the list of supported operations, type of parameters they can accept and return type of these operations.  Implementation − Implementation provides the internal representation of a data structure. Implementation also provides the definition of the algorithms used in the operations of the data structure. ITE 2: Computer Programming 1 Page 9 of 30 Types of Data Structures There are two types of data structures:  Primitive data structure  Non-primitive data structure Primitive Data structure The primitive data structures are primitive data types. The int, char, float, double, and pointer are the primitive data structures that can hold a single value. Non-Primitive Data structure The non-primitive data structure is divided into two types:  Linear data structure  Non-linear data structure Linear Data Structure: The arrangement of data in a sequential manner is known as a linear data structure. The data structures used for this purpose are Arrays, Linked list, Stacks, and Queues. In these data structures, one element is connected to only one another element in a linear form. Non-linear data structure: When one element is connected to the 'n' number of elements known as a non- linear data structure. The best example is trees and graphs. In this case, the elements are arranged in a random manner. Data structures can also be classified as:  Static data structure: It is a type of data structure where the size is allocated at the compile time. Therefore, the maximum size is fixed.  Dynamic data structure: It is a type of data structure where the size is allocated at the run time. Therefore, the maximum size is flexible. Major Operations The major or the common operations that can be performed on the data structures are:  SEARCHING: We can search for any element in a data structure.  SORTING: We can sort the elements of a data structure either in an ascending or descending order.  INSERTION: We can also insert the new element in a data structure.  UPDATION: We can also update the element, i.e., we can replace the element with another element.  DELETION: We can also perform the delete operation to remove the element from ITE 2: Computer Programming 1 Page 10 of 30 the data structure. ITE 2: Computer Programming 1 Page 11 of 30 The data structures can also be classified on the basis of the following characteristics: Characterstic Description Linear In Linear data structures,the data items are arranged in a linear sequence. Example: Array Non-Linear In Non-Linear data structures,the data items are not in sequence. Example: Tree, Graph Homogeneous In homogeneous data structures,all the elements are of same type. Example: Array Non- In Non-Homogeneous data structure, the elements may or may not be of the Homogeneous same type. Example: Structures Static Static data structures are those whose sizes and structures associated memory locations are fixed, at compile time. Example: Array Dynamic Dynamic structures are those which expands or shrinks depending upon the program need and its execution. Also, their associated memory locations changes. Example: Linked List created using pointers ITE 2: Computer Programming 1 Page 12 of 30 Characteristics of a Data Structure:  Correctness − Data structure implementation should implement its interface correctly.  Time Complexity − Running time or the execution time of operations of data structure must be as small as possible.  Space Complexity − Memory usage of a data structure operation should be as little as possible. Need for Data Structure: As applications are getting complex and data rich, there are three common problems that applications face now-a-days.  Data Search − Consider an inventory of 1 million(106) items of a store. If the application is to search an item, it has to search an item in 1 million(106) items every time slowing down the search. As data grows, search will become slower.  Processor speed − Processor speed although being very high, falls limited if the data grows to billion records.  Multiple requests − As thousands of users can search data simultaneously on a web server, even the fast server fails while searching the data. To solve the above-mentioned problems, data structures come to rescue. Data can be organized in a data structure in such a way that all items may not be required to be searched, and the required data can be searched almost instantly. Execution Time Cases: There are three cases which are usually used to compare various data structure's execution time in a relative manner.  Worst Case − This is the scenario where a particular data structure operation takes maximum time it can take. If an operation's worst case time is ƒ(n) then this operation will not take more than ƒ(n) time where ƒ(n) represents function of n.  Average Case − This is the scenario depicting the average execution time of an operation of a data structure. If an operation takes ƒ(n) time in execution, then m operations will take mƒ(n) time.  Best Case − This is the scenario depicting the least possible execution time of an operation of a data structure. If an operation takes ƒ(n) time in execution, then the actual operation may take time as the random number which would be maximum as ƒ(n). ITE 2: Computer Programming 1 Page 13 of 30 Advantages of Data structures The following are the advantages of a data structure:  EFFICIENCY: If the choice of a data structure for implementing a particular ADT is proper, it makes the program very efficient in terms of time and space.  REUSABILITY: he data structures provide reusability means that multiple client programs can use the data structure.  ABSTRACTION: The data structure specified by an ADT also provides the level of abstraction. The client cannot see the internal working of the data structure, so it does not have to worry about the implementation part. The client can only see the interface. Data Structure Classification Linear Data Structures: A data structure is called linear if all of its elements are arranged in the linear order. In linear data structures, the elements are stored in non-hierarchical way where each element has the successors and predecessors except the first and last element. Types of Linear Data Structures are given below: Arrays: An array is a collection of similar type of data items and each data item is called an element of the array. The data type of the element may be any valid data type like char, int, float or double. The elements of array share the same variable name but each one carries a different index number known as subscript. The array can be one dimensional, two dimensional or multidimensional. The individual elements of the array age are: age, age, age, age,.........age, age. Linked List: Linked list is a linear data structure which is used to maintain a list in the memory. It can be seen as the collection of nodes stored at non-contiguous memory locations. ITE 2: Computer Programming 1 Page 14 of 30 Each node of the list contains a pointer to its adjacent node. Stack: Stack is a linear list in which insertion and deletions are allowed only at one end, called top. A stack is an abstract data type (ADT), can be implemented in most of the programming languages. It is named as stack because it behaves like a real-world stack, for example: - piles of plates or deck of cards etc. Queue: Queue is a linear list in which elements can be inserted only at one end called rear and deleted only at the other end called front. It is an abstract data structure, similar to stack. Queue is opened at both end therefore it follows First-In-First-Out (FIFO) methodology for storing the data items. Non Linear Data Structures: This data structure does not form a sequence i.e. each item or element is connected with two or more other items in a non-linear arrangement. The data elements are not arranged in sequential structure. Types of Non Linear Data Structures are given below: Trees: Trees are multilevel data structures with a hierarchical relationship among its elements known as nodes. The bottommost nodes in the herierchy are called leaf node while the topmost node is called root node. Each node contains pointers to point adjacent nodes. Tree data structure is based on the parent-child relationship among the nodes. Each node in the tree can have more than one children except the leaf nodes whereas each node can have atmost one parent except the root node. Trees can be classfied into many categories which will be discussed later in this tutorial. Graphs: Graphs can be defined as the pictorial representation of the set of elements (represented by vertices) connected by the links known as edges. A graph is different from tree in the sense that a graph can have cycle while the tree can not have the one. Operations on data structure 1) Traversing: Every data structure contains the set of data elements. Traversing the data structure means visiting each element of the data structure in order to perform some specific operation like searching or sorting. Example: If we need to calculate the average of the marks obtained by a student in 6 different subject, we need to traverse the complete array of marks and calculate the total sum, then we will devide that sum by the number of subjects i.e. 6, in order to findthe average. 2) Insertion: Insertion can be defined as the process of adding the elements to the data structure at any location. If the size of data structure is n then we can only insert n-1 data elements into it. 3) Deletion: The process of removing an element from the data structure is called Deletion. We can delete an element from the data structure at any random location. If we try to delete an element from an empty data structure then underflow occurs. ITE 2: Computer Programming 1 Page 15 of 30 4) Searching: The process of finding the location of an element within the data structure is called Searching. There are two algorithms to perform searching, Linear Search and Binary Search. We will discuss each one of them later in this tutorial. 5) Sorting: The process of arranging the data structure in a specific order is known as Sorting. There are many algorithms that can be used to perform sorting, for example, insertion sort, selection sort, bubble sort, etc. 6) Merging: When two lists List A and List B of size M and N respectively, of similar type of elements, clubbed or joined to produce the third list, List C of size (M+N), then this process is called merging What is an Algorithm ? An algorithm is a finite set of instructions or logic, written in order, to accomplish a certain predefined task. Algorithm is not the complete code or program, it is just the core logic(solution) of a problem, which can be expressed either as an informal high level description as pseudocode or using a flowchart. Every Algorithm must satisfy the following properties:  Input- There should be 0 or more inputs supplied externally to the algorithm.  Output- There should be at least 1 output obtained.  Definiteness- Every step of the algorithm should be clear and well defined.  Finiteness- The algorithm should have finite number of steps.  Correctness- Every step of the algorithm must generate a correct output. Why do we need Algorithms? We need algorithms because of the following reasons: Algorithm is a step-by-step procedure, which defines a set of instructions to be executed in a certain order to get the desired output. Algorithms are generally created independent of underlying languages, i.e. an algorithm can be implemented in more than one programming language.  Scalability: It helps us to understand the scalability. When we have a big real- world problem, we need to scale it down into small-small steps to easily analyze the problem.  Performance: The real-world is not easily broken down into smaller steps. If the problem can be easily broken into smaller steps means that the problem is feasible. Let's understand the algorithm through a real-world example. Suppose we want to make a lemon juice, so following are the steps required to make a lemon juice: Step 1: First, we will cut the lemon into half. Step 2: Squeeze the lemon as much you can and take out its juice in a container. Step 3: Add two tablespoon sugar in it. Step 4: Stir the container until the sugar gets dissolved. Step 5: When sugar gets dissolved, add some water and ice in it. Step 6: Store the juice in a fridge for 5 to minutes. Step 7: Now, it's ready to drink. ITE 2: Computer Programming 1 Page 16 of 30 The above real-world can be directly compared to the definition of the algorithm. We cannot perform the step 3 before the step 2, we need to follow the specific order to make lemon juice. An algorithm also says that each and every instruction should be followed in a specific order to perform a specific task. Now we will look an example of an algorithm in programming. We will write an algorithm to add two numbers entered by the user. The following are the steps required to add two numbers entered by the user: Step 1: Start Step 2: Declare three variables a, b, and sum. Step 3: Enter the values of a and b. Step 4: Add the values of a and b and store the result in the sum variable, i.e., sum=a+b. Step 5: Print sum Step 6: Stop Characteristics of an Algorithm: Not all procedures can be called an algorithm. An algorithm should have the following characteristics  Unambiguous − Algorithm should be clear and unambiguous. Each of its steps (or phases), and their inputs/outputs should be clear and must lead to only one meaning.  Input − An algorithm should have 0 or more well-defined inputs.  Output − An algorithm should have 1 or more well-defined outputs, and should match the desired output.  Finiteness − Algorithms must terminate after a finite number of steps.  Feasibility − Should be feasible with the available resources  Independent − An algorithm should have step-by-step directions, which should be independent of any programming code. ITE 2: Computer Programming 1 Page 17 of 30 How do you represent an algorithm? Writing code before fully understanding its underlying algorithm can lead to bugs, so what's a better alternative? Two options are flowcharts and pseudocode. Using FLOWCHARTS to represent algorithms: A flowchart is a visual representation of an algorithm's control flow. This representation illustrates statements that need to be executed, decisions that need to be made, logic flow (for iteration and other purposes), and terminals that indicate start and end points. Figure 1 reveals the various symbols that flowcharts use to visualize algorithms. Flowcharts use symbols to represent statements, decisions, logic flow, and terminals Consider an algorithm that initializes a counter to 0, reads characters until a newline (\n) character is seen, increments the counter for each digit character that's been read, and prints the counter's value after the newline character has been read. The flowchart in Figure 2 illustrates this algorithm's control flow. ITE 2: Computer Programming 1 Page 18 of 30 A flowchart's simplicity and its ability to present an algorithm's control flow visually (so that it's is easy to follow) are its major advantages. Flowcharts also have several disadvantages, however:  It's easy to introduce errors or inaccuracies into highly-detailed flowcharts because of the tedium associated with drawing them.  It takes time to position, label, and connect a flowchart's symbols, even using tools to speed up this process. This delay might slow your understanding of an algorithm.  Flowcharts belong to the structured programming era and aren't as useful in an object- oriented context. In contrast, the Unified Modeling Language (UML) is more appropriate for creating object-oriented visual representations. Using PSEUDOCODE to represent algorithms: An alternative to flowcharts is pseudocode, which is a textual representation of an algorithm that approximates the final source code. Pseudocode is useful for quickly writing down an algorithm's representation. Because syntax is not a concern, there are no hard-and-fast rules for writing pseudocode. You should strive for consistency when writing pseudocode. Being consistent will make it much easier to translate the pseudocode into actual source code. For example, consider the following pseudocode representation of the previous counter-oriented flowchart: The pseudocode first presents a couple of DECLARE statements that introduce variables ch and count, initialized to default values. It then presents a DO loop that executes UNTIL ch contains \n (the newline character), at which point the loop ends and a PRINT statement outputs count's value. For each loop iteration, READ causes a character to be read from the keyboard (or perhaps a file--in this case it doesn't matter what constitutes the underlying input source) and assigned to ch. If this character is a digit (one of 0 through 9), count is incremented by 1. ITE 2: Computer Programming 1 Page 19 of 30 How to Write an Algorithm? There are no well-defined standards for writing algorithms. Rather, it is problem and resource dependent. Algorithms are never written to support a particular programming code. As we know that all programming languages share basic code constructs like loops (do, for, while), flow-control (if-else), etc. These common constructs can be used to write an algorithm. We write algorithms in a step-by-step manner, but it is not always the case. Algorithm writing is a process and is executed after the problem domain is well-defined. That is, we should know the problem domain, for which we are designing a solution. Example Let's try to learn algorithm-writing by using an example. Problem − Design an algorithm to add two numbers and display the result. Algorithms tell the programmers how to code the program. Alternatively, the algorithm can be written as: In design and analysis of algorithms, usually the second method is used to describe an algorithm. It makes it easy for the analyst to analyze the algorithm ignoring all unwanted definitions. He can observe what operations are being used and how the process is flowing. Writing step numbers, is optional. ITE 2: Computer Programming 1 Page 20 of 30 We design an algorithm to get a solution of a given problem. A problem can be solved in more than one ways. Hence, many solution algorithms can be derived for a given problem. The next step is to analyze those proposed solution algorithms and implement the best suitable solution. Importance of Algorithms  Theoretical importance: When any real-world problem is given to us and we break the problem into small-small modules. To break down the problem, we should know all the theoretical aspects.  Practical importance: As we know that theory cannot be completed without the practical implementation. So, the importance of algorithm can be considered as both theoretical and practical. Choosing the Right Algorithm: The data structures and algorithms you use critically affect two factors in your applications: 1. Memory usage (for data structures). 2. CPU time (for algorithms that interact with those data structures). It follows that you should be especially mindful of the algorithms and data structures you use for applications that will process lots of data. These include applications used for big data and the Internet of Things. ITE 2: Computer Programming 1 Page 21 of 30 Balancing MEMORY and CPU: When choosing a data structure or algorithm, you will sometimes discover an inverse relationship between memory usage and CPU time: the less memory a data structure uses, the more CPU time associated algorithms need to process the data structure's data items. Also, the more memory a data structure uses, the less CPU time associated algorithms will need to process the data items–leading to faster algorithm results. As much as possible, you should strive to balance memory use with CPU time. You can simplify this task by analyzing algorithms to determine their efficiency. How well does one algorithm perform against another of similar nature? Answering this question will help you make good choices given a choice between multiple algorithms. Approaches of Algorithm: The following are the approaches used after considering both the theoretical and practical importance of designing an algorithm:  Brute force algorithm: The general logic structure is applied to design an algorithm. It is also known as an exhaustive search algorithm that searches all the possibilities to provide the required solution. Such algorithms are of two types: o Optimizing: Finding all the solutions of a problem and then take out the best solution or if the value of the best solution is known then it will terminate if the best solution is known. o Sacrificing: As soon as the best solution is found, then it will stop.  Divide and conquer: It is a very implementation of an algorithm. It allows you to design an algorithm in a step-by-step variation. It breaks down the algorithm to solve the problem in different methods. It allows you to break down the problem into different methods, and valid output is produced for the valid input. This valid output is passed to some other function.  Greedy algorithm: It is an algorithm paradigm that makes an optimal choice on each iteration with the hope of getting the best solution. It is easy to implement and has a faster execution time. But, there are very rare cases in which it provides the optimal solution.  Dynamic programming: It makes the algorithm more efficient by storing the intermediate results. It follows five different steps to find the optimal solution for the problem: 1. It breaks down the problem into a subproblem to find the optimal solution. 2. After breaking down the problem, it finds the optimal solution out of these subproblems. 3. Stores the result of the subproblems is known as memorization. 4. Reuse the result so that it cannot be recomputed for the same subproblems. 5. Finally, it computes the result of the complex program.  Branch and Bound Algorithm: The branch and bound algorithm can be applied to only integer programming problems. This approach divides all the sets of feasible solutions into smaller subsets. These subsets are further evaluated to find the best solution.  Randomized Algorithm: As we have seen in a regular algorithm, we have predefined input and required output. Those algorithms that have some defined set of inputs and required output, and follow some described steps are known as deterministic algorithms. What happens that ITE 2: Computer Programming 1 Page 22 of 30 when the random variable is introduced in the randomized algorithm?. In a randomized algorithm, some random bits are introduced by the algorithm and added in the input to produce the output, which is random in nature. Randomized algorithms are simpler and efficient than the deterministic algorithm.  Backtracking: Backtracking is an algorithmic technique that solves the problem recursively and removes the solution if it does not satisfy the constraints of a problem. ALGORITHM ANALYSIS: Efficiency of an algorithm can be analyzed at two different stages, before implementation and after implementation. They are the following :  A Priori Analysis − This is a theoretical analysis of an algorithm. Efficiency of an algorithm is measured by assuming that all other factors, for example, processor speed, are constant and have no effect on the implementation.  A Posterior Analysis − This is an empirical analysis of an algorithm. The selected algorithm is implemented using programming language. This is then executed on target computer machine. In this analysis, actual statistics like running time and space required, are collected. Measuring algorithm efficiency: Some algorithms perform better than others. For example, the Binary Search algorithm is almost always more efficient than the Linear Search algorithm–something you'll see for yourself in Part 2. You want to choose the most efficient algorithm for your application's needs, but that choice might not be as obvious as you would think. For instance, what does it mean if the Selection Sort algorithm (introduced in Part 2) takes 0.4 seconds to sort 10,000 integers on a given machine? That benchmark is only valid for that particular machine, that particular implementation of the algorithm, and for the size of the input data. As computer scientist, we use time complexity and space complexity to measure an algorithm's efficiency, distilling these into complexity functions to abstract implementation and runtime environment details. Complexity functions reveal the variance in an algorithm's time and space requirements based on the amount of input data: ITE 2: Computer Programming 1 Page 23 of 30 Algorithm Complexity: Suppose X is an algorithm and n is the size of input data, the time and space used by the algorithm X are the two main factors, which decide the efficiency of X.  Time Factor − Time is measured by counting the number of key operations such as comparisons in the sorting algorithm. This measures an algorithm's time complexity--meaning how long an algorithm takes to complete.  Space Factor − Space is measured by counting the maximum memory space required by the algorithm. This measures an algorithm's space complexity-- meaning the amount of memory overhead required by the algorithm to perform its task. An algorithm is said to be efficient and fast, if it takes less time to execute and consumes less memory space. The performance of an algorithm is measured on the basis of following properties :  Space Complexity  Time Complexity The complexity of an algorithm f(n) gives the running time and/or the storage space required by the algorithm in terms of n as the size of input data. Space Complexity: Its the amount of memory space required by the algorithm, during the course of its execution. Space complexity must be taken seriously for multi-user systems and in situations where limited memory is available. An algorithm generally requires space for following components :  Instruction Space: Its the space required to store the executable version of the program. This space is fixed, but varies depending upon the number of lines of code in the program.  Data Space: Its the space required to store all the constants and variables(including temporary variables) value.  Environment Space: Its the space required to store the environment information needed to resume the suspended function. Space complexity of an algorithm represents the amount of memory space required by the algorithm in its life cycle. The space required by an algorithm is equal to the sum of the following two components :  A fixed part that is a space required to store certain data and variables, that are independent of the size of the problem. For example, simple variables and constants used, program size, etc. ITE 2: Computer Programming 1 Page 24 of 30  A variable part is a space required by variables, whose size depends on the size of the problem. For example, dynamic memory allocation, recursion stack space, etc. Space complexity S(P) of any algorithm P is S(P) = C + SP(I), where C is the fixed part and S(I) is the variable part of the algorithm, which depends on instance characteristic I. Following is a simple example that tries to explain the concept. Here we have three variables A, B, and C and one constant. Hence S(P) = 1 + 3. Now, space depends on data types of given variables and constant types and it will be multiplied accordingly. Time Complexity: Time complexity of an algorithm represents the amount of time required by the algorithm to run to completion. Time requirements can be defined as a numerical function T(n), where T(n) can be measured as the number of steps, provided each step consumes constant time. For example, addition of two n-bit integers takes n steps. Consequently, the total computational time is T(n) = c ∗ n, where c is the time taken for the addition of two bits. Here, we observe that T(n) grows linearly as the input size increases. Time Complexity is a way to represent the amount of time required by the program to run till its completion. It's generally a good practice to try to keep the time required minimum, so that our algorithm completes it's execution in the minimum time possible. ASYMPTOTIC ANALYSIS: As we know that data structure is a way of organizing the data efficiently and that efficiency is measured either in terms of time or space. So, the ideal data structure is a structure that occupies the least possible time to perform all its operation and the memory space. Our focus would be on finding the time complexity rather than space complexity, and by finding the time complexity, we can decide which data structure is the best for an algorithm. The main question arises in our mind that on what basis should we compare the time complexity of data structures?. The time complexity can be compared based on operations performed on them. Let's consider a simple example. Suppose we have an array of 100 elements, and we want to insert a new element at the beginning of the array. This becomes a very tedious task as we first need to shift the elements towards the right, and we will add new element at the starting of the array. Suppose we consider the linked list as a data structure to add the element at the beginning. The linked list contains two parts, i.e., data and address of the next node. We simply add the address ITE 2: Computer Programming 1 Page 25 of 30 of the first node in the new node, and head pointer will now point to the newly added node. Therefore, we conclude that adding the data at the beginning of the linked list is faster than the arrays. In this way, we can compare the data structures and select the best possible data structure for performing the operations. How to find the Time Complexity or running time for performing the operations? The measuring of the actual running time is not practical at all. The running time to perform any operation depends on the size of the input. Let's understand this statement through a simple example. Suppose we have an array of five elements, and we want to add a new element at the beginning of the array. To achieve this, we need to shift each element towards right, and suppose each element takes one unit of time. There are five elements, so five units of time would be taken. Suppose there are 1000 elements in an array, then it takes 1000 units of time to shift. It concludes that time complexity depends upon the input size. Therefore, if the input size is n, then f(n) is a function of n that denotes the time complexity. How to calculate f(n)? Calculating the value of f(n) for smaller programs is easy but for bigger programs, it's not that easy. We can compare the data structures by comparing their f(n) values. We can compare the data structures by comparing their f(n) values. We will find the growth rate of f(n) because there might be a possibility that one data structure for a smaller input size is better than the other one but not for the larger sizes. Now, how to find f(n). Let's look at a simple example. f(n) = 5n2 + 6n + 12 where n is the number of instructions executed, and it depends on the size of the input. When n=1 % of running time due to 5n2 = Asymptotic Analysis * 100 = 21.74% % of running time due to 6n = Asymptotic Analysis * 100 = 26.09% % of running time due to 12 = Asymptotic Analysis * 100 = 52.17% From the above calculation, it is observed that most of the time is taken by 12. But, we have to find the growth rate of f(n), we cannot say that the maximum amount of time is taken by 12. Let's assume the different values of n to find the growth rate of f(n). ITE 2: Computer Programming 1 Page 26 of 30 N 5n2 6n 12 1 21.74% 26.09% 52.17% 10 87.41% 10.49% 2.09% 100 98.79% 1.19% 0.02% 1000 99.88% 0.12% 0.0002% As we can observe in the above table that with the increase in the value of n, the running time of 5n2 increases while the running time of 6n and 12 also decreases. Therefore, it is observed that for larger values of n, the squared term consumes almost 99% of the time. As the n2 term is contributing most of the time, so we can eliminate the rest two terms. Therefore, f(n) = 5n2 Here, we are getting the approximate time complexity whose result is very close to the actual result. And this approximate measure of time complexity is known as an Asymptotic complexity. Here, we are not calculating the exact running time, we are eliminating the unnecessary terms, and we are just considering the term which is taking most of the time. In mathematical analysis, asymptotic analysis of algorithm is a method of defining the mathematical foundation of its run-time performance. Using the asymptotic analysis, we can easily conclude the average-case, best-case and worst-case scenario of an algorithm. It is used to mathematically calculate the running time of any operation inside an algorithm. Example: Running time of one operation is x(n) and for another operation, it is calculated as f(n2). It refers to running time will increase linearly with an increase in 'n' for the first operation, and running time will increase exponentially for the second operation. Similarly, the running time of both operations will be the same if n is significantly small. Usually, the time required by an algorithm comes under three types:  Worst case: It defines the input for which the algorithm takes a huge time.  Average case: It takes average time for the program execution.  Best case: It defines the input for which the algorithm takes the lowest time ITE 2: Computer Programming 1 Page 27 of 30 ASYMPTOTIC NOTATIONS The commonly used asymptotic notations used for calculating the running time complexity of an algorithm is given below: 1. Big oh Notation (?) 2. Omega Notation (Ω) 3. Theta Notation (θ) BIG oh NOTATION (O)  Big O notation is an asymptotic notation that measures the performance of an algorithm by simply providing the order of growth of the function.  This notation provides an upper bound on a function which ensures that the function never grows faster than the upper bound. So, it gives the least upper bound on a function so that the function never grows faster than this upper bound. It is the formal way to express the upper boundary of an algorithm running time. It measures the worst case of time complexity or the algorithm's longest amount of time to complete its operation. It is represented as shown below: For example: If f(n) and g(n) are the two functions defined for positive integers, then f(n) = O(g(n)) as f(n) is big oh of g(n) or f(n) is on the order of g(n)) if there exists constants c and no such that: f(n)≤c.g(n) for all n≥no This implies that f(n) does not grow faster than g(n), or g(n) is an upper bound on the function f(n). In this case, we are calculating the growth rate of the function which eventually calculates the worst time complexity of a function, i.e., how worst an algorithm can perform. Let's understand through examples Example 1: f(n)=2n+3 , g(n)=n Now, we have to find Is f(n)=O(g(n))? To check f(n)=O(g(n)), it must satisfy the given condition: f(n)0 Let's consider a simple example. If f(n) = 2n+3, g(n) = n, Is f(n)= Ω (g(n))? It must satisfy the condition: f(n)>=c.g(n) To check the above condition, we first replace f(n) by 2n+3 and g(n) by n. 2n+3>=c*n Suppose c=1 2n+3>=n (This equation will be true for any value of n starting from 1). Therefore, it is proved that g(n) is big omega of 2n+3 function. ITE 2: Computer Programming 1 Page 30 of 30 As we can see in the above figure that g(n) function is the lower bound of the f(n) function when the value of c is equal to 1. Therefore, this notation gives the fastest running time. But, we are not more interested in finding the fastest running time, we are interested in calculating the worst-case scenarios because we want to check our algorithm for larger input that what is the worst time that it will take so that we can take further decision in the further process. Theta Notation (θ)  The theta notation mainly describes the average case scenarios.  It represents the realistic time complexity of an algorithm. Every time, an algorithm does not perform worst or best, in real-world problems, algorithms mainly fluctuate between the worst-case and best-case, and this gives us the average case of the algorithm.  Big theta is mainly used when the value of worst-case and the best-case is same.  It is the formal way to express both the upper bound and lower bound of an algorithm running time. Let's understand the big theta notation mathematically: Let f(n) and g(n) be the functions of n where n is the steps required to execute the program then: f(n)= θg(n) The above condition is satisfied only if when c1.g(n)

EVSU Module 1 - Part 1 IT 213 Data Structure and Algorithm.pdf

Document Details

Related

Full Transcript