Ch 01 - Algorithm Analysis (1).pdf

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Full Transcript

School of Computer Science COMP 8547 “Advanced Computing Concepts” Fall 2024 Chapter 1: Algorithm Analysis DR. OLENA SYROTKINA Copyright Notice This content is protected and may not be shared, uploaded, or distributed. Any unauthorized distr...

School of Computer Science COMP 8547 “Advanced Computing Concepts” Fall 2024 Chapter 1: Algorithm Analysis DR. OLENA SYROTKINA Copyright Notice This content is protected and may not be shared, uploaded, or distributed. Any unauthorized distribution, reproduction or sharing of this material, including uploading to CourseHero, Chegg, StuDocu or other websites, is strictly prohibited and may be subject to legal action. Students are granted access to this material solely for their personal use and may not use it for any other purpose without the express written consent of the instructor. Thank you for respecting our intellectual property rights. 3 2 Get to know course team Course outline Introduction to computing Algorithms Experimental analysis Pseudocode Random access machine Agenda Functions in algorithm analysis Asymptotic analysis Asymptotic notation Big-Oh, Big-Omega, Big-Theta, Little-Oh, Little-Omega Examples Search Prefix averages Maximum contiguous subsequence sum Java and Eclipse 3 3 Course Team Instructor: Dr. Olena Syrotkina Email: [email protected] Office hours: Monday, 2:45 p.m. – 4:15 p.m. (300 Ouellette Ave., Office 4004); Wednesday, 2:30 p.m. – 4:00 p.m. (300 Ouellette Ave., Office 4004); Thursday, 2:30 a.m. – 3:30 p.m. (300 Ouellette Ave., Office 4004) Office location: 300 Ouellette Ave., Office 4004 Preferred communication: Teams Dr. Olena Syrotkina was born, raised and educated in the Ukrainian city of Dnipro. Her teaching career started in Dnipro University of Technology (formerly National Mining University, Dnipro, Ukraine) in 2011 where she taught various Computer Science courses for ten years. In 2018 she received her Ph.D. in Mathematical Simulation and Methods of Computation from the Ukrainian State University of Science and Technologies (formerly National Metallurgical Academy of Ukraine). In 2022 she joined the School of Computer Science at the University of Windsor where she is currently teaching COMP 8547 Advanced Advanced Database Topics courses (Summer 2022, Fall 2022), COMP 2067 Programming for Beginners (Summer 2023), COMP 2120 Object-Oriented Programming using Java (Summer 2023, Summer 2024), and COMP 2547 Applied Algorithms 3 Computing Concepts (Winter 2022, Fall 2022, Winter 2023, Fall 2023, Winter 2024, Summer 2024, Fall 2024), COMP 8157 and Data Structures. 4 Course Team Graduate Assistants: Ali Nakhaeisharif Darpan Khanna Sudharshan Gurpartap Singh Abhishek Mahajan Sundararajan Ahluwalia Email: Email: Email: [email protected] [email protected] Email: Email: [email protected] [email protected] [email protected] Office hours: Office hours: Office hours: Wednesday from 10 AM to 11 AM Wednesday from 2:30 PM to 3:30 PM Office hours: TBA Office hours: Wednesday from 3:30 PM to 4:30 PM 3 Monday from 11:00 AM to 12:00 PM 5 Course Outline This course covers advanced topics in principles and applications of algorithm design and analysis, programming techniques, advanced data structures, languages, compilers and translators, regular expressions, grammars, computing and intractability. Cases studies and applications in current programming languages are explored in class and labs. This course is restricted to students in the Master of Applied Computing program. Assessments ✓ Labs & Assignments ✓ Participation in seminars/workshops ✓ Random lecture quizzes ✓ Mid-term Exam ✓ Final project 3 6 COURSE EVALUATION You need to score 4% out of 4.5% possible (each lab = 0.5%, 9 labs in total). Labs 4% Labs should be completed during the class. Students are required to show a GA/TA their completed lab after the class and GA/TA marks it. Attendance in class is mandatory, and the plagiarism score should be below 50%. Please always read the submission requirements before turning in your lab. Grace period: 5 hours. Submitting the lab within this time frame is considered a regular submission with no penalties applied. No make-ups will be considered for missed labs. You will have 4 assignments in total. Assignments 21% Please always read the submission requirements before turning in your assignment. Any submission after the deadline will receive a penalty of 10% for the first 24 hrs, and so on, for up to three days (10% per day = 30% total). After three days, the mark will be zero. Grace period: 5 hours. Submitting the assignment within this time frame is considered a regular submission with no penalties applied. To receive your participation marks you are required to attend 10 Seminars/Workshops (CS Participation in 5% Workshops, Colloquiums, and Thesis defence or proposal) during the Fall 2024 term. You will seminars/workshops be required to register for the event and sign in after the event. The attendance will be tracked by Melissa and will be provided to the instructors at the end of the term to calculate your participation marks. Oct 25, 2024 (Tentative), in-person, the details will be announced on Brightspace Mid-Term Exam I 20% Nov 29, 2024 (Tentative), in-person, the details will be announced on Brightspace Mid-Term Exam II 20% Group work, the details will be announced on Brightspace Final project 25% These are unannounced quizzes regarding lectures to test the topics discussed in the lectures. Random lecture quizzes (RLQ) 5% RLQ will be conducted randomly at any time (before or after the class) and will contain multiple-choice questions or short answer questions. There are no make-ups for missed RLQ. 7 COURSE SCHEDULE Weeks Topics Dates Description & Deadlines 1 Algorithm Analysis Sec 1: Sept 9, 2024; Sec 2: Sept 11, Lab 1 (Introductory lab, not graded); 2024; Sec 3: Sept 12, 2024. P1: Forming a group for the final project (Deadline: Sec 1: Sept 16; Sec 2: Sept 18; Sec 3: Sept 19.) 2 Linear Data Structures Sec 1: Sept 16, 2024; Sec 2: Sept 18, Lab 2; 2024; Sec 3: Sept 19, 2024. P2: Choosing the variant of your final project (Sec 1: Sept 23; Sec 2: Sept 25; Sec 3: Sept 26) 3 Search Trees Sec 1: Sept 23, 2024; Sec 2: Sept 25, Lab 3; 2024; Sec 3: Sept 26, 2024. 4 Sorting Sec 1: Sept 30, 2024; Sec 2: Oct 2, Lab 4; Assignment 1; 2024; Sec 3: Oct 3, 2024. 5 Advanced Design and Sec 1: Oct 7, 2024; Sec 2: Oct 9, 2024; Lab 5; Analysis Sec 3: Oct 10, 2024. P3: Milestone Report I READING WEEK (October 12-20, 2024) 8 COURSE SCHEDULE Weeks Topics Dates Description & Deadlines 6 Graph Algorithms Sec 1: Oct 21, 2024; Sec 2: Oct 23, 2024; Lab 6; Assignment 2; Sec 3: Oct 24, 2024. 7 Text Processing Sec 1: Oct 28, 2024; Sec 2: Oct 29, 2024; Lab 7; Sec 3: Oct 31, 2024. 8 Languages Sec 1: Nov 4, 2024; Sec 2: Nov 6, 2024; Lab 8; Assignment 3; Sec 3: Nov 7, 2024. 9 Selected Topics I Sec 1: Nov 11, 2024; Sec 2: Nov 13, 2024; Lab 9; Sec 3: Nov 14, 2024. 10 Selected Topics II Sec 1: Nov 18, 2024; Sec 2: Nov 20, 2024; Lab 10; Assignment 4; Sec 3: Nov 21, 2024. 11 Final Project Defence (in- Nov 25-28, 2024 P5: Final project report, presentation, person at 300 Ouellette project source code (bonus + 1 pts for those Ave.) groups who will defend their projects during Week 11) 12 Final Project Defence (in- Dec 2-4, 2024 P5: Final project report, presentation, person at 300 Ouellette project source code (no bonus) Ave.) 9 COURSE POLICY Academic Honesty: ✓ Refer to UWindsor Policy Page. ✓ If you're not sure, ask the professor. ✓ Plagiarism will not be tolerated. All discussion and announcements will be on Brightspace course page. Missed Assessment Make-up: No make-ups will be considered for ✓ missing a mid-exam; ✓ missing labs; ✓ missing a random lecture quiz; ✓ missing a final project presentation; Late Assignment: 3 ✓ Any submission after the deadline will receive a penalty of 10% for the first 24 hrs, and so on, for up to three days. After three days, the mark will be zero. 10 ASSESSMENTS Labs: to understand the topics discussed in the lectures and to learn more about advanced algorithms and data structures. 1. Labs should be completed during the class. Students are required to show a GA/TA their completed lab after the class, and the GA/TA will mark it as 'done,' 'not done,' or 'partially done.’ 2. If a student misses the class, they will lose points for the lab, even if they submit it on Brightspace after the lab has concluded. 3. Each lab must be submitted through Brightspace in both *.java and *.txt formats (+ screenshots) and is subject to a plagiarism check. 4. 0.5 points will be awarded for each lab if two conditions are met: A) The student attended the lab and showed the result to the GA; B) The plagiarism check originality score does not exceed 50%. 5. No points will be awarded for labs submitted via email, Teams, or other platforms, for sending zip archives, or for failing to submit your code in *.txt files or screenshots. 11 Student Responsibilities Be polite in all dealings with the professor, the GA/TAs, and the other students. Come to the class on time and ready to participate in the learning process. If you miss any announcement, it is your responsibility to catch up on instruction you have missed. Make sure that you do not plagiarize in any assessments. Make sure to submit all assessments on time. 3 12 ASSESSMENTS Participation in seminars/workshops: To receive your participation marks you are required to attend a total of 10 Seminars/Workshops (CS Workshops, Colloquiums, and Thesis defence or proposal) during the Fall 2024 term. You will be required to register for the event, sign in and complete the QR code after the event. The attendance will be tracked by the Admin staff and will be provided to the instructors at the end of the term to calculate your participation marks. These events are regularly announced, and you can find the announcements at the following link, as well: https://www.uwindsor.ca/science/computerscience/event-calendar/month The participation is saved by registering to the attendance list (or scanning a QR code) available at each event. 13 ASSESSMENTS Random lecture quizzes (RLQ): These are quizzes regarding lectures to test the topics discussed in the lectures. RLQ will be conducted randomly at any time (before or after the class) and will contain multiple-choice questions. There are no make-ups for missed RLQ. 14 ASSESSMENTS Final project Milestone report I - 0 points* *There are no points for your milestone report. However, for not submitting their milestone reports, students will be faced a penalty of 10% deducted from their final project total mark. Presentation and Q&A (during your class in the 11th and 12th week) 15 TEXTBOOKS Introduction to Algorithms, fourth Algorithm Design and Data Structures and Algorithms in Java, edition by Thomas H. Cormen, Applications by M. Goodrich 6th Edition, by M. Goodrich and R. Charles E. Leiserson, Ronald L. and R. Tamassia, Wiley, 2015. Tamassia, Wiley, 2014 Rivest and Clifford Stein, The MIT Press, 2022. TEXTBOOKS (OPTIONAL) Data Structures and Algorithms An Open Guide to Data Made Easy: Data Structures and Structures and Algorithmic Puzzles, 5th Algorithms by Paul W. Edition, by Narasimha Bible and Lucas Moser, Karumanchi, CareerMonk PALNI Press, 2023. Publications, 2017. 17 Introduction to Computing Computing refers to the use of computers and other electronic systems to process, store, retrieve, and manage data. It encompasses a wide range of activities and technologies related to calculations, problem-solving, and information processing. At its core, computing involves the execution of instructions (usually in the form of algorithms) by a computer to perform specific tasks, such as calculations, data analysis, communication, and automation. Algorithms are the foundation of any computational process, and advanced computing concepts rely heavily on the efficiency and effectiveness of these algorithms to perform at scale or in novel ways. What is an Algorithm? Let us consider the problem of preparing an omelette. To prepare an omelette, we follow the steps given below: 1) Get the frying pan. 2) Get the oil. a. Do we have oil? i. If yes, put it in the pan. ii. If no, do we want to buy oil? 1. If yes, then go out and buy. 2. If no, we can terminate. 3) Turn on the stove, etc... What we are doing is, for a given problem (preparing an omelette), we are providing a step-by-step procedure for solving it. The formal definition of an algorithm can be stated as: An algorithm is the step-by-step unambiguous instructions to solve a given problem. 19 Algorithm Definition: An algorithm is a sequence of steps used to solve a problem in a finite amount of time Input Output Properties ▪ Correctness: must provide the correct output for every input ▪ Performance: Measured in terms of the resources used (time and space) ▪ End: must finish in a finite amount of time 20 Why the Analysis of Algorithms? To go from city “A” to city “B”, there can be many ways of accomplishing this: by flight, by bus, by train and also by bicycle. Depending on the availability and convenience, we choose the one that suits us. Similarly, in computer science, multiple algorithms are available for solving the same problem (for example, a sorting problem has many algorithms, like insertion sort, selection sort, quick sort and many more). Algorithm analysis helps us to determine which algorithm is most efficient in terms of time and space consumed. 21 Goal of the Analysis of Algorithms The goal of the analysis of algorithms is to compare algorithms (or solutions) mainly in terms of running time but also in terms of other factors (e.g., memory, developer effort, etc.) 22 What is Running Time Analysis? It is the process of determining how processing time increases as the size of the problem (input size) increases. Input size is the number of elements in the input, and depending on the problem type, the input may be of different types. The following are the common types of inputs. Size of an array A 21 22 23 25 24 23 22 Polynomial degree 0 1 2 3 4 5 6 Number of elements in a matrix Number of bits in the binary representation of the input Vertices and edges in a graph. 23 How to Compare Algorithms? To compare algorithms, let us define a few objective measures: Execution times? Not a good measure as execution times are specific to a particular computer. Number of statements executed? Not a good measure, since the number of statements varies with the programming language as well as the style of the individual programmer. Ideal solution? Let us assume that we express the running time of a given algorithm as a function of the input size n (i.e., f(n)) and compare these different functions corresponding to running times. This kind of comparison is independent of machine time, programming style, etc. 24 Experimental vs Theoretical analysis 9000 Experimental analysis: Write a program that implements the 8000 algorithm 7000 Run the program with inputs of varying 6000 size and composition Time (ms) 5000 Keep track of the CPU time used by the program on each input size 4000 Plot the results on a two-dimensional plot 3000 Limitations: 2000 Depends on hardware and programming 1000 language Need to implement the algorithm and 0 0 20 40 60 80 100 debug the programs Input Size 25 Theoretical analysis – main framework 2. Write algorithm in Advantages: pseudocode max  A Uses a high-level description of for i  1 to n − 1 do the algorithm instead of an if A[i]  max then max  A[i] implementation 1. Problem return max 3. Count primitive Characterizes running time as a Given array A operations Find max of A function of the input size, n. T(n) = 8n – 3 Takes into account all possible inputs Allows us to evaluate the speed of an algorithm independently 5. Compare 4. Find of the hardware/software with other asymptotic algorithms notation for environment T(n): Divide and conquer? O, ,,, ,  Recursive? Linear time? T(n) is (n) 26 What is Rate of Growth? The rate at which the running time increases as a function of input is called rate of growth. Let us assume that you go to a shop to buy a car and a bicycle. If your friend sees you there and asks what you are buying, then in general you say buying a car. This is because the cost of the car is high compared to the cost of the bicycle (approximating the cost of the bicycle to the cost of the car). Total Cost = cost_of_car + cost_of_bicycle Total Cost ≈ cost of car (approximation) For the above-mentioned example, we can represent the cost of the car and the cost of the bicycle in terms of function, and for a given function ignore the low order terms that are relatively insignificant (for large value of input size, n). As an example, in the case below, n4, 2n2, 100n and 500 are the individual costs of some function and approximate to n4 since n4 is the highest rate of growth. n⁴ + 2n² + 100n + 500 ≈ n⁴ 27 Commonly Used Rates of Growth The following functions often 1.00E+30 appear in algorithm analysis: 1.00E+28 n3 Constant  1 (or c) 1.00E+26 ▪ 2n Logarithmic  log n 1.00E+24 ▪ 1.00E+22 ▪ Linear  n 1.00E+20 n2 ▪ N-Log-N  n log n T(n) = time 1.00E+18 ▪ Quadratic  n2 1.00E+16 ▪ Cubic  n3 1.00E+14 ▪ Exponential  2n 1.00E+12 1.00E+10 n 1.00E+08 In a log-log chart, the slope of 1.00E+06 the line corresponds to the 1.00E+04 growth rate of the function 1.00E+02 log n (except exponential) c 1.00E+00 1.00E+00 1.00E+02 1.00E+04 1.00E+06 1.00E+08 1.00E+10 n = size of input 28 Pseudocode Control flow Method call Example: find max ▪ if … then … [else …] var.method (arg [, arg…]) element of an array ▪ while … do … Return value ▪ repeat … until … Algorithm arrayMax(A, n) return expression Input array A of n integers ▪ for … do … ▪ Indentation replaces braces Expressions Output maximum element of A Assignment currentMax  A Method declaration for i  1 to n − 1 do (like = in Java) Algorithm method (arg [, arg…]) if A[i]  currentMax then Input … = Equality testing currentMax  A[i] (like == in Java) Output … return currentMax n2 Superscripts and other return … mathematical formatting allowed Pseudocode provides a high-level description of an algorithm and avoids to show details that are unnecessary for the analysis. 29 Primitive operations Basic computations Examples: Counting primitive operations: performed by an algorithm ▪ Evaluating an expression e.g. 𝑎 − 5 + 𝑐 𝑏 Algorithm arrayMax(A, n) Identifiable in pseudocode currentMax  A # operations 2 Largely independent of the ▪ Assigning a value to a for i  1 to n − 1 do 2n programming language variable if A[i]  currentMax then 2(n − 1) e.g. 𝑎 ← 23 currentMax  A[i] 2(n − 1) Exact definition not { increment counter i } 2(n − 1) ▪ Indexing into an array return currentMax 1 _ important (we will see why e.g. 𝐴 𝑖 later) Total: T(n) = 8n − 3 ▪ Calling a method Take a constant amount of e.g. v.method() time in the RAM model (one unit of time or ▪ Returning from a method constant time) e.g. return a 30 Case analysis 140 Experimental analysis Three cases 120 Worst case: ▪ among all possible inputs, the one which Running time 100 takes the largest amount of time. Best case: 80 60 ▪ The input for which the algorithm runs the fastest 40 Average case: 20 ▪ The average is over all possible inputs ▪ Can be considered as the expected value of 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 T(n), which is a random variable Input number 31 Example: Best vs Worst Case Scenario 32 Asymptotic notation Name Notation Informal Bound Notes /use name Big-Oh Ο(𝑛) order of Upper bound – The most commonly used tight notation for assessing the complexity of an algorithm Big-Theta Θ(𝑛) Upper and The most accurate asymptotic lower bound – notation tight Big- Ω(𝑛) Lower bound Mostly used for determining Omega – tight lower bounds on problems rather than algorithms (e.g., sorting) Little-Oh 𝜊(𝑛) Upper bound – Used when it is difficult to obtain loose a tight upper bound Little- 𝜔(𝑛) Lower bound Used when it is difficult to obtain Omega – loose a tight lower bound 33 Asymptotic notation Big - Oh Notation (O) ▪ Big - Oh notation is used to define the upper boundary of an algorithm in terms of Time Complexity. ▪ Worst Case. Big - Omega Notation (Ω) ▪ Big - Omega notation is used to define the lower boundary of an algorithm in terms of Time Complexity. ▪ Best Case Big - Theta Notation (Θ) ▪ Big - Theta notation is used to define the average boundary of an algorithm in terms of Time Complexity. ▪ Average Case 34 Big-Oh notation Given functions f(n) and g(n), 10,000 3n we say that f(n) is O(g(n)) if there are positive constants c 1,000 2n+10 and n0 such that n f(n)  cg(n) for n  n0 100 Example: 2n + 10 is O(n) ▪ 2n + 10  cn 10 ▪ (c − 2) n  10 ▪ n  10(c − 2) 1 ▪ Pick c = 3 and n0 = 10 1 10 100 1,000 ▪ It follows that n 2n + 10 ≤ 3n for n  10 35 Asymptotic notation: Big-Oh The idea is to establish a relative order among functions for large n  c , n0 > 0 such that f(n)  c g(n) when n  n0 f(n) grows no faster than g(n) for “large” n 36 Big-Oh rules - properties The statement “f(n) is O(g(n))” means that the growth rate of f(n) is no more than the growth rate of g(n) O(g(n)) is a set or class of functions: it contains all the functions that have the same growth rate If f(n) is a polynomial of degree d, then f(n) is O(nd) ▪ If d = 0, then f(n) is O(1) ▪ Example: 𝑛2 + 3𝑛 − 1 is O(n2) We always use the simplest expression of the class/set ▪ E.g., we state 2n + 3 is O(n) instead of O(4n) or O(3n+1) We always use the smallest possible class/set of functions ▪ E.g., we state 2n is O(n) instead of O(n2) or O(n3) Linearity of asymptotic notation ▪ O(f(n)) + O(g(n)) = O(f(n) + g(n)) = O(max{f(n),g(n)}), where “max” is wrt “growth rate” ▪ Example: O(n) + O(n2) = O(n + n2) = O(n2) 37 Asymptotic notation: Big-Ω (Big-Omega) f(n) = O(g(n)) If there are positive constants c and n0 Such that f(n) > c x g(n) for all n >= n0 g(n) is an asymptotic Lower bound for f(n) 38 Asymptotic notation: Big-θ (Big-Theta) f(n) = O(g(n)) If there are positive constants c1,c2 and n0 Such that c1g(n) < f(n) < c2 x g(n) for all n >= n0 g(n) is an asymptotic Tight boundary for f(n) 39 Big-Omega and Big-Theta notations big-Omega ◼ f(n) is (g(n)) if there is a constant c > 0 and an integer constant n0  1 such that f(n)  c g(n) for n  n0 Example: 3n3 – 2n + 1 is (n3) big-Theta ◼ f(n) is (g(n)) if there are constants c’ > 0 and c’’ > 0 and an integer constant n0  1 such that c’ g(n)  f(n)  c’’ g(n) for n  n0 ◼ Example: 5n log n – 2n is (n log n) ◼ Important axiom: ◼ f(n) is O(g(n)) and (g(n))  f(n) is (g(n) ◼ Example: 5n2 is O(n2) and (n2)  5n2 is (n2) 40 Asymptotic notation – graphical comparison Big-Oh Normal scale: ◼ f(n) is O(g(n)) if f(n) is 5000 Log-log scale: f(n) = n2 asymptotically less than 4500 c" g(n) = 2n2 c` g(n) = 0.5n2 1.00E+21 or equal to g(n) 4000 n^2 3n^2 1.00E+19 Big-Omega 1.00E+17 0.5n^2 3500 ◼ f(n) is (g(n)) if f(n) is 1.00E+15 asymptotically greater 3000 1.00E+13 than or equal to g(n) 2500 1.00E+11 Big-Theta 2000 1.00E+09 ◼ f(n) is (g(n)) if f(n) is 1.00E+07 1500 asymptotically equal to 1.00E+05 g(n) 1000 1.00E+03 1.00E+01 500 1.00E-011.00E+00 1.00E+02 1.00E+04 1.00E+06 1.00E+08 1.00E+10 0 0 5 10 15 20 25 30 35 40 45 50 41 Little-Oh and Little-Omega notations Little-Oh ◼ f(n) is o(g(n)) if for any constant c > 0, there is a constant n0 > 0 such that f(n) < c g(n) for n  n0 Example: 3n2 – 2n + 1 is o(n3), while 3n2 – 2n + 1 is not o(n2) Little-Omega ◼ f(n) is (g(n)) if for any constant c > 0, there is a constant n0 > 0 such that f(n) > c g(n) for n  n0 ◼ Example: 3n2 – 2n + 1 is (n), while 3n2 – 2n + 1 is not (n2) Important axiom: f(n) is o(g(n))  g(n) is (f(n)) ◼ Comparison with O and  ◼ For O and , the inequality holds if there exists a constant c > 0 ◼ For o and  , the inequality holds for all constants c > 0 42 Guidelines for Asymptotic Analysis There are some general rules to help us determine the running time of an algorithm. 1) Loops: The running time of a loop is, at most, the running time of the statements inside the loop (including tests) multiplied by the number of iterations. // executes n times for (i=1; i

Use Quizgecko on...
Browser
Browser