Sorting Algorithms PDF
Document Details
![FasterJoy7850](https://quizgecko.com/images/avatars/avatar-11.webp)
Uploaded by FasterJoy7850
Al-Azhar University
Dr. Manal Mostafa
Tags
Summary
This document provides lecture notes on sorting algorithms. It covers definitions, assumptions, in-place sorting, techniques, and strategies for sorting, along with an overview of run times. The document also discusses lower bounds on run times, the concept of inversions, and the insertion sort algorithm.
Full Transcript
Sorting algorithms 1 Data Structures and Algorithms CSE354 Sorting algorithms Dr. Manal Mostafa Lecturer at Comp & Sys Eng Dept., Al azhar University Sorting algorithms...
Sorting algorithms 1 Data Structures and Algorithms CSE354 Sorting algorithms Dr. Manal Mostafa Lecturer at Comp & Sys Eng Dept., Al azhar University Sorting algorithms 2 Sorting Definitions Assumptions In-place sorting Sorting techniques and strategies Overview of run times Agenda Lower bound on run times Define inversions and use this as a measure of unsortedness insertion sort algorithm an example run-time analysis worst case average case best case Sorting algorithms 3 8.1 Definition Sorting is the process of: Taking a list of objects which could be stored in a linear order (a0, a1,..., an – 1) e.g., numbers, and returning an reordering (a'0, a'1,..., a'n – 1) such that a'0 ≤ a'1 ≤ · · · ≤ a'n – 1 The conversion of an Abstract List into an Abstract Sorted List Sorting algorithms 4 8.1 Definition Seldom will we sort isolated values Usually we will sort a number of records containing a number of fields based on a key: 199915 Stevenso Monic 3 Glendridge 32 n a Ave. 199902 Redpath Ruth 53 Belton Blvd. 53 199858 Kilji Islam 37 Masterson 32 Ave. 200035 Groskurth Ken 12 Marsdale Ave. 41 Numerically by ID Number 199819 Carol Ann 81 Lexicographically Oakridge Ave. by surname, then given name 32 199819 Carol Ann 81 Oakridge Ave. 199819 32 200032 Redpath David 5 GlendaleCarol Ave. Ann 81 Oakridge Ave. 87 32 199858 Khilji Islam 37 Masterson 200035 Groskurth Ken 12 Marsdale Ave. 32 Ave. 41 199902 Redpath Ruth 53 Belton Blvd. 199858 Kilji Islam 37 Masterson 53 32 Ave. 199915 Stevenso Monic 3 Glendridge 200032 Redpath David 5 Glendale Ave. 32 n a Ave. 87 Sorting algorithms 5 8.1 Definition In these topics, we will assume that: Arrays are to be used for both input and output, We will focus on sorting objects and leave the more general case of sorting records based on one or more fields as an implementation detail Sorting algorithms 6 8.1.1 In-place Sorting Sorting algorithms may be performed in-place, that is, with the allocation of at most Q(1) additional memory (e.g., fixed number of local variables) Some definitions of in place as using o(n) memory Other sorting algorithms require the allocation of second In-Place array Sorting: of equal size An algorithm is Requires Q(n) additional memory considered in-place if it sorts the input data without using We will prefer in-place sorting algorithms additional memory proportional to the input size. Sorting algorithms 7 8.1.2 Classifications The operations of a sorting algorithm are based on the actions performed: Insertion Exchanging Selection Merging Distribution Sorting algorithms 8 8.1.3 Run-time The run time of the sorting algorithms we will look at fall into one of three categories: Q(n) Q(n ln(n)) O(n2) We will examine average- and worst-case scenarios for each algorithm The run-time may change significantly based on the scenario Sorting algorithms 9 8.1.3 Run-time We will review the more traditional O(n2) sorting algorithms: Insertion sort, Bubble sort Some of the faster Q(n ln(n)) sorting algorithms: Heap sort, Quicksort, and Merge sort And linear-time sorting algorithms Bucket sort and Radix sort We must make assumptions about the data Sorting algorithms 10 8.1.4 Lower-bound Run-time Any sorting algorithm must examine each entry in the array at least once Consequently, all sorting algorithms must be W(n) [W(n) is the lower bound of an algorithm’s time complexity] We will not be able to achieve Q(n) behavior without additional assumptions Sorting algorithms 11 8.1.4 Lower-bound Run-time The general run time is W(n ln(n)) The proof depends on: The number of permutations of n objects is n!, A tree with 2h leaf nodes has height at least h, Each permutation is a leaf node in a comparison tree, and The property that lg(n!) = Q(n ln(n)) Reference: Donald E. Knuth, The Art of Computer Programming, Volume 3: Sorting and Searching, 2nd Ed., Addison Wesley, 1998, §5.3.1, p.180. Sorting algorithms 12 8.1.5 Optimal Sorting Algorithms There is no optimal sorting algorithm which can be used in all places Under various circumstances, different sorting algorithms will deliver optimal run-time and memory-allocation requirements Sorting algorithms 13 8.1.6 Sub-optimal Sorting Algorithms Before we look at other algorithms, we will consider the Bogosort algorithm: 1. Randomly order the objects, and 2. Check if they’re sorted, if not, go back to Step 1. Run time analysis: best case: Q(n) average: Q(n·n!) n! permutations worst: unbounded... Sorting algorithms 14 8.1.6 Sub-optimal Sorting Algorithms There is also the Bozosort algorithm: 1. Check if the entries are sorted, 2. If they are not, randomly swap two entries and go to Step 1. Run time analysis: More difficult than bogosort... See references and wikipedia Hopefully we can do better... Sorting algorithms 15 8.1.7 Inversions Consider the following three lists: 1 16 12 26 25 35 33 58 45 42 56 67 83 75 74 86 81 88 99 95 1 17 21 42 24 27 32 35 45 47 57 23 66 69 70 76 87 85 95 99 22 20 81 38 95 84 99 12 79 44 26 87 96 10 48 80 1 31 16 92 To what degree are these three lists unsorted? Sorting algorithms 16 8.1.7 Inversions The first list requires only a few exchanges to make it sorted 1 16 12 26 25 35 33 58 45 42 56 67 83 75 74 86 81 88 99 95 1 12 16 25 26 33 35 42 45 56 58 67 74 75 81 83 86 88 95 99 Sorting algorithms 17 8.1.7 Inversions The second list has two entries significantly out of order 1 17 21 42 24 27 32 35 45 47 57 23 66 69 70 76 87 85 95 99 1 17 21 23 24 27 32 35 42 45 47 57 66 69 70 76 85 87 95 99 however, most entries (13) are in place Sorting algorithms 18 8.1.7 Inversions The third list would, by any reasonable definition, be significantly unsorted 22 20 81 38 95 84 99 12 79 44 26 87 96 10 48 80 1 31 16 92 1 10 12 16 20 22 26 31 38 44 48 79 80 81 84 87 92 95 96 99 Sorting algorithms 19 8.1.7 Inversions Given any list of n numbers, there are n n(n 1) Binomial coefficients- How many ways can 2 2 we choose 2 elements from a set pairs of numbers of 6? For example, the list (1, 3, 5, 4, 2, 6) contains the following 15 pairs: (1, 3) (1, 5) (1, 4) (1, 2) (1, 6) (3, 5) (3, 4) (3, 2) (3, 6) (5, 4) (5, 2) (5, 6) (4, 2) (4, 6) (2, 6) Sorting algorithms 20 8.1.7 Inversions You may note that 11 of these pairs of numbers are in order: (1, 3) (1, 5) (1, 4) (1, 2) (1, 6) (3, 5) (3, 4) (3, 2) (3, 6) (5, 4) (5, 2) (5, 6) (4, 2) (4, 6) (2, 6) Sorting algorithms 21 8.1.7 Inversions The remaining four pairs are reversed, or inverted (1, 3) (1, 5) (1, 4) (1, 2) (1, 6) (3, 5) (3, 4) (3, 2) (3, 6) (5, 4) (5, 2) (5, 6) (4, 2) (4, 6) (2, 6) Sorting algorithms 22 8.1.7 Inversions Given a permutation of n elements a0, a1,..., an – 1 an inversion is defined as a pair of entries which are reversed That is, (aj, ak) forms an inversion if j < k but aj > ak Ref: Bruno Preiss, Data Structures and Algorithms Sorting algorithms 23 8.1.7 Inversions Therefore, the permutation 1, 3, 5, 4, 2, 6 contains four inversions: (3, 2) (5, 4) (5, 2) (4, 2) Sorting algorithms 24 8.1.7 Inversions Exchanging (or swapping) two adjacent entries either: removes an inversion, e.g., 1 3 5 4 2 6 1 3 5 2 4 6 removes the inversion (4, 2) or introduces a new inversion, e.g., (5, 3) with 1 3 5 4 2 6 1 5 3 4 2 6 Sorting algorithms 25 8.1.7.1 Number of Inversions n n n 1 There are 2 2 pairs of numbers in any set of n objects Consequently, each pair contributes to the set of ordered pairs, or the set of inversions For a random ordering, we would expect approximately half of all pairs, or n 1 nn 1 O(n 2 ) 2 2 4 , inversions Sorting algorithms 26 8.1.7.1 Number of Inversions For example, the following unsorted list of 56 entries 61 548 3 923 195 973 289 237 57 299 594 928 515 55 507 351 262 797 788 442 97 798 227 127 474 825 7 182 929 852 504 485 45 98 538 476 175 374 523 800 19 901 349 947 613 265 844 811 636 859 81 270 697 563 976 539 has 655 inversions and 885 ordered pairs 1 56 56 56 1 770 2 2 4 The formula predicts inversions Sorting algorithms 27 8.1.7.1 Number of Inversions Let us consider the number of inversions in our first three lists: 1 16 12 26 25 35 33 58 45 42 56 67 83 75 74 86 81 88 99 95 1 17 21 42 24 27 32 35 45 47 57 23 66 69 70 76 87 85 95 99 22 20 81 38 95 84 99 12 79 44 26 87 96 10 48 80 1 31 16 92 Each list has 20 entries, and therefore: 20 20 20 1 190 There are 2 2 pairs On average, 190/2 = 95 pairs would form inversions Sorting algorithms 28 8.1.7.1 Number of Inversions The first list 1 16 12 26 25 35 33 58 45 42 56 67 83 75 74 86 81 88 99 95 has 13 inversions: (16, 12) (26, 25) (35, 33) (58, 45) (58, 42) (58, 56) (45, 42) (83, 75) (83, 74) (83, 81) (75, 74) (86, 81) (99, 95) This is well below 95, the expected number of inversions Therefore, this is likely not to be a random list Sorting algorithms 29 8.1.7.1 Number of Inversions The second list 1 17 21 42 24 27 32 35 45 47 57 23 66 69 70 76 87 85 95 99 also has 13 inversions: (42, 24) (42, 27) (42, 32) (42, 35) (42, 23) (24, 23) (27, 23) (32, 23) (35, 23) (45, 23) (47, 23) (57, 23) (87, 85) This, too, is not a random list Sorting algorithms 30 8.1.7.1 Number of Inversions The third list 22 20 81 38 95 84 99 12 79 44 26 87 96 10 48 80 1 31 16 92 has 100 inversions: (22, 20) (22, 12) (22, 10) (22, 1) (22, 16) (20, 12) (20, 10) (20, 1) (20, 16) (81, 38) (81, 12) (81, 79) (81, 44) (81, 26) (81, 10) (81, 48) (81, 80) (81, 1) (81, 16) (81, 31) (38, 12) (38, 26) (38, 10) (38, 1) (38, 16) (38, 31) (95, 84) (95, 12) (95, 79) (95, 44) (95, 26) (95, 87) (95, 10) (95, 48) (95, 80) (95, 1) (95, 16) (95, 31) (95, 92) (84, 12) (84, 79) (84, 44) (84, 26) (84, 10) (84, 48) (84, 80) (84, 1) (84, 16) (84, 31) (99, 12) (99, 79) (99, 44) (99, 26) (99, 87) (99, 96) (99, 10) (99, 48) (99, 80) (99, 1) (99, 16) (99, 31) (99, 92) (12, 10) (12, 1) (79, 44) (79, 26) (79, 10) (79, 48) (79, 1) (79, 16) (79, 31) (44, 26) (44, 10) (44, 1) (44, 16) (44, 31) (26, 10) (26, 1) (26, 16) (87, 10) (87, 48) (87, 80) (87, 1) (87, 16) (87, 31) (96, 10) (96, 48) (96, 80) (96, 1) (96, 16) (96, 31) (96, 92) (10, 1) (48, 1) (48, 16) (48, 31) (80, 1) (80, 16) (80, 31) (31, 16) Sorting algorithms 31 8.2 Introduction to Insertion sort Consider the following observations: A list with one element is sorted In general, if we have a sorted list of k items, we can insert a new item to create a sorted list of size k + 1 8.2 Background For example, consider this sorted array containing of eight sorted entries 5 7 12 19 21 26 33 40 14 9 18 21 2 Suppose we want to insert 14 into this array leaving the resulting array sorted Sorting algorithms 33 8.2 Background Starting at the back, if the number is greater than 14, copy it to the right Once an entry less than 14 is found, insert 14 into the resulting vacancy Sorting algorithms 34 8.2.1 The Algorithm For any unsorted list: Treat the first element as a sorted list of size 1 Then, given a sorted list of size k – 1 Insert the kth item in the unsorted list into it into the sorted list The sorted list is now of size k + 1 Sorting algorithms 35 8.2.1 The Algorithm Recall the five sorting techniques: Insertion Exchange Selection Merging Distribution Clearly insertion falls into the first category Sorting algorithms 36 8.2.2 The Algorithm Code for this would be: for ( int j = k; j > 0; --j ) { if ( array[j - 1] > array[j] ) { std::swap( array[j - 1], array[j] ); } else { // As soon as we don't need to swap, the (k + 1)st // is in the correct location break; } } Sorting algorithms 37 8.2.2 Implementation and Analysis This would be embedded in a function call such as template void insertion_sort( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { for ( int j = k; j > 0; --j ) { if ( array[j - 1] > array[j] ) { std::swap( array[j - 1], array[j] ); } else { // As soon as we don't need to swap, the (k + 1)st // is in the correct location break; } } } } Sorting algorithms 38 8.2.3 Implementation and Analysis Let’s do a run-time analysis of this code template void insertion_sort( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { for ( int j = k; j > 0; --j ) { if ( array[j - 1] > array[j] ) { std::swap( array[j - 1], array[j] ); } else { // As soon as we don't need to swap, the (k + 1)st // is in the correct location break; } } } } Sorting algorithms 39 8.2.3 Implementation and Analysis The Q(1)-initialization of the outer for-loop is executed once template void insertion_sort( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { for ( int j = k; j > 0; --j ) { if ( array[j - 1] > array[j] ) { std::swap( array[j - 1], array[j] ); } else { // As soon as we don't need to swap, the (k + 1)st // is in the correct location break; } } } } Sorting algorithms 40 8.2.3 Implementation and Analysis This Q(1)- condition will be tested n times at which point it fails template void insertion_sort( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { for ( int j = k; j > 0; --j ) { if ( array[j - 1] > array[j] ) { std::swap( array[j - 1], array[j] ); } else { // As soon as we don't need to swap, the (k + 1)st // is in the correct location break; } } } } Sorting algorithms 41 8.2.3 Implementation and Analysis Thus, the inner for-loop will be executed a total of n – 1 times template void insertion_sort( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { for ( int j = k; j > 0; --j ) { if ( array[j - 1] > array[j] ) { std::swap( array[j - 1], array[j] ); } else { // As soon as we don't need to swap, the (k + 1)st // is in the correct location break; } } } } Sorting algorithms 42 8.2.3 Implementation and Analysis In the worst case, the inner for-loop is executed a total of k times template void insertion_sort( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { for ( int j = k; j > 0; --j ) { if ( array[j - 1] > array[j] ) { std::swap( array[j - 1], array[j] ); } else { // As soon as we don't need to swap, the (k + 1)st // is in the correct location break; } } } } Sorting algorithms 43 8.2.3 Implementation and Analysis The body of the inner for-loop runs in Q(1) in either case template void insertion_sort( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { for ( int j = k; j > 0; --j ) { if ( array[j - 1] > array[j] ) { std::swap( array[j - 1], array[j] ); } else { // As soon as we don't need to swap, the (k + 1)st // is in the correct location break; } } } Thus, the worst-case run time is } n 1 n n 1 k O n 2 k 1 2 Sorting algorithms 44 8.2.3 Implementation and Analysis Problem: we may break out of the inner loop… template void insertion_sort( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { for ( int j = k; j > 0; --j ) { if ( array[j - 1] > array[j] ) { std::swap( array[j - 1], array[j] ); } else { // As soon as we don't need to swap, the (k + 1)st // is in the correct location break; } } } } Sorting algorithms 45 8.2.3 Implementation and Analysis Recall: each time we perform a swap, we remove an inversion template void insertion_sort( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { for ( int j = k; j > 0; --j ) { if ( array[j - 1] > array[j] ) { std::swap( array[j - 1], array[j] ); } else { // As soon as we don't need to swap, the (k + 1)st // is in the correct location break; } } } } Sorting algorithms 46 8.2.3 Implementation and Analysis As soon as a pair array[j - 1] 0; --j ) { if ( array[j - 1] > array[j] ) { std::swap( array[j - 1], array[j] ); } else { // As soon as we don't need to swap, the (k + 1)st // is in the correct location break; } } } } Sorting algorithms 47 8.2.3 Implementation and Analysis Thus, the body is run only as often as there are inversions template void insertion_sort( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { for ( int j = k; j > 0; --j ) { if ( array[j - 1] > array[j] ) { std::swap( array[j - 1], array[j] ); } else { // As soon as we don't need to swap, the (k + 1)st // is in the correct location break; } } } } If the number of inversions is d, the run time is Q(n + d) Sorting algorithms 48 8.2.3 Consequences of Our Analysis A random list will have d = O(n2) inversions The average random list has d = Q(n2) inversions Insertion sort, however, will run in Q(n) time whenever d = O(n) Other benefits: The algorithm is easy to implement Even in the worst case, the algorithm is fast for small problems Considering these run times, Approximate it appears to be approximately Size 10 instructions per inversion Time (ns) 8 175 16 750 32 2700 64 8000 Sorting algorithms 49 8.2.3 Consequences of Our Analysis Unfortunately, it is not very useful in general: Sorting a random list of size 223 ≈ 8 000 000 would require approximately one day Doubling the size of the list quadruples the required run time An optimized quick sort requires less than 4 s on a list of the above size Sorting algorithms 50 8.2.3 Consequences of Our Analysis The following table summarizes the run-times of insertion sort Case Run Time Comments Worst Q(n2) Reverse sorted Average O(d + n) Slow if d = w(n) Best Q(n) Very few inversions: d = O(n) Sorting algorithms 51 8.2.4 The Algorithm Now, swapping is expensive, so we could just temporarily assign the new entry this reduces assignments by a factor of 3 speeds up the algorithm by a factor of two Sorting algorithms 52 8.2.4 Implementation and Analysis A reasonably good* implementation template void insertion( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { Type tmp = array[k]; for ( int j = k; k > 0; --j ) { if ( array[j - 1] > tmp ) { array[j] = array[j - 1]; } else { array[j] = tmp; goto finished; } } array = tmp; // only executed if tmp < array finished: ; // empty statement } } * we could do better with pointers