L10 - Greedy Algorithms.pdf
Document Details
Uploaded by ZippyPelican
null
Tags
Full Transcript
CP312 Algorithm Design and Analysis I LECTURE 10: GREEDY ALGORITHMS 1 Optimization Problems Unlike some of the sorting problems that we have seen so far, optimization problems are those where we have to find the best solution out of several possible solutions. How “best” is defined is determined by...
CP312 Algorithm Design and Analysis I LECTURE 10: GREEDY ALGORITHMS 1 Optimization Problems Unlike some of the sorting problems that we have seen so far, optimization problems are those where we have to find the best solution out of several possible solutions. How “best” is defined is determined by the problem statement. 2 Greedy Algorithms A technique for algorithm design Typically used to create solutions for optimization problems. ◦ Such problems have many possible solutions and we wish to find an optimal one. Problems need to have certain properties in order to be solvable using the “greedy” approach. 3 Greedy Algorithms Problems for which we will demonstrate the greedy strategy: ◦ Coin-changing ◦ Fractional knapsack ◦ Task-scheduling ◦ Huffman Encoding 4 Coin Changing Problem Suppose that you have a shop that deals with the following currency notes: $1, $5, $10, $20 You want to design an algorithm such that you can produce a certain amount of money 𝑉 using the minimum number of notes. E.g. To get $100, the best (minimum) solution is to use five $20 notes. 5 Coin Changing Problem Idea: always take the next big note that does not exceed 𝑉 CoinChange(𝑉): 1. Initialize set 𝑆 2. While (𝑉 > 0) { Find the largest bill 𝑏 at most 𝑉 Add 𝑏 to 𝑆 𝑉 =𝑉−𝑏 } 3. Return 𝑆 6 Coin Changing Problem It is easy to check that the algorithm always returns a set of bills whose sum is 𝑉. At each step, the algorithm makes a greedy choice (by including the largest coin) which looks best to come up with an optimal solution (the fewest number of bills) This is an example of a Greedy Algorithm. 7 Coin Changing Problem Does a greedy algorithm always work? ◦ No! Suppose the set of bills is: $1, $4, $5, $10 We want to produce $8: ◦ Greedy Algorithm (always choose largest): 4 bills ($5, $1, $1, $1) ◦ Optimal Solution: 2 bills ($4, $4) 8 Greedy Algorithm To show a greedy algorithm works, we need to show that the problem has the following properties: 1. Optimal substructure: the optimal solution to the problem contains within it optimal solutions to subproblems 2. Greedy-choice property: a globally optimal solution can be arrived at by making a locally optimal solution. 9 Fractional Knapsack Problem Problem: Given a set of items, find a combination of items to add to a bag of some capacity that yields the most value. Input: A capacity 𝑊 and a set of 𝑛 weight/value pairs 𝐼 = 1, 𝑣1 , 𝑤1 , 2, 𝑣2 , 𝑤2 , … , 𝑛, 𝑣𝑛 , 𝑤𝑛 Output: A set of values 𝑃 = (𝑝1 , … , 𝑝𝑛 ) where 0 ≤ 𝑝𝑖 ≤ 1 such that: Maximizes σ𝑛𝑖=1 𝑝𝑖 𝑣𝑖 Subject to σ𝑛𝑖=1 𝑝𝑖 𝑤𝑖 ≤ 𝑊 10 The Fractional Knapsack Problem Weight (kg) 2 1 3 6 11 Value ($) 10 6 12 12 33 Capacity: 5 kg 11 The Fractional Knapsack Problem Weight (kg) 2 1 3 6 11 Value ($) 10 6 12 12 33 Density (V/W) 5 6 4 2 3 Greedy Approach: At every step, choose the item that has the highest density. Capacity: 5 kg 12 The Fractional Knapsack Problem Weight (kg) 2 1 3 6 11 Value ($) 10 6 12 12 33 Density (V/W) 5 6 4 2 3 Total Value: $6 Weight so far: 1 kg Capacity: 5 kg 1 kg $6 13 The Fractional Knapsack Problem Weight (kg) 2 1 3 6 11 Value ($) 10 6 12 12 33 Density (V/W) 5 6 4 2 3 Total Value: $16 Weight so far: 3 kg Capacity: 5 kg 1 kg $6 2 kg $10 14 The Fractional Knapsack Problem Weight (kg) 2 1 3 6 11 Value ($) 10 6 12 12 33 Density (V/W) 5 6 4 2 3 Total Value: $24 Weight so far: 5 kg Capacity: 5 kg 1 kg $6 2 kg $10 2/3 × 3 kg 2/3 × $12 15 Fractional Knapsack Problem Greedy-FracKnapsack(𝐼, 𝑊): For each 𝑖 ∈ 1, 𝑛 : 𝑑𝑖 = 𝑣𝑖 /𝑤𝑖 Let 𝑆 = 𝑠𝑖 , 𝑣𝑠𝑖 , 𝑤𝑠𝑖 be the list of items sorted based on 𝑑𝑖 in descending order. 𝑖∈[1,𝑛] Initialize 𝑤𝑙𝑒𝑓𝑡 = 𝑊, 𝑉 = 0, 𝑖 = 0, 𝑝1 , … 𝑝𝑛 = 0 While 𝑤𝑙𝑒𝑓𝑡 > 0 do: 𝑠𝑗 , 𝑣𝑠𝑗 , 𝑤𝑠𝑗 ← 𝑆[𝑖] If 𝑤𝑗 ≤ 𝑤𝑙𝑒𝑓𝑡 then 𝑝𝑠𝑗 = 1 If 𝑤𝑗 > 𝑤𝑙𝑒𝑓𝑡 then 𝑝𝑠𝑗 = 𝑤𝑙𝑒𝑓𝑡 𝑤𝑠𝑗 𝑉 = 𝑉 + 𝑝𝑠𝑗 𝑣𝑠𝑗 𝑤𝑙𝑒𝑓𝑡 = 𝑤𝑙𝑒𝑓𝑡 − 𝑝𝑠𝑗 𝑤𝑠𝑗 𝑖 =𝑖+1 Output 𝑃 = (𝑝1 , … , 𝑝𝑛 ) // 𝑝𝑠𝑗 is the fraction of item 𝑠𝑗 selected 16 Fractional Knapsack Problem Running time of the greedy approach? 1. 2. Sort the 𝑛 items based on density Add each item to fill the bag starting with largest density (possibly using the last item partially) until the bag is exactly full. 𝑇 𝑛 = Θ(𝑛 lg 𝑛) 17 Fractional Knapsack Problem Why did the greedy approach work here? 1. Optimal substructure: the final optimal solution contains optimal solutions for subproblems. 2. Greedy-choice property: choosing the best candidate at every step leads to an optimal solution later. 18 Fractional Knapsack Problem Theorem: Algorithm Greedy-FracKnapsack leads to an optimal solution to the fractional knapsack problem Proof of correctness: Consider a globally optimal solution 𝑄 Show that 𝑄 can be modified so that: a. The greedy choice is made at the first step b. And this choice reduces the problem to a smaller fractional knapsack subproblem 3. Apply induction to show that a greedy choice can be taken at every step by proving that the problem exhibits optimal substructure 1. 2. 19 Task Scheduling Problem Task No. Start Time End Time 1 1 4 2 5 7 3 2 8 4 3 11 5 8 15 6 13 18 Goal: Schedule the maximum number of activities without having any two overlapping 20 Task Scheduling Problem T3 T6 T4 T2 T1 T5 Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21 Task Scheduling Problem Solution 1 = {𝑇4 , 𝑇6 } T3 T6 T4 T2 T1 T5 Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 22 Task Scheduling Problem Solution 2 = {𝑇3 , 𝑇5 } T3 T6 T4 T2 T1 T5 Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 23 Task Scheduling Problem Problem: Given a set of activities with starting and ending times, schedule the maximum number of non-overlapping activities. Input: A set of 𝑛 activities with start/finish times 𝑇= 1, 𝑠1 , 𝑓1 , 2, 𝑠2 , 𝑓2 , … , 𝑛, 𝑠𝑛 , 𝑓𝑛 Output: A subset 𝑃 ⊆ 𝑇 such that: Maximizes 𝑃 Subject to ∀ 𝑖, 𝑠𝑖 , 𝑓𝑖 ∈ 𝑃 and ∀ 𝑗 ≠ 𝑖, 𝑠𝑗 , 𝑓𝑗 ∈ 𝑃 : 𝑠𝑖 ≥ 𝑓𝑗 or 𝑠𝑗 ≥ 𝑓𝑖 24 Task Scheduling Problem Will a greedy approach here work? ◦ We need to ask ourselves the following questions? 1. Does the problem exhibit optimal substructure? 2. Is there a globally optimal solution that contains greedy choices? ◦ And if there is, what is the greedy choice property? 25 Task Scheduling Problem 1. Does the problem exhibit optimal substructure? Theorem: Let 𝑘 be a task in an optimal solution 𝑄 ⊆ 𝑇 then 𝑄 − 𝑘, 𝑠𝑘 , 𝑓𝑘 is an optimal solution to the subproblem 𝑇 ′ = {(𝑖, 𝑠𝑖 , 𝑓𝑖 ) ∈ 𝑇: 𝑠𝑖 ≥ 𝑓𝑘 || 𝑓𝑖 ≤ 𝑠𝑘 } 26 Task Scheduling Problem Proof: (by contradiction) ◦ Let 𝑄 be the optimal solution to the original problem 𝑇 ◦ Let 𝑄 ′ = 𝑄 − 𝑘, 𝑠𝑘 , 𝑓𝑘 be the solution to the subproblem 𝑇 ′ = {(𝑖, 𝑠𝑖 , 𝑓𝑖 ) ∈ 𝑇: 𝑠𝑖 ≥ 𝑓𝑘 } ◦ Suppose, for the sake of contradiction, that 𝑄′ is not optimal ◦ Let 𝑄 be the optimal solution instead (i.e. it contains more tasks than 𝑄′) ◦ Then 𝑄 > 𝑄′ = 𝑄 − 𝑘, 𝑠𝑘 , 𝑓𝑘 = 𝑄 − 1 ◦ But this means that there exists a solution 𝑄 + ◦ So 𝑄 is NOT optimal (contradiction) ◦ Hence, 𝑄′ is optimal as well 𝑘, 𝑠𝑘 , 𝑓𝑘 for the original problem 𝑇 that is better than 𝑄 27 Task Scheduling Problem 2. Is there a globally optimal solution that contains greedy choices? ◦ None of these greedy choices work! And if there is, what is the greedy choice property? Greedy Choice 1: Earliest start time T2 Counterexample T1 T3 T2 Greedy Choice 2: Shortest duration Counterexample T1 T3 T1 Greedy Choice 3: Fewest conflicts Counterexample T8 T3 T2 T4 T5 T6 T7 T9 T10 T11 28 Task Scheduling Problem 2. Is there a globally optimal solution that contains greedy choices? ◦ And if there is, what is the greedy choice property? Correct Greedy Choice: At every step, choose the activity that finishes first 29 Greedy Approach: At every step, choose the activity that finishes first Task Scheduling Problem Is this optimal? Solution = {𝑇1 , 𝑇2 , 𝑇3 } T3 T6 T4 T2 T1 T5 Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 30 Task Scheduling Problem Greedy-TaskSchedule(𝑇): Let 𝑇′ = 𝑡𝑖 , 𝑠𝑡𝑖 , 𝑓𝑡𝑖 be the list of items sorted based on 𝑓𝑖 in ascending order. 𝑖∈[1,𝑛] Initialize 𝑃 = 𝑡1 , 𝑠𝑡1 , 𝑓𝑡1 For 𝑖 = 2 to 𝑛: if 𝑠𝑡𝑖 ≥ 𝑓𝑡𝑗 then and 𝑗 = 1 𝑃 = 𝑃 ∪ 𝑡𝑖 , 𝑠𝑡𝑖 , 𝑓𝑡𝑖 𝑗=𝑖 // 𝑗 is the index of the most recent activity added to 𝑃 Output 𝑃 31 Task Scheduling Problem Running time of the greedy approach? 1. 2. Ascendingly sort the 𝑛 activities based on finish time Add each activity (in the sorted order) to the schedule starting with the first activity in the list as long as that activity does not overlap with any previously chosen activities. 𝑇 𝑛 = Θ(𝑛 lg 𝑛) 32 Task Scheduling Problem Theorem: Algorithm Greedy-TaskSchedule leads to an optimal solution to the task scheduling problem Proof of correctness: Consider a globally optimal solution 𝑄 Show that 𝑄 can be modified so that: a. The greedy choice is made at the first step b. And this choice reduces the problem to a smaller task scheduling subproblem 3. Apply induction to show that a greedy choice can be taken at every step by proving that the problem exhibits optimal substructure (already done!) 1. 2. 33 Task Scheduling Problem Proof: ◦ Let 𝑄 = (𝑞1 , 𝑞2 , … ) be the optimal solution ordered by finish time (i.e. 𝑓𝑞1 ≤ 𝑓𝑞2 ≤ ⋯) ◦ Let 𝑃 = (𝑝𝑗1 , 𝑝𝑗2 , … ) be the greedy solution sorted by selection order (i.e. finish time) ◦ We show that there exists an optimal solution where the first choice is greedy: ◦ Case 1: If 𝑞1 = 𝑝𝑗1 then we are done, the optimal solution already includes the first greedy choice ◦ Case 2: if 𝑞1 ≠ 𝑝𝑗1 then we will show how to construct a different optimal solution where 𝑞1 = 𝑝𝑗1 : ◦ Remove task 𝑞1 from 𝑄 ◦ Add 𝑝𝑗1 to 𝑄 to create a new solution 𝑄′. ◦ Note that 𝑝𝑗1 will not overlap with 𝑞2 , 𝑞3 , … because 𝑓𝑝𝑗1 ≤ 𝑓𝑞1 and we already know that 𝑞1 did not overlap with them. ◦ Since we have not decreased the number of tasks in the new solution, 𝑄′ is an optimal solution (which has the first choice as greedy) 34 The Data Compression Problem We would like to represent data using less space than the original representation. Encoding data using fewer bits than the original encoding 35 The Data Compression Problem Problem: Given an input file with a sequence of characters, output an encoding of the input that is lossless, decodable, and of smaller size. Input: A sequence of characters 𝐶 = (𝑐1 , … , 𝑐𝑛 ) Output: An encoder 𝐸 and decoder 𝐷 such that 𝐷 𝐸 𝐶 the length of the codeword 𝐸 𝐶 is minimized. A VERY LONG STRING….. 𝐶 𝐸 0110010010100010 Codeword 𝐸(𝐶) 𝐷 = 𝐶 and A VERY LONG STRING….. 𝐶 36 The Data Compression Problem Example: Let the input string be BCAADDDCCACACAC Without compression: using ASCII, the number of bits needed to store this string is 8 × 15 = 𝟏𝟐𝟎 bits since each character is 1 byte. How can we reduce the number of bits needed to represent this string? 37 The Data Compression Problem Example: Let the input string be BCAADDDCCACACAC Naive Method (Fixed length code): ◦ Assign unique equal-length codes for each letter in the input string. ◦ So now the input string is now encoded as follows: Character Encoding A 00 B 01 C 10 D 11 01 10 00 00 11 11 11 10 10 00 10 00 10 00 10 ◦ We reduced the length of the representation to 𝟐 × 𝟏𝟓 = 𝟑𝟎 bits ◦ But, is this the shortest encoding possible? 38 The Data Compression Problem Example: Let the input string be BCAADDDCCACACAC Naive Method (Fixed length code): Character Encoding A 00 B 01 C 10 D 11 ◦ This method is wasteful ◦ Some characters might appear more often than others, but we are giving the same encoding length of more frequent characters as those do not appear a lot. 39 The Data Compression Problem Example: Let the input string be BCAADDDCCACACAC Method 2: Unary variable-length prefix code Character Frequency Encoding C 6 0 A 5 10 D 3 110 B 1 1110 ◦ Characters that appear more frequently are given a shorter encoding ◦ Prefix code: only use codes for which no code word is a prefix of another one ◦ So now, the input string is now encoded as follows: 1110 0 10 10 110 110 110 0 0 10 0 10 0 10 0 ◦ We reduced the length of the representation to: 6 × 1 + 5 × 2 + 3 × 3 + (1 × 4) = 𝟐𝟗 bits ◦ Can we do better? 40 Huffman Encoding Creates variable length encodings for each distinct character in the input Encodings are created so characters that appear more frequently are given shorter prefix encodings compared to characters that occur less frequently The greedy choice here is: to assign the shorter encoding to the character with the higher frequency. 41 Huffman Encoding Huffman Tree A VERY LONG STRING….. Huffman Encoder 𝐶 How does the encoder work? 0110010010100010 Codeword 𝐸(𝐶) Huffman Decoder A VERY LONG STRING….. 𝐶 How does the decoder work? 42 Huffman Encoding Huffman Encoder Encoding algorithm steps: 1. Find the frequency of each character in the input file 2. Sort the characters in increasing order of frequency 3. Build a Huffman tree from the frequency data 4. Traverse the Huffman tree and build the encodings for each character found in the input file 5. For each character in the input file, write the bits of the Huffman encoding to the output file 6. Output the codeword and the Huffman Tree 43 Huffman Encoding: Example Huffman Encoder BCAADDDCCACACAC 1. Find the frequency of each character in the input file Character Frequency C 6 A 5 D 3 B 1 44 Huffman Encoder Huffman Encoding: Example BCAADDDCCACACAC 2. Sort the characters in increasing order of frequency Character Frequency Character Frequency C 6 B 1 A 5 D 3 D 3 A 5 B 1 C 6 45 Huffman Encoder Huffman Encoding: Example BCAADDDCCACACAC 3. Build a Huffman tree from the frequency data Character Frequency B 1 D 3 A 5 C 6 0 9 6 9 C 4 1 B 3 D Character Frequency B+D 4 A 5 C 6 4 1 3 B D 1 15 5 Character Frequency A C 6 B+D+A 9 0 1 4 5 0 1 1 3 B D A 46 Huffman Encoder Huffman Encoding: Example BCAADDDCCACACAC 4. Traverse the Huffman tree and build the encodings for each character found in the input file 0 1 15 6 9 C 0 1 4 5 0 1 1 3 B D A Character Frequency Encoding C 6 0 A 5 11 D 3 101 B 1 100 47 Huffman Encoding: Example Huffman Encoder BCAADDDCCACACAC 5. For each character in the input file, write the bits of the Huffman encoding to the output file Character Frequency Encoding C 6 0 A 5 11 D 3 101 B 1 100 B C A ADDD C C A C A C A C 100 0 11 11 101 101 101 0 0 11 0 11 0 11 0 48 Huffman Encoder Huffman Encoding: Example BCAADDDCCACACAC 6. Output the codeword and the Huffman Tree 0 1 15 6 9 0 C 1 100 0 11 11 101 101 101 0 0 11 0 11 0 11 0 4 5 0 1 1 3 B D A 𝟐𝟖 bits 49 Huffman Decoding Decoding algorithm steps: 1. Read the representation of the Huffman tree from the input file and rebuild the Huffman tree 2. Read the bits from the input file 3. Traverse the Huffman tree using the bits from the input file and output the character whenever a leaf is reached 50 Huffman Decoding Decode the following bit string: 0 1 15 6 1101011000 9 C 0 1 4 5 0 1 1 3 B D 𝐴𝐶𝐷𝐵𝐶 A 51 𝑻 𝒏 = 𝚯(𝒏 𝐥𝐠 𝒏) Huffman Encoding: Running Time 1. Find the frequency of each character in the input file Θ(𝑛) 2. Sort the characters in increasing order of frequency Θ(𝑛 lg 𝑛) 3. Build a Huffman tree from the frequency data Θ(𝑛 lg 𝑛) 4. Traverse the Huffman tree and build the encodings for each character found in the input file Θ(𝑛 lg 𝑛) 5. For each character in the input file, write the bits of the Huffman Θ(𝑛 lg 𝑛) encoding to the output file 6. Output the codeword and the Huffman Tree Θ(𝑛) 52 Summary When are greedy algorithms useful? ◦ For problems that exhibit greedy property and optimal substructure Procedure for greedy algorithm design ◦ Identify optimal substructure ◦ Define the greedy choice property ◦ Write your algorithm based on the greedy choice How do we know if a greedy algorithm does not work? ◦ Show a counterexample 53