IS4242 Intelligent Systems & Techniques L6 Experimentation PDF

Document Details

RecommendedActinium

Uploaded by RecommendedActinium

National University of Singapore

Aditya Karanam

Tags

intelligent systems experimentation multi-armed bandits business analysis

Summary

These lecture notes from NUS cover intelligent systems and techniques. The document includes topics like experimentation, innovative decisions, and multi-armed bandit algorithms. It details how organizations can effectively experiment in uncertain business environments and different applications.

Full Transcript

IS4242 INTELLIGENT SYSTEMS & TECHNIQUES L6 – Experimentation Aditya Karanam © Copyright National University of Sing...

IS4242 INTELLIGENT SYSTEMS & TECHNIQUES L6 – Experimentation Aditya Karanam © Copyright National University of Singapore. All Rights Reserved. Announcements ▸ Group formation, due date: October 7 ‣ No randomization, you are expected to network and find group mates ‣ Search for Teammates in Piazza: ▸ Quiz-1: Face-to-face and closed book ‣ Venue: LT-15, no seating arrangement ‣ Sample mid-term will be released by 25 Sept (11:59 pm) with all instructions IS4242 (Aditya Karanam) 2 In this Class … ▸ Experimentation ‣ What is the value of experimentation? ‣ What makes a good experiment? ‣ Concerns about corporate experiments ▸ Techniques: ‣ Multi-armed bandits ‣ Epsilon-greedy method IS4242 (Aditya Karanam) 3 Innovative Decisions ▸ Apple has a highly successful retail store concept ‣ Redefined in-store experience with cashier-free checkouts and genius bar ‣ Retail sales per square foot are greater than any retail store (worldwide) and more visitors than Disney’s theme parks ▸ Ron Johnson is the person behind these innovative concepts ‣ In 2011, Ron Johnson left Apple to become the CEO of J.C. Penney (JCP) IS4242 (Aditya Karanam) 4 Innovative Decisions at JCP ▸ J.C. Penny (JCP) is a large US departmental store selling clothes, bed & bath, jewelry, home décor, etc. ▸ JCP hoped that Johnson would repeat Apple’s retail success IS4242 (Aditya Karanam) 5 Innovative Decisions at JCP ▸ Johnson implemented a bold new plan: ‣ Eliminated coupons and clearance racks, used technology to eliminate cashiers, cash registers, and checkout counters. ▸ Within seventeen months, sales had plunged, losses had soared, and Johnson had lost his job. How did JCP get it so wrong? IS4242 (Aditya Karanam) 6 Innovative Decisions at JCP ▸ Is it due to the lack of data on customers’ tastes and preferences? ‣ Transaction data provides clues only about past behavior, not about how customers might react to disruptive strategies ▸ What about Johnson’s experience? ‣ Ideas that are truly innovative typically go against experience ‣ The most experienced business leaders are often wrong ‣ Whether it’s improving customer experiences, trying out new business models, or developing new products and services IS4242 (Aditya Karanam) 7 Famous Predictions of Consumer Behavior ▸ The iPhone is the most expensive phone in the world, and it doesn’t appeal to business customers because it doesn’t have a keyboard, which makes it not a very good e-mail machine. - Microsoft CEO Steve Ballmer (2007) ▸ People have told us over and over and over again; they don’t want to rent their music... they don’t want subscriptions. - Apple CEO Steve Jobs (2003) IS4242 (Aditya Karanam) https://www.cultofmac.com/501138/apple-history-steve-ballmer-iphone-freakout/ 8 Is it All a Despair? ▸ No! Managers can discover whether a change in product, service, or business model will succeed by acting like a scientist i.e., by experimentation! ▸ Pharmaceutical companies introduce a drug by first conducting a round of experiments based on established scientific protocols ‣ US Food and Drug Administration requires extensive clinical trials for approval ▸ Had JCP run rigorous experiments on its CEO’s proposed innovations ‣ Would have known that their customers would probably reject these innovations ‣ Given the long odds against any innovation, such a result is not at all surprising IS4242 (Aditya Karanam) 9 Experimentation is the Key ▸ Organizations need an experimentation mindset when it comes to decision- making in uncertain business environments ‣ E.x: Technological uncertainty, Customer uncertainty, etc. ▸ Plan to fail early and inexpensively in the search for the market for a disruptive technology or any new idea ‣ Create a market through an iterative process of trial, learning, and trial again ‣ Most of the experiments do not produce positive results IS4242 (Aditya Karanam) 10 Experimentation is the Key ▸ Microsoft has found that only one-third of its experiments prove effective, one-third have neutral results, and one-third have negative results ▸ In 2010, Google found that out of 13,311 proposed evaluations to algorithm changes, 516 changes that were determined to be useful ‣ Google’s experts missed their mark 96% of the time ▸ Capability to test what does and does not work at a huge scale provides the company an advantage against its competitors ‣ Learning from the failures sets the company apart from its competitors IS4242 (Aditya Karanam) 11 Stock Performance of Leading Experimenters IS4242 (Aditya Karanam) 12 What makes a Good Business Experiment? ▸ The experiment must have a testable hypothesis ▸ Commitment to abide by the results ▸ The experiment should be able to isolate the change in the variable of interest ▸ Identifying proper treatment and control groups ▸ Understanding the cause and effect IS4242 (Aditya Karanam) 13 Does the Experiment Have a Testable Hypothesis? ▸ Managers must first figure out exactly what they want to learn and measure ‣ Identify the independent and dependent variable ‣ Form a null hypothesis ‣ The variable of interest does not have an impact on the measure ▸ Only then can one decide if testing is the best approach to achieve the answer and, if so, what the scope of the experiment should be IS4242 (Aditya Karanam) 14 Commitment to Abide by the Results ▸ Negative results would be a huge disappointment for those who may have advocated for the initiative ▸ Management should be willing to walk away from a project if it’s not supported by the data. IS4242 (Aditya Karanam) 15 Is the Experiment Doable? ▸ Can you isolate a change in the variable of interest from other factors affecting the measure or the dependent variable? ‣ Environments are constantly changing, and the potential causes of business outcomes are often uncertain or unknown ‣ Linkages between the factors affecting the variables are frequently complex and poorly understood ▸ Ex: A hypothetical retail chain owns 10,000 stores, 8,000 are named QwikMart and the other 2000, FastMart. ‣ QwikMart has $1 Million in annual sales and the FastMart $1.1 Million. ‣ A senior executive asks a simple question: ‣ Would changing the name of the QwikMart to FastMart increase its revenue? IS4242 (Aditya Karanam) 16 Proper Treatment and Control Groups ▸ In 2018, Uber wanted to understand the value of ‘Uber Express Pool’ – a carpooling service, before rolling it out ▸ The initial idea was to conduct an experiment by randomizing users in Boston ‣ The experiences of the users in the control (or treatment) group would be impacted by the treatment (or control) group ▸ Control and treatment groups from the same city is not a good option ‣ Uber rolled out the experiment across 6 major US cities ▸ This especially problematic in online platform businesses IS4242 (Aditya Karanam) 17 Understanding the Cause and Effect ▸ Because of the presence of big data, some executives may believe that all they need to do is establish correlation, and causality can be inferred ▸ Sometimes two variables are correlated because they have a common cause, or because the correlation is simply a coincidence ‣ Example: Ice cream consumption increases the chance of death by drowning ‣ Common cause: high temperature IS4242 (Aditya Karanam) 18 Concerns of Corporate Experiments ▸ Facebook’s Mood Manipulation Experiment ‣ In 2012, for one week, ~700K users were randomly shown highly positive and less than average sadder content in their feed ‣ After a week, these users posted positively or negatively on the platform ‣ Emotional contagion was observed from the feed to the users ▸ This experiment is legal, but is it ethical? ‣ It may have harmed the individuals who were shown the negative feed ‣ IRB approval was not received for this study IS4242 (Aditya Karanam) https://www.theatlantic.com/technology/archive/2014/06/everything-we-know-about-facebooks-secret-mood-manipulation-experiment/373648/ 19 Experimentation Techniques: Multi-Armed Bandits © Copyright National University of Singapore. All Rights Reserved. 20 Applications: Advertisements ▸ Which version of Ad to show? IS4242 (Aditya Karanam) 21 Applications: UI/UX Design ▸ Which layout to use? IS4242 (Aditya Karanam) 22 Applications: Political Campaign ▸ A family image and a Learn More button resulted in a 41% increase in the sign- up rate and about $60 million donations compared to the old version New Old IS4242 (Aditya Karanam) https://www.optimizely.com/insights/blog/how-obama-raised-60-million-by-running-a-simple-experiment/ 23 Applications: Movie Recommendation ▸ Which movie to recommend? ▸ We use this application in the tutorial IS4242 (Aditya Karanam) 24 Simple Technique: A/B Tests ▸ Going forward we assume that the testing environment is set up properly for a controlled experiment (following the principles discussed earlier). ▸ A/B Tests are based on statistical hypothesis testing ▸ Allocate your budget equally between the movies/ads ‣ Considering two choices, we display each 50% of the time ‣ Measure likes/CTR for each and choose the movie/ad with a higher percentage of likes/CTR IS4242 (Aditya Karanam) 25 Should we still split equally? ▸ A lot of users might see suboptimal ads ‣ In the previous example, if one of the ads is truly better, we have wasted 50% of our budget! Can we allocate the resources better? IS4242 (Aditya Karanam) 26 Can we have Adaptivity? ▸ Non-adaptive learning methods (e.g., hypothesis testing, supervised learning) do not optimize during learning ▸ Make selection adaptive, instead of having a static scheme ‣ Observe the results at the end of each time step and make decisions based on that ▸ Solve two problems simultaneously: ‣ Learning: Learn which ad is better ‣ Optimization: Minimize the budget and maximize CTR IS4242 (Aditya Karanam) 27 Multi-Armed Bandits ▸ Las Vegas Slot Machines IS4242 (Aditya Karanam) 28 Each time an arm is pulled bandit gets a reward! 1$ 1$ 2$ 0$ 4$ 0$ 1$ 0$ 2$ IS4242 (Aditya Karanam) 29 What are Arms and Rewards? ▸ Arms are the choices provided to the user based on application ▸ Rewards must be modeled based on application requirements Problem Arms Reward Advertisement Ads CTR Web Optimization Designs Web traffic Clinical Trails Drugs Efficacy Recommendation Movies Customer satisfaction IS4242 (Aditya Karanam) 30 Bandits? ▸ The bandit wants to maximize the total reward ‣ Pull arms sequentially, and learn the arm yielding the highest reward ‣ Learn the distribution of reward provided by each arm K arms/decisions … IS4242 (Aditya Karanam) 31 Multi-Armed Bandits Problem ▸ Each arm has an unknown reward distribution which has to be learnt ‣ Assume a distribution and learn its parameters (e.g., mean) ‣ Ex: Distribution across clicks for a given Ad ▸ Learning occurs by sequential sampling from each distribution ‣ Sampling is the process of obtaining the reward by pulling the arm ‣ Ex: CTR (reward) obtained from presenting (pulling arm) an Ad ▸ The samples (cumulatively) form the reward, which needs to be maximized IS4242 (Aditya Karanam) 32 Multi-Armed Bandits Problem ▸ Each arm has a mean reward that is unknown and has to be learnt K arms/decisions … Unknown mean 𝜇! 𝜇" … 𝜇# rewards IS4242 (Aditya Karanam) 33 https://www.andrew.cmu.edu/course/18-847F/lectures/18687Nov182019.pdf MAB Parameter Learning Without Optimization 1$ 1$ 2$ 0$ 4$ 0$ 1$ 0$ 2$ ▸ If optimization is not involved, we can learn mean reward by pulling each arm n times ‣ Pull each arm 3 times and estimate the mean reward ‣ 2/3, 5/3, 4/3 for arms 1, 2, and 3 respectively IS4242 (Aditya Karanam) https://www.andrew.cmu.edu/course/18-847F/lectures/18687Nov182019.pdf 34 MAB Parameter Learning With Optimization ▸ We proceed in rounds and maintain for each arm an estimate of the mean reward ‣ On pulling an arm, we obtain a reward, and we update the mean estimate ▸ We also need to maximize the reward in this process ‣ In each round, we must decide which arm to pull next ‣ This choice presents the dilemma: – Explore: pull an arm to learn the distributions better – Exploit: pull that arm which has the highest mean reward known so far, to optimize the overall rewards IS4242 (Aditya Karanam) 35 MAB: Exploration vs. Exploitation ▸ After 3 rounds, Arm 1 had the lowest mean, so the bandit stopped pulling it ▸ After 5 rounds, Arm 2 had a better mean than Arm 3, so the bandit stopped pulling Arm 3 IS4242 (Aditya Karanam) https://www.andrew.cmu.edu/course/18-847F/lectures/18687Nov182019.pdf 36 MAB: Exploration vs. Exploitation ▸ Remember, we only observe random samples from each arm ▸ Maybe Arm1 has a higher mean — but we wouldn’t know if we don’t explore enough ▸ But if we explore and sample more from low-mean arms, we’ll reduce our overall reward IS4242 (Aditya Karanam) 37 https://www.andrew.cmu.edu/course/18-847F/lectures/18687Nov182019.pdf MAB Problem: Formulation ▸ Given n arms, 𝑎! , 𝑎" , 𝑎# , … , 𝑎$ , the 𝑖 %& arm has the reward distribution 𝑝(𝑟|𝜃' ) ‣ Learn parameters 𝜃$ ‣ The reward from the 𝑖 %& arm observed in the 𝑡 %& round is 𝑟%$ ▸ A policy is a function that maps a history to a choice of arm: ‣ 𝜋: 𝑎∗ 𝑟!∗ , 𝑎∗ 𝑟"∗ , 𝑎∗ 𝑟(∗ , … , 𝑎∗ 𝑟%∗ → 𝑎%)! ‣ * indicates an arm (not specified, to reduce notation) ∑$ ∗ !"# )! ▸ Find a policy that maximizes the rewards over 𝑇 rounds: * ‣ Each round incurs an expense and so, 𝑇 depends on our budget IS4242 (Aditya Karanam) 38 MAB Algorithms to Learn Policy ▸ 𝜀 – greedy ▸ Thompson sampling ▸ Upper Confidence Bound ▸… High epsilon = Lower exploitation, High exploration, Low regret, Low -> High rewards Low epsilon = High exploitation, Low exploration, High regret, High -> Low rewards IS4242 (Aditya Karanam) 39 MAB Algorithm: 𝜀-Greedy Algorithm ▸ Input: 𝜀 In the e-greedy algorithm, a lower value of epsilon (e) means that the algorithm will exploit (choose the best- known option) more frequently and explore (try other options) less often. ▸ Pull each arm a fixed number of times (e.g., once each) While exploitation helps in focusing on options with higher rewards, it can ‣ Get initial estimates of mean reward for arm: 𝜇-' also lead to higher regret if the algorithm fails to explore enough to find better alternatives. This can make it challenging to calculate regret ▸ In each round: accurately since the algorithm might be over-exploiting early on and ‣ Exploitation: With probability 1-𝜀, choose arm with (current) highest missing potentially better options. estimated reward ‣ Exploration: With probability 𝜀, select arm uniformly at random ‣ For the selected arm, observe reward and update mean reward so can you say that a lower value of epsilon means it will exploit more often than explore. This is similar to the greedy algorithm where it will get the local instead of globally optimal result. This may result in higher regret later onwards when better alternatives are failed to be selected. IS4242 (Aditya Karanam) 40 Evaluation of Testing Algorithms ▸ Regret (after T rounds) ‣ How much better could we have done if we had known the distributions from the beginning i.e., if know the optimal choice without learning ‣ Lower regret ≈ Better policy ▸ Expected difference between reward from optimal strategy and the sum/mean of collected rewards ‣ Simply maximize the reward ‣ Higher reward ≈ Lower regret ≈ Better policy Reward is the immediate output from each action, and regret quantifies the loss of potential rewards compared to an optimal policy. Where m = number of arm pulls, Regret=m * r − Sum r = optimal average reward per pull IS4242 (Aditya Karanam) Sum = sum of rewards (current) 41 A/B Testing vs MAB A/B Testing MAB Pure exploitation followed by pure Interleaved exploration and exploitation exploration Full exploration followed by full exploitation IS4242 (Aditya Karanam) 42 Results: Movie Recommendation A/B Testing MAB IS4242 (Aditya Karanam) https://github.com/husnejahan/Multi-armed-bandits-for-dynamic-movie-recommendations 43 Results: Movie Recommendation IS4242 (Aditya Karanam) 44 References ▸ Stefan H. Thomke (2020), Experimentation Works: The Surprising Power of Business Experiments, HBR Press. ▸ Kramer et al. (2014) Experimental evidence of massive-scale emotional contagion through social networks ‣ https://www.pnas.org/doi/10.1073/pnas.1320040111 IS4242 (Aditya Karanam) 45 Thank You © Copyright National University of Singapore. All Rights Reserved.

Use Quizgecko on...
Browser
Browser