The Discipline of Business Experimentation PDF

Document Details

JudiciousDetroit6938

Uploaded by JudiciousDetroit6938

Indian Institute of Management - Ahmedabad

2014

Stefan Thomke and Jim Manzi

Tags

business experimentation innovation business strategy management

Summary

This article from Harvard Business Review, discusses the discipline of business experimentation, offering insights into increasing the chances of success with innovation test-drives. The article presents examples of both successful and failed experiments, highlighting the importance of rigorous testing in decision making and emphasizing the need for a clear purpose to maximize the value of experimentation.

Full Transcript

HBR.ORG DECEMBER 2014 REPRINT R1412D SPOTLIGHT ON INNOVATION ON THE FLY...

HBR.ORG DECEMBER 2014 REPRINT R1412D SPOTLIGHT ON INNOVATION ON THE FLY The Discipline of Business Experimentation Increase your chances of success with innovation test-drives. by Stefan Thomke and Jim Manzi This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. SPOTLIGHT INNOVATION ON THE FLY SPOTLIGHT ARTWORK Berndnaut Smilde, Nimbus Green Room, 2013 Digital C-type print, 75 x 102 cm/125 x 170 cm Courtesy of the artist and Ronchini Gallery The Discipline of Business Experimentation 2 Harvard Business Review December 2014 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. FOR ARTICLE REPRINTS CALL 800-988-0886 OR 617-783-7500, OR VISIT HBR.ORG Stefan Thomke is the Jim Manzi is the founder William Barclay Harding and chairman of Applied Professor of Business Predictive Technologies, Administration at Harvard which provides software Business School. for designing and analyzing business experiments. Increase your chances of success with innovation test-drives. by Stefan Thomke and Jim Manzi oon after Ron Johnson left Apple to become the CEO of J.C. Penney, in 2011, his team imple- mented a bold plan that eliminated coupons and clearance racks, filled stores with branded boutiques, and used technology to eliminate cashiers, cash registers, and checkout counters. Yet just 17 months after Johnson joined Penney, sales had plunged, losses had soared, and Johnson had lost his job. The retailer then did an about-face. How could Penney have gone so wrong? Didn’t it have tons of transaction data revealing customers’ tastes and preferences? Presumably it did, but the problem is that big data can provide clues only about the past behav- ior of customers—not about how they will react to bold changes. When it comes to innovation, then, most managers must operate in a world where they lack sufficient data to inform their decisions. Consequently, they often rely on their experience or COPYRIGHT © 2014 HARVARD BUSINESS SCHOOL PUBLISHING CORPORATION. ALL RIGHTS RESERVED. December 2014 Harvard Business Review 3 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. SPOTLIGHT INNOVATION ON THE FLY intuition. But ideas that are truly innovative—that But the vast majority (more than 90%) of consumer is, those that can reshape industries—typically go business is conducted through more-complex dis- against the grain of executive experience and con- tribution systems, such as store networks, sales ter- ventional wisdom. ritories, bank branches, fast-food franchises, and so Managers can, however, discover whether a on. Business experimentation in such environments new product or business program will succeed by suffers from a variety of analytical complexities, the subjecting it to a rigorous test. Think of it this way: most important of which is that sample sizes are A pharmaceutical company would never introduce typically too small to yield valid results. Whereas a a drug without first conducting a round of experi- large online retailer can simply select 50,000 con- ments based on established scientific protocols. (In sumers in a random fashion and determine their fact, the U.S. Food and Drug Administration requires reactions to an experimental offering, even the extensive clinical trials.) Yet that’s essentially what largest brick-and-mortar retailers can’t randomly many companies do when they roll out new business assign 50,000 stores to test a new promotion. For models and other novel concepts. Had J.C. Penney them, a realistic test group usually numbers in the done thorough experiments on its CEO’s proposed dozens, not the thousands. Indeed, we have found changes, the company might have discovered that that most tests of new consumer programs are too customers would probably reject them. informal. They are not based on proven scientific Why don’t more companies conduct rigorous and statistical methods, and so executives end up tests of their risky overhauls and expensive propos- misinterpreting statistical noise as causation—and als? Because most organizations are reluctant to making bad decisions. fund proper business experiments and have con- In an ideal experiment the tester separates an siderable difficulty executing them. Although the independent variable (the presumed cause) from a process of experimentation seems straightforward, dependent variable (the observed effect) while hold- it is surprisingly hard in practice, owing to myriad ing all other potential causes constant, and then ma- organizational and technical challenges. That is the nipulates the former to study changes in the latter. overarching conclusion of our 40-plus years of col- The manipulation, followed by careful observation lective experience conducting and studying busi- and analysis, yields insight into the relationships be- ness experiments at dozens of companies, including tween cause and effect, which ideally can be applied Bank of America, BMW, Hilton, Kraft, Petco, Staples, to and tested in other settings. Subway, and Walmart. To obtain that kind of knowledge—and ensure Running a standard A/B test over a direct chan- that business experimentation is worth the expense nel such as the internet—comparing, for instance, and effort—companies need to ask themselves sev- the response rate to version A of a web page with the eral crucial questions: Does the experiment have response rate to version B—is a relatively uncompli- a clear purpose? Have stakeholders made a com- cated exercise using math developed a century ago. mitment to abide by the results? Is the experiment doable? How can we ensure reliable results? Have we gotten the most value out of the experiment? (See the sidebar “Checklist for Running a Business EXPERIMENT KOHL’S Experiment.”) Although those questions seem obvi- The retailer set out to test ous, many companies begin conducting tests with- out fully addressing them. the hypothesis that opening Does the Experiment Have a Clear Purpose? stores an hour later would Companies should conduct experiments if they are the only practical way to answer specific questions not lead to a significant about proposed management actions. Consider Kohl’s, the large retailer, which in 2013 drop in sales. was looking for ways to decrease its operating costs. One suggestion was to open stores an hour later on 4 Harvard Business Review December 2014 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. FOR ARTICLE REPRINTS CALL 800-988-0886 OR 617-783-7500, OR VISIT HBR.ORG Idea in Brief THE PROBLEM THE SOLUTION THE GUIDANCE In the absence of sufficient data to inform A rigorous scientific test, in which To make the most of their experiments, decisions about proposed innovations, companies separate an independent companies must ask: Does the experiment managers often rely on their experience, variable (the presumed cause) from a have a clear purpose? Have stakeholders intuition, or conventional wisdom—none dependent variable (the observed effect) made a commitment to abide by the of which is necessarily relevant. while holding all other potential causes results? Is the experiment doable? How constant, and then manipulate the former can we ensure reliable results? Have to study changes in the latter. we gotten the most value out of the experiment? Monday through Saturday. Company executives well in spot tests. But the initiative was killed before were split on the matter. Some argued that reducing the launch, when a rigorous experiment—complete the stores’ hours would result in a significant drop in with test and control groups followed by regression sales; others claimed that the impact on sales would analyses—showed that the new product would likely be minimal. The only way to settle the debate with cannibalize other more profitable items. any certainty was to conduct a rigorous experiment. A test involving 100 of the company’s stores showed Have Stakeholders Made a that the delayed opening would not result in any Commitment to Abide by meaningful sales decline. the Results? In determining whether an experiment is needed, Before conducting any test, stakeholders must agree managers must first figure out exactly what they how they’ll proceed once the results are in. They want to learn. Only then can they decide if testing should promise to weigh all the findings instead of is the best approach and, if it is, the scope of the ex- cherry-picking data that supports a particular point periment. In the case of Kohl’s, the hypothesis to of view. Perhaps most important, they must be will- be tested was straightforward: Opening stores an ing to walk away from a project if it’s not supported hour later to reduce operating costs will not lead to a by the data. significant drop in sales. All too often, though, com- When Kohl’s was considering adding a new prod- panies lack the discipline to hone their hypotheses, uct category, furniture, many executives were tre- leading to tests that are inefficient, unnecessarily mendously enthusiastic, anticipating significant ad- costly, or, worse, ineffective in answering the ques- ditional revenue. A test at 70 stores over six months, tion at hand. A weak hypothesis (such as “We can ex- however, showed a net decrease in revenue. Products tend our brand upmarket”) doesn’t present a specific that now had less floor space (to make room for the independent variable to test on a specific dependent furniture) experienced a drop in sales, and Kohl’s was variable, so it is difficult either to support or to reject. actually losing customers overall. Those negative re- A good hypothesis helps delineate those variables. sults were a huge disappointment for those who had In many situations executives need to go beyond advocated for the initiative, but the program was the direct effects of an initiative and investigate its nevertheless scrapped. The Kohl’s example high- ancillary effects. For example, when Family Dollar lights the fact that experiments are often needed to wanted to determine whether to invest in refrigera- perform objective assessments of initiatives backed tion units so that it could sell eggs, milk, and other by people with organizational clout. perishables, it discovered that a side effect—the Of course, there might be good reasons for roll- increase in the sales of traditional dry goods to the ing out an initiative even when the anticipated ben- additional customers drawn to the stores by the efits are not supported by the data—for example, a refrigerated items—would actually have a bigger program that experiments have shown will not sub- impact on profits. Ancillary effects can also be nega- stantially boost sales might still be necessary to build tive. A few years ago, Wawa, the convenience store customer loyalty. But if the proposed initiative is a chain in the mid-Atlantic United States, wanted to done deal, why go through the time and expense of introduce a flatbread breakfast item that had done conducting a test? December 2014 Harvard Business Review 5 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. SPOTLIGHT INNOVATION ON THE FLY Checklist for Running a Business Experiment Purpose Buy-In Feasibility D oes the experiment focus on W  hat specific changes would be made D  oes the experiment have a testable a specific management action on the basis of the results? prediction? under consideration? How will the organization ensure that What is the required sample size? What do people hope to learn the results aren’t ignored? Note: The sample size will depend on from the experiment? How does the experiment fit into the the expected effect (for example, a 5% organization’s overall learning agenda increase in sales). and strategic priorities? Can the organization feasibly conduct the experiment at the test locations for the required duration? A process should be instituted to ensure that test business,” says John Rhoades, the company’s for- results aren’t ignored, even when they contradict mer director of retail analytics. “We want to try new the assumptions or intuition of top executives. At concepts or new ideas.” Publix Super Markets, a chain in the southeastern United States, virtually all large retail projects, espe- Is the Experiment Doable? cially those requiring considerable capital expendi- Experiments must have testable predictions. But the tures, must undergo formal experiments to receive “causal density” of the business environment—that a green light. Proposals go through a filtering pro- is, the complexity of the variables and their interac- cess in which the first step is for finance to perform tions—can make it extremely difficult to determine an analysis to determine if an experiment is worth cause-and-effect relationships. Learning from a conducting. business experiment is not necessarily as easy as For projects that make the cut, analytics profes- isolating an independent variable, manipulating it, sionals develop test designs and submit them to and observing changes in the dependent variable. a committee that includes the vice president of fi- Environments are constantly changing, the poten- nance. The experiments approved by the committee tial causes of business outcomes are often uncertain are then conducted and overseen by an internal test or unknown, and so linkages between them are fre- group. Finance will approve significant expenditures quently complex and poorly understood. only for proposed initiatives that have adhered to this Consider a hypothetical retail chain that has process and whose experiment results are positive. 10,000 convenience stores, 8,000 of which are “Projects get reviewed and approved much more named QwikMart and 2,000 FastMart. The Qwik- quickly—and with less scrutiny—when they have Mart stores have been averaging $1 million in an- our test results to back them,” says Frank Maggio, the nual sales and the FastMart stores $1.1 million. A senior manager of business analysis at Publix. senior executive asks a seemingly simple question: When constructing and implementing such a fil- Would changing the name of the QwikMart stores to tering process, it is important to remember that ex- FastMart lead to an increase in revenue of $800 mil- periments should be part of a learning agenda that lion? Obviously, numerous factors affect store sales, supports a firm’s organizational priorities. At Petco including the physical size of the store, the number each test request must address how that particu- of people who live within a certain radius and their lar experiment would contribute to the company’s average incomes, the number of hours the store is overall strategy to become more innovative. In the open per week, the experience of the store manager, past the company performed about 100 tests a year, the number of nearby competitors, and so on. But but that number has been trimmed to 75. Many the executive is interested in just one variable: the test requests are denied because the company has stores’ name (QwikMart versus FastMart). done a similar test in the past; others are rejected The obvious solution is to conduct an experiment because the changes under consideration are not by changing the name of a handful of QwikMart radical enough to justify the expense of testing (for stores (say, 10) to see what happens. But even deter- example, a price increase of a single item from $2.79 mining the effect of the name change on those stores to $2.89). “We want to test things that will grow the turns out to be tricky, because many other variables 6 Harvard Business Review December 2014 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. FOR ARTICLE REPRINTS CALL 800-988-0886 OR 617-783-7500, OR VISIT HBR.ORG Reliability Value W  hat measures will be used to account for H  as the organization considered a targeted systemic bias, whether it’s conscious or rollout—that is, one that takes into account unconscious? a proposed initiative’s effect on different Do the characteristics of the control group customers, markets, and segments—to match those of the test group? concentrate investments in areas where Can the experiment be conducted in either the potential payback is highest? “blind” or “double-blind” fashion? Has the organization implemented only Have any remaining biases been eliminated the components of an initiative with the through statistical analyses or other highest return on investment? techniques? Does the organization have a better Would others conducting the same test understanding of what variables are obtain similar results? causing what effects? may have changed at the same time. For example, way: The smaller the expected effect, the greater the the weather was very bad at four of the locations, number of observations that are required to detect it a manager was replaced in one, a large residential from the surrounding noise with the desired statisti- building opened near another, and a competitor cal confidence. started an aggressive advertising promotion near yet Selecting the right sample size does more than en- another. Unless the company can isolate the effect sure that the results will be statistically valid; it can of the name change from those and other variables, also enable a company to decrease testing costs and the executive won’t know for sure whether the name increase innovation. Readily available software pro- change has helped (or hurt) business. grams can help companies choose the optimal sam- To deal with environments of high causal den- ple size. (Full disclosure: Jim Manzi’s firm, Applied sity, companies need to consider whether it’s feasi- Predictive Technologies, sells one, Test & Learn.) ble to use a sample large enough to average out the effects of all variables except those being studied. How Can We Ensure Reliable Results? Unfortunately, that type of experiment is not always In the previous section we described the basics for doable. The cost of a test involving an adequate conducting an experiment. However, the truth is that sample size might be prohibitive, or the change companies typically have to make trade-offs between in operations could be too disruptive. In such in- reliability, cost, time, and other practical consider- stances, as we discuss later, executives can some- ations. Three methods can help reduce the trade-offs, times employ sophisticated analytical techniques, thus increasing the reliability of the results. some involving big data, to increase the statistical Randomized field trials. The concept of ran- validity of their results. domization in medical research is simple: Take a That said, it should be noted that managers of- large group of individuals with the same charac- ten mistakenly assume that a larger sample will teristics and affliction, and randomly divide them automatically lead to better data. Indeed, an experi- into two subgroups. Administer the treatment to ment can involve a lot of observations, but if they just one subgroup and closely monitor everyone’s are highly clustered, or correlated to one another, health. If the treated (or test) group does statistically then the true sample size might actually be quite better than the untreated (or control) group, then small. When a company uses a distributor instead the therapy is deemed to be effective. Similarly, ran- of selling directly to customers, for example, that domized field trials can help companies determine distribution point could easily lead to correlations whether specific changes will lead to improved among customer data. performance. The required sample size depends in large part The financial services company Capital One has on the magnitude of the expected effect. If a com- long used rigorous experiments to test even the most pany expects the cause (for example, a change in seemingly trivial changes. Through randomized field store name) to have a large effect (a substantial in- trials, for instance, the company might test the color crease in sales), the sample can be smaller. If the of the envelopes used for product offers by sending expected effect is small, the sample must be larger. out two batches (one in the test color and the other This might seem counterintuitive, but think of it this in white) to determine any differences in response. December 2014 Harvard Business Review 7 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. SPOTLIGHT INNOVATION ON THE FLY Randomization plays an important role: It helps they are aware that they are part of an experiment. prevent systemic bias, introduced consciously or At Petco none of the test stores’ staffers know when unconsciously, from affecting an experiment, and experiments are under way, and Publix conducts it evenly spreads any remaining (and possibly un- blind tests whenever it can. For simple tests involv- known) potential causes of the outcome between ing price changes, Publix can use blind procedures the test and control groups. But randomized field because stores are continually rolling out new prices, tests are not without challenges. For the results to so the tests are indistinguishable from normal oper- be valid, the field trials must be conducted in a sta- ating practices. tistically rigorous fashion. But blind procedures are not always practical. Instead of identifying a population of test subjects For tests of new equipment or work practices, Publix with the same characteristics and then randomly di- typically informs the stores that have been selected viding it into two groups, managers sometimes make for the test group. (Note: A higher experimental the mistake of selecting a test group (say, a group of standard is the use of “double-blind” tests, in which stores in a chain) and then assuming that everything neither the experimenters nor the test subjects are else (the remainder of the stores) should be the con- aware of which participants are in the test group trol group. Or they select the test and control groups and which are in the control. Double-blind tests are in ways that inadvertently introduce biases into the widely used in medical research but are not com- experiment. Petco used to select its 30 best stores to monplace in business experimentation.) try out a new initiative (as a test group) and compare Big data. In online and other direct-channel them with its 30 worst stores (as the control group). environments, the math required to conduct a rig- Initiatives tested in this way would often look very orous randomized experiment is well known. But as promising but fail when they were rolled out. we discussed earlier, the vast majority of consumer Now Petco considers a wide range of parameters— transactions occur in other channels, such as retail store size, customer demographics, the presence of stores. In tests in such environments, sample sizes nearby competitors, and so on—to match the charac- are often smaller than 100, violating typical assump- teristics of the control and test groups. (Publix does tions of many standard statistical methods. To mini- the same.) The results from those experiments have mize the effects of this limitation, companies can been much more reliable. utilize specialized algorithms in combination with Blind tests. To minimize biases and increase multiple sets of big data (see the sidebar “How Big reliability further, Petco and Publix have conducted Data Can Help”). “blind” tests, which help prevent the Hawthorne ef- Consider a large retailer contemplating a store fect: the tendency of study participants to modify redesign that was going to cost a half-billion dollars their behavior, consciously or subconsciously, when to roll out to 1,300 locations. To test the idea, the retailer redesigned 20 stores and tracked the results. The finance team analyzed the data and concluded that the upgrade would increase sales by a meager 0.5%, resulting in a negative return on investment. The marketing team conducted a separate analy- EXPERIMENT WAWA sis and forecast that the redesign would lead to a A new flatbread did well healthy 5% sales increase. As it turned out, the finance team had com- pared the test sites with other stores in the chain in spot tests, but the chain that were of similar size, demographic income, and other variables but were not necessarily in the killed it after rigorous same geographic market. It had also used data six months before and after the redesign. In contrast, experiments revealed it the marketing team had compared stores within the same geographic region and had considered data 12 cannibalized other products. months before and after the redesign. To determine which results to trust, the company employed big 8 Harvard Business Review December 2014 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. FOR ARTICLE REPRINTS CALL 800-988-0886 OR 617-783-7500, OR VISIT HBR.ORG How Big Data Can Help To filter out statistical noise and identify cause-and-effect relationships, business experiments should ideally employ samples numbering in the thousands. But this can be data, including transaction-level data (store items, prohibitively expensive or impossible. A new approach the times of day when the sale occurred, prices), store attributes, and data on the environments to merchandise assortment may have to be tested in just around the stores (competition, demographics, 25 stores, a sales-training program with 32 salespeople, weather). In this way, the company selected stores and a proposed remodeling in 10 hotel properties. In such for the control group that were a closer match with situations, big data and other sophisticated computing those in which the redesign was tested, which made techniques, such as “machine learning,” can help. Here’s how: the small sample size statistically valid. It then used objective, statistical methods to review both analy- Getting started ses. The results: The marketing team’s findings were If a retailer wants to test a new store layout, it should collect detailed the more accurate of the two. data (such as competitors’ proximity, employees’ tenures, and customer Even when a company can’t follow a rigorous test- demography) about each unit of analysis (each store and its trade area, each ing protocol, analysts can help identify and correct salesperson and her accounts, and so on). This will become part of a big for certain biases, randomization failures, and other data set. Determining how many and which stores, customers, or employees experimental imperfections. A common situation is should be part of the test and how long the test should run depends on when an organization’s testing function is presented the volatility in the data and the precision required for impact estimates. with nonrandomized natural experiments—the vice president of operations, for example, might want to Building a control group know if the company’s new employee training pro- In experiments involving small samples, correctly matching test subjects gram, which was introduced in about 10% of the com- (such as individual stores or customers) to control subjects is essential pany’s markets, is more effective than the old one. As and depends on the experimenter’s ability to fully identify dozens or even it turns out, in such situations the same algorithms hundreds of variables that characterize the test subjects. Big data feeds and big data sets that can be used to address the (complete transaction logs by customer, detailed weather data, social problem of small or correlated samples can also be media streams, and so on) can assist in this. Once the characteristics are deployed to tease out valuable insights and minimize determined, a control group can be built that contains all elements of the test group except for what is being tested. This allows the retailer to determine uncertainty in the results. The analysis can then help whether the test results were influenced only by that one element—the experimenters design a true randomized field trial to new layout—or by other factors (demographic variances, better economic confirm and refine the results, especially when they conditions, warmer weather). are somewhat counterintuitive or are needed to in- form a decision with large economic stakes. Targeting the best opportunities For any experiment, the gold standard is repeat- ability; that is, others conducting the same test The same data feeds can be used to identify situations in which the tested program is effective. For example, the new store layout may work better should obtain similar results. Repeating an expen- in highly competitive urban areas but may be only moderately successful sive test is usually impractical, but companies can in other markets. By pinpointing these patterns, the experimenter can verify results in other ways. Petco sometimes de- implement the program in situations where it works and avoid investments ploys a staged rollout for large initiatives to confirm where the program may not generate the best ROI. the results before proceeding with a companywide implementation. And Publix has a process for track- Tailoring the program ing the results of a rollout and comparing them with the predicted benefit. Additional large data feeds can be used to characterize program components that are more or less effective. For example, a retailer testing the effects of a new store layout can use data captured from in-store video streams to Have We Gotten the Most Value determine whether the new layout is encouraging customers to move through out of the Experiment? more of the store or is generating more traffic near high-margin products. Many companies go through the expense of con- Or the experimenter may find that moving items to the front of the store and ducting experiments but then fail to make the most putting in new shelves have a positive impact, but moving the sales registers of them. To avoid that mistake, executives should disrupts checkouts and hurts profits. take into account a proposed initiative’s effect on various customers, markets, and segments and con- centrate investments in areas where the potential paybacks are highest. The correct question is usually not, What works? but, What works where? December 2014 Harvard Business Review 9 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. SPOTLIGHT INNOVATION ON THE FLY Petco frequently rolls out a program only in stores that are most similar to the test stores that had the best results. By doing so, Petco not only saves BEST PRACTICE PETCO on implementation costs but also avoids involving stores where the new program might not deliver benefits or might even have negative consequences.  he specialty retailer T Thanks to such targeted rollouts, Petco has consis- tently been able to double the predicted benefits of ensures reliable results from new initiatives. Another useful tactic is “value engineering.” Most its experiments by matching programs have some components that create ben- efits in excess of costs and others that do not. The trick, then, is to implement just the components the characteristics of the with an attractive return on investment (ROI). As a simple example, let’s say that a retailer’s tests of a control and test groups. 20%-off promotion show a 5% lift in sales. What por- tion of that increase was due to the offer itself and what resulted from the accompanying advertising made the front porches of the restaurants look dim- and training of store staff, both of which directed mer, and many customers mistakenly thought that customers to those particular sales products? In the restaurants were closed. This was puzzling—the such cases, companies can conduct experiments to LEDs should have made the porches brighter. Upon investigate various combinations of components further investigation, executives learned that the (for instance, the promotional offer with advertising store managers hadn’t previously been following the but without additional staff training). An analysis of company’s lighting standards; they had been making the results can disentangle the effects, allowing ex- their own adjustments, often adding extra lighting ecutives to drop the components (say, the additional on the front porches. And so the luminosity dropped staff training) that have a low or negative ROI. when the stores adhered to the new LED policy. The Moreover, a careful analysis of data generated by point here is that correlation alone would have left experiments can enable companies to better under- the company with the wrong impression—that LEDs stand their operations and test their assumptions of are bad for business. It took experimentation to un- which variables cause which effects. With big data, cover the actual causal relationship. the emphasis is on correlation—discovering, for in- Indeed, without fully understanding causality, stance, that sales of certain products tend to coin- companies leave themselves open to making big cide with sales of others. But business experimenta- mistakes. Remember the experiment Kohl’s did to tion can allow companies to look beyond correlation investigate the effects of delaying the opening of its and investigate causality—uncovering, for instance, stores? During that testing, the company suffered an the factors causing the increase (or decrease) of pur- initial drop in sales. At that point, executives could chases. Such fundamental knowledge of causality have pulled the plug on the initiative. But an analysis can be crucial. Without it, executives have only a showed that the number of customer transactions fragmentary understanding of their businesses, and had remained the same; the issue was a drop in units the decisions they make can easily backfire. per transaction. Eventually, the units per transaction When Cracker Barrel Old Country Store, the recovered and total sales returned to previous levels. Southern-themed restaurant chain, conducted an Kohl’s couldn’t fully explain the initial decrease, but experiment to determine whether it should switch executives resisted the temptation to blame the re- from incandescent to LED lights at its restaurants, ex- duced operating hours. They didn’t rush to equate ecutives were astonished to learn that customer traf- correlation with causation. fic actually decreased in the locations that installed What’s important here is that many companies LED lights. The lighting initiative could have stopped are discovering that conducting an experiment is there, but the company dug deeper to understand the just the beginning. Value comes from analyzing and underlying causes. As it turned out, the new lighting then exploiting the data. In the past, Publix spent 10 Harvard Business Review December 2014 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. FOR ARTICLE REPRINTS CALL 800-988-0886 OR 617-783-7500, OR VISIT HBR.ORG 80% of its testing time gathering data and 20% ana- have an ‘ugly’ price,” notes Rhoades. At first, ex- lyzing it. The company’s current goal is to reverse ecutives at Petco were skeptical of the results, but that ratio. because the experiment had been conducted so rigorously, they eventually were willing to give the Challenging Conventional Wisdom new pricing a try. A targeted rollout confirmed the By paying attention to sample sizes, control groups, results, leading to a sales jump of more than 24% randomization, and other factors, companies can after six months. ensure the validity of their test results. The more The lesson is not merely that business experi- valid and repeatable the results, the better they will mentation can lead to better ways of doing things. It hold up in the face of internal resistance, which can also give companies the confidence to overturn can be especially strong when the results challenge wrongheaded conventional wisdom and the faulty long-standing industry practices and conventional business intuition that even seasoned executives wisdom. can display. And smarter decision making ultimately When Petco executives investigated new pric- leads to improved performance. ing for a product sold by weight, the results were Could J.C. Penney have averted disaster by rig- unequivocal. By far, the best price was for a quar- orously testing the components of its overhaul? At ter pound of the product, and that price was for an this point, it’s impossible to know. But one thing’s amount that ended in $.25. That result went sharply for certain: Before attempting to implement such a against the grain of conventional wisdom, which bold program, the company needed to make sure typically calls for prices ending in 9, such as $4.99 or that knowledge—not just intuition—was guiding the $2.49. “This broke a rule in retailing that you can’t decision.  HBR Reprint R1412D HARLEY SCHWADRON December 2014 Harvard Business Review 11 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024.

Use Quizgecko on...
Browser
Browser