Predictably Inaccurate: Big Data Perils PDF
Document Details
Uploaded by Deleted User
2017
John Lucker, Susan K. Hogan, and Trevor Bischoff
Tags
Summary
This Deloitte review examines the prevalence and perils of inaccurate big data. It explores the impact of bad big data on business decisions and customer relationships. The article discusses potential adverse consequences of relying on big data.
Full Transcript
Predictably inaccurate: The prevalence and perils of bad big data Issue 21 | July 2017 Special issue: SPECIAL ISSUE: Navigating Navigating...
Predictably inaccurate: The prevalence and perils of bad big data Issue 21 | July 2017 Special issue: SPECIAL ISSUE: Navigating Navigating the the future future of work work Can Can we point point business, business,workers, workers, and and social social institutions in the institutions in thesame samedirection? direction? 8 www.deloittereview.com Predictably inaccurate 9 Predictably inaccurate The prevalence and perils of bad big data By John Lucker, Susan K. Hogan, and Trevor Bischoff Illustration by Jon Krause “We’re not that much smarter than we used to be, even though we have much more information—and that means the real skill now is learning how to pick out the useful information from all this noise.” —Nate Silver1 IS OUR LOVE AFFAIR WITH BIG DATA LEADING US ASTRAY? S OCIETY and businesses have fallen in love with big data. We can’t get enough: The more we collect, the more we want. Some companies hoard data, unsure of its value or unclear if or when it will be useful to them but, all the while, reticent to delete or not capture it for fear of missing out on potential future value. Stoking this appetite is the sheer growth in the volume, veloc- ity, and variety of the data. Most of all, many business leaders see high potential in a fourth V: value. Given our ability to access and (potentially) understand every move our current and potential customers make, coupled with www.deloittereview.com 10 Predictably inaccurate access to their demographic, biographic, and PERSONAL DATA THAT’S BOTH psychographic data, it seems logical that we INCOMPLETE AND INACCURATE should be able to form a more intimate, mean- “It’s pretty scary how wrong data collected ingful relationship with them. Every data point about you can be—especially if people make should move the business at least one step important decisions based on this incorrect information. This becomes more frighten- closer to the customer. ing as more and more decisions become information-based.” Yet despite all the digital breadcrumbs, it turns —Survey respondent T out that marketers might know less about in- O better gauge the degree and types of dividual consumers than they think. The num- big data inaccuracies and consumer bers don’t lie—or do they? What if much of this willingness to help correct any inaccu- data is less accurate than we expect it to be? racies, we conducted a survey to test how ac- Perils ranging from minor embarrassments to curate commercial data-broker data is likely complete customer alienation may await busi- to be—data upon which many firms rely for nesses that increasingly depend on big data to marketing, research and development, product guide business decisions and pursue micro- management, and numerous other activities. segmentation and micro-targeting marketing (See the sidebar “Survey methodology” for de- strategies. Specifically, overconfidence in the tails.) Some of the key findings:3 accuracy of both original and purchased data can lead to a false sense of security that can More than two-thirds of survey respondents compromise these efforts to such an extent stated that the third-party data about them that it undermines the overall strategy. was only 0 to 50 percent correct as a whole. One-third of respondents perceived the in- This article explores the potential adverse formation to be 0 to 25 percent correct. consequences of our current love affair with big data. Evidence from our prior2 and cur- Whether individuals were born in the Unit- rent primary research, supported by secondary ed States tended to determine whether they research, highlights the potential prevalence were able to locate their data within the and types of inaccurate data from US-based data broker’s portal. Of those not born in data brokers, as well as the factors that might the United States, 33 percent could not lo- be causing these errors. The good news is that cate their data; conversely, of those born in strategies and guardrails exist to help busi- the United States, only 5 percent had miss- nesses improve the accuracy of their data sets ing information. Further, no respondents as well as decrease the risks associated with born outside the United States and resid- overreliance on big data in general. www.deloittereview.com Predictably inaccurate 11 ing in the country for less than three years cent correct, while 75 percent said the ve- could locate their data. hicle data was 0 to 50 percent correct. In contrast to auto data, home data was con- The type of data on individuals that was sidered more accurate, with only 41 percent most available was demographic informa- of respondents judging their data to be 0 to tion; the least available was home data. 50 percent accurate. However, even if demographic information was available, it was not all that accurate Only 42 percent of participants said that and was often incomplete, with 59 percent their listed online purchase activity was cor- of respondents judging their demographic rect. Similarly, less than one-fourth of par- data to be only 0 to 50 percent correct. Even ticipants felt that the information on their seemingly easily available data types (such online and offline spending and the data on as date of birth, marital status, and number their purchase categories were more than of adults in the household) had wide vari- 50 percent correct. ances in accuracy. While half of the respondents were aware Nearly 44 percent of respondents said the that this type of information about them information about their vehicles was 0 per- existed among data providers, the remain- SURVEY METHODOLOGY Our survey asked 107 Deloitte US professionals to privately and anonymously review data made available by a leading consumer data broker, a broker with a publicly available, web-based portal that presents users with a variety of personal and household data. Respondents, all between 22 and 67 years of age, completed the rapid-response, 87-question survey between January 12–March 31, 2017. Respondents viewed their third-party data profiles along a number of specific variables (such as gender, marital status, and political affiliation), grouped into six categories (economic, vehicle, demographic, interest, purchase, and home). To calculate the “percent correct” for each individual variable, we took the number of participants who indicated that the third-party data point for that variable was correct, and divided it by the total number of participants for whom third-party data were available for that variable. To determine respondents’ views of the accuracy of the data for each category, we asked them to indicate whether they felt the category data was 0 percent, 25 percent, 50 percent, 75 percent, or 100 percent accurate. www.deloittereview.com 12 Predictably inaccurate Figure 1. Reported accuracy of third-party consumer data from our respondents Data category inaccuracy Variables surveyed Percentage of participants that judged that their 0% 25% 50% 75% 100% data in each category was only 0 to 50% correct Gender 84% 75% 59% Owner or renter of home Economic Vehicle Demographic Home construction year 54% 49% 41% Type of home Interest Purchase Home Number of bedrooms in home Home market value Overall: 71% Veteran Purchase date of home Awareness of consumer Birthday data collection Length of residence 50% Count of rooms in home Presence of children 30% Marital status 20% Associated political party Date moved into home Primary vehicle make Unaware Aware, but Aware and Number of adults in household surprised not surprised by extent by extent Smoker of data of data points points Owner of life insurance policy Primary vehicle model The data for nearly half of the variables examined was only 50 Education level achieved percent or less likely to be Primary vehicle year accurate—equivalent to the accuracy obtained by tossing a coin. Purchase activity online Secondary vehicle make Secondary vehicle model For each variable surveyed, percent correct was defined as Secondary vehicle year the proportion of participants Household income range indicating the respective variable was correct to the total number Intent for vehicle purchase of participants for whom the respective variable was available. Number of children Percent unavailable was defined Children’s gender by age as the proportion of participants indicating the respective variable 0% 25% 50% 75% 100% was unavailable to view to the total number of participants that Percent correct Percent unavailable provided a response. Deloitte University Press | dupress.deloitte.com www.deloittereview.com Predictably inaccurate 13 ing half were surprised or completely un- “There was lots of information that didn’t ex- aware of the scale and breadth of the data ist about me. And of the data that did exist, much seemed inconsistent with other data.” being gathered. —Survey respondent Figure 1 outlines other inaccuracies or omis- Interestingly, even after being offered the op- sions related to date of birth, education level, portunity to edit their data via the data bro- number of children, political affiliation, and ker’s online portal, few respondents chose to household income. Clearly, all of these types of do so. While approximately two-thirds of re- data are potentially important to marketers as spondents reported that at least half of their they target different consumer segments. information was inaccurate, only 37 percent opted to edit their data. Can we count on individuals to correct their own data? The most common best reason for the deci- “While I wasn’t surprised by the extent of the sion to edit (given by 31 percent of respon- data collected, it was interesting to see it. I dents who chose to edit) was to improve the was actually surprised at how little data information’s accuracy. The second most com- there was about me (I am an avid online mon response was a decision to edit only what shopper), and how incomplete the ‘cyber me’ picture is. I’m not complaining about it, seemed relevant (provided by 17 percent of re- though.” spondents opting to edit). Another 11 percent —Survey respondent of respondents who opted to edit cited privacy Survey respondents were provided with the and nervousness about their data being “out opportunity to elaborate on why they thought there.” Other respondents noted the desire to their data might be wrong or incomplete. Most reduce or avoid targeted messaging and politi- commonly, the available information was cal mailings, as well as the hope of improving outdated—especially vehicle data. Many oth- their credit rating (even though, presumably ers saw the data as characterizing their par- unknown to them, this type of marketing data ents or other household members (spouses or has no direct connection to how credit scores children) rather than themselves. The most- are derived). The most commonly edited cat- mentioned feeling among respondents was egories were demographic data and political surprise—not at the amount of correct data party data. available, but rather that the information was Why did so many respondents elect not to edit so limited, of poor quality, and inconsistent. their data? Most often, people cited privacy In essence, for many respondents, the data concerns. Other reasons included no perceived seemed, as aptly put by one respondent, “stale.” www.deloittereview.com 14 Predictably inaccurate value in editing and ambiguity regarding how THE PERILS OF RELYING ON BAD third parties might use the data. Table 1 gives DATA O an overview of the most common reasons for UR survey findings suggest that the the decision to edit or not. data that brokers sell not only has se- rious accuracy problems, but may be “I’m skeptical and cautious about what could less current or complete than data buyers ex- be done with this data. Even assuming the best of intentions and integrity by people pect or need. Given that a major US marketing who might consume this data, I cannot data broker hosts the publicly available portal imagine a scenario that would also be in my used for our survey, these findings can be con- or my family’s best interest. I would actually prefer less personal information about me sidered a credible representation of the entire to exist publicly. So, obscure, inaccurate, or US marketing data available from numerous unreliable data is what I consider to be the data brokers. The impacts of inaccurate or in- next best thing.” complete data are many, ranging from missed —Survey respondent opportunities to just plain misses. Table 1. Common reasons driving decisions to edit or not to edit data Why did you edit your data? Why didn’t you edit your data? To make data more accurate/better Privacy Corrected only where I perceived that it No perceived value/not worth the time was a a e w ee and energy Privacy/nervous that this data is even Not interested/don’t care what data out there they have on me ed ea d a e ed ads e s Cautious/unclear how the information will be used Lack of time to edit d e w fi e s Against targeted marketing Source: Deloitte analysis. Deloitte University Press | dupress.deloitte.com www.deloittereview.com Predictably inaccurate 15 Missed opportunity 1: Underestimating Missed opportunity 2: Decreased customer worth and not capitalizing on customer loyalty and revenue the power of habit “[The data] stated that I own a property that “I wish I spent only that much. My purchas- is actually owned by my parents, and at the ing data seems significantly understated same time, it failed to list the property that I from what I know I spend in the categories currently do own.” indicated.” —Survey respondent —Survey respondent Another area of significant inaccuracy was Understanding the spending behavior and home residence and vehicle ownership, which power of current and potential customers is was quite surprising given the readily available very important to firms. Many marketers ex- public records for each. As stated previously, trapolate this information based on three key home data was more accurate than auto data, categories: current income, modeled net worth, but still considerably inaccurate overall. Re- and prior purchasing behavior. Consumers are spondents suggested that the data in these two creatures of habit—our past spending behavior categories was often outdated—potentially by is one of the best indicators for marketers to five to ten years. determine not only how much we will spend in the future, but what types of items we are One of the highest-expenditure periods in an likely to purchase. This can guide predictions individual’s life is when she makes a household on how much revenue a company can expect move. Not only are these moves expensive— to see in the coming year, as well as any cross- households incur significant ancillary spending selling or up-selling efforts.4 Given this infor- as well, even with local moves. When moving mation’s importance to marketers, and the from one geography to another with a differ- incredible number of digital breadcrumbs that ent climate, the consumer often starts from consumers leave behind, we were surprised scratch in numerous product categories (new to find such a high level of inaccuracy. More wardrobe, home furnishings, outdoor equip- often than not, respondents indicated that the ment, and so on). A marketer wouldn’t want to household income data provided by the broker miss this transitional moment, in which con- was incorrect, with purchasing data often un- sumers spend more money than they typically derestimated, suggesting that marketers rely- would as well as form new behaviors—includ- ing on this information to guide their targeting ing purchasing routines and loyalties. With- efforts may be leaving potential revenue on the out a timely and relatively accurate picture of table. a consumer’s residence changes, the marketer could miss out on influencing momentary pur- www.deloittereview.com 16 Predictably inaccurate Figure 2. What do people think about their own big data profiles? A sampling of comments from our respondents Most surprising was that I assumed that making I like that the info is wrong. online purchases allowed for easier tracking. However, It might save me from certain types my purchase history was probably the least accurate. of mailings, scams, or other things. If my data is representative, I think the system has me this seems pretty useless. confused with someone else entirely since it thinks my birth year is 1947 (actually 1992) and it thinks I'm married (I’m single). All the information that was correct was most likely due to chance. Weird that I was listed as blue-collar as I have been a professional my The data was outdated, entire 30-plus-year career. as if it were a snapshot of a point in time 10 years ago. All it says is that I am interested in domestic travel. That’s it? It said I have a renewable car insurance policy; I don't own a car. It said I was single (I am married), I have no children (I have six), and I vote Democrat (I often vote Republican). Didn't get much correct other than info Fortunately they are WAAAYYYY that I gave them. under on our household income. Woefully It said I was interested in about 100+ things. incomplete. I do not own a home and rent an This did not seem right to me. apartment; the data says that I have been a homeowner for over 14 years. I changed information regarding political affiliation in an attempt to avoid politically focused communications. $451 I removed some incorrect information, spent total? but then got tired of editing, so just left it. I wish! Deloitte University Press | dupress.deloitte.com chases, subsequent add-on purchases, and, po- average 13 percent hit on revenue. Additionally, tentially, building long-run customer loyalty. 70 percent of financial institutions blame poor data quality for ongoing problems with their Corroborating our findings, a third-party data loyalty efforts.5 quality study found that 92 percent of financial institutions rely on faulty information to bet- Miss 1: Moving the customer ter understand their members, a rate likely at- relationship along too fast tributable to human errors and flaws in the way “I’m annoyed that nothing is private anymore. multiple data sources were combined. Fully 80 I rarely use advertisements for purchasing percent of credit unions believe the inaccura- decisions anyway, and I wish I could stop receiving them altogether.” cies have affected their bottom line, causing an —Survey respondent www.deloittereview.com Predictably inaccurate 17 It should go without saying that micro-target- ic group profile, may be viewed as invasive and ed messaging is full of pitfalls—regardless of a little too close for comfort. This latter situa- the accuracy of the data on which it is based. tion can lead to a 5 percent decrease in intent Take, for example, the father who learned to purchase.10 about his daughter’s pregnancy through retail- Miss 2: Delivering the wrong or er offerings that came in the mail after the re- inappropriate micro-targeted message tailer detected purchasing behavior correlated with pregnancy.6 While evidence suggests that “Some of the misses were really bad, like my political party and my interest in tobacco!” consumers are becoming more receptive to —Survey respondent personalized marketing, marketers still need Probably worse than getting too close is get- to be thoughtful and tread lightly in this area.7 ting it wrong. When a marketer tries to make This word of warning is consistent with re- a personal connection through messaging us- cent research identifying similarities between ing wrong or inappropriate information, the interpersonal relationship development and effects can range from humorous—such as a business and customer relationships,8 as well twentysomething receiving AARP member- as existing theories regarding healthy relation- ship invitations11—to sad. The latter was the ship development. Particularly, self-disclosure case with a recently mailed discount offer that, of personal information is meant to follow a while sent to a live person, included an (accu- reciprocal and progressive course, with initial rate) reference to not only a recently deceased mutual sharing of surface-level personal infor- family member but the way this person died— mation over time evolving to a more intimate embedded into the recipient’s mailing address. level of exchange.9 Too much, too soon from The firm that had given the offer, which didn’t either party can come across as invasive and believe it could have sent out this mailing un- creepy—and disrupt the relationship that has til receiving the physical proof, claimed this developed so far. This means that demonstrat- blunder was the result of a rented mailing list ing a ballpark knowledge of your customer from a third-party provider.12 While reported early on may be more beneficial than dem- cases such as this last example are rare, bas- onstrating an intimate or precise knowledge. ing a personalized message around wrong or Recent research has corroborated this idea, inappropriate information, and subsequently suggesting that semi-tailored or customized delivering the wrong micro-targeted message advertising can lead to a 5 percent increase in to customers, can not only diminish the effect intent to purchase. However, advertising that of marketing efforts, but do more damage than gets too specific, by seeming to zero in on one good. This adverse reaction is often referred individual as opposed to a general demograph- www.deloittereview.com 18 Predictably inaccurate to as a boomerang effect: causing a customer based flu-tracking model forecast an increase to move from a neutral, nonexistent, or posi- in influenza-related doctor visits that was tive attitude toward the company to a negative more than double what the Centers for Disease one.13 Control and Prevention (CDC) predicted.20 While the CDC based its predictions on various Miss 3: Assessing risk inaccurately laboratory surveillance reports collected from Both private and public health care institu- across the United States, the culprit behind tions often create and rely on big data models the social media tracking tool’s wildly different to understand their patients’ future needs and result was what some researchers have called potential life spans. Such risk models, however, “big data hubris”: the mistake of assuming that go beyond managing an insurer’s bottom line big data can substitute for, rather than supple- by helping identify high-risk clients. Inaccu- 14 ment, traditional methods of data collection rate data can prompt inaccurate assessments and analysis.21 such as determining financial risks, 15 life ex- HOW DID THE DATA GET SO BAD? U pectancies,16 and medical care needs, which can lead to inappropriate insurance payments at NFORTUNATELY, our primary re- best. At worst, if public health groups that use 17 search findings are not unique but, these risk models to guide strategic decisions rather, a glimpse into the general around global public health initiatives miss the state of affairs: Big data is often inaccurate,22 mark, it can contribute to deaths. These deaths and companies relying on inaccurate big data could be due to misidentification of vulnerable can suffer significant consequences. Since we or at-risk populations, which could be avoided reviewed only the fields available to us, it’s if the right treatments were made available to important to note that inaccuracies almost them. 18 certainly extend beyond the fields and attri- butes highlighted in this article, especially the Miss 4: Predicting inaccurate outcomes less common or more esoteric fields, such as While most us have learned to cut weather whether an individual is a veteran. forecasters some slack, we are fixated on the So how does this information wind up so far many “scientific” and “statistically significant” off the mark? There are many possible causes, crystal balls: models used to predict the out- such as human error, collection or modeling comes of our elections,19 football games, and errors, and even malicious behavior. To make horse races. Yet models meant to determine matters worse, a data set is often victim to precautions to be taken have often been off the more than one type of error. Some examples of mark. For example, in 2013, a search engine- how errors can arise: www.deloittereview.com Predictably inaccurate 19 Big data is often inaccurate, and companies relying on inaccurate big data can suffer significant consequences. Outdated or incomplete information may – Collecting data in suboptimal settings persist due to the cost and/or effort of ob- that can also lead to demand effects taining up-to-date information (for example, exit polls, public surveys, or any mechanism or environment in An organization that uses multiple data which respondents do not feel their re- sources may incorrectly interweave data sponses will be truly anonymous) sets and/or be unaware of causal relation- ships between data points and lack proper – Relying on self-reported data versus ob- data governance mechanisms to identify served (actual) behaviors24 these inconsistencies Data analysis errors may lead to inaccura- An organization may fall prey to data cies due to: collection errors: – Incorrect inferences about consumers’ – Using biased sample populations (sub- interests (for example, inferring that ject to sampling biases based on con- the purchase of a hang-gliding maga- venience, self-selection, and/or opt-out zine suggests a risky lifestyle when the options, for instance)23 purchaser’s true motive is an interest in photography)25 – Asking leading or evaluative questions that increase the likelihood of demand – Incorrect models (for instance, incorrect effects (for example, respondents assumptions, proxies, or presuming a providing what they believe to be the causal relationship where none exists) “desired” or socially acceptable answer versus their true opinion, feeling, belief, Malicious parties may corrupt data (for ex- or behavior) ample, cybercrime activity that alters data and documents)26 www.deloittereview.com 20 Predictably inaccurate Understanding the causes of these errors is a Ask and expect more from big data bro- first step to avoiding and rectifying them. The kers. Perhaps our expectations for big data are next section explores the next steps companies too high—but it’s possible that we are asking can take along the path to utilizing big data in too little of data brokers, especially given the the right way. study results we describe here. The role of data brokers has evolved over time. Traditionally, A BIG DATA PLAYBOOK: firms looked to data brokers to provide mail- PRESCRIPTIONS FOR SUCCESS T ing lists and labels for prospective customers HERE is growing recognition that much and, perhaps, to manage mailing lists and track big data is built on inaccurate infor- current customers’ purchasing behavior. How- mation, driving incorrect, suboptimal, ever, the information that brokers provide now or disadvantageous actions. Some initial plays a much more integral role in our strate- efforts are under way to put in place regula- gies, digital interactions, and analytic models. tions around big data governance and man- Consequently, we should be asking for more agement.27 Regulatory agencies, such as the accountability, transparency, and continuous Federal Trade Commission and the National dialogue with these organizations. (See the Association of Insurance Commissioners, are sidebar, “What to ask your data brokers.”) beginning to consider more oversight on data brokers as well as how models utilizing their Know the data sources. While you certainly data are used. However, savvy firms already want to understand where your own data come engaged in big data should not wait for agen- from, knowing the source and lineage is par- cies to act, especially given the uncertainty ticularly important for information you source around how effective or restrictive any even- through data brokers. However, our research tual regulations will be. Based on our market suggests data brokers fall on a spectrum when experience and observations, here are some it comes to revealing their sources. Not all bro- guidelines, advice, and remedies to consider kers organically generate the data they sell; to help you avoid shooting yourself in the foot rather, many license information to each other, when utilizing big data. as different brokers cater to various data use cases and business niches. Increase the likelihood that more of your big data will be accurate Put steps in place to verify that the brokers from which you source have adequate control “If they were more clever, they could cross- reference the home data with household over their data’s accuracy, including control income data to find major discrepancies.” over and transparency regarding their data —Survey respondent sources. Understand the surveillance proce- www.deloittereview.com Predictably inaccurate 21 WHAT TO ASK YOUR DATA BROKERS Demand transparency regarding: Data source(s): the lineage of the data fields and values, timing of maintenance, update processes Data collection, validation, and correction methods Any relationships and interdependencies—for instance, interrelatedness between data sources and model inputs Model inputs and assumptions Ensure ongoing communications with data sources in order to be kept abreast of any: Inaccuracies found in existing data sets Changes to models and/or assumptions and the rationale for such changes, as well as transparency to model logic and metadata Changes to categories and the rationale for such changes Verify the appropriateness of the manner in which you are using their data: Explain to the broker how you are using data, and verify that their information is appropriate and sufficiently accurate for your context Consider specifying accuracy and performance standards in your data broker contracts. dures they have in place with these sources to Explore the data yourself. Before you use track changes, measure accuracy, and ensure any big data (especially externally sourced) to consistency. Develop and maintain processes guide your decisions and marketing strategies, to be notified of inaccuracies in the data, and do an exploratory data analysis yourself. If pos- understand how often information is validated sible, test a sample for inaccuracies or incon- or updated. Consider the significance of a five- sistencies against data fields you already have year age difference: 20-year-olds are buying or can validate. On your own, consider digging different products than those aged 25, just as into the data and doing validity checks, explor- those who are 25 are at a different stage in life atory analysis, and data mining against indi- than 30-year-olds. vidual and industry information. Does what www.deloittereview.com 22 Predictably inaccurate you are seeing make sense? For example, one any micro-messaging, consider limiting its ge- of the authors of this very article was labeled ographies and scope to avoid some of the perils as having an old-fashioned dial-up Internet we discussed earlier. Additionally, soliciting connection rather than the actual broadband customer feedback on the data not only im- connection. proves the prospect of more accurate data—it increases transparency within the relationship. Alternatively, hire an expert to look at this data. However, as our findings suggest, you can’t Also, realize that internally gathered informa- count on your customers to fill in the gaps ad- tion often relies on a combination of sources— equately and accurately. which could be external or outdated—and is also prone to human error, so the same veri- Complement big data with other deci- fication tests should be performed here as well. sion-making tools. While big data is and A proper data governance framework can go a will remain a powerful tool for firms and long way in helping to ensure your information marketers when used appropriately, we’ve al- is accurate, timely, and valuable. ready explored the dangers of overreliance on it—which could also result in marketers losing Consider big data to be one more tool in faith in their own experience and intuition to the toolkit, not a replacement toolkit help guide decisions.29 Therefore, executives Keep expectations for big data in check. should complement the decisions derived from It is often the case that big data might be di- big data with their own insights based on expe- rectionally correct but still inaccurate at an rience and other research methods and sources individual level. The good news for firms and (such as small-sample qualitative research). marketers is that big data analytics built on Regardless of the data quality, a good rule of such “semi-accurate” information can provide thumb is to not over-rely on the data and out- predictive power overall. However, it is a mis- source too many decisions.30 take to expect individual micro-predictions to carry the same level of accuracy.28 Continually connect with customers Use and draw conclusions from big data Be nimble and responsive. Continually judiciously. Big data is a great tool for mar- assess data sources and appropriateness of keters, but it should be thought of as a tool in methodologies, models, and assumptions; fre- the decision-making and marketing toolkit, quently revisit and assess questions and cate- not a replacement for the already existing tool- gory fit with changing target demographics and kit. Consequently, don’t rely too heavily on a categories. Also, measure how successful target limited number of data points, especially if ac- marketing efforts have been since incorporat- curacy is a potential peril. If you decide to do ing insights from big data. Beyond quantitative www.deloittereview.com Predictably inaccurate 23 or objective measures, create feedback op- an action. Additionally, in an effort to thank portunities within your micro-targeting. After customers for not only their patronage but for collecting feedback, spend time reviewing, updating personal information, firms can of- incorporating, and adjusting your strategies fer incentives for their corrective efforts. The based on this feedback. When appropriate, benefits could be many: accurate customer respond directly to those providing feedback— data; an active, direct line of communication; recent research suggests this may not only and, ultimately, a deeper connection with increase the likelihood of additional feedback, customers. but also make the customer feel more valued Regardless of our current infatuation with and encourage an ongoing dialogue.31 big data, we must remember that data should Reward customers for correcting their never take center stage at the expense of the data. While our study suggests that con- customer. Firms that understand big data’s sumers are unlikely to correct information limitations (and advantages) can add it to their provided by a big data source, it’s worth explor- marketing and analytical arsenal, aiming to ing their willingness to take corrective action foster and preserve customer relationships and for their own data if the request comes from a the trust that they work so hard to develop and firm with which they have a relationship—and maintain. for which they see more direct value from such John Lucker is an advisory principal with Deloitte & Touche LLP, Global Advanced Analytics & Modeling market leader, and a US leader of Deloitte Analytics. Susan K. Hogan is a market insights manager with Deloitte Services LP. Trevor Bischoff is a senior consultant within the financial services sector at Deloitte & Touche LLP. The authors would like to thank Ashley Daily, Adam Hirsch, and Michael Greene, who served as inspiration and fueled our enthusiasm for this current research. We also thank Negina Rood, Junko Kaji, Aditi Rao, and Kevin Weier for their contributions. www.deloittereview.com 24 Predictably inaccurate Endnotes 13. Sharon S. Brehm and Jack Williams Brehm, Psychological Reactance: A Theory of Freedom and 1. NPR, “‘Signal’ and ‘noise’: prediction as art and Control (New York: Academic Press, 1981). science,” October 10, 2012, https://n.pr/UPXRS4. 14. Leslie Scism, “Life insurers draw on data, not 2. John Lucker, Ashley Daily, Adam Hirsch, and blood,” January 12, 2017, Wall Street Journal, Michael Greene, “Predictably inaccurate: Big www.wsj.com/articles/the-latest-gamble-in- data brokers,” LinkedIn Pulse, November 18, life-insurance-sell-it-online-1484217026. 2014, www.linkedin.com/pulse/20141118145642- 24928192-predictably-inaccurate-big-data-brokers. 15. Ankur Aggarwal et al., “Model risk—daring to open up the black box,” British Actuarial Journal 3. All percentages relating to respondents 21(2), December 22, 2015, http://journals. were calculated on a base of the number of cambridge.org/abstract_S1357321715000276. respondents for whom third-party data was actually available in the categories of interest; 16. Scism, “Life insurers draw on data, not blood.” the calculations excluded respondents for whom the third-party data was unavailable. 17. Rachel S. Karas, “Stakeholders urge CMS to factor Rx drugs in risk assessment pay, question other CMS 4. Lucker et al., “Predictably inaccurate.” ideas,” InsideHealthPolicy’s Daily Brief, April 28, 2016. 5. Thomas Schutz, “Want better analysis? 18. Jane Freemantle et al., “Indigenous mortality Consider the data,” Credit Union Journal, (revealed): The invisible illuminated,” American May 16, 2014, www.cujournal.com/opinion/ Journal of Public Health 105(4), April 2015, www. want-better-analysis-consider-the-data. ncbi.nlm.nih.gov/pmc/articles/PMC4358192/. 6. Morgan Hochheiser, “The truth behind data collec- 19. Jim Rutenberg, “A ‘Dewey defeats Truman’ tion and analysis,” John Marshall Journal of Informa- lesson for the digital age,” New York Times, tion Technology and Privacy Law 33, no. 1 (2015): pp. November 9, 2016, https://nyti.ms/2jL43lb. 32–55, http://repository.jmls.edu/jitpl/vol32/iss1/3. 20. Vasileios Lampos, Andrew C. Miller, Steve Crossan, 7. Victoria Petrock, “Are consumers warming to per- and Christian Stefansen, “Advances in nowcasting sonalized marketing services?,” eMarketer brief, July n en a e ness a es s n sea e 26, 2016, www.emarketer.com/Brief/Consumers- logs,” Scientific Reports 5 (2015), https://www.ncbi. Warming-Personalized-Marketing-Services/5500941. nlm.nih.gov/pmc/articles/PMC4522652/. 8. Susan K. Hogan, Rod Sides, and Stacy Kemp, “Today’s 21. Larry Greenemeier, “Why big data isn’t necessarily relationship dance: What can digital dating teach us better data,” Scientific American, March 13, 2014. about long-term customer loyalty?,” Deloitte Review 20, January 23, 2017, https://dupress.deloitte.com/ 22. Daniel A. McFarland and H. Richard McFarland, dup-us-en/deloitte-review/issue-20/behavioral- “Big data and the danger of being precisely insights-building-long-term-customer-loyalty.html. inaccurate,” Big Data & Society, July–December 2015, pp. 1–4; http://journals.sagepub.com/ 9. Irwin Altman and Dalmas A. Taylor, Social Penetra- doi/full/10.1177/2053951715602495. tion: The Development of Interpersonal Relationships (New York: Holt, Rinehart and Winston, 1973). 23. StopDataMining.me, “Opt out list,” www.stopdata mining.me/opt-out-list/, accessed May 2, 2017. 10. Ithaca College, “Online creep: Targeted ads a a e s ee e a e e s n en 24. McFarland and McFarland, “Big data and the Science Daily, April 8, 2015, www.sciencedaily. danger of being precisely inaccurate.” com/releases/2015/04/150408171201.htm. 25. Scism, “Life insurers draw on data, not blood.” 11. Joshua Lederman, Twitter post, September 26. Mark Ward, “How fake data could lead to failed 29, 2016, 1:48 p.m., https://twitter.com/ crops and other woes,” BBC, March 21, 2017, joshledermanap/status/781596504351907840. www.bbc.com/news/business-38254362. 12. Natasha Singer, “Oops! Health insurer exposes member data,” New York Times, No- vember 10, 2014, https://nyti.ms/2qycdOU. www.deloittereview.com Predictably inaccurate 25 27. Carten Cordell, “Transparency advocates pitch 30. e a e nfiden e n d en the next big thing in big data,” Federal Times, Trends in Cognitive Sciences 1(2), May 1997, pp. March 23, 2017, www.federaltimes.com/articles/ 78–82, www.cell.com/trends/cognitive-sciences/ with-data-act-reports-looming-transparency- fulltext/S1364-6613(97)01014-0. advocates-pitch-the-next-big-thing. 31. Susan K. Hogan and Timothy Murphy, Loving the one 28. Lucker et al., “Predictably inaccurate.” you’re with, Deloitte University Press, June 17, 2016, https://dupress.deloitte.com/dup-us-en/focus/ 29. a e e nfiden e n d e behavioral-economics/how-behavioral-factors- Pohl, editor, Cognitive Illusions: A Handbook on n en e s e ewa ds n en es Fallacies and Biases in Thinking, Judgement and Memory (New York: Psychology Press, 2004); Tore Håkonsson and Tim Carroll, “Is there a dark side of big data—point, counterpoint,” Journal of Organization Design 5(5), July 12, 2016, http://link. springer.com/article/10.1186/s41469-016-0007-5. www.deloittereview.com Deloitte refers to one or more of Deloitte Touche Tohmatsu Limited, a UK private company limited by guarantee, and its network of member firms, each of which is a legally separate and independent entity. Please see http://www/deloitte.com/about for a detailed description of the legal structure of Deloitte Touche Tohmatsu Limited and its member firms. Please see http://www.deloitte.com/us/about for a detailed description of the legal structure of the US member firms of Deloitte Touche Tohmatsu Limited and their respective subsidiaries. Certain services may not be available to attest clients under the rules and regulations of public accounting. Copyright © 2017 Deloitte Development LLC. All rights reserved. Follow @DU_Press #DeloitteReview Subscribe to receive email updates at dupress.deloitte.com