Data Quality Improvement: Start at the Source PDF
Document Details
2020
Thomas C. Redman
Tags
Summary
This article discusses how to improve data quality by focusing on the root causes of errors instead of just cleaning up the errors themselves. It emphasizes the importance of proactively creating data correctly from the source, rather than fixing errors after they occur. The article provides a practical example of how a health clinic improved data quality by identifying the root cause.
Full Transcript
To Improve Data Quality, Start at the Source by Thomas C. Redman February 10, 2020 Phil Ashley/Getty Images Summary. You can’t do anything important in your company without high-quality data. But most organizations focus their data-quality efforts on cleaning up errors, rather than finding and...
To Improve Data Quality, Start at the Source by Thomas C. Redman February 10, 2020 Phil Ashley/Getty Images Summary. You can’t do anything important in your company without high-quality data. But most organizations focus their data-quality efforts on cleaning up errors, rather than finding and fixing the root cause of the errors in the first place. To become a more data-driven...more You can’t do anything important in your company without high-quality data, and most people suspect, deep down, that their data is not up-to-snuff. They do their best to clean up their data, install software to find errors automatically, and seek confirmation from external sources — efforts I call “the hidden data factory.” It is time-consuming, expensive work, and most of the time, it doesn’t go well. Even worse, cleanup never goes away! Imagine that you had cleaned all your existing data perfectly, but not addressed the problem of poor quality at the source. As you acquire new data, you will also acquire new errors that impact your work. You and your team will once again waste time dealing with errors. Cleanup as the primary means of data quality is long past its sell-by date. Rather than fixing data quality by finding and correcting errors, managers and teams must adopt a new mentality — one that focuses on creating data correctly the first time to ensure quality throughout the process. This new approach — and the changes needed to make it happen — must be step one for any leader that is serious about cultivating a data-driven mindset across the company, implementing data science, monetizing its data, or even simply striving to become more efficient. It requires seeing yourself and the role you play in data in a new way, all the while identifying and ruthlessly attacking the root causes of errors, making them disappear once and for all. Eliminating most root causes is surprisingly easy. For instance, at one health clinic, staff often had difficulties contacting patients post-visit when they needed to schedule more tests, change medications, and so forth. No one knew how frequently this occurred or exactly how much time was wasted, but it could impact patients’ health and it was frustrating for the staff. So employees in the clinic looked at the data associated with the last 100 patient visits and found that the phone number was wrong for 46 of them. They reviewed their procedures and found that no one was responsible for obtaining that data. They made a simple change: When patients checked in, the front-desk person asked them to verify their phone numbers. It was the first thing they requested upon arrival: “It is nice to see you again, Ms. Jones. Can I confirm your cell phone number?” This clinic re-measured a couple of weeks later — errors in cell phone numbers were virtually eliminated. The process the health clinic used appears universal: sort out the data you need; measure the quality of needed data; identify areas where quality could be improved and identify root cause(s); and eliminate those causes. It is remarkably flexible, easy-to- teach, and simple-to-use. Digging deeper into the example, you’ll see that the health clinic also features two important roles in data quality: the data customer and the data creator. The customer is the person using the data. The creator, on the other hand, is the person who creates, or first inputs, the needed data (note that machines, devices, and algorithms also use and create data. So the customer or creator may also be the person responsible for such machines, devices, and algorithms). It is essential that people recognize themselves as customers, clarify their needs, and communicate those needs to creators. People must also recognize themselves as creators, and make improvements to their processes, so they provide data in accordance with their customers’ needs. In the health clinic, post-visit staff did not recognize themselves as customers and desk personnel did not recognize themselves as creators. Once they did, completing the improvement project was straightforward. I find that quality improves quickly when teams and companies adopt this approach, take on these roles, and follow the steps. People in companies large and small, in industries as diverse as financial services, oil and gas, retail, and telecom, have used them to make order-of-magnitude improvements in billing, customer, people, production, and other types of data and, as a direct result, improved their team’s performance. In some cases, the savings come to hundreds of millions per year. So why aren’t they the norm? It turns out that a variety of organizational and cultural issues get in the way. Like those in the clinic, many are only vaguely aware they have a problem; they think that data is the province of IT or are afraid to make the needed connections across organizational silos. Indeed, people have gotten into bad habits when it comes to data quality and bad habits are hard to break. To see how these bad habits take root and grow, consider Laura, a saleswoman who receives contact data from the marketing department. She is well aware the data isn’t very good — she spends a couple of hours a day making corrections. Laura’s performance is based on the number of sales calls she makes successfully — and her quota is high! On any given day, it is easier to deal with the errors than to take the time to reach out to marketing, even though a small investment in time will free her up down the road. It is easy to see Laura’s actions as justified. After all she needs to meet her quota, even in the face of bad data. But in taking it upon herself to fix the data and not communicate her needs to the marketing department, she is assuming responsibility for the quality of data created by others. And every day she further embeds a bad habit into her routine. What’s more, if anyone else were to use the same marketing data, they wouldn’t have access to Laura’s corrections, and the cycle of errors and corrections would continue elsewhere. There is no shortage of Lauras in every job, department, and at every level. Without giving it much thought, too many people take the wrong approach and bake bad data quality practices in their work! While these issues are both subtle and powerful, any manager can take them on and adopt the mindset that “data quality means creating data correctly the first time” within their sphere of influence. Start by asking yourself if you’ve grown too tolerant of bad data and taken on the extra work it engenders. Then step into the customer role next time you experience any sort of problem. Don’t just complain, “this isn’t what I want!” Rather, think deeply about what you really need and open a dialog with data creators. Work together to make one improvement, then another, and another. After a short time, this will become second nature. At the company level, senior leaders must insist that everyone take on these roles. Toward that end, I recommend that a small but mighty team of data quality professionals form up and administer an overall program, train people on how to do the work, help customers and creators connect, and assist when difficulties arise. Shifting paradigms is difficult. Fortunately, creating data correctly the first time pays great dividends. It saves time and money — and possibly, as in the case of the health clinic, sometimes lives! It builds confidence in the data and leads to better decisions. All of us are data customers and data creators. Taking on these roles helps people build the right mindset around data quality and stop data problems before they begin. Source: https://hbr.org/2020/02/to-improve-data-quality-start-at-the-source