Getting in front on data quality presents a terrific opportunity to improve business performance. Better data means fewer mistakes, lower costs, better decisions, and better products. Further, I predict that many companies that don’t give data quality its due will struggle to survive in the business environment of the future.
Bad data is the norm. Every day, businesses send packages to customers, managers decide which candidate to hire, and executives make long-term plans based on data provided by others. When that data is incomplete, poorly defined, or wrong, there are immediate consequences: angry customers, wasted time, and added difficulties in the execution of strategy. You know the sound bites — “decisions are no better than the data on which they’re based” and “garbage in, garbage out.” But do you know the price tag to your organization?
Based on recent research by Experian plc, as well as by consultants James Price of Experience Matters and Martin Spratt of Clear Strategic IT Partners Pty. Ltd., we estimate the cost of bad data to be 15% to 25% of revenue for most companies (more on this research later). These costs come as people accommodate bad data by correcting errors, seeking confirmation in other sources, and dealing with the inevitable mistakes that follow.
Fewer errors mean lower costs, and the key to fewer errors lies in finding and eliminating their root causes. Fortunately, this is not too difficult in most cases. All told, we estimate that two-thirds of these costs can be identified and eliminated — permanently.
In the past, I could understand a company’s lack of attention to data quality because the business case seemed complex, disjointed, and incomplete. But recent work fills important gaps.
The case builds on four interrelated components: the current state of data quality, the immediate consequences of bad data, the associated costs, and the benefits of getting in front on data quality. Let’s consider each in turn.
Four Reasons to Pay Attention to Data Quality Now
The Current Level of Data Quality Is Extremely Low
A new study that I recently completed with Tadhg Nagle and Dave Sammon (both of Cork University Business School) looked at data quality levels in actual practice and shows just how terrible the situation is.
We had 75 executives identify the last 100 units of work their departments had done — essentially 100 data records — and then review that work’s quality. Only 3% of the collections fell within the “acceptable” range of error. Nearly 50% of newly created data records had critical errors.
Said differently, the vast majority of data is simply unacceptable, and much of it is atrocious. Unless you have hard evidence to the contrary, you must assume that your data is in similar shape.
Bad Data Has Immediate Consequences
Virtually everyone, at every level, agrees that high-quality data is critical to their work. Many people go to great lengths to check data, seeking confirmation from secondary sources and making corrections. These efforts constitute what I call “hidden data factories” and reflect a reactive approach to data quality. Accommodating bad data this way wastes time, is expensive, and doesn’t work well. Even worse, the underlying problems that created the bad data never go away.
One consequence is that knowledge workers waste up to 50% of their time dealing with mundane data quality issues. For data scientists, this number may go as high as 80%.
A second consequence is mistakes, errors in operations, bad decisions, bad analytics, and bad algorithms. Indeed, “big garbage in, big garbage out” is the new “garbage in, garbage out.”
Finally, bad data erodes trust. In fact, only 16% of managers fully trust the data they use to make important decisions.
Frankly, given the quality levels noted above, it is a wonder that anyone trusts any data.
When Totaled, the Business Costs Are Enormous
Obviously, the errors, wasted time, and lack of trust that are bred by bad data come at high costs.
Companies throw away 20% of their revenue dealing with data quality issues. This figure synthesizes estimates provided by Experian (worldwide, bad data cost companies 23% of revenue), Price of Experience Matters ($20,000/employee cost to bad data), and Spratt of Clear Strategic IT Partners (16% to 32% wasted effort dealing with data). The total cost to the U.S. economy: an estimated $3.1 trillion per year, according to IBM.
The costs to businesses of angry customers and bad decisions resulting from bad data are immeasurable — but enormous.
Finally, it is much more difficult to become data-driven when a company can’t depend on its data. In the data space, everything begins and ends with quality. You can’t expect to make much of a business selling or licensing bad data. You should not trust analytics if you don’t trust the data. And you can’t expect people to use data they don’t trust when making decisions.
Two-Thirds of These Costs Can Be Eliminated by Getting in Front on Data Quality
“Getting in front on data quality” stands in contrast to the reactive approach most companies take today. It involves attacking data quality proactively by searching out and eliminating the root causes of errors. To be clear, this is about management, not technology — data quality is a business problem, not an IT problem.
Companies that have invested in fixing the sources of poor data — including AT&T, Royal Dutch Shell, Chevron, and Morningstar — have found great success. They lead us to conclude that the root causes of 80% or more of errors can be eliminated; that up to two-thirds of the measurable costs can be permanently eliminated; and that trust improves as the data does.
Which Companies Should Be Addressing Data Quality?
While attacking data quality is important for all, it carries a special urgency for four kinds of companies and government agencies:
- Those that must keep an eye on costs. Examples include retailers, especially those competing with Amazon.com Inc.; oil and gas companies, which have seen prices cut in half in the past four years; government agencies, tasked with doing more with less; and companies in health care, which simply must do a better job containing costs. Paring costs by purging the waste and hidden data factories created by bad data makes far more sense than indiscriminant layoffs — and strengthens a company in the process.
- Those seeking to put their data to work. Companies include those that sell or license data, those seeking to monetize data, those deploying analytics more broadly, those experimenting with artificial intelligence, and those that want to digitize operations. Organizations can, of course, pursue such objectives using data loaded with errors, and many companies do. But the chances of success increase as the data improves.
- Those unsure where primary responsibility for data should reside. Most businesspeople readily admit that data quality is a problem, but claim it is the province of IT. IT people also readily admit that data quality is an issue, but they claim it is the province of the business — and a sort of uneasy stasis results. It is time to put an end to this folly. Senior management must assign primary responsibility for data to the business.
- Those who are simply sick and tired of making decisions using data they don’t trust. Better data means better decisions with less stress. Better data also frees up time to focus on the really important and complex decisions.
In my experience, many executives find reasons to discount or even dismiss the bad news about bad data. Common refrains include, “The numbers seems too big, they can’t be right,” and “I’ve been in this business 20 years, and trust me, our data is as good as it can be,” and “It’s my job to make the best possible call even in the face of bad data.”
But I encourage each executive to think deeply about the implications of these statistics for his or her own company, department, or agency, and then develop a business case for tackling the problem. Senior executives must explore the implications of data quality given their own unique markets, capabilities, and challenges.
The first step is to connect the organization or department’s most important business objectives to data. Which decisions and activities and goals depend on what kinds of data?
The second step is to establish a data quality baseline. I find that many executives make this step overly complex. A simple process is to select one of the activities identified in the first step — such as setting up a customer account or delivering a product — and then do a quick quality review of the last 100 times the organization did that activity. I call this the Friday Afternoon Measurement because it can be done with a small team in an hour or two.
The third step is to estimate the consequences and their costs for bad data. Again, keep the focus narrow — managers who need to keep an eye on costs should concentrate on hidden data factories; those focusing on AI can concentrate on wasted time and the increased risk of failure; and so forth.
Finally, for the fourth step, estimate the benefits — cost savings, lower risk, better decisions — that your organization will reap if you can eliminate 80% of the most common errors. These form your targets going forward.
Chances are that after your organization sees the improvements generated by only the first few projects, it will find far more opportunity in data quality than it had thought possible. And if you move quickly, while bad data is still the norm, you may also find an unexpected opportunity to put some distance between yourself and your competitors.
About the author
Thomas C. Redman is President of Data Quality Solutions. He helps companies and people, including start-ups, multinationals, executives, and leaders at all levels chart their courses to data-driven futures. He places special emphasis on quality, analytics, and organizational capabilities.
© Massachusetts Institute of Technology
This article was originally placed on MITSloan Management Review. You can read the original article here