Data quality was never a high listed issue for many organizations. But why not? Data quality has become increasingly important and is expected to become an essential part for the strategy of organizations. According to IDC’s latest report “Data Age 2025”, by 2025 the global data sphere is expected to be 10 times the 2016 data sphere volume. That is a lot of data.
Unfortunately, this large increase in amount of data also has an influence on the amount of bad data; it grows along with the amount of data meaning the amount of bad data will rapidly increase as well. But what is bad data (or poor data quality), how do you recognize it and what are the consequences of bad data for an organization.
Bad data can have its origin in all kinds of places and departments, both inside and outside your organization. In short, bad data means that “data cannot help your organization to achieve its goals”. The reasons that goals cannot be achieved because of data can vary and depend on the goals and needed information of an organization. For example, if your organizations goal is to increase the amount of customers for magazines, it needs contact details and postal addresses. That type of data needs to be of good quality. That being said, there are some common types of bad data that can easily be recognized and found in many departments and organizations:
- Inaccurate data: data that is wrong or incomplete or has a typo or misspelling
- Duplicate data: data and information that is found multiple types in a database of the same organization
- Outdated data: data that has not been updated for several years might contain outdated information and is often unused and inactive
Customer satisfaction, regulations and trends
Important to take into consideration is that it is not just data and information in the way it is currently used but think about the predicted new developments. Developments such as Virtual Reality and Artificial Intelligence are built on data. Their results and predictions are based on the information that was entered. Meaning that "if you put bad stuff in, bad stuff comes out”. To get the most out of these new technologies, it is important that the data is of good quality.
As for the customer satisfaction, this is a lot lower when the data quality is poor. Think about a deny or delay of a transaction, delivering the wrong product or a product to the wrong address. Customers these days are used to fast and correct delivery and customer satisfaction decreases rapidly if this gets delayed or wrong information is given.
Last, bad data has a large impact on rules and regulations. Governmental and industry regulations such as GDPR, HIPAA and PCI-DSS require specific types of management, storage and protection for data. If your struggling with bad data, it might be very difficult, nearly impossible to live up to these requirements.
To make it easier to understand the importance of data and data quality, the 1-10-100 rule can be used. This rule was first proposed by George Labovitz and Yu Sang Chang in the early ‘90s. The 1-10-100 rule has three phases which explain the costs of maintaining data quality. The first phase is called the ‘prevention’ phase ($1). This phase is the simples and least expensive way for accurate and valid data collection. Preventing bad data from entering your database is in itself an easy step. There are many tools that help verifying email, postal addresses and phone numbers or prevent the creation of incomplete or duplicate data. With these tools installed, it is safe to say that new data (or information) that enters your database is not bad data.
The second phase is the ‘correction’ phase in which the $1 has risen to $10. This $10, represents the increased cost that an incorrect address will have on the business. Without a fixed and well executed data plan (read more about creating a data plan here), and without prevention of bad data entering the database, bad data will increase the costs up to ten times. This does not only include the information itself, but also the time and effort to track down bad data and to validate it.
In the last phase, ‘failure’, the costs have increased to $100. Another tenfold increase compared to phase 2. This $100 represents the amount companies will pay for doing nothing about poor data. Important to remember is that bad data that has already entered the database and is not validated or cleaned leads to more bad information. Think about reports and business decisions that are based on that data. Customers can also feel the effects of poor data when the inaccurate or incomplete information leads to failed deliveries or a bad experience when talking to customers support.
Data Quality and Data Happiness
The before mentioned examples clearly show why data quality is so much more important than organizations think. The impact of bad data is much higher than expected and can lead to serious consequences. Luckily, bad data can be made better and the positive results are felt immediately. Good data quality improves the quality and accuracy of reporting, improves efficiency and reduces sales and marketing costs that are now spend on inaccurate data and wrong information. It will also improve customer satisfaction and helps you adapt to new rules and regulations a lot quicker because your data is accurate, complete and unique.
At Plauti, we believe data good data quality can help your organization to achieve its goal. We also believe that everyone in an organization should be responsible for good data quality, something we call Data Happiness. This does not mean that everyone has access to edit or remove records or can perform entire merges (unless you want them to), it simply means that everyone knows how important data is, what rules are in place and how they have to work with data. In short, everyone who works with data feels responsible for data.
To know explain how Data Happiness can be achieved, we have written an Ebook that explains all the steps you need to take and how to implement Data Happiness in your organization. The six steps you need to take are the following;
Step 1: Data Purpose
Determine what you are going to use your data for. Who needs the data, what departments use it an what for? In addition, determine what the exact problem is with the data right now. Formulate a sentence to formulate the issue with data and then formulate a sentence that tells you what you need the data to do in order to solve the issue. (i.e. “I need my data for the ability to send emails to the right people”)
Step 2: Data analysis
Analyze your data and find out if the data currently in your database can help you to achieve your goals.
An important part of the data analysis is the data flow. Where does your data come from, what route does it take and where does it end? The data flow provides insights in where the data issue is located.
Step 3: Data plan/data rules
Create a data plan with a set of rules that need to be followed in order to get an efficient database that can help you achieve your goals. Once the rules are set, inform the organization of the changes so everyone can easily follow the rules and understand them.
Step 4: Prevention and validation
Implement the rules in your organization and inform, if necessary, the organization again. If all departments work together and follow the rules in the data plan, the easier it is to keep a database clean and useful.
Step 5: Clean
With the rules in place, start cleaning the current data. Remove duplicates, validate the existing information and check if data is still accurate and up-to-date.
Step 6: Monitor
Once all the steps are completed, make sure you test if the new rules have the desired effect and if they have improved the data quality.
Don’t forget to repeat these steps multiple times a year. Just once or twice a year is often not sufficient enough.
Duplicate Check and Record Validation
To improve data quality, it is important to find and merge duplicate records and to prevent new duplicates from entering. It is also wise to format records and to validate the information that you have. Doing all of that by yourself takes a very long time and can be nearly impossible, but there is a simpler way.
Duplicate Check is an application for Salesforce that finds, prevents and merges duplicate records. With Duplicate Check installed, it is a lot easier to prevent duplicates from coming in. There are some standard rules that can be set to prevent duplicates from being created, even when records come in from automated systems or third parties. The application can also find duplicate records that currently already exist. You can create your own search scenario and based on that scenario, duplicate records will be presented to you. You can either choose to automatically merge the duplicates or to merge manually. Our pricing page shows all features and solutions that help improve the quality of your data.
Record Validation for Salesforce allows you to validate and format postal addresses, phone numbers and email addresses. Again, it is possible to create your own settings for validating new and existing records. Because the application can validate all existing records in batch, it is fast and easy to see what information needed to be validated and validation can be done manually or choose to automate the validation. Use the overview on the pricing page to learn more about the possibilities and available features.