We’ve talked a lot about data quality in the past – including the cost of bad data. But despite a basic understanding of data quality, many people still don’t quite grasp what exactly is meant by “quality”.
For example, is there a way to measure that quality, and if so, how do you do it? In this article, we’ll be looking to answer those questions and much more. But first…
Dispelling Data Quality Myths
The foundation for ensuring data quality starts when basic requirements are created
One of the biggest myths about data quality is that it has to be completely error-free. With websites and other campaigns collecting so much data, getting zero errors is next to impossible. Instead, the data only needs to conform to the standards that have been set for it. In order to determine what “quality” is, we first need to know three things:
- Who creates the requirements
- How are the requirements created, and
- What degree of latitude do we have in terms of meeting those requirements
Many businesses have a singular “data steward” who understands and sets these requirements, as well as being the person who determines the tolerance levels for errors. If there is no data steward, IT often plays the role in making sure those in charge of the data understand any shortcomings that may affect it.
You Can Have It Good, Fast or Cheap – Pick Two
Everything from collecting the data to making it fit the company’s needs open it up to potential errors. Having data that’s 100% complete and 100% accurate is not only prohibitively expensive, but time consuming and barely nudging the ROI needle.
With so much data coming in, decisions have to be made and quickly. That’s why data quality is very much a delicate balancing act – juggling and judging accuracy and completeness. If it sounds like a tall order to fill, you’ll be glad to know that there is a method to the madness, and the first step is data profiling.
What is Data Profiling?
Data profiling involves looking at all the information in your database to determine if it is accurate and/or complete, and what to do with entries that are not. It’s fairly straightforward to, for instance, import a database of products that your company manufactures and make sure all the information is exact, but it’s a different story when you’re importing details about competitor’s products or other related details.
With data profiling, you’re also looking at how accurate the data is. If you’ve launched on 7/1/16, does the system record that as 1916 or 2016? It’s possible that you may even uncover duplicates and other issues in combing through the information you’ve obtained. Profiling the data in this way gives us a starting point – a springboard to jump from in making sure the information we’re using is of the best possible quality.
Determining Data Quality
So now that we have a starting point from which to determine if our information is complete and accurate, the next question becomes – what do we do when we find errors or issues? Typically, you can do one of four things:
- Accept the Error – If it falls within an acceptable standard (i.e. Main Street instead of Main St) you can decide to accept it and move on to the next entry.
- Reject the Error – Sometimes, particularly with data imports, the information is so severely damaged or incorrect that it would be better to simply delete the entry altogether than try to correct it.
- Correct the Error – Misspellings of customer names are a common error that can easily be corrected. If there are variations on a name, you can set one as the “Master” and keep the data consolidated and correct across all the databases.
- Create a Default Value – If you don’t know the value, it can be better to have something there (unknown or n/a) than nothing at all.
Integrating the Data
When you have the same data across different databases, the opportunity is ripe for errors and duplicates. The first step toward successful integration is seeing where the data is and then combining that data in a way that’s consistent. Here it can be extremely worthwhile to invest in proven data quality and accuracy tools to help coordinate and sync information across databases.
Your Data Quality Checklist
Finally, because you’re dealing with so much data across so many different areas, it’s helpful to have a checklist to determine that you’re working with the highest quality of data possible. DAMA UK has created an excellent guide on “data dimensions” that can be used to better get the full picture on how data quality is decided.
Their data quality dimensions include:
Completeness – a percentage of data that includes one or more values. It’s important that critical data (such as customer names, phone numbers, email addresses, etc.) be completed first since completeness doesn’t impact non-critical data that much.
Uniqueness – When measured against other data sets, there is only one entry of its kind.
Timeliness – How much of an impact does date and time have on the data? This could be previous sales, product launches or any information that is relied on over a period of time to be accurate.
Validity – Does the data conform to the respective standards set for it?
Accuracy – How well does the data reflect the real-world person or thing that is identified by it?
Consistency – How well does the data align with a preconceived pattern? Birth dates share a common consistency issue, since in the U.S., the standard is MM/DD/YYYY, whereas in Europe and other areas, the usage of DD/MM/YYYY is standard.
The Big Picture on Data Quality
As you can see, there’s no “one size fits all” approach to maintaining accuracy and completeness on every type of data for every business. And with big data’s appetite for information growing more and more every day, it is becoming more important than ever to tackle data quality issues head-on. Although it can seem overwhelming, it’s worth enlisting data hygiene tools to let computers do what they do best – crunch numbers.
The most important step you can take is simply getting started. The data is always going to grow as more prospects come on board and new markets are discovered, so there’s never going to be a “best time” to tackle data quality issues. Taking the time now to map out what data quality means to your company or organization can create a ripple-effect of improved customer service, a better customer experience, a higher conversion rate and longer customer retention – and those are the kinds of returns on investment that any business will wholeheartedly embrace!
About the Author: Sherice Jacob helps business owners improve website design and increase conversion rates through compelling copywriting, user-friendly design and smart analytics analysis. Learn more at iElectrify.com and download your free web copy tune-up and conversion checklist today!