What’s ahead for your data quality group, DQ tools, and the role of machine learning in DQ?
Originally posted on TDWI
By Farnaz Erfan
Recent Gartner research indicates that the average financial impact of poor data quality (DQ) on organizations is $9.7 million per year. This is likely to worsen as information environments become increasingly complex. As enterprise data management systems continue to mature in response to IT-driven data quality requirements, DQ roles are beginning to spread into the lines of business (LOBs). This makes sense, given that the data is generated and consumed primarily by business personnel who are most familiar with the context of data and the ideal state of its quality.
Additionally, the complexity of new data sets and the frequency of data-driven decisions are creating a constant flow of requests for IT assistance and remarkably lengthier backlogs. As such, business decision makers are demanding their own stake in data quality projects and searching for tools and techniques that will empower them to improve the quality of data firsthand while aligning it with their business goals and timelines.
After examining new research and speaking directly with customers, we believe the following predictions will hold true for the data quality market in 2018:
Prediction #1: Data quality groups will form in business teams
A large majority of global enterprises will assign resources to data quality projects in 2018. Although data quality practices occurring in central IT teams will remain focused on cross-domain or deep-domain analysis and quality improvements, business units will tie their own data quality metrics into use cases and business scenarios. For example, chief marketing officers will be interested in improving the quality of their marketing databases and controlling the costs of direct mail campaigns by removing duplicate names and addresses.
Prediction #2: Data profiling will be the catalyst for data quality exercises within the LOB
With data quality becoming increasingly necessary across a variety of business units, executives and LOB managers will search for areas in the practice where they can add the most value. Given their proficiency in their respective business domains, analysts and subject matter experts are inherently best-suited to decide what information the data should represent, what values are false positives, and what patterns are leading indicators of certain business outcomes. As a result, data profiling — the practice of understanding the content, relationships, patterns, anomalies, and redundancies in data — will be the first step in a data quality journey for the LOB.
Prediction #3: A new breed of modern data quality tools will emerge
To accelerate time to value, many business leaders will endeavor to quickly deploy data quality solutions and rapidly enable their teams, driving the need for cloud deployments. Given that business teams typically lack deep technical expertise, a new breed of modern data quality tools — designed with business users’ skills and know-how in mind — will gain popularity. Some user experiences will include interactivity with data (where a business user can see and profile all data values in Excel-like interfaces) and the use of visualization and point-and-click motions to speedily perform a task.
Prediction #4: New datasets and rapid prototypes will drive demand for separating signals from the noise
IoT, data lakes, data streams, and external datasets can be the origins of new technology. These rich, often complex data sets carry a high potential for new products or services. For example, by using sensors inside vehicles to collect data such as tire pressure, temperature, speed, and location, Michelin is improving consumers’ driving techniques and fuel consumption. These types of services require rapid profiling of sensor and device data to discover patterns, model behavior data, and pilot and test ideas to devise new use cases and products.
Prediction #5: Machine learning algorithms will augment data quality tools
Machine learning algorithms will continue to enrich data quality tools. Part of augmented analytics, Gartner defines this use as “augmented data preparation, which uses machine learning to augment data profiling and data quality, harmonization, modeling, manipulation, enrichment, metadata development, and cataloging.” In 2018, we will continue to see machine learning techniques utilized to identify relationships between data sets and suggest joins, perform appends and unions, and automate profiling of data to detect and fix anomalies and normalize data by using clustering techniques.
Data Quality Directions
Market dynamics have started to change where data quality exercises are taking a more active role in the line of business. As new data lakes are created to manage more complex data from business products and services (such as ad tech, mobile consumer behavior, and IoT device usage), organizations are more inclined to discover data quality problems in their early stages and inside the line of business. For many of these use cases, it is essential that business subject matter experts, who know their data more intimately, play an active role in providing a baseline for improving its quality in a way that impacts business outcomes.
Thanks to new technologies (such as machine learning and cloud) and better user-experience principles, a set of new, modern, business-user-centric data quality solutions have emerged to enable analysts and business domain experts to interact with data for a hands-on experience in collaboration with IT.