Rapid Data Profiling Delivering Self-Service Data Trust
By Piet Loubser
A recent study into “The State of Data Quality in the Enterprise, 2018”, companies are experiencing two major obstacles in their quest for trust in their data: the significant variety of data sources and a complex mix of data types. The result is that data preparation activities are difficult and the resulting analytical initiatives are slowed down. Register for our webcast titled “Rapid Data Profiling to Streamline Customer Data Onboarding” where Mike White and Krupa Natarajan from Paxata will walk you through the elements of Rapid Data Profiling and share some customer examples.
The Case For Self Service Data Preparation
It is commonly proclaimed by industry observers and business leaders that data is the fuel of the digital economy. Of course, the challenge is that everyone has data – so what will make your business stand out in the data fueled race? For most, it is faster and smarter use of that data could give you a major leg up. But while the world’s repositories of data have exploded to an expected 45,000 Exabytes by 2020, and BI tools have become more pervasive than ever before, the path between that data and the BI tool is going through a proverbial straw.
IT cannot be the only place where data is prepared for analytics. Enter the domain of self-service data preparation. Data analysts and power users are empowered to find, understand, and shape data from a variety of sources, and then publish that resulting dataset into their BI tool of choice. The resulting productivity and accuracy substantially improves analytical outcomes. Of course, the quality of those outcomes are dependent on the quality of the data itself.
Researchers traditionally use Excel or Perl scripts to bring these different data sources together, flatten XML structures, parse delimited files, and compare the profile of a given patient to a cohort of other patients – a manual and time-consuming process. Additionally, most researchers lack the data science skills required with traditional data prep and ETL solutions.
Interactive Rapid Data Profiling
In our diagram above we outline the various stages in the data prep lifecycle. The first question the analyst has is what data is available? Maybe it is customer data and purchase transactions. Once you located the available data, you need to understand what is in the data and what does it look like. Rapid data profiling allows you to visually and quickly understand the shape of the data. It helps you understand possible data quality issues like duplicates, miss-spelling of States, Countries, Products, Companies and then it guides you to interactively fix those problems.
Paxata Rapid Data Profiling In Action
During the webcast on Wednesday, April 25 at 11:00 am PDT | 2:00 pm EDT we will introduce rapid data profiling, share examples of customer success, and demonstrate a modern, point-and-click approach for streamlining your data profiling and remediation efforts:
See scenarios and customer use cases where rapid data profiling changed the game
Generate a profile report with a single click and obtain summary insight (remediation guidance)
Leverage built-in views and algorithmic intelligence to detect data quality issues and remediate them across the entire dataset in real time
Generate data quality business rules using Excel-like syntax to discover and fix exceptions
Publish and automate a clean and conformed output for a data source or analytics solution
As we are living in the middle of the data arms race, it is clear the world is never going to be homogeneous. Data is in every location – on-premises, in multiple clouds. And the variety will keep on growing. The key to transformative insights is driven by the agility of organizations moving from ideas, to pilots, to operationalizing their insights. Read more about Paxata’s journey towards our Adaptive Information Platform.