The Rise of Enterprise Data Prep
By Farnaz Erfan
Self-Service Data Prep – heard of it? Paxata pioneered it in 2013. Since then, it has transformed into a category that many new and old-timers have latched onto — a clear recognition for the market that we created five years ago.
What is evident today is the rise of “enterprise data prep” as an emerging market segment. The Self-Service Data Prep market is coming of age and is splitting into two categories: personal data prep tools and enterprise data prep solutions.
While personal data prep tools are designed for individual analysts looking to clean and shape their data for analytics – often using a desktop computer – an enterprise data prep solution has a much broader scope and far greater applicability.
Enterprise data prep solutions are designed to appeal to individuals across a wide range of skillsets – from non-technical business personnel to savvy technology experts. They cover many use cases including, but not limited to, analytics. They go beyond ad hoc projects and become a critical part of an organization’s information fabric and daily business operations.
But how do you recognize an enterprise data prep solution? Use the following five key criteria to distinguish an enterprise data prep solution from a personal data prep tool:
- Adaptive to many use cases and destinations: A personal data prep tool typically has one destination: A Business Intelligence (BI) / Analytics application that it publishes data to or is embedded in. On the other hand, an enterprise data prep solution has many different use cases and destinations – including analytics and reporting, migrating business applications, creating marketplaces, matching and consolidating product or customer data, exploring data lakes, and much more.
- Intelligent algorithms to correlate mixed-structure datasets: Personal data prep tools, especially new entrants, are sufficient for use with relational and flat files. However, enterprise data prep solutions have the ability to perform intelligent joins between multi- and mixed-structured data sets – such as JSON and XML files where the data is extremely nested and schemas are diversified – tasks that personal data prep tools are ill-suited to tackle.
- Automation: Personal data prep tools lack scheduling and enterprise publishing capabilities. This is not a problem when the use case is a one-off scenario. However, in an enterprise, that is not the case. Without a way to automatically keep the information fresh and share it broadly, your data prep is not live throughout the organization and creates nothing but static, stale, and dead data extracts.
- Governance: A personal data prep tool is installed on somebody’s machine; while this makes it a productivity tool, it comes at the cost of creating yet another information silo. The only way to ensure consistency, accuracy, governance and collaboration around the same set of data is to utilize a governed, cloud-based, enterprise data prep solution.
- Lives in a poly-cloud world: A personal data prep tool installs in one location, but data live in a multi-cloud world – such as in Microsoft Azure, AWS, Cloud Applications, on-premise data warehouses, Hadoop clusters, or sFTP sites. An enterprise data prep solution is fully capable of living and operating in this hybrid, poly-cloud world, while a personal data prep tool can do no more than just sitting in a corner.
When considering whether a personal data prep tool or an enterprise data prep solution is best suited for your organization, bear in mind your use cases, your information consumers, the complexity of your data and its distribution, and the requirements for your first project and subsequent ones.
To help you with your decision-making, download the Data Preparation Buyer’s Guide, created by the Eckerson Group, one of the top data management and business intelligence research and consulting firms in the market.