Adaptive Data Preparation™

Over the last ten years, technology has accelerated our ability to gather and share information in ways we never expected. Instead of bringing data to our desktops, the Web has enabled us to go to the data, wherever it is. Mobile devices make it possible for everyone to access information, wherever they are. And Cloud-based application delivery means we can work however we want to. These new capabilities have spawned a generation of individuals of all ages who want to exploit the data around us to work smarter, move faster and live better.

In response to that movement, innovations have taken place in most parts of the analytic supply chain. Big Data technologies like Hadoop and MongoDB have completely disrupted the data management world, making it possible to collect and store more raw data than we ever imagined, at a fraction of the cost and speed we once lived with. At the same time, a relentless focus on the end user has driven companies like Tableau and QlikView to deliver new ways for everyone in the business to digest and explore information how they want to. With all of this at our fingertips, why do so many organizations still rely on intuition or gut-driven decisions?

Why "Adaptive" Data Preparation™? 

The notion of "adaptive" addresses two aspects of data preparation.

The first is that, with Paxata, business analysts can now adjust more rapidly to the iterative business requests that come in on a daily basis. It is very typical that, as data is being understood and analyzed, users need more data to complete their questioning. So they go back to their analysts to repeat the data prep cycle again  At that point, analysts spend weeks and months in spreadsheets and home-grown data marts trying to combine their clean data with additional data from raw or outside sources, hoping that this next answer set will be what the business needs. Until now, that back-and-forth was the most painful part of every analytic exercise.

The second describes the most powerful aspect of the Paxata solution: the machine learning that leverages proven technologies from consumer search and social media, namely intelligent indexing, textual pattern recognition, and statistical graph analysis. By applying proprietary, patent-­pending algorithms to the linguistic content of both structured and unstructured data, the Paxata  solution  automatically  builds  a  comprehensive  and  flexible  data  model  in  the  form  of  a  graph, reflecting  similarities  and associations  amongst  data items. The system uses associations between the data to detect and resolve both syntactic and semantic data quality issues, rapidly improving the quality of large data sets.  As more data sources are added, the expanded associations amongst the data are leveraged to further improve the quality of the data.