PAXATA ACHIEVES HIGHEST PERFORMANCE TO DATE OF DATA PREPARATION
AT SCALE BASED ON APACHE SPARK
From Excel Hell to Massive Data Migration Projects to a Value-Add for Analytic Services –
Paxata Handles All Data Prep Needs
San Jose, CA – Mar. 4, 2015 – Paxata, provider of the first purpose-built Adaptive Data Preparation™ solution, today announced the results of its Data Preparation Performance benchmark study. The evaluation tested the performance of Paxata’s recent Spring ’15 release based on Apache Spark. Proving the power of Adaptive Data Preparation at scale and as demonstrated by continued customer and partner success, the benchmark results highlight the company’s continued focus on performance, efficiency, elasticity, connectivity and scalability.
In the benchmark, Paxata demonstrated an aggregate median response time of less than five seconds for a full spectrum of data preparation operations on datasets with up to 20 million rows and 198 columns, with many operations showing sub-second response times. This represents an overall performance improvement on all operations by over 80 percent for its Adaptive Data Preparation when compared to previous releases. The benchmark was conducted on 27 nodes with eight cores x 60 GB each in Amazon Web Services with the Paxata multi-tenant cloud offering.
According to the Gartner report* titled Data Preparation Is Not an Afterthought, “The iterative and explorative nature of data preparation results in a time-consuming process that demands considerable effort from data scientists and business analysts. In addition, preparation of data originating from new and diverse data sources can be challenging.”
“We had a short period of time to complete a massive data migration project which required us to extract, organize and clean 30 million records being moved from a legacy environment into an SAP system,” said Matt Heinz, Head of Business Intelligence at Del Monte Foods, Inc. “The work had to be done by a non-technical team who understood the data best, and we wanted flexibility to explore and define our needs as the project evolved. The Paxata cloud solution took no time to deploy, and gave us an easy-to-use tool on a platform that scales as our data volumes and usage increase or decrease.”
The Paxata data preparation solution goes well beyond “wrangling”, “taming” or “munging” data, as it was designed to simplify how the business gathers and uses data, regardless of size or source. Built from the ground up to help business analysts, data scientists, developers, data curators, and IT teams automate, collaborate and dynamically govern the data integration, data quality and enrichment process in a selfservice fashion, now with unprecedented scale, the platform allows them to quickly and confidently build the AnswerSets needed for analytics without coding, scripting, data modeling or sampling.
“Paxata makes fast work of the pre-cursor steps of our analytics workflow so we can help our clients faster,” said Mitchell D. Silber, Executive Managing Director at K2 Intelligence, an investigative and integrity consulting firm. “The self-service, intuitive nature of the Paxata solution makes it easy for data scientists and non-technical analysts to quickly understand, shape and combine data sets from email logs, transaction records, social network activity, or other sources, and the Paxata platform lets us scale up or down based on the volume of data we are working with.”
While the initial success of the Paxata solution was around self-service data preparation on Excel, CSV, XML and JSON files, the Spring ’15 release now makes it simple to extract and interactively prepare data from Hadoop data lakes without the requirement for customers to write MapReduce jobs to prepare data sets through sampling and executing in batch-mode. This gives Paxata partners and customers an end-to-end method that minimizes the time and effort it takes to surface and combine Hadoop and non-Hadoop data.
Paxata delivers the first purpose-built Adaptive Data Preparation solution for business analysts, data scientists, developers, data curators, and IT teams, to enable the integration, cleansing, and enrichment of raw data into rich, analytic-ready data to power ad hoc, operational, predictive, and packaged analytics. Paxata partners with industry-leading big data and business intelligence solutions providers such as Cloudera, and seamlessly connects to BI tools, including Salesforce.com, Tableau, Qlik and Microsoft Excel to greatly accelerate the time to actionable business insights.
For more information on pricing and availability, please visit paxata.com. Visit http://www.paxata.com, follow @Paxata, connect on linkedin.com/company/paxata, follow us at http://www.facebook.com/paxata and watch us on http://www.Youtube.com/PaxataTV. *Data Preparation Is Not an Afterthought, Gartner, Inc. Nov. 7, 2014