Data preparation: Self-service meets big data scale
Paxata is the first self-service data preparation solution built from the ground-up to help business analysts, data architects, data scientists and IT teams eliminate the pain of combining, cleaning and shaping their data prior to doing ad-hoc analysis, predictive analytics and operational reporting.
Paxata combines an easy-to-use, self-service data preparation application with an enterprise data preparation platform powerful enough to satisfy the demands of teams who want to dramatically increase their analytic productivity on ever-increasing data volumes, reduce risk of data sprawl and allow the entire enterprise to get greater value from the work they do. The power of the platform is exposed to cloud customers in the form of unparalleled data prep flexibility and responsiveness, while on-premise customers can integrate Paxata into their Hadoop environment with minimal effort and immediate time to value. Regardless of which deployment model is desired, all customers get the benefits of the only modern data preparation solution on the market.
Add data: Paxata brings in data regardless of where it comes from (HDSF, relational databases, Excel, Flat Files, XML, JSON, Avro) and automatically parses and identifies the types (products, customers, dates, timestamps, geography) and meaning of the data.
Explore: Paxata helps you proactively find quality issues by engaging in completely ad-hoc interactive exploration with full-text search, interactive text and numeric filters and histograms, and visual data quality heat maps that highlight patterns, errors, duplicates and sparse or missing data.
Clean and Change: Once Paxata has highlighted what transformations are needed to improve the data, analysts can remediate errors, add data and make changes to entire columns or single fields without any coding or scripting.
Shape: In a single click, data can be pivoted or de-pivoted, columns can be split, and aggregations can be created to quickly make the datasets more suitable for the required analytic exercise.
Enrich: Because Paxata has semantic understanding of the data, the system can recommend other attributes which will increase the contextual value of the final AnswerSet, making it easier for data teams to achieve greater value from their analytic results.
Combine: Paxata automatically detects common attributes across multiple raw source data sets, assembling them into a single AnswerSet™, and then merges multiple overlapping entity references into de-duplicated, trusted entities without any scripting, SQL, or complex Excel functionality like VLOOKUPS, pivot tables and macros.
Publish: Paxata makes AnswerSets™ available directly through ODBC LiveQuery to Qlik, Tableau, Excel and any other ODBC-compliant analytics tool. Paxata also supports publishing AnswerSets to Hadoop clusters. At the heart of the Paxata platform is IntelliFusion™, the proprietary semantic fusion and machine learning engine, which intelligently identifies and transforms raw data on the fly based on a set of sophisticated algorithms.
Share: Paxata allows business and technical data teams to work simultaneously on data projects without a pre-determined workflow. An event-driven, WebSocket-powered HTML5 visual user experience allows simultaneous editing, and access to projects across all devices, such as desktop web browsers, tablets, and smart phones. Paxata’s unique data library is ideal for data curators and IT organizations to use as a central repository for sharing data sets with the business, as well as the one-stop shop to all completed and in-process data prep projects. The library includes full versioning and tracking of all uploaded data and published AnswerSets.
Govern: Paxata’s step editor tracks every action taken within a project, with full replay (see what the data looked like at every step), reusability (the ability to apply previous data preparation steps to new data sets) and reordering (take the same data preparation steps but in a different sequence). The administration console provides security and user management capabilities across individuals or teams.