When dealing with very wide datasets, analysts spend a lot of time searching for a value without knowing which column has that value. In Paxata, there is a pretty simple way for the user to find this value, provided the dataset is not too big!Use the following steps in Paxata to search for a value across columns:Step 1: Start with the... Read More
Converting XML files into rows and columns is automatically handled during import with Paxata. Paxata does a particularly nice job of handling XML, JSON data with regards of converting the xml structure to rows and columns with virtually no additional effort from the end user. The transpose is all handled during import.
Try out these steps to see... Read More
By: Piet Loubser
Through 2018 90% of deployed data lakes will be rendered useless as they’re overwhelmed with information assets captured for uncertain use cases, according to Gartner.1 This is despite growth from pure play Hadoop vendors like Hortonworks and Cloudera. Join our webcast to learn key steps to accelerate value based on our learnings with numerous customers that leverage... Read More
VALUE: Profiling and filtering dates in Paxata can provide a simple and fast method to understand the date/time characteristics of your data (such as trends, outliers, and unexpected values) which may need resolving prior to pushing the data into production. It’s a great way to validate you have been provided an accurate & complete set of data from upstream or third-party... Read More
Often data files have variable record types, identified by a control character in the first one or two positions of the record. This Tip of the Day walks you through the basic steps of importing and parsing the data.
Assume your data file looks like this:
The first character of each row identifies what type of row it is. Some examples:
VALUE: For transactional and other low-level data, Paxata provides a method for flagging or marking individual rows which meet a given criteria or business rule. Working on complete (non-sampled) data, provides users the ability to review, validate, isolate, and output results. This approach allows you to operate within an spreadsheet paradigm while you maintain focus on the full scope of data.
In... Read More
Often we need to remove rows from our Project, and that is very straightforward. There is a simple best practice around removing rows of which you should be aware.
The steps for removing rows are as follows:Using a Filtergram, isolate the rows which you want to remove.From the left-hand toolbar, click the Scissors icon (the step editor will now have... Read More
Importing Data from a flat file into Paxata is very straightforward. Often, the default import settings work perfectly. Sometimes a user may need to adjust a couple settings.
Today’s Tip of the Day focuses on handling flat files with leading zeros. Paxata will not remove leading zeros during import, yet it is important to know how leading zeros are handled... Read More
By: Kumar JayaramData in a multi-cloud hybrid world
Traditional data analytics environments were built on the Enterprise Data Warehouse (EDW) usually running in some relational database like Teradata, Microsoft, or Oracle. While those still exist, there are many more sources that analysts need to get access to in order to perform the kinds of analytics required.Big data/Hadoop: increasingly, the modern... Read More
Using “Auto Number” provides a unique index number for every row in your dataset. You can then bind the column to the sort order for other, existing columns in your dataset. Auto Number is useful when you need to: 1) track your dataset’s original order 2) assign row ID’s to your dataset 3) identify sections or groups of rows in... Read More