June 16, 2017

By Mike White   June 16, 2017


Working with wide data can be a challenge when you’re trying to find that one column in the midst of hundreds or even thousands. It’s even more frustrating when the headers consist of cryptic codes resulting from legacy systems, character-limited headers, or an abbreviation in a foreign language. Fortunately, Paxata provides a clever way to find and navigate through the columns so you can be on your way to profiling or deriving new attributes and discovering insights.


To illustrate the ease of working with challenging wide data, we will work with public data from the United States census bureau covering a 5-year period showing New York housing statistics and characteristics. This particular file is not terribly wide by column count (though it is 218 columns) but it is a challenge to scroll through and it does include some long column names.

  1. Begin by noting the “column widgets” in the bottom-left corner of a project. The first one is the interactive column search (magnifying glass).
  2. Click or use the Ctrl+F (Cmd+F) shortcut to reveal the column search bar at the bottom.

    Note: You can toggle to “use browser find” for the keyboard shortcut (relevant only to your Paxata web browser tab)
  3. Type a prefix, suffix, or any portion of a column you’re interested in finding. The column search uses “contains” to help you find the most relevant results. In this scenario, I’m looking for statistics related to the working population so I type “work”. Note how all other columns which do not fit this criteria are conveniently collapsed:

    Note: “Hide non-matching columns” is enabled by default for all projects except those with an extreme number of columns
  4. Use Ctrl+G (Cmd+G) shortcut to cycle through each column in the grid which matches your column name search criteria. You will see each column highlighted in yellow and snap to the left-most position in the grid:

    Note: Pattern & range highlighting was disabled above to make the highlighted column more obvious in the screenshot
  5. Now that you found what you’re looking for, you are one click away from profiling and filtering your data:

    Wow – it takes some poor New Yorkers 2.5 hours to get to work!

Stay tuned for more tips involving the useful, albeit cryptic, census data (such as how to replace all those code values in the data with actual text values)!

