Importing Files with Variable Record Layouts

May 22, 2018

Importing Files with Variable Record Layouts

Often data files have variable record types, identified by a control character in the first one or two positions of the record. This Tip of the Day walks you through the basic steps of importing and parsing the data.

 

Assume your data file looks like this:

The first character of each row identifies what type of row it is.  Some examples:

1  = header record

5  = batch header record

9  = file control record

Different row types have different layouts.

How do you extract all the data in a reliable, repeatable way?  Read on!

Step 1:  Import the file as a fixed width file with a single column as wide as the entire width of the file.
 
In the illustration above, the following parameters have been set

  1. Use same width for all columns = True
  2. Column Width = 94  (or however wide your file is)
  3. Number of columns = 1
  4. Number of header rows = 0  (this will always be the case with files of this type)
  5. File Format = Fixed Width (upper right quadrant)
Step 2:  Create a project.  Use the dataset you imported in Step 1 as the anchor dataset

 

 

Step 3:  Split the one column, by Length with the first column being 1 character wide and the second column being the remaining characters (93 in the case of the example).

 

Step 4:  For each record type, set a filter on that record type and split the unparsed_data column based on that record layout specification.
!

This illustration shows parsing a header type record (1) of an ACH file.

Once you have parsed each record type, you would likely have additional work to finalize the data preparation, but that is a subject for another day. Hint:  “Fill Down”!

 

n.b.: There are many standards which use this type of format.  In finance, there are countless applications of this type of format, in genomic research, the VCF (Variant Call Format) is used for storing gene sequence variations, in statistics XPORT / XPT make up the SAS Transport File Format Family.  There are countless applications and industries which use this type of layout. Paxata is the Swiss Army knife that will open them all!

🙂

Show Buttons
Hide Buttons