Multiple Cluster and Edit vs Single Cluster and Edit (Part 2 of 7)

August 27, 2018

Multiple Cluster and Edit vs Single Cluster and Edit (Part 2 of 7)

In general, we recommend Paxata users to perform the cluster and edit operation in an iterative manner using a combination of the multiple algorithms available (Fingerprint, ngram and metaphone) . Using multiple cluster and edits increases the accuracy of your results that come out of the data cleansing project.

 

Example:

The goal is to execute the transformation shown in the image above.

Step 1: Perform the Cluster and Edit operation (fingerprint method) on the raw dataset

Your data will look similar to the image below:

Step 2: Second Cluster and Edit operation (Metaphone method) and your output should look like our desired output.

A single Cluster and Edit operation might be sufficient in some cases but more often than not a users tend to use Cluster and Edit in an iterative fashion.

Get Started
Show Buttons
Hide Buttons