Multiple Cluster and Edit vs Single Cluster and Edit (Part 2 of 7)
In general, we recommend Paxata users to perform the cluster and edit operation in an iterative manner using a combination of the multiple algorithms available (Fingerprint, ngram and metaphone) . Using multiple cluster and edits increases the accuracy of your results that come out of the data cleansing project.
The goal is to execute the transformation shown in the image above.
|Step 1: Perform the Cluster and Edit operation (fingerprint method) on the raw dataset
Your data will look similar to the image below:
|Step 2: Second Cluster and Edit operation (Metaphone method) and your output should look like our desired output.
A single Cluster and Edit operation might be sufficient in some cases but more often than not a users tend to use Cluster and Edit in an iterative fashion.