Cluster+Edit Basics – (Part 1 of 7)

August 21, 2018

Cluster+Edit Basics – (Part 1 of 7)

The cluster and Edit feature on Paxata can be used with one of the following three methods.

 

 

  1. Metaphone = sounds like (in English)
  2. Fingerprint = looks like (excluding punctuations/case)
  3. N-gram = “n” patterns of characters

Briefly, consider these three examples to help learn the effect of each method:

Metaphone
Groups words together based on their English language pronunciation. For example, “Noo York Citi” and “New York City” would belong to the same cluster using Metaphone.

Fingerprint
Groups values together where the differences are punctuations, capitalizations and word order. For example, “Chicago” and “CHICAGO.” and “Chicago” and “Chic-ago” would belong to the same cluster using Fingerprint.

Ngram
Groups “n” letters in a pattern based on user-provided value of “n”. See the example below for how ngram1, ngram2 and ngram3 represent the value found in the column CityName.

Try Now
Show Buttons
Hide Buttons