Cluster+Edit Basics – (Part 1 of 7)
The cluster and Edit feature on Paxata can be used with one of the following three methods.
- Metaphone = sounds like (in English)
- Fingerprint = looks like (excluding punctuations/case)
- N-gram = “n” patterns of characters
Briefly, consider these three examples to help learn the effect of each method:
|Groups words together based on their English language pronunciation. For example, “Noo York Citi” and “New York City” would belong to the same cluster using Metaphone.
|Groups values together where the differences are punctuations, capitalizations and word order. For example, “Chicago” and “CHICAGO.” and “Chicago” and “Chic-ago” would belong to the same cluster using Fingerprint.
|Groups “n” letters in a pattern based on user-provided value of “n”. See the example below for how ngram1, ngram2 and ngram3 represent the value found in the column CityName.