Tag Archives: levenshtein distance

Identifying Duplicate Records with Fuzzy Matching

I was prompted to write this post  in response to a recent discussion thread in linkedin Hadoop Users Group regarding fuzzy string matching for duplicate record identification with Hadoop. As part of my open source Hadoop based recommendation engine project … Continue reading

Posted in Big Data, Hadoop and Map Reduce, Text Analytic | Tagged , , | 37 Comments