Tag Archives: levenshtein distance

Identifying Duplicate Records with Fuzzy Matching

Posted on September 9, 2013 by Pranab

I was prompted to write this post in response to a recent discussion thread in linkedin Hadoop Users Group regarding fuzzy string matching for duplicate record identification with Hadoop. As part of my open source Hadoop based recommendation engine project … Continue reading →

Posted in Big Data, Hadoop and Map Reduce, Text Analytic | Tagged duplicate detection, fuzzy matching, levenshtein distance | 37 Comments

Tag Archives: levenshtein distance

Identifying Duplicate Records with Fuzzy Matching

Recent Posts

Top Posts

Archives

Categories

Meta

About me

My Recent Tweets