Category Archives: Text Analytic

Identifying Duplicate Records with Fuzzy Matching

I was prompted to write this post  in response to a recent discussion thread in linkedin Hadoop Users Group regarding fuzzy string matching for duplicate record identification with Hadoop. As part of my open source Hadoop based recommendation engine project … Continue reading

Posted in Big Data, Hadoop and Map Reduce, Text Analytic | Tagged , , | 33 Comments

Similarity Based Recommendation – Tossed up with Text Analytic

In my last post I mentioned that similarity based recommendation engine in sifarish only considered categorical and integer attributes. I have added support for text attributes to sifarish. I am using Lucene for text processing and a variation of jaccard … Continue reading

Posted in Data Mining, Hadoop and Map Reduce, Recommendation Engine, Text Analytic | Tagged , , , , , | 8 Comments