Category Archives: Text Analytic

Improving Elastic Search Query Result with Query Expansion using Topic Modeling

Query expansion is a process of reformulating a query to improve query results and to be more specific to improve the recall for a query. Topic modeling is an Natural Language Processing (NLP) technique to discover hidden topics or concepts … Continue reading

Posted in elastic search, NLP, Python, Solr, Text Analytic, Text Mining, Topic Modeling | Tagged , , , | 1 Comment

Identifying Duplicate Records with Fuzzy Matching

I was prompted to write this post  in response to a recent discussion thread in linkedin Hadoop Users Group regarding fuzzy string matching for duplicate record identification with Hadoop. As part of my open source Hadoop based recommendation engine project … Continue reading

Posted in Big Data, Hadoop and Map Reduce, Text Analytic | Tagged , , | 36 Comments

Similarity Based Recommendation – Tossed up with Text Analytic

In my last post I mentioned that similarity based recommendation engine in sifarish only considered categorical and integer attributes. I have added support for text attributes to sifarish. I am using Lucene for text processing and a variation of jaccard … Continue reading

Posted in Data Mining, Hadoop and Map Reduce, Recommendation Engine, Text Analytic | Tagged , , , , , | 8 Comments