Category Archives: Text Analytic

Six Unsupervised Extractive Text Summarization Techniques Side by Side

In text summarization, we create a summary of the original content that is coherent and captures the salient points in the original content. There are various important usages of text summarization. Something we face almost every day is the text … Continue reading

Posted in Data Science, NLP, Python, Text Analytic, Text Mining | Tagged , | 1 Comment

Improving Elastic Search Query Result with Query Expansion using Topic Modeling

Query expansion is a process of reformulating a query to improve query results and to be more specific to improve the recall for a query. Topic modeling is an Natural Language Processing (NLP) technique to discover hidden topics or concepts … Continue reading

Posted in elastic search, NLP, Python, Solr, Text Analytic, Text Mining, Topic Modeling | Tagged , , , | 1 Comment

Identifying Duplicate Records with Fuzzy Matching

I was prompted to write this post  in response to a recent discussion thread in linkedin Hadoop Users Group regarding fuzzy string matching for duplicate record identification with Hadoop. As part of my open source Hadoop based recommendation engine project … Continue reading

Posted in Big Data, Hadoop and Map Reduce, Text Analytic | Tagged , , | 37 Comments

Similarity Based Recommendation – Tossed up with Text Analytic

In my last post I mentioned that similarity based recommendation engine in sifarish only considered categorical and integer attributes. I have added support for text attributes to sifarish. I am using Lucene for text processing and a variation of jaccard … Continue reading

Posted in Data Mining, Hadoop and Map Reduce, Recommendation Engine, Text Analytic | Tagged , , , , , | 8 Comments