Category Archives: Java

Semantic Matching with Hadoop

Recently, I had a request to support semantic matching in sifarish, my open source matching and recommendation engine. By  semantic matching, I mean any algorithm that does not rely on explicit keyword based match. In this post, I will provide … Continue reading

Posted in AI, Big Data, Hadoop and Map Reduce, Java, Semantic | Tagged , , | Leave a comment

Similarity Based Recommendation – Hadoop Way

In my earlier post, I discussed some of the basic  concepts for Similarity Based Recommendation. As discussed, distance between entities in a multi dimensional attributes space is used as a measure of similarity. In this post I will take a … Continue reading

Posted in Data Mining, Hadoop and Map Reduce, Java, Recommendation Engine | Tagged , | 9 Comments

Multi Cluster Hadoop Job Monitoring

I spend lot of time tracking and monitoring Hadoop jobs running across multiple clusters in my current project. Typically I navigate around multiple Job tracker web admin consoles.  Although the job tracker web console gives some basic system level statuses … Continue reading

Posted in Hadoop and Map Reduce, Java | Tagged , | 6 Comments

Visitor Conversion with Bayesian Discriminant and Hadoop

You have lots of visitors on your eCommerce web site and obviously you would like most of them to convert. By conversion, I mean buying   your product or service. It could also mean the visitor taking  an action, which … Continue reading

Posted in Data Mining, Hadoop and Map Reduce, Java, Predictive Analytic | Tagged , | 8 Comments

Hadoop Orchestration

Most data processing tasks with Hadoop require multiple Hadoop jobs with dependencies between them. The dependency arises out of the need for one job to use the output for another job. The dependency between Hadoop jobs can be expressed as … Continue reading

Posted in Hadoop and Map Reduce, Java, Workflow | Tagged , , | 8 Comments

Map Reduce Secondary Sort Does It All

I came across a question in Stack Overflow recently related to calculating a web chat room statistics using Hadoop Map Reduce. The answer to the question was begging for a solution based map reduce secondary sort. I will provide  details, … Continue reading

Posted in Hadoop and Map Reduce, Java | Tagged | 30 Comments

Easy Cassandra Data Access

This post is about a simple no nonsense data access API for Cassandra. I did not start with a grandiose plan for yet another high level Cassandra API. I was implementing a Cassandra based BPM that I blogged about earlier. … Continue reading

Posted in Cassandra, Java, NOSQL | Tagged , | 4 Comments

Ruling with Drools Rule Engine

In a project several years ago I built a rule engine from scratch.  In a recent project, which needed a rule engine, I decided to take different route. I decided to give  Drools rule engine from JBOSS a try. It … Continue reading

Posted in Java, Rule Engine | Tagged | 11 Comments

Recommendation Engine Powered by Hadoop (Part 2)

In Part 1 of this post the focus was on finding the correlation between items, based on rating data available in individual items. The MR job output was the correlation coefficient matrix, with correlation coefficient  values between 0 and 1 … Continue reading

Posted in Collaborative Filtering, Data Mining, Hadoop and Map Reduce, Java | Tagged , , | 10 Comments