Making Recommendations in Real Time

Making recommendations based on an user’s current behavior in a small time window is a powerful feature that has been added to sifarish recently. In this post I will go over the details of this feature. The real time feature has been added for social collaborative filtering based recommendations.

In our solution, although Storm is used for processing real time user engagement event stream to find recommended items, Hadoop does lot of heavy lifting by computing the item correlation matrix from historical  user engagement event  data.  Redis has been used Continue reading

Posted in Big Data, Collaborative Filtering, Data Mining, Hadoop and Map Reduce, Real Time Processing, Recommendation Engine, Redis, Storm | Tagged , | Leave a comment

Using Mutual Information to Find Critical Factors in Hospital Readmission

Nobody likes hospital readmission soon after discharge, whether it’s the patient or the insurance company. Predictive analytic techniques have been used to predict the likelihood of hospital readmission, using the various medical, personal and demographic input or feature attributes. However, some problems including the one we are discussing has a very large input feature set.

Before we plow  ahead with building a predictive model, it’s important to pause and ask ourselves what features are really important, especially with a problem like this with a very large set of input features.

Machine learning algorithms generally work better if the dimensionality i.e. the number of feature attributes is lowered. One of the techniques for lowering the dimensionality is to select a subset of the original feature set, known as feature subset selection. Continue reading

Posted in Big Data, Correlation, Data Mining, Hadoop and Map Reduce, Healthcare Analytic, Predictive Analytic | Tagged , , , | Leave a comment

From Explicit User Engagement to Implicit Product Rating

The basic input for sifarish or any other collaborative filtering  based recommendation engine is user rating of items. However explicit  rating by users is not always available. Even when it’s available, it’s been known that generally only users with extreme views tend to explicitly rate items. So the rating data even when available may be biased and not  very reliable.

However, user click stream data is always available. The type of engagement an user has with an item (e.g browsing product description, placing an item in shopping cart etc.) reflects the level of interest an user  has on the item. Based on this intuition, it’s possible to map engagement events to an implicit rating.

Application of this kind of heuristic is  viable option, when Continue reading

Posted in Big Data, eCommerce, Hadoop and Map Reduce, Recommendation Engine, Web Analytic | Tagged , , | 1 Comment

My blog 2013 review

The WordPress.com stats helper monkeys prepared a 2013 annual report for this blog.

Here’s an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 57,000 times in 2013. If it were a concert at Sydney Opera House, it would take about 21 sold-out performances for that many people to see it.

Click here to see the complete report.

Posted in Uncategorized | Leave a comment

Boost Lead Generation with Online Reinforcement Learning

When I go to a web site for for downloading white paper or product data sheet,  I often  hit the back button if presented with a form asking for lots of personal data. Any user that bounces out, is a potential loss of a lead. Perhaps a redesigned page, asking for fewer personal details, would have circumvented the problem.

In this post we will work on a solution using reinforcement learning algorithms. We will have multiple candidate page designs and have a reinforcement learning algorithm find the optimum page with highest click through rate. The algorithm will run real time deployed on Storm Continue reading

Posted in Big Data, Real Time Processing, Redis, Reinforcement Learning, Storm | Tagged , , , | Leave a comment

Reading Nested Objects Modeled with Composite Key from Cassandra

My earlier post was about storing nested objects modeled with composite key in Cassandra. Well, we need to be able to read the data back as objects and that’s the topic for this post. This post will focus on rest of the object story.  This is part of my open source project agiato.

Mapping Object to Composite Columns

As described in  the earlier post, here are some of the salient features of mapping between an object and Cassandra column family. The mapping logic does not use column family meta data. Instead it relies on introspection of the object passed. Continue reading

Posted in Big Data, Cassandra, Data Model, NOSQL | Tagged | Leave a comment

Retarget Campaign for Abandoned Shopping Carts with Decision Tree

Research has shown that customers who have abandoned shopping carts, when subjected to retargeting email campaign, often come back and in many cases end up buying more than what was originally in the shopping cart.

There are many attributes of such email campaigns. In this post, we will find the attribute values  that produce the  maximum  effectiveness for such  retargeting campaigns, by including some of those attributes.  A Hadoop based decision tree algorithm will be used to mine existing retargeting campaign data. Continue reading

Posted in Big Data, Data Mining, eCommerce, Hadoop and Map Reduce, Marketing Analytic | Tagged , , , , , | Leave a comment