The output of a recommendation engine, whether based on collaborative filtering or some other techniques reflects consumer’s interest in products or services. However a business may have some goals that may be at odds with the items recommended by the recommendation engine. For example, a business may be more interested in selling items with large inventory or items that are being promoted with discounted pricing. How do we reconcile these conflicting interests.
My open source recommendation engine sifarish is no exception to this problem. . So I decided to do something about it. We need to find score for recommended items that reflect a compromise between consumer interest and business interest. Here is my solution. Continue reading
In this post, I will focus on a time honored machine learning technique called Fisher Discriminant Analysis and will use it for customer segmentation for on an line music store customers. The store offers music of different genres for download. When there is a new release in certain genre, the store wants to do targeted email marketing for customers in the age group that are most likely to be to be interested in that genre.
The store has divided the different genres offered into two groups. One group tends to be preferred by younger customers, the other by the older customers. Based on past past purchase and download history, the store wants to build a predictive model for predicting the age that separates the younger from the older customers. This is where Fisher Discriminant Analysis comes into the picture. Continue reading
What does email marketing have to do with Markov model? Let’s explore and find out. Any consumer of product and services has a natural rhythm to his or her purchase history. Regular customers tend to visit a business according to some temporal patterns inherent in their buying history of products and services.
Wouldn’t you expect to get better results if your email marketing or any other marketing for that matter is aligned with the temporal pattern inherent in a customer’s purchase behavior. In practical terms, it means that your marketing email will go out at a time, predicted to be optimal by a predictive model. Continue reading
In this post, I will be venturing into the medical domain and show how big data analytic can play a crucial role in the complex and daunting world of health care. There is a kind of cancer that affects the male population above a certain age. There are also other important contributing factors like race, family history etc.
In this post, I will provide a Hadoop based machine learning solution to predict that threshold age. My focus will be only on one attribute of the patient data i.e., the age. The goal is to mine to the data to extract a simple rule like if age > x then patient should take test y . The doctor may use this rule to order specific test for a patient.
As we work our way through the post, we will find out that the Continue reading
In my last post, we did some exploratory analytic for customer churn. We identified the parameters that have most influence on whether a customer account gets closed or not. We performed correlation analysis using Cramer index.
In this post, we will take the next step forward i.e., build a Bayesian prediction model for predicting customer churn. The Hadoop based Bayesian classifier implementation is part of my open source project avenir on github. We will use the same mobile service provider customer data as an example, Continue reading
Classification problems involve predicting a response variable based on a set of feature variables for some entity. But there is another problem whose solution is a prerequisite for solving classification problem. We may want to know which among the set of feature variables are most strongly correlated to the response variables. Once we have identified those, we may only want to use that sub set of the feature variables to build the prediction model.
To put this in context, we will use the customer churn prediction problem, specifically for mobile telecom service provider customers. Continue reading
In one of my earlier posts, I discussed about using Pearson correlation for making social recommendation. In this post we will delve deeper into it including the Hadoop map reduce implementation. There are many correlation techniques, including cosine distance, slope one etc. These are already implemented in sifarish. The latest addition to the arsenal of correlation techniques in sifarish is Pearson correlation.
One advantage that Pearson correlation over other techniques is that it can handle bias in ratings for an user. For example, Continue reading