Nearest Flunking Neighbors

Adoption of eLearning or Learning Management Systems (LMS) has increased significantly within academic and business world. In some cases, depending on the content  and the eLearning system being used, high drop out rates have been reported as a serious problem. Here is an article form the Journal of Online Learning and Teaching on this topic. In this post I will use K Nearest Neighbor (KNN) classification technique implemented on Hadoop  to predict partially  through a course whether a student is likely to drop out eventually. KNN is a popular and reliable classification technique.

The input features consist of various signals based on the  engagement level of the student  with the eLearning system and the student’s performance so far. With the identification of students who are likely to drop  out, the instructors can be more vigilant Continue reading

Posted in Big Data, Data Mining, Hadoop and Map Reduce, Predictive Analytic | Tagged , , , , , | Leave a comment

Novelty in Personalization

We all have the unfortunate  experience of being pigeon holed by Personalization and Recommendation engines. When recommendation are based on our past behavior and there is very little  opportunity to explore. But our past actions are not always good predictors for our future behavior.  At any given moment, our behavior is highly influence by our mood, fleeting thoughts and contextual surrounding. There are are several way to improve the solution e.g., by introducing novelty and diversity in the recommendation list. Even adding some items randomly to the recommendation list has been found to be effective.

I have recently added novelty in my open source Recommendation and Personalization engine sifarish.  In this post I will go over the solution as implemented in sifarish. Continue reading

Posted in Big Data, Data Mining, Hadoop and Map Reduce, Personalization, Recommendation Engine | Tagged , | Leave a comment

Popularity Shaken

We will be addressing two important issues faced by recommendation systems. First, how do you solve the cold start problem i.e., provide recommendations for new users with very limited behavior data available. Second, even if we have a recommendation list for new users, how do we prevent ourselves from presenting the same recommendation list repeatedly.

I will go over the details of how both of these problems have been solved in sifarish, my OSS recommendation engine. We calculate certain statistical parameters from user engagement historical signals and compute popularity for an item by combining those statistical parameters.

Why Dithering

The second issue has to do with what is known as “above the fold issue”. When presented with a long list, typically users will scan only the top few items from the list. Continue reading

Posted in Big Data, Hadoop and Map Reduce, Recommendation Engine, Storm | Tagged , , | 2 Comments

Making Recommendations in Real Time

Making recommendations based on an user’s current behavior in a small time window is a powerful feature that has been added to sifarish recently. In this post I will go over the details of this feature. The real time feature has been added for social collaborative filtering based recommendations.

In our solution, although Storm is used for processing real time user engagement event stream to find recommended items, Hadoop does lot of heavy lifting by computing the item correlation matrix from historical  user engagement event  data.  Redis has been used Continue reading

Posted in Big Data, Collaborative Filtering, Data Mining, Hadoop and Map Reduce, Real Time Processing, Recommendation Engine, Redis, Storm | Tagged , | 2 Comments

Using Mutual Information to Find Critical Factors in Hospital Readmission

Nobody likes hospital readmission soon after discharge, whether it’s the patient or the insurance company. Predictive analytic techniques have been used to predict the likelihood of hospital readmission, using the various medical, personal and demographic input or feature attributes. However, some problems including the one we are discussing has a very large input feature set.

Before we plow  ahead with building a predictive model, it’s important to pause and ask ourselves what features are really important, especially with a problem like this with a very large set of input features.

Machine learning algorithms generally work better if the dimensionality i.e. the number of feature attributes is lowered. One of the techniques for lowering the dimensionality is to select a subset of the original feature set, known as feature subset selection. Continue reading

Posted in Big Data, Correlation, Data Mining, Hadoop and Map Reduce, Healthcare Analytic, Predictive Analytic | Tagged , , , | 1 Comment

From Explicit User Engagement to Implicit Product Rating

The basic input for sifarish or any other collaborative filtering  based recommendation engine is user rating of items. However explicit  rating by users is not always available. Even when it’s available, it’s been known that generally only users with extreme views tend to explicitly rate items. So the rating data even when available may be biased and not  very reliable.

However, user click stream data is always available. The type of engagement an user has with an item (e.g browsing product description, placing an item in shopping cart etc.) reflects the level of interest an user  has on the item. Based on this intuition, it’s possible to map engagement events to an implicit rating.

Application of this kind of heuristic is  viable option, when Continue reading

Posted in Big Data, eCommerce, Hadoop and Map Reduce, Recommendation Engine, Web Analytic | Tagged , , | 4 Comments

My blog 2013 review

The WordPress.com stats helper monkeys prepared a 2013 annual report for this blog.

Here’s an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 57,000 times in 2013. If it were a concert at Sydney Opera House, it would take about 21 sold-out performances for that many people to see it.

Click here to see the complete report.

Posted in Uncategorized | Leave a comment