From Explicit User Engagement to Implicit Product Rating


The basic input for sifarish or any other collaborative filtering  based recommendation engine is user rating of items. However explicit  rating by users is not always available. Even when it’s available, it’s been known that generally only users with extreme views tend to explicitly rate items. So the rating data even when available may be biased and not  very reliable.

However, user click stream data is always available. The type of engagement an user has with an item (e.g browsing product description, placing an item in shopping cart etc.) reflects the level of interest an user  has on the item. Based on this intuition, it’s possible to map engagement events to an implicit rating.

Application of this kind of heuristic is  viable option, when there is paucity of explicit rating data  or when such data is deemed to be not very reliable. This preprocessing  map reduce job to estimate implicit rating  is provided by sifarish. In this post, we will go over the details of this map reduce job with an example.

Mapping User Engagement Events

In our example, we consider 5 different event types with decreasing user interest level as below.

Event Type Description
1 Purchased item
2 Joined checkout
3 Placed item in shopping cart
4 Placed item in wish list
5 Browsed item from search result
6 Browsed item from recommendation list
7 Browsed item
-1 Returned item
-2 Left checkout
-3 Removed item from shopping cart
-4 Removed item from wish list

The user rating is a function of the event type and the number of occurrences of such event type.  If there are multiple event types associated with an item, The rating associated with each event type is calculated and the highest rating among them is selected.

For a given event type, rating increases asymptotically with increasing number of occurrences up to a threshold rating value.

Estimating Implicit Rating

The  map reduce implementation for implicit rating is here. Some sample input data is as follows, which can easily be generated by pre-processing raw click stream data.

0I3GQ6SETOIR,1595e19b-01c1-48a6-835c-e7d55902417e,929BBU0001,6,1397403852
UGU2IS4VW6SC,6238c407-377e-4b02-b0ea-be90dbd5b199,YVY412FGW4,6,1397403868
HW0WP38NWV2V,b73c6b09-390c-4494-b893-b6e265332ade,SQAG41CKO1,7,1397403886
TKQFZM0WCM84,5dcd1252-071e-4299-8d57-f6b0a61fd795,93R93SYKQ5,4,1397403903

The fields are 1. user ID 2. sessionID 3. item ID 4. event type 5. time stamp. Time stamp is included in the input so that time stamped rating data can be generated. One of the features of sifarish is time sensitive recommendation, which requires time stamped rating data.

The mapper output output key is user ID and item ID. It is secondary sorted by event type. On the reducer side, only the event data corresponding to the most engaging event is processed and the rest is ignored.

The event type to rating mapping meta data is provided through a JSON as below. The event types are as described earlier.

{
	"eventScores" : 
	[
		{
			 "eventType" : 1,
			 "description" : "purchased",
			 "scores" : [100]
		},
		{
			 "eventType" : 2,
			 "description" : "joined checkout",
			 "scores" : [85]
		},
		{
			 "eventType" : 3,
			 "description" : "placed in shopping cart",
			 "scores" : [60]
		},
		{
			 "eventType" : 4,
			 "description" : "placed in wishlist",
			 "scores" : [40]
		},
		{
			 "eventType" : 5,
			 "description" : "browsed from search result",
			 "scores" : [25,32,38,43,47]
		},
		{
			 "eventType" : 6,
			 "description" : "browsed for recommendation list",
			 "scores" : [15,21,26,30,33]
		},
		{
			 "eventType" : 7,
			 "description" : "browsed",
			 "scores" : [5,12,17,21,24]
		},
		{
			 "eventType" : -1,
			 "description" : "returned"
		},
		{
			 "eventType" : -2,
			 "description" : "left checkout"
		},
		{
			 "eventType" : -3,
			 "description" : "removed from shopping cart"
		},
		{
			 "eventType" : -4,
			 "description" : "removed from wish list"
		}
	]
}

The scores field provides the mapping between event occurrence count and rating. As the count increases, the rating reaches a limiting value.

Here is some sample output. The fields are 1. user ID 2. item ID 3.rating 4. most engaging event type 5. event count.  The last two fields are are optional output, controlled through a configuration parameter.

000R1I1QK4R62,512YL4KC6W,5,1397434130,7,1
000R1I1QK4R62,7A0JLOLVQ2,25,1397803873,5,1
000R1I1QK4R62,7DFGDFU026,100,1397864143,1,1
000R1I1QK4R62,FOD39Y2FTT,15,1397436814,6,1
000R1I1QK4R62,GZF5UQ75N9,25,1397647743,5,1
000R1I1QK4R62,J4LWGR23OI,40,1397645120,4,1
000R1I1QK4R62,QBALZ21R1E,40,1397858317,4,1
000R1I1QK4R62,SC4604N2XQ,5,1397445978,7,1
000W425HZ6JL4,4POOZEJ4HN,60,1397854330,3,1

Negative Events

Some events have negative values indicating negative actions on the part of the user e.g., removing an item from the shopping cart. While processing the the event sequence in the reducer for an user and an item, all the negative events are identified.

For each such negative event, a corresponding positive event is removed from the event sequence, before calculating rating.

Wrapping Up

We have gone through a simple  heuristic based process to convert click stream data to implicit rating. Beyond recommendation, the implicit rating can  potentially be used for other purposes. One example is targeted personalized marketing.

To run the example, please refer to the Implicit Rating Predictor section of this tutorial document.

About these ads

About Pranab

I am Pranab Ghosh, a software professional in the San Francisco Bay area. I manipulate bits and bytes for the good of living beings and the planet. I have worked with myriad of technologies and platforms in various business domains for early stage startups, large corporations and anything in between. I am an active blogger and open source contributor. I am passionate about technology and green and sustainable living. My technical interest areas are Big Data, Distributed Processing, NOSQL databases, Data Mining and Programming languages. I am fascinated by problems that don't have neat closed form solution.
This entry was posted in Big Data, eCommerce, Hadoop and Map Reduce, Recommendation Engine, Web Analytic and tagged , , . Bookmark the permalink.

5 Responses to From Explicit User Engagement to Implicit Product Rating

  1. Pingback: Making Recommendations in Real Time | Mawazo

  2. Pingback: From Item Correlation to Rating Prediction | Mawazo

  3. Pingback: Popularity Shaken | Mawazo

  4. Pingback: Novelty in Personalization | Mawazo

  5. Pingback: Realtime Trending Analysis with Approximate Algorithms | Mawazo

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s