From Explicit User Engagement to Implicit Product Rating

Posted on February 10, 2014 by Pranab

The basic input for sifarish or any other collaborative filtering based recommendation engine is user rating of items. However explicit rating by users is not always available. Even when it’s available, it’s been known that generally only users with extreme views tend to explicitly rate items. So the rating data even when available may be biased and not very reliable.

However, user click stream data is always available. The type of engagement an user has with an item (e.g browsing product description, placing an item in shopping cart etc.) reflects the level of interest an user has on the item. Based on this intuition, it’s possible to map engagement events to an implicit rating.

Application of this kind of heuristic is viable option, when there is paucity of explicit rating data or when such data is deemed to be not very reliable. This preprocessing map reduce job to estimate implicit rating is provided by sifarish. In this post, we will go over the details of this map reduce job with an example.

Mapping User Engagement Events

In our example, we consider 5 different event types with decreasing user interest level as below.

Event Type	Description
1	Purchased item
2	Joined checkout
3	Placed item in shopping cart
4	Placed item in wish list
5	Browsed item from search result
6	Browsed item from recommendation list
7	Browsed item
-1	Returned item
-2	Left checkout
-3	Removed item from shopping cart
-4	Removed item from wish list

The user rating is a function of the event type and the number of occurrences of such event type. If there are multiple event types associated with an item, The rating associated with each event type is calculated and the highest rating among them is selected.

For a given event type, rating increases asymptotically with increasing number of occurrences up to a threshold rating value.

Estimating Implicit Rating

The map reduce implementation for implicit rating is here. Some sample input data is as follows, which can easily be generated by pre-processing raw click stream data.

0I3GQ6SETOIR,1595e19b-01c1-48a6-835c-e7d55902417e,929BBU0001,6,1397403852
UGU2IS4VW6SC,6238c407-377e-4b02-b0ea-be90dbd5b199,YVY412FGW4,6,1397403868
HW0WP38NWV2V,b73c6b09-390c-4494-b893-b6e265332ade,SQAG41CKO1,7,1397403886
TKQFZM0WCM84,5dcd1252-071e-4299-8d57-f6b0a61fd795,93R93SYKQ5,4,1397403903

The fields are 1. user ID 2. sessionID 3. item ID 4. event type 5. time stamp. Time stamp is included in the input so that time stamped rating data can be generated. One of the features of sifarish is time sensitive recommendation, which requires time stamped rating data.

The mapper output output key is user ID and item ID. It is secondary sorted by event type. On the reducer side, only the event data corresponding to the most engaging event is processed and the rest is ignored.

The event type to rating mapping meta data is provided through a JSON as below. The event types are as described earlier.

{
	"eventScores" : 
	[
		{
			 "eventType" : 1,
			 "description" : "purchased",
			 "scores" : [100]
		},
		{
			 "eventType" : 2,
			 "description" : "joined checkout",
			 "scores" : [85]
		},
		{
			 "eventType" : 3,
			 "description" : "placed in shopping cart",
			 "scores" : [60]
		},
		{
			 "eventType" : 4,
			 "description" : "placed in wishlist",
			 "scores" : [40]
		},
		{
			 "eventType" : 5,
			 "description" : "browsed from search result",
			 "scores" : [25,32,38,43,47]
		},
		{
			 "eventType" : 6,
			 "description" : "browsed for recommendation list",
			 "scores" : [15,21,26,30,33]
		},
		{
			 "eventType" : 7,
			 "description" : "browsed",
			 "scores" : [5,12,17,21,24]
		},
		{
			 "eventType" : -1,
			 "description" : "returned"
		},
		{
			 "eventType" : -2,
			 "description" : "left checkout"
		},
		{
			 "eventType" : -3,
			 "description" : "removed from shopping cart"
		},
		{
			 "eventType" : -4,
			 "description" : "removed from wish list"
		}
	]
}

The scores field provides the mapping between event occurrence count and rating. As the count increases, the rating reaches a limiting value.

Here is some sample output. The fields are 1. user ID 2. item ID 3.rating 4. most engaging event type 5. event count. The last two fields are are optional output, controlled through a configuration parameter.

000R1I1QK4R62,512YL4KC6W,5,1397434130,7,1
000R1I1QK4R62,7A0JLOLVQ2,25,1397803873,5,1
000R1I1QK4R62,7DFGDFU026,100,1397864143,1,1
000R1I1QK4R62,FOD39Y2FTT,15,1397436814,6,1
000R1I1QK4R62,GZF5UQ75N9,25,1397647743,5,1
000R1I1QK4R62,J4LWGR23OI,40,1397645120,4,1
000R1I1QK4R62,QBALZ21R1E,40,1397858317,4,1
000R1I1QK4R62,SC4604N2XQ,5,1397445978,7,1
000W425HZ6JL4,4POOZEJ4HN,60,1397854330,3,1

Negative Events

Some events have negative values indicating negative actions on the part of the user e.g., removing an item from the shopping cart. While processing the the event sequence in the reducer for an user and an item, all the negative events are identified.

For each such negative event, a corresponding positive event is removed from the event sequence, before calculating rating.

Wrapping Up

We have gone through a simple heuristic based process to convert click stream data to implicit rating. Beyond recommendation, the implicit rating can potentially be used for other purposes. One example is targeted personalized marketing.

To run the example, please refer to the Implicit Rating Predictor section of this tutorial document.

For commercial support for this solution or other solutions in my github repositories, please talk to ThirdEye Data Science Services. Support is available for Hadoop or Spark deployment on cloud including installation, configuration and testing,

About Pranab

I am Pranab Ghosh, a software professional in the San Francisco Bay area. I manipulate bits and bytes for the good of living beings and the planet. I have worked with myriad of technologies and platforms in various business domains for early stage startups, large corporations and anything in between. I am an active blogger and open source project owner. I am passionate about technology and green and sustainable living. My technical interest areas are Big Data, Distributed Processing, NOSQL databases, Machine Learning and Programming languages. I am fascinated by problems that don't have neat closed form solution.

View all posts by Pranab →

This entry was posted in Big Data, Data Science, eCommerce, Hadoop and Map Reduce, Recommendation Engine, Web Analytic and tagged heuristic, product rating, user enagement. Bookmark the permalink.

23 Responses to From Explicit User Engagement to Implicit Product Rating

Pingback: Making Recommendations in Real Time | Mawazo
Pingback: From Item Correlation to Rating Prediction | Mawazo
Pingback: Popularity Shaken | Mawazo
Pingback: Novelty in Personalization | Mawazo
Pingback: Realtime Trending Analysis with Approximate Algorithms | Mawazo
Pingback: Positive Feedback Driven Recommendation Rank Reordering | Mawazo
Pingback: Customer Service and Recommendation System | Mawazo
Archie Sheran says:

June 12, 2015 at 7:13 am

Hi, Pranab my name is Archie and I am working in a German company. I am tasked with building a recommendation engine that takes as input an item’s position in search and its click trough rate to deliver a score of the item that I can later use for sorting in search. I think I can map an item position to user engagement events and then use sifarish to get a rating. Do you think this is the best way to go around my task?

Reply
- Pranab says:
  
  June 12, 2015 at 9:02 am
  
  Archie, you can use sifarish. You could model your events as follows. Let’s say your search results are broken into page, each page containing 10 items. As an example, then the events in increasing order of affinity could be 1)item in 3rd page or later 2)item in 2nd page 3) item in 1st page 4)item clicked irrespective of location in search result. For any item, events later in my list will supersede earlier events. That’s the logic of the map reduce.
  
  For each such event you could define scores with number of occurrences, as shown in the sample JSON. Them you run the implicit rating generator map reduce.
  
  Reply
Shob says:

January 9, 2016 at 4:40 pm

I am looking for some source code/documentation for RedisSpout.withTupleFields but not finding any. Could you point me pls to the API ?

Reply
- Pranab says:
  
  November 13, 2016 at 9:48 am
  
  Shob
  Check my project sifarish in github
  
  Reply
vij says:

November 11, 2016 at 11:57 pm

Hi..
I see that you have manually come up with weights.
i.e { “eventType” : 1,”description” : “purchased”,”scores” : [100]},
{“eventType” : 2,”description” : “joined checkout”,”scores” : [85]},

Now { “eventType” : 1,”description” : “purchased”}, can have a score of 95 or 90.

How can we select optimal weights for each event type?

Reply
Pranab says:

November 12, 2016 at 11:00 am

Vij,
The weights are up to you, based on heuristics. Weights increase with the affinity of the event to conversion. For example, “joined checkout” will have higher weight that “browsed product”.

Reply
- vij says:
  
  November 13, 2016 at 3:14 am
  
  Hi Pranab
  I totally agree, “joined checkout” will have higher weight that “browsed product”.
  but higher by how much. What I am thinking is, if we can derive these weights using cross validation or gridsearch. pointers to any such resources will be useful.
  
  Reply
  - Pranab says:
    
    November 13, 2016 at 9:42 am
    
    Vij
    
    As I said these are input configuration parameters. You set it to whatever you like, subject to the guidelines I provided.
    
    These are not machine learning parameters for a prediction problem. Cross validation. grid search don’t make any sense here.
Pingback: Measuring Campaign Effectiveness for an Online Service on Spark | Mawazo
Natsu says:

May 7, 2018 at 3:39 am

Hi, Thanks for writing useful blog
I tried to run ./brec.sh to generate event followed this guide “./brec.sh genHistEvent ” but as I run “./brec.sh genHistEvent 1000 100 10”, I got “./brec.sh: line 58: $5: ambiguous redirect”. I would greatly appreciate on your help.

Reply
Pranab says:

May 8, 2018 at 9:06 am

@Natsu You need to provide the name of the file where you want to save the output as the last argument. So there will be an additional command line argument. The tutorial is incorrect. I will correct it and check in

Reply
- Natsu says:
  
  May 8, 2018 at 9:39 am
  
  Hi, I also try this as i read through script and guess the $5 but another error came out.
  
  Reply
- Natsu says:
  
  May 8, 2018 at 9:55 am
  
  Thanks for your attention to my humble question, also sorry not showing gratitute before asking further question.
  
  Reply
natsu says:

May 8, 2018 at 9:44 am

[root@quickstart resource]# ./brec.sh genHistEvent 1000 3000 10 10
generating historical event data
./engage.rb:139: undefined method `uuid’ for SecureRandom:Module (NoMethodError)
from ./engage.rb:133:in `upto’
from ./engage.rb:133
[root@quickstart resource]# vim brec.sh

[1]+ Stopped vim brec.sh
[root@quickstart resource]# ./brec.sh genHistEvent 1000 3000 10 somefile
generating historical event data
./engage.rb:139: undefined method `uuid’ for SecureRandom:Module (NoMethodError)
from ./engage.rb:133:in `upto’
from ./engage.rb:133
[root@quickstart resource]# fg
vim brec.sh

[1]+ Stopped vim brec.sh
[root@quickstart resource]# ./brec.sh genHistEvent 1000 3000 10 9
generating historical event data
./engage.rb:139: undefined method `uuid’ for SecureRandom:Module (NoMethodError)
from ./engage.rb:133:in `upto’
from ./engage.rb:133

Reply
- Pranab says:
  
  May 8, 2018 at 11:05 am
  
  May have something to do with your ruby version. Responded in github. Please use github for this issue.
  
  Reply
  - Adsonamt says:
    
    June 6, 2018 at 6:06 pm
    
    What ?