From Explicit User Engagement to Implicit Product Rating


The basic input for sifarish or any other collaborative filtering  based recommendation engine is user rating of items. However explicit  rating by users is not always available. Even when it’s available, it’s been known that generally only users with extreme views tend to explicitly rate items. So the rating data even when available may be biased and not  very reliable.

However, user click stream data is always available. The type of engagement an user has with an item (e.g browsing product description, placing an item in shopping cart etc.) reflects the level of interest an user  has on the item. Based on this intuition, it’s possible to map engagement events to an implicit rating.

Application of this kind of heuristic is  viable option, when there is paucity of explicit rating data  or when such data is deemed to be not very reliable. This preprocessing  map reduce job to estimate implicit rating  is provided by sifarish. In this post, we will go over the details of this map reduce job with an example.

Mapping User Engagement Events

In our example, we consider 5 different event types with decreasing user interest level as below.

Event Type Description
1 Purchased item
2 Joined checkout
3 Placed item in shopping cart
4 Placed item in wish list
5 Browsed item from search result
6 Browsed item from recommendation list
7 Browsed item
-1 Returned item
-2 Left checkout
-3 Removed item from shopping cart
-4 Removed item from wish list

The user rating is a function of the event type and the number of occurrences of such event type.  If there are multiple event types associated with an item, The rating associated with each event type is calculated and the highest rating among them is selected.

For a given event type, rating increases asymptotically with increasing number of occurrences up to a threshold rating value.

Estimating Implicit Rating

The  map reduce implementation for implicit rating is here. Some sample input data is as follows, which can easily be generated by pre-processing raw click stream data.

0I3GQ6SETOIR,1595e19b-01c1-48a6-835c-e7d55902417e,929BBU0001,6,1397403852
UGU2IS4VW6SC,6238c407-377e-4b02-b0ea-be90dbd5b199,YVY412FGW4,6,1397403868
HW0WP38NWV2V,b73c6b09-390c-4494-b893-b6e265332ade,SQAG41CKO1,7,1397403886
TKQFZM0WCM84,5dcd1252-071e-4299-8d57-f6b0a61fd795,93R93SYKQ5,4,1397403903

The fields are 1. user ID 2. sessionID 3. item ID 4. event type 5. time stamp. Time stamp is included in the input so that time stamped rating data can be generated. One of the features of sifarish is time sensitive recommendation, which requires time stamped rating data.

The mapper output output key is user ID and item ID. It is secondary sorted by event type. On the reducer side, only the event data corresponding to the most engaging event is processed and the rest is ignored.

The event type to rating mapping meta data is provided through a JSON as below. The event types are as described earlier.

{
	"eventScores" : 
	[
		{
			 "eventType" : 1,
			 "description" : "purchased",
			 "scores" : [100]
		},
		{
			 "eventType" : 2,
			 "description" : "joined checkout",
			 "scores" : [85]
		},
		{
			 "eventType" : 3,
			 "description" : "placed in shopping cart",
			 "scores" : [60]
		},
		{
			 "eventType" : 4,
			 "description" : "placed in wishlist",
			 "scores" : [40]
		},
		{
			 "eventType" : 5,
			 "description" : "browsed from search result",
			 "scores" : [25,32,38,43,47]
		},
		{
			 "eventType" : 6,
			 "description" : "browsed for recommendation list",
			 "scores" : [15,21,26,30,33]
		},
		{
			 "eventType" : 7,
			 "description" : "browsed",
			 "scores" : [5,12,17,21,24]
		},
		{
			 "eventType" : -1,
			 "description" : "returned"
		},
		{
			 "eventType" : -2,
			 "description" : "left checkout"
		},
		{
			 "eventType" : -3,
			 "description" : "removed from shopping cart"
		},
		{
			 "eventType" : -4,
			 "description" : "removed from wish list"
		}
	]
}

The scores field provides the mapping between event occurrence count and rating. As the count increases, the rating reaches a limiting value.

Here is some sample output. The fields are 1. user ID 2. item ID 3.rating 4. most engaging event type 5. event count.  The last two fields are are optional output, controlled through a configuration parameter.

000R1I1QK4R62,512YL4KC6W,5,1397434130,7,1
000R1I1QK4R62,7A0JLOLVQ2,25,1397803873,5,1
000R1I1QK4R62,7DFGDFU026,100,1397864143,1,1
000R1I1QK4R62,FOD39Y2FTT,15,1397436814,6,1
000R1I1QK4R62,GZF5UQ75N9,25,1397647743,5,1
000R1I1QK4R62,J4LWGR23OI,40,1397645120,4,1
000R1I1QK4R62,QBALZ21R1E,40,1397858317,4,1
000R1I1QK4R62,SC4604N2XQ,5,1397445978,7,1
000W425HZ6JL4,4POOZEJ4HN,60,1397854330,3,1

Negative Events

Some events have negative values indicating negative actions on the part of the user e.g., removing an item from the shopping cart. While processing the the event sequence in the reducer for an user and an item, all the negative events are identified.

For each such negative event, a corresponding positive event is removed from the event sequence, before calculating rating.

Wrapping Up

We have gone through a simple  heuristic based process to convert click stream data to implicit rating. Beyond recommendation, the implicit rating can  potentially be used for other purposes. One example is targeted personalized marketing.

To run the example, please refer to the Implicit Rating Predictor section of this tutorial document.

For commercial support for this solution or other solutions in my github repositories, please talk to ThirdEye Data Science Services. Support is available for Hadoop or Spark deployment on cloud including installation, configuration and testing,

About Pranab

I am Pranab Ghosh, a software professional in the San Francisco Bay area. I manipulate bits and bytes for the good of living beings and the planet. I have worked with myriad of technologies and platforms in various business domains for early stage startups, large corporations and anything in between. I am an active blogger and open source project owner. I am passionate about technology and green and sustainable living. My technical interest areas are Big Data, Distributed Processing, NOSQL databases, Machine Learning and Programming languages. I am fascinated by problems that don't have neat closed form solution.
This entry was posted in Big Data, Data Science, eCommerce, Hadoop and Map Reduce, Recommendation Engine, Web Analytic and tagged , , . Bookmark the permalink.

23 Responses to From Explicit User Engagement to Implicit Product Rating

  1. Pingback: Making Recommendations in Real Time | Mawazo

  2. Pingback: From Item Correlation to Rating Prediction | Mawazo

  3. Pingback: Popularity Shaken | Mawazo

  4. Pingback: Novelty in Personalization | Mawazo

  5. Pingback: Realtime Trending Analysis with Approximate Algorithms | Mawazo

  6. Pingback: Positive Feedback Driven Recommendation Rank Reordering | Mawazo

  7. Pingback: Customer Service and Recommendation System | Mawazo

  8. Hi, Pranab my name is Archie and I am working in a German company. I am tasked with building a recommendation engine that takes as input an item’s position in search and its click trough rate to deliver a score of the item that I can later use for sorting in search. I think I can map an item position to user engagement events and then use sifarish to get a rating. Do you think this is the best way to go around my task?

    • Pranab says:

      Archie, you can use sifarish. You could model your events as follows. Let’s say your search results are broken into page, each page containing 10 items. As an example, then the events in increasing order of affinity could be 1)item in 3rd page or later 2)item in 2nd page 3) item in 1st page 4)item clicked irrespective of location in search result. For any item, events later in my list will supersede earlier events. That’s the logic of the map reduce.

      For each such event you could define scores with number of occurrences, as shown in the sample JSON. Them you run the implicit rating generator map reduce.

  9. Shob says:

    I am looking for some source code/documentation for RedisSpout.withTupleFields but not finding any. Could you point me pls to the API ?

  10. vij says:

    Hi..
    I see that you have manually come up with weights.
    i.e { “eventType” : 1,”description” : “purchased”,”scores” : [100]},
    {“eventType” : 2,”description” : “joined checkout”,”scores” : [85]},

    Now { “eventType” : 1,”description” : “purchased”}, can have a score of 95 or 90.

    How can we select optimal weights for each event type?

  11. Pranab says:

    Vij,
    The weights are up to you, based on heuristics. Weights increase with the affinity of the event to conversion. For example, “joined checkout” will have higher weight that “browsed product”.

    • vij says:

      Hi Pranab
      I totally agree, “joined checkout” will have higher weight that “browsed product”.
      but higher by how much. What I am thinking is, if we can derive these weights using cross validation or gridsearch. pointers to any such resources will be useful.

      • Pranab says:

        Vij

        As I said these are input configuration parameters. You set it to whatever you like, subject to the guidelines I provided.

        These are not machine learning parameters for a prediction problem. Cross validation. grid search don’t make any sense here.

  12. Pingback: Measuring Campaign Effectiveness for an Online Service on Spark | Mawazo

  13. Natsu says:

    Hi, Thanks for writing useful blog
    I tried to run ./brec.sh to generate event followed this guide “./brec.sh genHistEvent ” but as I run “./brec.sh genHistEvent 1000 100 10”, I got “./brec.sh: line 58: $5: ambiguous redirect”. I would greatly appreciate on your help.

  14. Pranab says:

    @Natsu You need to provide the name of the file where you want to save the output as the last argument. So there will be an additional command line argument. The tutorial is incorrect. I will correct it and check in

    • Natsu says:

      Hi, I also try this as i read through script and guess the $5 but another error came out.

    • Natsu says:

      Thanks for your attention to my humble question, also sorry not showing gratitute before asking further question.

  15. natsu says:

    [root@quickstart resource]# ./brec.sh genHistEvent 1000 3000 10 10
    generating historical event data
    ./engage.rb:139: undefined method `uuid’ for SecureRandom:Module (NoMethodError)
    from ./engage.rb:133:in `upto’
    from ./engage.rb:133
    [root@quickstart resource]# vim brec.sh

    [1]+ Stopped vim brec.sh
    [root@quickstart resource]# ./brec.sh genHistEvent 1000 3000 10 somefile
    generating historical event data
    ./engage.rb:139: undefined method `uuid’ for SecureRandom:Module (NoMethodError)
    from ./engage.rb:133:in `upto’
    from ./engage.rb:133
    [root@quickstart resource]# fg
    vim brec.sh

    [1]+ Stopped vim brec.sh
    [root@quickstart resource]# ./brec.sh genHistEvent 1000 3000 10 9
    generating historical event data
    ./engage.rb:139: undefined method `uuid’ for SecureRandom:Module (NoMethodError)
    from ./engage.rb:133:in `upto’
    from ./engage.rb:133

Leave a comment