Novelty in Personalization

We all have the unfortunate experience of being pigeon holed by Personalization and Recommendation engines. When recommendation are based on our past behavior and there is very little opportunity to explore. But our past actions are not always good predictors for our future behavior. At any given moment, our behavior is highly influence by our mood, fleeting thoughts and contextual surrounding. There are are several way to improve the solution e.g., by introducing novelty and diversity in the recommendation list. Even adding some items randomly to the recommendation list has been found to be effective.

I have recently added novelty in my open source Recommendation and Personalization engine sifarish. In this post I will go over the solution as implemented in sifarish. We will start with some some core concepts followed by the Hadoop Map reduce workflow for the implementation. I will cover diversity in a future post when it’s implemented.

Novelty

Novelty has an inverse relationship with the exposure an user has to an item. It’s an indicator of how unfamiliar a particular item is to an user. Novelty can be defined with respect to a specific user’s exposure to an item or with respect to the exposure the whole user population has to an item. We will be using the user specific novelty definition. Although the solution discussed here can equally be applied novelty based on the whole user population.

Our solution consists of the following steps. Details on creating a recommendation list based on item to items collaborative filtering can be found in my earlier post.

For any user and item, calculate a novelty score
Take an weighted average of the predicted rating score and the novelty score to calculate a net score
Create a recommendation list based on the descending order of the net score

Novelty Score

I have used two algorithms for computing the novelty score. With both of them, novelty score increases at an increasing rate, as the an user’s exposure to an item goes to zero. The first one has it’s grounding in information theory and is based on Self Information. Novelty is self information as defined below.

n(u,i) = -log(p(u,i)) where
n(u,i) = novelty (self information)
p(u,i) = probability of exposure to an item i for an user u

In my earlier post, I discussed how to calculate implicit rating for an item based on an user’s level of engagement with that item. For a given user, from the histogram of implicit rating for items, we can easily calculate p(i)

The second algorithm is heuristic based and defined with a second degree polynomial as below

n(u,i) = k2 * r(u,i)² + k1 * r(u.i) + k0 where
n(u,i) = novelty
r(u,i) = implicit or explicit rating of item i for user u

The three coefficients are found from the following conditions. The parameter n is user defined with a value less than 1.0. It dictates how far below the linear inverse line the polynomial lies, or in other words the degree of non linearity.

When r(u,i) = r_max, n(u,i) = 0 where r_max = maximum implicit rating
When r(u,i) = 0, n(u,i) = n_max where n_max = maximum novelty score
When r = r_max /2, n(u,i) = n * r_max /2 where n = user defined parameter ( < 1.0)

Map Reduce Workflow

There are multiple Map Reduce jobs as listed below that are used to compute the final net rating score.

Input	Map Reduce	Output
User item implicit rating	ItemEngagementDistr which converts implicit rating to rating distribution	Estimated item rating distribution as (userID,itemID, rating, rating distibution)
Output of previous job	IndividualNovelty which converts either rating or rating distribution to novelty	Novelty as (userID,itemID, novelty)
Output of previous job and predicted rating	Joiner which joins predicted rating and novelty for each user, item pair	Predicted rating and novelty as (userID,itemID, predicted rating, novelty)
Output of previous job	WeightedAverage which computes weighted average of predicted rating and novelty	Net score as (userID,itemID, score)

The first MR job ItemEngagementDistr simply calculates a probability distribution from a a histogram.

The second MR job IndividualNovelty operates on one of the two supported algorithms to calculate novelty as defined through the parameter novelty.gen.strategy. For self information based algorithm, the rating distribution is used. For polynomial based algorithm, the actual implicit rating value is used. This MR class is extended from another MR class called Transformer which allows transformation or suppression of fields in a record. Transformation logic can be provided as a plugin.

The third MR job Joiner simply joins predicted rating and novelty keyed on (userID, itemID) with some caveat. When the novelty value is not available for an (userID, itemID) pair, the maximum novelty value is inserted. It covers for the case when an user has not engaged with an item at all, there is no implicit rating and hence no novelty score. This MR generally does inner join. However if the default value for a missing record is provided, it switches to the outer join mode.

The fourth MR job WeightedAverage takes weighted average of predicted rating and novelty. The relative weights are defined through the parameter field.weights. One interesting aspect of this job is that as an option when the value of novelty is 0 the output out can be suppressed for the record.

This features enables us to handle the scenario where an item will not be recommended, if it has been already consumed by the user. In an eCommerce setting, consumption corresponds to purchase and this feature is more relevant. We all had the annoying experience of being recommended something or similar to something we have purchased recently. Importance of this feature really depends on the domain. With media, an user may consume an item multiple times (e.g., listening to the same song multiple times) and this suppression may not be necessary.

Here is some sample of the final output, consisting of userID, itemID and score.

100L4TLNW22W7,SVAUUQ5YT3,79
100L4TLNW22W7,VWBN02G7K0,73
100L4TLNW22W7,02BODOHLO1,73
100L4TLNW22W7,X9MI27Z3QF,69
100L4TLNW22W7,TWPK3MX682,69

How Much Novelty

How much novelty should be injected into the mix. Honestly we don’t really know. As consumers interacting with web site, we are either in an exploratory mode, exploring things we have not been exposed to before or we are in a focused exploitative mode when armed with knowledge gained from exploration, we are going to choose something and consume it.

While in an exploratory mode, it’s beneficial to introduce more novelty while doing customization and recommendation for an user. On the other hand while in an exploitative goal driven mode, novelty should be suppressed.

Unfortunately, there is no easy way to detect either of these modes. If we could glean this kind of latent information from click stream, the amount of novelty could be tuned to match user’s current mode of behavior.

Global Novelty

So far, we have used novelty of an item per user basis. We could have used aggregate as well. Global novelty for an item is defined over the whole user community

We have to replace r(u,i) which is the rating for an item i by an user u, with r(i), which is average rating of an item aggregated over all users. This gets used for second degree polynomial based novelty estimation.

For self information based novelty estimation we use p(i), which is the probability that users have interacted with the item i. It’s based on the number of users that have interacted with an item.

Sifarish supports global novelty. The net score based on predicted rating and global novelty could be computed by executing the following MR jobs

Run Projection on rating data grouped by itemID
Run GlobalNovelty on the output.
Run Joiner to join rating and novelty data
Run WeightedAverage for net score

Dissimilarity Based Novelty

So far we have based our novelty definition on whether an user has been exposed to an item. One issue with defining novelty based on discovery is that for those items not yet discovered the novelty score is at it’s maximum.

However, an item that we have not been exposed to before may be very similar to an item that we have been exposed to before and hence it’s novelty score should be low. The novelty score should depend on upon how dissimilar an item is to the items that we have been exposed to. This is where dissimilarity based novelty shines. It’s defined as below

n(u,i) = Σ_j((1 – sim(i,j)) * r(u.j) where
n(u,i) = novelty of item i for user u
sim(i,j)=similarity between item i and j
r(u,j) = rating of item j by user u

The similarity sim(i,j) could be defined based on item correlation in social interaction context or physical attribute context.

Summing Up

With all the idiosyncrasies we have and the contextual environment we are in, it’s impossible for an algorithm to predict completely what’s in our mind. The algorithms discussed in this post are essentially steps towards improving the solution. This a good article on the topic of novelty and diversity.

To run the example please follow the steps under the section novelty in the tutorial document. if you have not run the tutorial before and don’t have predicted rating, then you have to start from the beginning.

For commercial support for this solution or other solutions in my github repositories, please talk to ThirdEye Data Science Services. Support is available for Hadoop or Spark deployment on cloud including installation, configuration and testing,