The output of a recommendation engine, whether based on collaborative filtering or some other techniques reflects consumer’s interest in products or services. However a business may have some goals that may be at odds with the items recommended by the recommendation engine. For example, a business may be more interested in selling items with large inventory or items that are being promoted with discounted pricing. How do we reconcile these conflicting interests.
My open source recommendation engine sifarish is no exception to this problem. . So I decided to do something about it. We need to find score for recommended items that reflect a compromise between consumer interest and business interest. Here is my solution. For every item scores are assigned for one or more business goals. These scores are combined with the recommendation score using a weighted average algorithm.
Recommendation Engine Processing
As outlined in my earlier blog, there are three MR jobs that run in tandem to generate predicted rating for each user, item pair. The three MR jobs do the following
- Create rating matrix from available rating data and generate correlation between items
- For all users, using available rating data as input and the correlation coefficient from the first job generate predicted rating
- For a given item and user, aggregate across all input rating to generate final predicted rating for an user and item
Business Goal Processor Map reduce
The scores for different business goals along with recommendation output are fed into the MR class BusinessGoalInjector along with the output of the last MR from the list above to generate the final rating score. Here is some sample business score data
W2EIVQXWU5,568,21 M7LIBV4M4P,897,681 FN95R3KA36,298,185 1Z12NT7OVE,692,622 YCEM2VGI7K,632,933 1XJME7Y3V3,332,670 Y49XI0HKKZ,803,691
The fields for the business goal in our example input are
- Item ID
- Score for inventory level
- Score for promotion.
The magnitude of score reflects the importance of the particular business goal. For example, a score of 1000 for inventory implies highest level of inventory. Actual inventory level is transformed to a scale of 0 to 1000.
The mapper output key is the itemID. There are two kinds of values associated with each key; one business score record and multiple user ID, rating score records. We are using secondary sorting so that the business score record appears first on the reducer side followed by multiple userID, rating score pairs for a given itemID which is the key
Different weights can be can be assigned to different business goal through a config parameter as follows
The remaining weight of 60 in this case is assigned to the basic recommendation rating score. The total adds up to 100. The reducer computes the weighted score and outputs (userID, itemID, score).
Some business goals may force a recommended item to be excluded, based on the score of that business goal. For example, when the score for inventory for an item is below some predefined threshold, that item does not make it to the final recommendation list. This threshold is defined through the following configuration for the two business goals we have in our example. The value of -1 implies no such filtering is applied.
Finally, I run the output through a MR called TextSorter, which sorts the output based on userID, so that I have all the predicted score for different items for a given user together. Here is some sample out. The first set is the basic recommendation engine output. The second set is the output after business goal processing
T32JFQSFM45T,VU9D1W970J,1000 T32JFQSFM45T,A3AS3B4KQT,250 T32JFQSFM45T,2AC311PMPT,1000 T32JFQSFM45T,VU9D1W970J,786 T32JFQSFM45T,A3AS3B4KQT,386 T32JFQSFM45T,2AC311PMPT,731
The fields are userID, itemID and score. As you can see, in some cases the score has dropped and in some cases gone up.
In contextual recommendation, additional constraints are place on the recommendations results based on various other criteria. Here is an example of contextual recommendation that can be handled using the same technique.
In recommending web content, the freshness of the content could be incorporated by assigning score inversely proportional to the time elapsed since the content was published. Content could also be assigned score based on advertisement revenue potential . These scores will impact the final recommendation score and how they are ranked when presented to the user.
The processing presented here is simple and straightforward. However, the feature is important for any recommendation engine being used to solve real world problem. I have updated the collaborative filtering based recommendation tutorial document to include these two post processing MR.