What does email marketing have to do with Markov model? Let’s explore and find out. Any consumer of product and services has a natural rhythm to his or her purchase history. Regular customers tend to visit a business according to some temporal patterns inherent in their buying history of products and services.

Wouldn’t you expect to get better results if your email marketing or any other marketing for that matter is aligned with the temporal pattern inherent in a customer’s purchase behavior. In practical terms, it means that your marketing email will go out at a time, predicted to be optimal by a predictive model. Effectively, your marketing campaign is being customized and timed appropriately based a customer’s past purchase history.

Well it sounds good, but what predictive modeling technique that can do all these. One answer is the Markov model. In this post, I will show how you to to build a Markov Model based on past customer transactions using Hadoop. This is part of my open source Hadoop driven data mining project *avenir*.

Once we have the model, we will be able to predict, given the recent transactions of a customer, when the customer is likely to make the next purchase and how much he or she is likely to spend.

This prediction can be leveraged in many ways e.g., customizing content when an user lands on your site, customized email marketing. If we also track products being purchased along with the transaction, then we can use the model for recommending products for time delayed cross selling.

## Markov Model

Markov model consists of a set of states of a system, the transitions between those states, and the associated probability of those state transitions.

You can also think of it as a probabilistic state machine. With this model, we can predict the most likely next state, given the current state. One common use of Markov model in the security domain is intrusion detection.

Ideally, in any sequence based predictive model, the current state should be considered to be dependent on all past states. However, in Markov model we assume that the current state depends only on the previous state. It’s as if the system has only short term memory and no long term memory. In spite of this assumption, Markov model has been found to be effective.

For our example, the state consists of a tuple with the following elements. By defining the state this way, it embodies the last two transactions.

*Time since last transaction**Whether the amount spent is less that, same as or greater than the last transaction*

Once we have the probabilistic state transition matrix based on past transactions of all customers, we will be able to predict, in a probabilistic sense, the time for the next purchase and whether the amount that will spent will be less than , more or less same or greater that the last transaction for a given customer. Since our use case is about email marketing, the marketing email will be sent out around the predicted time.

## Sequence Generating Map Reduce

Let’s start with our input. Our input of customer transaction consists of the following fields. The Transaction ID field, although present in the input is not used for any processing.

*Customer ID**Transaction ID**Date**Amount*

Here is some sample input.

U3N10FWOIE,1365990122,2012-01-01,90 Y3AA3LPV93,1365990123,2012-01-02,127 83OJI5G1MQ,1365990124,2012-01-02,84 Z31AA4A50V,1365990125,2012-01-02,129 JWFHFR4149,1365990126,2012-01-02,162 B4YAHPKAN1,1365990127,2012-01-02,146 G9AYL4Y8YB,1365990128,2012-01-02,171 FOE2HD1SV2,1365990129,2012-01-02,129

The job of this map reduce is to take the transaction data and converts it into a sequence of time ordered transactions for a given customer.

The sequence data generated is consumed by the another map reduce that computes the Markov state transition matrix. The map reduce logic is in the class Projection, which implements simple projection, grouping and order by logic as in SQL.

For our data, we are grouping by *customer ID* and ordering by *date. *The ordering is achieved by map reduce secondary sort. The output of this map reduce consists of *customer ID* followed by date sorted list of transactions as shown below. The second field in the input which is *transaction ID* is ignored.

00VVD1E210,2012-06-18,87 00W6TWFW4S,2012-03-24,22,2012-05-22,80,2012-06-15,33 00W86Y0GFT,2012-02-15,141,2012-03-10,30,2012-03-25,49,2012-05-17,107 00W92K8A1W,2012-04-19,25 00W9W3Y3XH,2012-03-25,123 00XL1QERUO,2012-01-07,81,2012-05-10,154 00XPR1XW1P,2012-04-26,103 00Y1B0Y4CO,2012-03-10,81 00YR97DWWO,2012-07-15,118 00Z5SOHKED,2012-01-28,43,2012-02-25,27 00ZLLMHKND,2012-02-21,185,2012-04-02,63,2012-04-03,30

The next thing we do is little bit of processing of this data to transform a consecutive pair of *date* and *amount* into symbol which stands for a state. I have used a script for this purpose. A state is represented by a two letter symbol. Letters are shown below.

Time elapsed since last transaction | Amount spent compared to previous transaction |

S : small | L : significantly less than |

M : medium | E : more or less same |

L : large | G : significantly greater than |

We have 9 states in our system. For example, the state SL implies that the time elapsed between the last transaction and the one before was short and the customer spent less compared to the transaction before. Here is some sample output.

0B1T50K451,SG 0B1US1Z2OX,SL,SG,LL,SG 0B1WPSU3Q0,MG,LL 0B2L0EOA2I,LL,SG 0B3OQ1WW24,SG,LL,LG 0B4G8PNS7N,LL,SG

## Markov State Transition Map Reduce

The goal of this map reduce is to compute that state transition probability matrix using the state sequence from the previous step as an input.

The implementation of this map reduce is in the class MarkovStateTransitionModel. Essentially we have to count the number of instances of state transitions. Since there are 9 states, there are 81 possible state transitions.

The mapper goes through the state sequence for each customer and emits a count of 1 for each *from state, to state* pair. This pair is essentially the mapper key. On the reducer side, we simply update the count for each *from state*, *to state* pair. In the *cleanup* method after the reducer has processed all the the data, we normalize and scale the state transition probability matrix and generate the output.

Only one reducer gets used, since we want all mapper output to be aggregated in one reducer. If multiple reducers are used, then each reducer will generate a partial state transition matrix, all of which will need to be consolidated into the final matrix. For simplicity I am using one reducer.

Here is the output for 9 x 9 state transition probability matrix.

4,1,363,246,3,7,366,3,3 250,0,373,0,0,0,377,0,0 377,1,3,251,0,0,366,0,0 0,0,407,0,0,299,293,0,0 142,0,282,0,0,288,285,0,0 9,0,396,285,0,0,307,0,0 0,0,541,0,0,295,0,0,162 38,0,500,0,0,192,38,38,192 19,0,519,0,0,321,139,0,0

## Predicting Time of Future Transaction

To use the model, we take the state transition matrix and apply it to some recent customer transactions and predict next likely date of transaction for the customers. Here is the output. This processing is done by a script. The output consists of *customer ID* and a *date*.

Y47G4MC20U, 2013-04-28 KNJUW2EYQH, 2013-02-05 T0WALRZ3Q6, 2013-02-08 E4L70D2UAD, 2013-02-07 IS10DE9BQF, 2013-01-27 MJFYX65370, 2013-02-04 R21E4FOYGA, 2013-02-11

## Hidden Markov Model

There is another variation of Markov Model called *Hidden Markov Model (HMM)*. So far, in our example the states are functions of customer transaction i.e., something observable. Sometimes there are states that are not visible, but only some observations are visible as manifestation of internal states. In such cases, HMM is the right model to use.

## Customer Loyalty

We could take our current problem to the next level using HMM. We could define different levels of customer loyalty and they could represent the hidden states in an HMM. Customer transactions are manifestations of these hidden states. Using HMM based modelling we could predict current customer loyalty level as well as loyalty level trajectory based on customer transaction history.

One use of this model is to detect customers with decreasing loyalty. Once detected, some proactive actions could be taken to retain those customers. This will be a topic of a later post.

## Summing Up

Markov model is a powerful tool for building predictive model based on sequence data, including time driven sequence data. There are many use cases in various domains.

If you want to try the example in this post, here is a step by step tutorial document to help. In addition to MR, you have to run several ruby scripts, as described in the tutorial.

Hi Pranab, i liked your post.It has helped me alot in understanding Markov chains method.

Can you please help me how to score customer data and logic of predicting date with transition matrix since i am not a ruby programmer.

SJ

Ruby is not the issue. I happened to use bunch of ruby scripts to convert transaction data to states. This logic is very much application specific and that’s the reason I did not write any java map reduce for this.

Depending on your data format, use whatever language you are comfortable with to convert your data into the format, where in each line you have some entity ID followed by a sequence of states, This data goes as input to the MarkovStateTransition model java MR

Hi Pranab,

In your mapper MarkovStateTransitionModel.StateTransitionMapper class,

you have not used customerID (since you start your loop index from 1)

as part of your mapper output:

protected void map(LongWritable key, Text value, Context context)

throws IOException, InterruptedException {

items = value.toString().split(fieldDelimRegex);

if (items.length >= (skipFieldCount + 2)) {

for (int i = skipFieldCount + 1; i < items.length; ++i) {

outKey.initialize();

outKey.add(items[i-1], items[i]);

context.write(outKey, outVal);

}

}

}

My question is how you process this a set of customers?

For example, if your input value is ",SL,SG,LL” then you generate

(I added line numbers for clarifications):

1: ((customerID, SL), 1)

2: ((SL,SG), 1)

3: ((SG,LL), 1)

How do you capture customerID for 2: and 3:?

Thank you.

Mahmoud

Mahmoud

Machine leaning based solutions consist of two distinct phases : 1) training using historical to build a model and 2) using the model to make prediction for new data.

The class MarkovStateTransitionModel does the training. Customer ID is irrelevant for building the model. All we care about is the sequence of transactions.

Customer ID enters the picture when we make prediction based on the model from step 1 and as described in the section “Predicting Time of Future Transaction”. You could also follow the link for the tutorial for more details.

So, I am assuming that the input to MarkovStateTransitionModel will be

state1,state2, state3, …

Thanks for clarification of input.

best,

Mahmoud

It works either way. The point is that customer ID is not used in building the model. If it’s there in the input it is ignored by appropriately setting the config param skip.field.count

input value is” “CustomeID,SL,SG,LL”

Thank you!

Pingback: Markov Model and Trading History | Tales from a Trading Desk