Smarter Email Marketing with Markov Model


What does email marketing have to do with Markov model? Let’s explore and find out. Any consumer of product and services has a natural rhythm to his or her  purchase history. Regular customers tend to visit a business  according to some temporal patterns inherent in their buying history of  products and services.

Wouldn’t you expect to get better results if your email marketing or any other marketing for that matter is aligned with the temporal pattern inherent  in a customer’s  purchase behavior.  In practical terms, it means that your marketing email will go out at a time, predicted to be optimal by a predictive model. Effectively, your marketing campaign is being customized and timed appropriately based a customer’s past purchase history.

Well it sounds good, but what predictive modeling technique that can do all these. One answer is the Markov model. In this post, I will show how you to to build a Markov Model based on past customer transactions using Hadoop. This is part of my open source Hadoop driven data mining project avenir.

Once we have the model, we will be able to predict, given the recent transactions of a customer, when the customer is likely to make the next purchase and how much he or she is likely to spend.

This prediction can be  leveraged in many ways e.g., customizing content when an user lands on your site, customized email marketing. If we also track products being purchased along with the transaction, then we can use the  model for recommending products for time delayed cross selling.

Markov Model

Markov model consists of a set of states of a system, the  transitions  between those states, and the associated probability of those state transitions.

You can also think of it as a probabilistic state machine. With this model, we can predict the most likely next state, given the current state. One common use of Markov model in the security domain is intrusion detection.

Ideally, in any sequence based predictive model, the current  state should be considered to be dependent on all past states. However, in Markov model we assume that the current state depends only on the previous state. It’s as if the system has only short term  memory and no long term memory. In spite of this assumption, Markov model has been found to be effective.

For our example, the state consists of a tuple with the following elements. By defining the state  this way, it embodies the last two transactions.

  1. Time since last transaction
  2. Whether the amount spent is less that,  same as or greater than the last transaction

Once we have the probabilistic state transition matrix based on past transactions of all customers, we will be able to predict, in a probabilistic sense,  the time for the next purchase and whether the amount that will spent will be less than , more or less same or greater that the last transaction for a given customer.  Since our use case is about email marketing, the marketing email  will be sent out around the predicted time.

Sequence Generating  Map Reduce

Let’s start with our input. Our input of customer transaction consists of the following fields. The Transaction ID field, although present in the input is not used for any processing.

  1. Customer ID
  2. Transaction ID
  3. Date
  4. Amount

Here is some sample input.

U3N10FWOIE,1365990122,2012-01-01,90
Y3AA3LPV93,1365990123,2012-01-02,127
83OJI5G1MQ,1365990124,2012-01-02,84
Z31AA4A50V,1365990125,2012-01-02,129
JWFHFR4149,1365990126,2012-01-02,162
B4YAHPKAN1,1365990127,2012-01-02,146
G9AYL4Y8YB,1365990128,2012-01-02,171
FOE2HD1SV2,1365990129,2012-01-02,129

The job of this map reduce is to take the transaction data and converts it into a sequence  of time ordered transactions for a given customer.

The sequence data generated is consumed by the another map reduce that computes the Markov state transition matrix. The map reduce logic is in  the class Projection, which implements simple projection, grouping and order by logic as in SQL.

For our data,  we are grouping by customer ID and ordering by date. The ordering is achieved by map reduce secondary sort. The output  of this map reduce consists  of customer ID followed by date sorted list of transactions as shown below. The second field in the input which is transaction ID is ignored.

00VVD1E210,2012-06-18,87
00W6TWFW4S,2012-03-24,22,2012-05-22,80,2012-06-15,33
00W86Y0GFT,2012-02-15,141,2012-03-10,30,2012-03-25,49,2012-05-17,107
00W92K8A1W,2012-04-19,25
00W9W3Y3XH,2012-03-25,123
00XL1QERUO,2012-01-07,81,2012-05-10,154
00XPR1XW1P,2012-04-26,103
00Y1B0Y4CO,2012-03-10,81
00YR97DWWO,2012-07-15,118
00Z5SOHKED,2012-01-28,43,2012-02-25,27
00ZLLMHKND,2012-02-21,185,2012-04-02,63,2012-04-03,30

The next thing we do is little bit of processing of this data to transform a consecutive pair of date and  amount into symbol which stands for a state.  I have used a script for this purpose. A state is represented by a two letter symbol. Letters are shown below.

Time elapsed since last transaction Amount spent compared to previous transaction
S : small L : significantly less than
M : medium E : more or less same
L : large G : significantly greater than

We have 9 states in our system. For example, the state SL implies that the time elapsed between the last transaction and the one before was short and the customer spent less compared to the transaction before. Here is some sample output.

0B1T50K451,SG
0B1US1Z2OX,SL,SG,LL,SG
0B1WPSU3Q0,MG,LL
0B2L0EOA2I,LL,SG
0B3OQ1WW24,SG,LL,LG
0B4G8PNS7N,LL,SG

Markov State Transition Map Reduce

The goal of this map reduce is to compute that state transition probability matrix using the state sequence from the previous step as an input.

The implementation of this map reduce is in the class MarkovStateTransitionModel. Essentially we have to count the number of instances of state transitions. Since there are 9 states, there are 81 possible state transitions.

The mapper goes through the state sequence for each customer and emits a count of 1 for each from state, to state pair. This  pair is essentially the mapper key. On the reducer side, we simply update the count for each from state, to state pair. In the cleanup method after the reducer has processed all the the data, we normalize and scale the state transition probability matrix and generate the output.

Only one reducer gets used, since we want all mapper output to be aggregated in one reducer. If multiple reducers are used, then each reducer will generate a partial state transition matrix, all of which will need to be consolidated into the final matrix. For simplicity I am using one reducer.

Here is the output for 9 x 9 state transition probability matrix.

4,1,363,246,3,7,366,3,3
250,0,373,0,0,0,377,0,0
377,1,3,251,0,0,366,0,0
0,0,407,0,0,299,293,0,0
142,0,282,0,0,288,285,0,0
9,0,396,285,0,0,307,0,0
0,0,541,0,0,295,0,0,162
38,0,500,0,0,192,38,38,192
19,0,519,0,0,321,139,0,0

Predicting Time of  Future Transaction

To use the model, we take the state transition matrix and apply it to  some recent customer transactions and predict next likely date of transaction for the customers. Here is the output. This processing is done by a script. The output consists of customer ID and a date.

Y47G4MC20U, 2013-04-28
KNJUW2EYQH, 2013-02-05
T0WALRZ3Q6, 2013-02-08
E4L70D2UAD, 2013-02-07
IS10DE9BQF, 2013-01-27
MJFYX65370, 2013-02-04
R21E4FOYGA, 2013-02-11

Hidden Markov Model

There is another variation of Markov Model called Hidden Markov Model (HMM). So far, in our example the states are functions of customer transaction i.e., something observable. Sometimes there are states that are not visible, but only some observations are visible as manifestation of internal states. In such cases, HMM is the right model to use.

Customer Loyalty

We could take our current problem to the next level using HMM. We could define different levels of customer loyalty and they could represent the  hidden states in an HMM. Customer transactions are manifestations of these hidden states. Using HMM based modelling we could predict current customer loyalty level as well as loyalty level trajectory based on customer transaction history.

One use of this model is to detect customers with decreasing loyalty. Once detected, some proactive actions could be taken to retain those customers. This will be a topic of a later post.

Summing Up

Markov model is a powerful tool for building predictive model based on sequence data, including time driven sequence data. There are many use cases in various domains.

If you want to try the example in this post, here is a step by step tutorial document to help. In addition to MR, you have to run several ruby scripts, as described in the tutorial.

For commercial support for this solution or other solutions in my github repositories, please talk to ThirdEye Data Science Services. Support is available for Hadoop or Spark deployment on cloud including installation, configuration and testing,

About Pranab

I am Pranab Ghosh, a software professional in the San Francisco Bay area. I manipulate bits and bytes for the good of living beings and the planet. I have worked with myriad of technologies and platforms in various business domains for early stage startups, large corporations and anything in between. I am an active blogger and open source project owner. I am passionate about technology and green and sustainable living. My technical interest areas are Big Data, Distributed Processing, NOSQL databases, Machine Learning and Programming languages. I am fascinated by problems that don't have neat closed form solution.
This entry was posted in Big Data, Data Mining, Data Science, Hadoop and Map Reduce, Marketing Analytic, Predictive Analytic and tagged , . Bookmark the permalink.

9 Responses to Smarter Email Marketing with Markov Model

  1. SJ says:

    Hi Pranab, i liked your post.It has helped me alot in understanding Markov chains method.
    Can you please help me how to score customer data and logic of predicting date with transition matrix since i am not a ruby programmer.

    • Pranab says:

      SJ
      Ruby is not the issue. I happened to use bunch of ruby scripts to convert transaction data to states. This logic is very much application specific and that’s the reason I did not write any java map reduce for this.
      Depending on your data format, use whatever language you are comfortable with to convert your data into the format, where in each line you have some entity ID followed by a sequence of states, This data goes as input to the MarkovStateTransition model java MR

  2. Mahmoud Parsian says:

    Hi Pranab,

    In your mapper MarkovStateTransitionModel.StateTransitionMapper class,
    you have not used customerID (since you start your loop index from 1)
    as part of your mapper output:

    protected void map(LongWritable key, Text value, Context context)
    throws IOException, InterruptedException {
    items = value.toString().split(fieldDelimRegex);
    if (items.length >= (skipFieldCount + 2)) {
    for (int i = skipFieldCount + 1; i < items.length; ++i) {
    outKey.initialize();
    outKey.add(items[i-1], items[i]);
    context.write(outKey, outVal);
    }
    }
    }

    My question is how you process this a set of customers?

    For example, if your input value is ",SL,SG,LL” then you generate
    (I added line numbers for clarifications):

    1: ((customerID, SL), 1)
    2: ((SL,SG), 1)
    3: ((SG,LL), 1)

    How do you capture customerID for 2: and 3:?

    Thank you.
    Mahmoud

    • Pranab says:

      Mahmoud
      Machine leaning based solutions consist of two distinct phases : 1) training using historical to build a model and 2) using the model to make prediction for new data.

      The class MarkovStateTransitionModel does the training. Customer ID is irrelevant for building the model. All we care about is the sequence of transactions.

      Customer ID enters the picture when we make prediction based on the model from step 1 and as described in the section “Predicting Time of Future Transaction”. You could also follow the link for the tutorial for more details.

      • Mahmoud Parsian says:

        So, I am assuming that the input to MarkovStateTransitionModel will be
        state1,state2, state3, …

        Thanks for clarification of input.
        best,
        Mahmoud

      • Pranab says:

        It works either way. The point is that customer ID is not used in building the model. If it’s there in the input it is ignored by appropriately setting the config param skip.field.count

  3. Mahmoud Parsian says:

    input value is” “CustomeID,SL,SG,LL”

  4. Mahmoud Parsian says:

    Thank you!

  5. Pingback: Markov Model and Trading History | Tales from a Trading Desk

Leave a comment