Customer loyalty is the strength of the relationship a customer has with a business as manifested by customer purchasing more and at high frequency. There are various signal or events related to a customer’s engagement with a business. Some examples are transactions, customer service calls and social media comments. These events are indicative of a customer’s loyalty to a business. Loyalty is an internal state, that can not be directly observed and measured, but can be inferred probabilistically.
The customer events over a time window reflect a corresponding sequence of internal states of loyalty. The theme of this post is to predict the sequence of internal loyalty states by using Hidden Markov Model (HMM). It’s a model for probabilistic sequence analysis. The Hadoop based implementation is part of my open source project avenir on github.
Results of such analysis could provide valuable and actionable insights in many ways. It allows identification of customers with downward loyalty trajectory, so that proactive action could be taken to resurrect the fading relationship of those customers with the business. This is particularly important for high value customer.
It can also be used to measure the effectiveness of a marketing campaign. Following the campaign, the loyalty trajectory could be analyzed for a span of time to see if there is any significant upward trend in general.
Hidden Markov Model
A Hidden Markov Model consists of the following components, consisting of various probabilities.
- Set of hidden states (m)
- Set of events or observations (n)
- Hidden state transition matrix (m x m)
- State to event probability matrix (m x n)
- Initial state probability vector (m)
Although used primarily in natural language processing, this technique is applicable for any problem that deals with sequential data.
For our problem, events are related to customer transaction. It’s function of time elapsed since the last transaction: small(S), medium(M) and large(L) and the amount spent compared amount spent in the last transaction: less(L), about same(S) and more(M).
So there are 9 possible combinations, resulting in 9 events. For example, the event SM implies a transaction, where the time elapsed since the last transaction is small and the customer spent more than last time.
There are 3 states corresponding to 3 levels of loyalty: low(L), normal(N), high(H). Having 3 hidden states is one of the assumptions in my model. Here is the complete HMM model.
L,N,H SL,SS,SM,ML,MS,MM,LL,LS,LM .30,.45,.25 .35,.40,.25 .25,.35,.40 .08,.05,.01,.15,.12,.07,.21,.17,.14 .10,.09,.08,.17,.15,.12,.11,.10,.08 .13,.18,.21,.08,.12,.14,.03,.04,.07 .38,.36,.26
Here is a dissection of the model defined above. The model is stored in an HDFS file. The mappers load and process the model data on start up.
- The first line defines the 3 hidden states
- The second line defines the 9 events or observations
- The next 3 lines define the 3 x 3 state transition probability matrix
- the next 3 lines define the 3 x 9 state event probability matrix
- The last line defines the initial state probability vector
You might be wondering how I came up this model. Well, I just made it up and tried to make it as realistic as possible. Although a much harder problem, the model can be learned from the training data using an algorithm called Baum Welch algorithm. That will be a topic for a future post, once I have implemented it. Let’s just assume for now, we have the model.
Customer Loyalty State Sequence
Our goal is to predict the loyalty level sequence for all customers , given a sequence of events for all customers. This is achieved by using dynamic programming algorithm called Viterbi algorithm.
Given a a sequence of events and an HMM model, the Viterbi algorithm finds the the sequence of internal states that has the highest probability of generating the sequence of events. It finds the highest probability event sequence path recursively, while keeping track of the states that cause the highest probability.
Here is some sample transaction event data for 3 customers. The first column is the customer ID and the rest of a line is the sequence of events. Such data can easily be derived from raw transaction data by grouping by customer ID and ordering by time stamp. The MR class Projection in my project chombo can do the job.
UQV11DCJ0J,SL,SM,SS,SS,SM,LS,SL,LM,MM,MS 2NNTP1TGYW,LM,LL,LS,MM,LM,LL,LS,SM,SM,SS,SL,LL,SM,SS,SS,SS,SM,SS,SL,SM,SS,SS,SL,SL,ML 4T9QFUXK3R,MS,LS,LL,LL,LL,LS,LL,LL,LL,SL,SL,SS,SS,SS,LS,SL,LL,LL,LL,SS,LS,LS,MM,ML,MS,ML,MM,MS,ML,LM,SS,LL,LS,LS,LL
The Viterbi algorithms is implemented in the MR class ViterbiStatePredictor. It’s mostly a CPU bound process. All processing is done by mappers and there is no reducer in the implementation. Here is some sample output for the same 3 customers.
UQV11DCJ0J,H,H,H,H,H,L,N,L,N,N 2NNTP1TGYW,L,L,L,N,L,L,L,H,H,H,H,L,H,H,H,H,H,H,H,H,H,H,H,H,N 4T9QFUXK3R,N,L,L,L,L,N,L,L,L,H,H,H,H,H,L,N,L,L,L,N,L,L,N,N,N,N,N,N,N,L,N,L,L,N,L
We can make some interesting observations from the output for the 3 customers. For the first customer, it starts with a high loyalty and then takes a downward trajectory. The second customer starts with low loyalty and takes an upward trajectory to a high level and stays there. The last one start with low and normal, swings to high level and then came down a mostly normal level.
In my example only transaction related events have been used. As I mentioned, other significant events that could be used are customer service calls or emails rated for the negative tone, sentiment expressed in social media comments.
If you are interested in trying out this example, here is a tutorial document with step by step instructions.
The chapter on Hidden Markov Model in the book Introduction to Machine Learning is good reference material.