Mobile Phone Usage Data Analytics for Effective Marketing Campaign

Insights gained from analyzing mobile phone usage data can be extremely valuable in marketing campaign and customer engagement efforts. For example, hour of the day when an user engages most with his or her mobile  device could be used to choose  the time to send a marketing message or email. Most frequent tower locations could be used to for promotional efforts of nearby businesses.

In this a post, we will go over a Spark based implementation for histogram and other simple statistics for mobile phone usage data. The solution is available in my open source project chombo.

Marketing Campaign

Marketing campaigns involves various parameters. These parameters are tuned to optimize the campaign performance. There are various Machine Learning algorithms to optimize marketing campaigns, generally known as Multi Arm Bandit algorithms (MAB). My earlier post on Reinforcement Learning also provides details on these learning algorithms.

These learning algorithms continuously learn the optimum values for the parameters that will result in the most effective campaign. The marketing campaign attributes fall under the following 3 categories.

  1. Audience e.g., previous purchase 
  2. Offer e.g., price range
  3. Tactical e.g., timing

Hour of usage can considered to be a tactical attribute in a marketing campaign. Other tactical attributes could be creative and communication channel. If it’s a brick and mortar business, location could be another tactical attribute.

We have conveniently assumed that such usage data is available from the service provider. However, if that is not the case, then the Multi Arm Bandit (MAB) algorithms can learn the optimum values for the hour of day.

Our hypothesis is that an user is most likely to engage when a marketing message is sent during the hours when the user is most active with his or her device. If that turns out to be true, then the MAB algorithm will find the same hours, otherwise not.

Mobile Usage Data

The input data consists of only 3 fields as below. In this post we will analyze hour of the day data, treated as numerical data. It’s assumed that another process has converted the time stamp data to hour of the day.

  1. Phone number
  2. Cell tower location
  3. Hour of the day

Each record in the data represent some active user interaction with device, whether it’s checking email, checking messages, browsing social media posts or interacting with any other application. Here are some sample input

(620)937 2153,36.845:-121.851,15
(941)597 8362,37.733:-122.446,11
(510)490 0339,37.733:-122.446,12
(470)457 8795,36.845:-121.851,9
(703)971 8771,37.733:-122.446,15
(614)134 0184,38.931:-123.958,17

Spark Analysis for Hour of Usage

The Spark job implementation is in the Scala object NumericalAttrDistrStats. The analysis for any numerical attributes consists of  histogram and optionally many other statistical quantities. Here is the complete list.

  1. Histogram
  2. Mean
  3. Median
  4. Std deviation
  5. Mode
  6. Quarter percentile
  7. Half percentile
  8. Three quarter percentile

Here is some sample output.Each record starts with the phone num which is the ID, followed by the column index of the column containing the numerical data to be analyzed, number of data points in histogram, the actual histogram data and ending with the statistical quantities listed above.

((615)906 1966,2,9,15.000,0.352,13.000,0.076,7.000,0.019,21.000,0.048,11.000,0.038,17.000,0.305,9.000,0.029,19.000,0.124,5.000,0.010,15.152,15.838,2.855,15.000,14.432,15.838,17.438)
((206)276 1810,2,8,15.000,0.047,13.000,0.037,7.000,0.047,21.000,0.028,11.000,0.393,17.000,0.028,9.000,0.393,19.000,0.028,10.449,10.286,2.999,9.000,9.000,10.286,11.571)
((385)530 5055,2,9,15.000,0.333,13.000,0.118,7.000,0.010,21.000,0.020,11.000,0.010,17.000,0.392,9.000,0.039,19.000,0.069,5.000,0.010,15.059,15.882,2.512,17.000,14.353,15.882,17.150)
((336)214 3490,2,9,15.000,0.049,13.000,0.029,7.000,0.029,21.000,0.010,11.000,0.422,17.000,0.029,9.000,0.392,19.000,0.029,5.000,0.010,10.294,10.326,2.652,11.000,9.050,10.326,11.488)
((540)663 4827,2,8,15.000,0.330,13.000,0.125,21.000,0.023,11.000,0.011,17.000,0.375,9.000,0.011,19.000,0.114,5.000,0.011,15.341,16.061,2.325,17.000,14.552,16.061,17.394)
((703)167 4089,2,8,15.000,0.031,13.000,0.031,7.000,0.041,21.000,0.020,11.000,0.408,17.000,0.041,9.000,0.418,19.000,0.010,10.235,10.200,2.559,9.000,8.976,10.200,11.400)
((614)903 1205,2,9,15.000,0.459,13.000,0.064,7.000,0.018,21.000,0.028,11.000,0.009,17.000,0.294,9.000,0.009,19.000,0.092,5.000,0.028,14.972,15.600,2.885,15.000,14.520,15.600,17.063)
((760)926 3666,2,9,15.000,0.039,13.000,0.039,7.000,0.029,21.000,0.010,11.000,0.333,17.000,0.020,9.000,0.480,19.000,0.039,5.000,0.010,10.137,9.918,2.683,9.000,8.857,9.918,11.353)
((206)249 3920,2,9,15.000,0.031,13.000,0.020,7.000,0.031,21.000,0.031,11.000,0.378,17.000,0.020,9.000,0.408,19.000,0.041,5.000,0.041,10.245,10.108,3.207,9.000,8.850,10.108,11.405)
((615)931 5188,2,9,15.000,0.038,13.000,0.038,7.000,0.048,21.000,0.010,11.000,0.448,17.000,0.029,9.000,0.305,19.000,0.048,5.000,0.038,10.324,10.468,2.971,11.000,9.063,10.468,11.574)
((408)784 8679,2,9,15.000,0.017,13.000,0.043,7.000,0.069,21.000,0.026,11.000,0.405,17.000,0.017,9.000,0.388,19.000,0.017,5.000,0.017,9.948,10.128,2.803,11.000,8.844,10.128,11.362)

Dissecting Hour of Day Result

Let’s take a look at the analysis result for the user with the phone number (614)903 1205 as an example. Here is the normalized histogram i.e. engagement probability distribution of hour of day for this user.

Hour of the day Probability of interaction
5 0.028
7 0.018
9 0.009
11 0.009
13 0.064
15 0.459
17 0.294
19 0.092
21 0.028

As we can see, this user is most active with his or her device between the hours 15 and 17. This time window might be a good time to send message or emails to the user, for marketing campaign.

Here are the other statistical quantities as listed above, for this particular user. There is nothing surprising here. Half percentile is same as median.

Mean 14.972
Median 15.600
Std dev 2.885
Mode 15.000
Quarter percentile 14.520
Half percentile 15.600
Three quarter percentile 17.063

Wrapping Up

The histogram for location data can also be computed with Spark implementation in the Scala object CategoricalAttrDistrStats. Location, which is a pair of latitude and longitude, is treated as a categorical attribute. Location analysis results can be used for location based marketing among other things.

Hadoop java map reduce implementation is also available for these Spark jobs  in chombo. The tutorial for executing the use case in this article is available.


About Pranab

I am Pranab Ghosh, a software professional in the San Francisco Bay area. I manipulate bits and bytes for the good of living beings and the planet. I have worked with myriad of technologies and platforms in various business domains for early stage startups, large corporations and anything in between. I am an active blogger and open source project owner. I am passionate about technology and green and sustainable living. My technical interest areas are Big Data, Distributed Processing, NOSQL databases, Machine Learning and Programming languages. I am fascinated by problems that don't have neat closed form solution.
This entry was posted in Big Data, Data Profiling, Marketing Analytic, Spark, Statistics and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s