Insights gained from analyzing mobile phone usage data can be extremely valuable in marketing campaign and customer engagement efforts. For example, hour of the day when an user engages most with his or her mobile device could be used to choose the time to send a marketing message or email. Most frequent tower locations could be used to for promotional efforts of nearby businesses.
In this a post, we will go over a Spark based implementation for histogram and other simple statistics for mobile phone usage data. The solution is available in my open source project chombo.
Marketing campaigns involves various parameters. These parameters are tuned to optimize the campaign performance. There are various Machine Learning algorithms to optimize marketing campaigns, generally known as Multi Arm Bandit algorithms (MAB). My earlier post on Reinforcement Learning also provides details on these learning algorithms.
These learning algorithms continuously learn the optimum values for the parameters that will result in the most effective campaign. The marketing campaign attributes fall under the following 3 categories.
- Audience e.g., previous purchase
- Offer e.g., price range
- Tactical e.g., timing
Hour of usage can considered to be a tactical attribute in a marketing campaign. Other tactical attributes could be creative and communication channel. If it’s a brick and mortar business, location could be another tactical attribute.
We have conveniently assumed that such usage data is available from the service provider. However, if that is not the case, then the Multi Arm Bandit (MAB) algorithms can learn the optimum values for the hour of day.
Our hypothesis is that an user is most likely to engage when a marketing message is sent during the hours when the user is most active with his or her device. If that turns out to be true, then the MAB algorithm will find the same hours, otherwise not.
Mobile Usage Data
The input data consists of only 3 fields as below. In this post we will analyze hour of the day data, treated as numerical data. It’s assumed that another process has converted the time stamp data to hour of the day.
- Phone number
- Cell tower location
- Hour of the day
Each record in the data represent some active user interaction with device, whether it’s checking email, checking messages, browsing social media posts or interacting with any other application. Here are some sample input
(620)937 2153,36.845:-121.851,15 (941)597 8362,37.733:-122.446,11 (510)490 0339,37.733:-122.446,12 (470)457 8795,36.845:-121.851,9 (703)971 8771,37.733:-122.446,15 (614)134 0184,38.931:-123.958,17
Spark Analysis for Hour of Usage
The Spark job implementation is in the Scala object NumericalAttrDistrStats. The analysis for any numerical attributes consists of histogram and optionally many other statistical quantities. Here is the complete list.
- Std deviation
- Quarter percentile
- Half percentile
- Three quarter percentile
Here is some sample output.Each record starts with the phone num which is the ID, followed by the column index of the column containing the numerical data to be analyzed, number of data points in histogram, the actual histogram data and ending with the statistical quantities listed above.
((615)906 1966,2,9,15.000,0.352,13.000,0.076,7.000,0.019,21.000,0.048,11.000,0.038,17.000,0.305,9.000,0.029,19.000,0.124,5.000,0.010,15.152,15.838,2.855,15.000,14.432,15.838,17.438) ((206)276 1810,2,8,15.000,0.047,13.000,0.037,7.000,0.047,21.000,0.028,11.000,0.393,17.000,0.028,9.000,0.393,19.000,0.028,10.449,10.286,2.999,9.000,9.000,10.286,11.571) ((385)530 5055,2,9,15.000,0.333,13.000,0.118,7.000,0.010,21.000,0.020,11.000,0.010,17.000,0.392,9.000,0.039,19.000,0.069,5.000,0.010,15.059,15.882,2.512,17.000,14.353,15.882,17.150) ((336)214 3490,2,9,15.000,0.049,13.000,0.029,7.000,0.029,21.000,0.010,11.000,0.422,17.000,0.029,9.000,0.392,19.000,0.029,5.000,0.010,10.294,10.326,2.652,11.000,9.050,10.326,11.488) ((540)663 4827,2,8,15.000,0.330,13.000,0.125,21.000,0.023,11.000,0.011,17.000,0.375,9.000,0.011,19.000,0.114,5.000,0.011,15.341,16.061,2.325,17.000,14.552,16.061,17.394) ((703)167 4089,2,8,15.000,0.031,13.000,0.031,7.000,0.041,21.000,0.020,11.000,0.408,17.000,0.041,9.000,0.418,19.000,0.010,10.235,10.200,2.559,9.000,8.976,10.200,11.400) ((614)903 1205,2,9,15.000,0.459,13.000,0.064,7.000,0.018,21.000,0.028,11.000,0.009,17.000,0.294,9.000,0.009,19.000,0.092,5.000,0.028,14.972,15.600,2.885,15.000,14.520,15.600,17.063) ((760)926 3666,2,9,15.000,0.039,13.000,0.039,7.000,0.029,21.000,0.010,11.000,0.333,17.000,0.020,9.000,0.480,19.000,0.039,5.000,0.010,10.137,9.918,2.683,9.000,8.857,9.918,11.353) ((206)249 3920,2,9,15.000,0.031,13.000,0.020,7.000,0.031,21.000,0.031,11.000,0.378,17.000,0.020,9.000,0.408,19.000,0.041,5.000,0.041,10.245,10.108,3.207,9.000,8.850,10.108,11.405) ((615)931 5188,2,9,15.000,0.038,13.000,0.038,7.000,0.048,21.000,0.010,11.000,0.448,17.000,0.029,9.000,0.305,19.000,0.048,5.000,0.038,10.324,10.468,2.971,11.000,9.063,10.468,11.574) ((408)784 8679,2,9,15.000,0.017,13.000,0.043,7.000,0.069,21.000,0.026,11.000,0.405,17.000,0.017,9.000,0.388,19.000,0.017,5.000,0.017,9.948,10.128,2.803,11.000,8.844,10.128,11.362)
Dissecting Hour of Day Result
Let’s take a look at the analysis result for the user with the phone number (614)903 1205 as an example. Here is the normalized histogram i.e. engagement probability distribution of hour of day for this user.
|Hour of the day||Probability of interaction|
As we can see, this user is most active with his or her device between the hours 15 and 17. This time window might be a good time to send message or emails to the user, for marketing campaign.
Here are the other statistical quantities as listed above, for this particular user. There is nothing surprising here. Half percentile is same as median.
|Three quarter percentile||17.063|
The histogram for location data can also be computed with Spark implementation in the Scala object CategoricalAttrDistrStats. Location, which is a pair of latitude and longitude, is treated as a categorical attribute. Location analysis results can be used for location based marketing among other things.
For commercial support for this solution or other solutions in my github repositories, please talk to ThirdEye Data Science Services. Support is available for Hadoop or Spark deployment on cloud including installation, configuration and testing,