Category Archives: Statistics

Synthetic Training Data Generation for Machine Learning Classification Problems using Ancestral Sampling

Access to good training data set is a serious impediment to building supervised Machine Learning models. Such data is scarce and when available, the quality of the data set may be questionable. Even if good quality data set is available, … Continue reading

Posted in Python, Statistics, Supervised Learning | Tagged , , | Leave a comment

Normal Distribution Fitness Test with Chi Square on Spark

Many Machine Learning models is based on certain assumptions made about the data. For example, in ZScore based  anomaly detection, it is  assumed that the data has normal distribution. Your Machine Learning model will be as good as how those … Continue reading

Posted in Anomaly Detection, Big Data, Data Science, Spark, Statistics | Tagged , | Leave a comment

Time Series Seasonal Cycle Detection with Auto Correlation on Spark

There are may benefits of auto correlation analysis on time series data, as we will be alluding to in detail later. It allows us to gain important insights on the nature of the time series data. Cycle detection is one … Continue reading

Posted in Big Data, Correlation, Spark, Statistics, Time Series Analytic | Tagged , , | 3 Comments

Mobile Phone Usage Data Analytics for Effective Marketing Campaign

Insights gained from analyzing mobile phone usage data can be extremely valuable in marketing campaign and customer engagement efforts. For example, hour of the day when an user engages most with his or her mobile  device could be used to … Continue reading

Posted in Big Data, Data Profiling, Marketing Analytic, Spark, Statistics | Tagged , | Leave a comment

Customer Lifetime Value, Present and Future

Customer lifetime value for a business is the monetary value associated with relationship with a customer, although there have been attempts to include non monetary value associated  with a customer. It’s an important metrics to have for any marketing initiative … Continue reading

Posted in Big Data, Data Science, Hadoop and Map Reduce, Marketing Analytic, Statistics | Tagged , , , | Leave a comment

Operational Analytics with Seasonal Data

Time sequence data which is all around us may contain seasonal components. Data is seasonal when there is a seasonal component e.g month of the year, day of the week, hour of week day etc in the data. It is defined … Continue reading

Posted in Big Data, Statistics, Time Series Analytic | Tagged , , | 2 Comments

Customer Conversion Prediction with Markov Chain Classifier

For on line users, conversion generally refers to the user action that results in some tangible gain for a business e.g., an user opening an account or an user making his or her first purchase. Next to drawing large number … Continue reading

Posted in Big Data, Data Science, Hadoop and Map Reduce, Machine Learning, Marketing Analytic, Predictive Analytic, Statistics | Tagged , , | 21 Comments