Category Archives: Statistics

Model Drift Detection with Kolmogorov Smirnov Statistic on Spark

In retail business, you may be using various business solutions based on product demand data e.g inventory management or how a newly introduced product may be performing with time. The buying behavior model may change with time rendering the those … Continue reading

Posted in Data Science, Machine Learning, Spark, Statistics | Tagged | Leave a comment

Synthetic Training Data Generation for Machine Learning Classification Problems using Ancestral Sampling

Access to good training data set is a serious impediment to building supervised Machine Learning models. Such data is scarce and when available, the quality of the data set may be questionable. Even if good quality data set is available, … Continue reading

Posted in Python, Statistics, Supervised Learning | Tagged , , | 1 Comment

Normal Distribution Fitness Test with Chi Square on Spark

Many Machine Learning models is based on certain assumptions made about the data. For example, in ZScore based  anomaly detection, it is  assumed that the data has normal distribution. Your Machine Learning model will be as good as how those … Continue reading

Posted in Anomaly Detection, Big Data, Data Science, Spark, Statistics | Tagged , | Leave a comment

Time Series Seasonal Cycle Detection with Auto Correlation on Spark

There are may benefits of auto correlation analysis on time series data, as we will be alluding to in detail later. It allows us to gain important insights on the nature of the time series data. Cycle detection is one … Continue reading

Posted in Big Data, Correlation, Spark, Statistics, Time Series Analytic | Tagged , , | 3 Comments

Mobile Phone Usage Data Analytics for Effective Marketing Campaign

Insights gained from analyzing mobile phone usage data can be extremely valuable in marketing campaign and customer engagement efforts. For example, hour of the day when an user engages most with his or her mobile  device could be used to … Continue reading

Posted in Big Data, Data Profiling, Marketing Analytic, Spark, Statistics | Tagged , | Leave a comment

Customer Lifetime Value, Present and Future

Customer lifetime value for a business is the monetary value associated with relationship with a customer, although there have been attempts to include non monetary value associated  with a customer. It’s an important metrics to have for any marketing initiative … Continue reading

Posted in Big Data, Data Science, Hadoop and Map Reduce, Marketing Analytic, Statistics | Tagged , , , | Leave a comment

Operational Analytics with Seasonal Data

Time sequence data which is all around us may contain seasonal components. Data is seasonal when there is a seasonal component e.g month of the year, day of the week, hour of week day etc in the data. It is defined … Continue reading

Posted in Big Data, Statistics, Time Series Analytic | Tagged , , | 2 Comments