Category Archives: Data Science

Measuring Campaign Effectiveness for an Online Service on Spark

Measuring campaign effectiveness is critical for any company to justify the marketing money being spent. Consider a company providing a free online service on signup. It’s critical for the company to convert them so that they subscribe to a paid … Continue reading

Posted in Big Data, Data Science, Marketing Analytic, Spark | Tagged , , | Leave a comment

Processing Missing Values with Hadoop

Missing values are just part of life in the data processing world. In most cases you can not simply ignore the missing values as it may adversely affect whatever analytic processing you are going to do. Broadly speaking, handling missing … Continue reading

Posted in Big Data, Data Profiling, Data Science, ETL, Hadoop and Map Reduce | Tagged , , | Leave a comment

Project Assignment Optimization with Simulated Annealing on Spark

Optimizing assignment of people to projects is a very complex problem and classical optimization techniques are not very useful. The topic this post is a project assignment optimization problem where people should be assigned to projects in a way that will … Continue reading

Posted in Data Science, Optimization, Spark | Tagged , , | 1 Comment

Mining Seasonal Products from Sales Data

The other day someone asked me how to include products with seasonal demand in recommendations based on collaborative filtering or some other technique. The solution to the problem involves two steps. The first step is to identify products with seasonal … Continue reading

Posted in Big Data, Data Mining, Data Science, eCommerce, Map Reduce, Recommendation Engine | Tagged , , , | Leave a comment

Gaining Insight by Mining Simple Rules from Customer Service Call Data

Although the goal for most predictive analytic problem is to make prediction, sometimes we are more interested in the model learnt by the learning algorithm. If the learnt model could be expressed as s set of rules, then those rules … Continue reading

Posted in Big Data, Data Science, Hadoop and Map Reduce, Machine Learning, Rule Mining | Tagged , , | Leave a comment

Supplier Fulfillment Forecasting with Continuous Time Markov Chain using Spark

In a supply chain, quantity ordered from a down stream supplier or manufacturer are not necessarily always completely fulfilled, because of various factors. If the extent of under fulfillment could be predicted over a time horizon, then the shortfall items … Continue reading

Posted in Big Data, Data Science, Machine Learning, Scala, Spark | Tagged , , | Leave a comment

Big Data System Design with Bayesian Optimization

Designing complex Big Data system with myriad of  parameters and design choices is a daunting task. It’s almost a black art. Typically we stay with the default parameter settings, unless it fails to meet your requirement which forces you venture out … Continue reading

Posted in Big Data, Cluster Computation, Data Science, Optimization | Tagged , | 1 Comment