Category Archives: Data Science

Project Assignment Optimization with Simulated Annealing on Spark

Optimizing assignment of people to projects is a very complex problem and classical optimization techniques are not very useful. The topic this post is a project assignment optimization problem where people should be assigned to projects in a way that will … Continue reading

Posted in Data Science, Optimization, Spark | Tagged , , | Leave a comment

Mining Seasonal Products from Sales Data

The other day someone asked me how to include products with seasonal demand in recommendations based on collaborative filtering or some other technique. The solution to the problem involves two steps. The first step is to identify products with seasonal … Continue reading

Posted in Big Data, Data Mining, Data Science, eCommerce, Map Reduce, Recommendation Engine | Tagged , , , | Leave a comment

Gaining Insight by Mining Simple Rules from Customer Service Call Data

Although the goal for most predictive analytic problem is to make prediction, sometimes we are more interested in the model learnt by the learning algorithm. If the learnt model could be expressed as s set of rules, then those rules … Continue reading

Posted in Big Data, Data Science, Hadoop and Map Reduce, Machine Learning, Rule Mining | Tagged , , | Leave a comment

Supplier Fulfillment Forecasting with Continuous Time Markov Chain using Spark

In a supply chain, quantity ordered from a down stream supplier or manufacturer are not necessarily always completely fulfilled, because of various factors. If the extent of under fulfillment could be predicted over a time horizon, then the shortfall items … Continue reading

Posted in Big Data, Data Science, Machine Learning, Scala, Spark | Tagged , , | Leave a comment

Big Data System Design with Bayesian Optimization

Designing complex Big Data system with myriad of  parameters and design choices is a daunting task. It’s almost a black art. Typically we stay with the default parameter settings, unless it fails to meet your requirement which forces you venture out … Continue reading

Posted in Big Data, Cluster Computation, Data Science, Optimization | Tagged , | 1 Comment

Customer Segmentation Based on Online Behavior using ScikitLearn

Customer segmentation or clustering is useful in various ways. It could be used for targeted marketing. Sometimes when building predictive model, it’s more effective to cluster the data and build a separate predictive model for each cluster. In this post, … Continue reading

Posted in Data Mining, Data Science, Machine Learning | Tagged , , , , | 2 Comments

Inventory Forecasting with Markov Chain Monte Carlo

Sometimes you want to calculate statistics about some variable which has complex, possibly non linear relationship with another variable for which probability distribution is available, which may be non standard or non parametric. That’s the situation we face when trying predict … Continue reading

Posted in Data Science, Machine Learning, Optimization, Python, Simulation | Tagged , , , | 1 Comment