Category Archives: Machine Learning

Data Normalization with Spark

Data normalization is a required data preparation step for many Machine Learning algorithms. These algorithms are sensitive to the relative values of the feature attributes. Data normalization is the process of bringing all the attribute values within some desired range. Unless … Continue reading

Posted in Big Data, Data Science, ETL, Machine Learning, Spark | Tagged , , | Leave a comment

Predicting Call Hangup in Customer Service Calls with Decision Tree and Random Forest

When customers hangup after a long wait in a call, it’s money wasted for the company. Moreover, it leaves the customer with a poor experience. It would have been nice, if we could predict in real time while the customer … Continue reading

Posted in Big Data, Customer Service, Hadoop and Map Reduce, Machine Learning, Predictive Analytic | Tagged , , | 2 Comments

Machine Learning at Scale with Parallel Processing

Machine Learning can leverage modern parallel data processing platforms like Hadoop and Spark in several ways. In this post we will discuss how to have Machine Learning at scale with Hadoop or Spark. We will consider three different ways parallel … Continue reading

Posted in Hadoop and Map Reduce, Machine Learning, Spark | Tagged , , | 3 Comments

Debunking the Myth of Top Ten Machine Learning Algorithms

This kind of broad brush statements about Machine Learning algorithms are made often and there are lot of online content alluding to this simplistic view of Machine Learning. It’s tempting to gravitate towards simplistic views and use recipe like approach while … Continue reading

Posted in Machine Learning | Tagged | Leave a comment

Gaining Insight by Mining Simple Rules from Customer Service Call Data

Although the goal for most predictive analytic problem is to make prediction, sometimes we are more interested in the model learnt by the learning algorithm. If the learnt model could be expressed as s set of rules, then those rules … Continue reading

Posted in Big Data, Data Science, Hadoop and Map Reduce, Machine Learning, Rule Mining | Tagged , , | Leave a comment

Supplier Fulfillment Forecasting with Continuous Time Markov Chain using Spark

In a supply chain, quantity ordered from a down stream supplier or manufacturer are not necessarily always completely fulfilled, because of various factors. If the extent of under fulfillment could be predicted over a time horizon, then the shortfall items … Continue reading

Posted in Big Data, Data Science, Machine Learning, Scala, Spark | Tagged , , | Leave a comment

Customer Segmentation Based on Online Behavior using ScikitLearn

Customer segmentation or clustering is useful in various ways. It could be used for targeted marketing. Sometimes when building predictive model, it’s more effective to cluster the data and build a separate predictive model for each cluster. In this post, … Continue reading

Posted in Data Mining, Data Science, Machine Learning | Tagged , , , , | 2 Comments