Author Archives: Pranab

About Pranab

I am Pranab Ghosh, a software professional in the San Francisco Bay area. I manipulate bits and bytes for the good of living beings and the planet. I have worked with myriad of technologies and platforms in various business domains for early stage startups, large corporations and anything in between. I am an active blogger and open source project owner. I am passionate about technology and green and sustainable living. My technical interest areas are Big Data, Distributed Processing, NOSQL databases, Machine Learning and Programming languages. I am fascinated by problems that don't have neat closed form solution.

Measuring Campaign Effectiveness for an Online Service on Spark

Measuring campaign effectiveness is critical for any company to justify the marketing money being spent. Consider a company providing a free online service on signup. It’s critical for the company to convert them so that they subscribe to a paid … Continue reading

Posted in Big Data, Data Science, Marketing Analytic, Spark | Tagged , , | Leave a comment

Processing Missing Values with Hadoop

Missing values are just part of life in the data processing world. In most cases you can not simply ignore the missing values as it may adversely affect whatever analytic processing you are going to do. Broadly speaking, handling missing … Continue reading

Posted in Big Data, Data Profiling, Data Science, ETL, Hadoop and Map Reduce | Tagged , , | Leave a comment

Project Assignment Optimization with Simulated Annealing on Spark

Optimizing assignment of people to projects is a very complex problem and classical optimization techniques are not very useful. The topic this post is a project assignment optimization problem where people should be assigned to projects in a way that will … Continue reading

Posted in Data Science, Optimization, Spark | Tagged , , | 1 Comment

Mining Seasonal Products from Sales Data

The other day someone asked me how to include products with seasonal demand in recommendations based on collaborative filtering or some other technique. The solution to the problem involves two steps. The first step is to identify products with seasonal … Continue reading

Posted in Big Data, Data Mining, Data Science, eCommerce, Map Reduce, Recommendation Engine | Tagged , , , | Leave a comment

Predicting Call Hangup in Customer Service Calls with Decision Tree and Random Forest

When customers hangup after a long wait in a call, it’s money wasted for the company. Moreover, it leaves the customer with a poor experience. It would have been nice, if we could predict in real time while the customer … Continue reading

Posted in Big Data, Customer Service, Hadoop and Map Reduce, Machine Learning, Predictive Analytic | Tagged , , | 1 Comment

Machine Learning at Scale with Parallel Processing

Machine Learning can leverage modern parallel data processing platforms like Hadoop and Spark in several ways. In this post we will discuss how to have Machine Learning at scale with Hadoop or Spark. We will consider three different ways parallel … Continue reading

Posted in Hadoop and Map Reduce, Machine Learning, Spark | Tagged , , | 2 Comments

Mobile Phone Usage Data Analytics for Effective Marketing Campaign

Insights gained from analyzing mobile phone usage data can be extremely valuable in marketing campaign and customer engagement efforts. For example, hour of the day when an user engages most with his or her mobile  device could be used to … Continue reading

Posted in Big Data, Data Profiling, Marketing Analytic, Spark, Statistics | Tagged , | Leave a comment