Category Archives: Spark

Data Normalization with Spark

Data normalization is a required data preparation step for many Machine Learning algorithms. These algorithms are sensitive to the relative values of the feature attributes. Data normalization is the process of bringing all the attribute values within some desired range. Unless … Continue reading

Posted in Big Data, Data Science, ETL, Machine Learning, Spark | Tagged , , | Leave a comment

Removing Duplicates from Order Data Using Spark

If you work with data, there is a high probability that you have run into duplicate data in your data set. Removing duplicates in Big Data is a computationally intensive process and parallel cluster processing with Hadoop or Spark becomes … Continue reading

Posted in Big Data, Data Science, ETL, Spark | Tagged , | Leave a comment

Measuring Campaign Effectiveness for an Online Service on Spark

Measuring campaign effectiveness is critical for any company to justify the marketing money being spent. Consider a company providing a free online service on signup. It’s critical for the company to convert them so that they subscribe to a paid … Continue reading

Posted in Big Data, Data Science, Marketing Analytic, Spark | Tagged , , | Leave a comment

Project Assignment Optimization with Simulated Annealing on Spark

Optimizing assignment of people to projects is a very complex problem and classical optimization techniques are not very useful. The topic this post is a project assignment optimization problem where people should be assigned to projects in a way that will … Continue reading

Posted in Data Science, Optimization, Spark | Tagged , , | 1 Comment

Machine Learning at Scale with Parallel Processing

Machine Learning can leverage modern parallel data processing platforms like Hadoop and Spark in several ways. In this post we will discuss how to have Machine Learning at scale with Hadoop or Spark. We will consider three different ways parallel … Continue reading

Posted in Hadoop and Map Reduce, Machine Learning, Spark | Tagged , , | 3 Comments

Mobile Phone Usage Data Analytics for Effective Marketing Campaign

Insights gained from analyzing mobile phone usage data can be extremely valuable in marketing campaign and customer engagement efforts. For example, hour of the day when an user engages most with his or her mobile  device could be used to … Continue reading

Posted in Big Data, Data Profiling, Marketing Analytic, Spark, Statistics | Tagged , | Leave a comment

JSON to Relational Mapping with Spark

If there one data format that’s ubiquitous, it’s JSON. Whether  you are calling an API, or exporting data from some system, the format is most likely to be JSON these days. However many databases can not handle  JSON and you … Continue reading

Posted in Big Data, ETL, Spark | Tagged , | Leave a comment