Category Archives: Spark

Mobile Phone Usage Data Analytics for Effective Marketing Campaign

Insights gained from analyzing mobile phone usage data can be extremely valuable in marketing campaign and customer engagement efforts. For example, hour of the day when an user engages most with his or her mobile  device could be used to … Continue reading

Posted in Big Data, Data Profiling, Spark, Statistics | Tagged , | Leave a comment

JSON to Relational Mapping with Spark

If there one data format that’s ubiquitous, it’s JSON. Whether  you are calling an API, or exporting data from some system, the format is most likely to be JSON these days. However many databases can not handle  JSON and you … Continue reading

Posted in Big Data, ETL, Spark | Tagged , | Leave a comment

Supplier Fulfillment Forecasting with Continuous Time Markov Chain using Spark

In a supply chain, quantity ordered from a down stream supplier or manufacturer are not necessarily always completely fulfilled, because of various factors. If the extent of under fulfillment could be predicted over a time horizon, then the shortfall items … Continue reading

Posted in Big Data, Data Science, Machine Learning, Scala, Spark | Tagged , , | Leave a comment

Simple Sanity Checks for Data Correctness with Spark

Sometimes when running a complex data processing pipeline with Hadoop or Spark, you may encounter data, where most of the data is just grossly invalid. It might save lot of pain and headache, if we could do some simple sanity checks before feeding … Continue reading

Posted in ETL, Hadoop and Map Reduce, Spark | Tagged | Leave a comment

Alarm Flooding Control with Event Clustering Using Spark Streaming

You show up at work in the morning and open your email to find 100 alarm emails in your inbox for the same error from an application running on some server within a short time window of 1 minute. You … Continue reading

Posted in Anomaly Detection, Big Data, Real Time Processing, Spark, stream processing | Tagged , , , | 1 Comment

Real Time Detection of Outliers in Sensor Data using Spark Streaming

As far as analytic of sensor generated data is concerned, in Internet of Things (IoT) and in a connected everything world, it’s mostly about real time analytic of time series data. In this post, I will be addressing an use … Continue reading

Posted in Big Data, Data Science, Internet of Things, Outlier Detection, Real Time Processing, Spark, Time Series Analytic | Tagged , , | 2 Comments

Location and Time Based Service

When I implemented feature similarity based matching engine in my open source Personalization and Recommendation Engine sifarish, it was for addressing the cold start problem. It allowed me to do content or feature based recommendation for users with limited engagement. … Continue reading

Posted in Big Data, Hadoop and Map Reduce, Mobile, Real Time Processing, Recommendation Engine, Search, Spark, Storm | Tagged , , , | Leave a comment