Category Archives: Real Time Processing

Alarm Flooding Control with Event Clustering Using Spark Streaming

You show up at work in the morning and open your email to find 100 alarm emails in your inbox for the same error from an application running on some server within a short time window of 1 minute. You … Continue reading

Posted in Anomaly Detection, Big Data, Real Time Processing, Spark, stream processing | Tagged , , , | 1 Comment

Exactly Once Stream Processing Semantics ? Not Exactly

Stream processing systems  are characterized by at least once, at most once and exactly once processing semantics. These are important characteristics that should be carefully considered from the point of view of  consistency and durability of a stream processing application. However … Continue reading

Posted in Big Data, Real Time Processing, Spark Streaming, Storm, stream processing | Tagged , , , | 1 Comment

Real Time Detection of Outliers in Sensor Data using Spark Streaming

As far as analytic of sensor generated data is concerned, in Internet of Things (IoT) and in a connected everything world, it’s mostly about real time analytic of time series data. In this post, I will be addressing an use … Continue reading

Posted in Big Data, Data Science, Internet of Things, Outlier Detection, Real Time Processing, Spark, Time Series Analytic | Tagged , , | 2 Comments

Counting Unique Mobile App Users with HyperLogLog

Continuing along the theme of real time analytic with approximate algorithms, the  focus this time is approximate cardinality estimation. To put the ideas in a context, the use case we will be working with is for counting number of unique users … Continue reading

Posted in Approximate Query, Big Data, Data Science, Mobile, Real Time Processing, Storm | Tagged , , | 1 Comment

Tracking Web Site Bounce Rate in Real Time

Bounce rate for a page  in a web site, is the  proportion of sessions with only that page in the session. This post will show how to calculate bounce rate in real time with Storm using web log data. We … Continue reading

Posted in Big Data, Optimization, Real Time Processing, Reinforcement Learning, Storm, Web Analytic | Tagged , | 2 Comments

Realtime Trending Analysis with Approximate Algorithms

When we hear about trending, twitter trending immediately comes to mind. However, there are many other scenarios, where such analysis is applicable. Some example  use cases  are 1. Top 5 videos watched in last 2 hours   2. Top 10 news … Continue reading

Posted in Approximate Query, Big Data, Data Science, Internet of Things, Real Time Processing, Storm | Tagged , , , , , , | 5 Comments

Location and Time Based Service

When I implemented feature similarity based matching engine in my open source Personalization and Recommendation Engine sifarish, it was for addressing the cold start problem. It allowed me to do content or feature based recommendation for users with limited engagement. … Continue reading

Posted in Big Data, Hadoop and Map Reduce, Mobile, Real Time Processing, Recommendation Engine, Search, Spark, Storm | Tagged , , , | Leave a comment

Making Recommendations in Real Time

Making recommendations based on an user’s current behavior in a small time window is a powerful feature that has been added to sifarish recently. In this post I will go over the details of this feature. The real time feature … Continue reading

Posted in Big Data, Collaborative Filtering, Data Mining, Data Science, Hadoop and Map Reduce, Real Time Processing, Recommendation Engine, Redis, Storm | Tagged , | 2 Comments

Boost Lead Generation with Online Reinforcement Learning

When I go to a web site for for downloading white paper or product data sheet,  I often  hit the back button if presented with a form asking for lots of personal data. Any user that bounces out, is a … Continue reading

Posted in Big Data, Data Science, Real Time Processing, Redis, Reinforcement Learning, Storm | Tagged , , , | 2 Comments

Big Data Caught in Storm

Hadoop is great for batch processing. However depending on the  incoming data throughput and the cluster characteristic, there is a minimum latency threshold for processing data. My blog post is based on a simple performance model for Hadoop that allows … Continue reading

Posted in Big Data, Predictive Analytic, Real Time Processing | Tagged , | 12 Comments