Category Archives: Storm

Exactly Once Stream Processing Semantics ? Not Exactly

Stream processing systems  are characterized by at least once, at most once and exactly once processing semantics. These are important characteristics that should be carefully considered from the point of view of  consistency and durability of a stream processing application. However … Continue reading

Posted in Big Data, Real Time Processing, Spark Streaming, Storm, stream processing | Tagged , , , | 1 Comment

Counting Unique Mobile App Users with HyperLogLog

Continuing along the theme of real time analytic with approximate algorithms, the  focus this time is approximate cardinality estimation. To put the ideas in a context, the use case we will be working with is for counting number of unique users … Continue reading

Posted in Approximate Query, Big Data, Data Science, Mobile, Real Time Processing, Storm | Tagged , , | 1 Comment

Tracking Web Site Bounce Rate in Real Time

Bounce rate for a page  in a web site, is the  proportion of sessions with only that page in the session. This post will show how to calculate bounce rate in real time with Storm using web log data. We … Continue reading

Posted in Big Data, Optimization, Real Time Processing, Reinforcement Learning, Storm, Web Analytic | Tagged , | 2 Comments

Realtime Trending Analysis with Approximate Algorithms

When we hear about trending, twitter trending immediately comes to mind. However, there are many other scenarios, where such analysis is applicable. Some example  use cases  are 1. Top 5 videos watched in last 2 hours   2. Top 10 news … Continue reading

Posted in Approximate Query, Big Data, Data Science, Internet of Things, Real Time Processing, Storm | Tagged , , , , , , | 4 Comments

Location and Time Based Service

When I implemented feature similarity based matching engine in my open source Personalization and Recommendation Engine sifarish, it was for addressing the cold start problem. It allowed me to do content or feature based recommendation for users with limited engagement. … Continue reading

Posted in Big Data, Hadoop and Map Reduce, Mobile, Real Time Processing, Recommendation Engine, Search, Spark, Storm | Tagged , , , | Leave a comment

Popularity Shaken

We will be addressing two important issues faced by recommendation systems. First, how do you solve the cold start problem i.e., provide recommendations for new users with very limited behavior data available. Second, even if we have a recommendation list for … Continue reading

Posted in Big Data, Hadoop and Map Reduce, Recommendation Engine, Storm | Tagged , , | 3 Comments

Making Recommendations in Real Time

Making recommendations based on an user’s current behavior in a small time window is a powerful feature that has been added to sifarish recently. In this post I will go over the details of this feature. The real time feature … Continue reading

Posted in Big Data, Collaborative Filtering, Data Mining, Data Science, Hadoop and Map Reduce, Real Time Processing, Recommendation Engine, Redis, Storm | Tagged , | 2 Comments