Category Archives: Spark

eCommerce Order Processing System Monitoring with Isolation Forest Based Anomaly Detection on Spark

Posted on June 28, 2021 by Pranab

Timely delivery of orders is critical for customer satisfaction for any retail eCommerce business. It’s even more critical for time bound guaranteed delivery orders. Retail eCommerce businesses generally use order processing workflow systems, which are state machines where state transition … Continue reading →

Posted in Anomaly Detection, Data Science, eCommerce, Scala, Spark | Tagged business process mining, isolation forest, multi variate anomaly detection, order processing | Leave a comment

Time Series Change Point Detection with Two Sample Statistic on Spark with Application for Retail Sales Data

Posted on September 27, 2020 by Pranab

The goal of change point detection is to detect the times when statistically significant and sustained changes happen in a time series. It has wide range of applications in various domains including retail, medical, IoT, finance, business and meteorology. In … Continue reading →

Posted in Anomaly Detection, Big Data, Data Science, Scala, Spark, Time Series Analytic | Tagged change point, time series, two sample statistic | Leave a comment

Detecting Quarantine Violation from Mobile Phone Location Anomaly on Spark

Posted on April 20, 2020 by Pranab

With the world under siege with Corona virus, you might find this topic timely. There are two main aspects of any epidemic breakout, epidemic spread and containment. There are various strategies for containing epidemic spread. One of them is to … Continue reading →

Posted in Anomaly Detection, Big Data, Data Science, Scala, Spark | Tagged epidemic control, mobile location, qurantine | Leave a comment

Model Drift Detection with Kolmogorov Smirnov Statistic on Spark

Posted on February 24, 2020 by Pranab

In retail business, you may be using various business solutions based on product demand data e.g inventory management or how a newly introduced product may be performing with time. The buying behavior model may change with time rendering the those … Continue reading →

Posted in Data Science, Machine Learning, Spark, Statistics | Tagged model drift | Leave a comment

Contextual Data Completeness Metric Computation on Spark

Posted on December 18, 2019 by Pranab

Data quality is critical for the healthy operation of any data driven enterprise. There are various kinds of data quality metrics. In this post, the focus will be on the completeness of data. Data quality from a completeness point of … Continue reading →

Posted in Big Data, Data Science, ETL, Spark | Tagged data completeness, data quality | Leave a comment

Time Series Trend and Seasonality Component Decomposition with STL on Spark

Posted on September 24, 2019 by Pranab

You may be interested in decomposing a time series into level, trend, seasonality and remainder components to gain more insight into your time series. You may also be interested in decomposition to separate out the remainder component for anomaly detection. … Continue reading →

Posted in Anomaly Detection, Big Data, Data Science, ETL, Spark, Time Series Analytic | Tagged seasonal cycle, STL, time series decomposition, trend | Leave a comment

Encoding High Cardinality Categorical Variables with Feature Hashing on Spark

Posted on August 7, 2019 by Pranab

Categorical variables are ubiquitous in data. They pose a serious problem in many Data Science analysis processes. For example, many supervised Machine Learning algorithms work only with numerical data. With high cardinality categorical variables, popular encoding solutions like One Hot … Continue reading →

Posted in Big Data, Data Science, ETL, Scala, Spark | Tagged categorical feature, feature hashing, high cardinality | 2 Comments

Time Series Sequence Anomaly Detection with Markov Chain on Spark

Posted on July 25, 2019 by Pranab

There are many techniques for time series anomaly detection. In this post, the focus is on sequence based anomaly detection of time series data with Markov Chain. The technique will be elucidated with a use case involving data from a … Continue reading →

Posted in Anomaly Detection, Big Data, Data Science, Machine Learning, Outlier Detection, Scala, Spark | Tagged anomaly score threshold, health monitring data, markov chain, sequence anomaly, time series anomaly | 1 Comment

Elastic Search or Solr Search Result Quality Evaluation with NCDG Metric on Spark

Posted on April 24, 2019 by Pranab

You have built an enterprise search engine with Elastic Search or Solr. You have tweaked all the knobs in the search engine to get the best possible quality for the search results. But how do you know how well your … Continue reading →

Posted in Big Data, Data Science, elastic search, Log Analysis, Scala, Search Analytic, Solr, Spark | Tagged enterprise search, NCDG, relevance feedback, search performance | Leave a comment

Plugin Framework Based Data Transformation on Spark

Posted on March 21, 2019 by Pranab

Data transformation is one of the key components in most ETL process. It is well known, that in most data projects, more than 50% of the time in spent in data pre processing. In my earlier blog, a Hadoop based … Continue reading →

Posted in Big Data, Data Science, ETL, Scala, Spark | Tagged data transformation, plugin framework | 2 Comments

Category Archives: Spark

eCommerce Order Processing System Monitoring with Isolation Forest Based Anomaly Detection on Spark

Time Series Change Point Detection with Two Sample Statistic on Spark with Application for Retail Sales Data

Detecting Quarantine Violation from Mobile Phone Location Anomaly on Spark

Model Drift Detection with Kolmogorov Smirnov Statistic on Spark

Contextual Data Completeness Metric Computation on Spark

Time Series Trend and Seasonality Component Decomposition with STL on Spark

Encoding High Cardinality Categorical Variables with Feature Hashing on Spark

Time Series Sequence Anomaly Detection with Markov Chain on Spark

Elastic Search or Solr Search Result Quality Evaluation with NCDG Metric on Spark

Plugin Framework Based Data Transformation on Spark

Recent Posts

Top Posts

Archives

Categories

Meta

About me

My Recent Tweets