Deep Reinforcement Learning with RLlib and TensorFlow for Price Optimization

Deep Learning has made serious inroads into Reinforcement Learning. Deep Reinforcement Learning(DRL) has been  used successfully for playing Atari games. Beyond games, Reinforcement Learning(RL) is applicable for any decision making problem under uncertain conditions e.g autonomous vehicles, business decision making problems. Classic Reinforcement Learning solutions become intractable when faced with  large dimensional state space and action space. Deep Learning shines with problems that have large input and output dimension. So it was natural to apply Deep Learning to Reinforcement Learning for higher dimensional problems.

In this post, we will go though a DRL based solution for price optimization, which is a business decision making problem. The  solution is based on the excellent DRL library called RLlib which uses TensorFlow and PyTorch Continue reading

Posted in Data Science, Deep Learning, PyTorch, Reinforcement Learning, TensorFlow | Tagged , , | Leave a comment

Monte Carlo Simulation Library in Python with Project Cost Estimation as an Example

I was working on a solution for change point detection in time series, which led me to certain two sample statistic, for which critical values didn’t exist. The only option was to simulate the statistic values and estimate critical values from the resulting distribution. Looking for a solution, I only found some ad hoc Python implementation for Monte Carlo simulation which were not very reusable. It prompted me to digress from my original project and instead to work on a reusable Python implementation of Monte Carlo Simulation. The implementation is available in my open source git hub project avenir.

Monte Carlo Simulation has application for wide range of problems Continue reading

Posted in Data Science, Python, Statistics | Tagged , | Leave a comment

Detecting Quarantine Violation from Mobile Phone Location Anomaly on Spark

With the world under siege with Corona virus, you might find this topic timely. There are two main aspects of any epidemic breakout, epidemic spread and containment. There are various strategies for containing epidemic spread. One of them is to put people tested positive under quarantine. People quarantined are not allowed to have any contact with any body.

How do you know if quarantine is not being violated. In this post, we will go through techniques for detecting quarantine violation based on anomaly in mobile phone location data. The Spark implementation is available in my open source project beymani in gitHub.

The implantation is generic and applicable for many other problems. It detects outliers depending on whether data is outside a defined range. It can also detect outliers based whether data falls into a range. Some other possible applications are IoT sensor data and geo fencing.

Continue reading
Posted in Anomaly Detection, Big Data, Data Science, Scala, Spark | Tagged , , | Leave a comment

Building SciKitLearn Random Forest Model and Tuning Parameters without writing Python Code

Random Forest is a supervised learning algorithm which can be used for classification and regression. In this article we go though a process of training a Random Forest model including auto parameter tuning without writing any Python code.We will use patient medical data to predict heart disease as an example use case.

The implementation is available in open source project avenir on github. Extensive use of configuration parameters enables the end user to use the solution without writing python code.

Continue reading
Posted in Data Science, Machine Learning, Python, ScikitLearn | Tagged , , | Leave a comment

Model Drift Detection with Kolmogorov Smirnov Statistic on Spark

In retail business, you may be using various business solutions based on product demand data e.g inventory management or how a newly introduced product may be performing with time. The buying behavior model may change with time rendering the those solutions ineffective. It may be necessary to periodically tune the system, i.e detect any drift in the behavior model. If the drift is significant, appropriate changes to be made in business solutions that depend on demand distribution.

In this post we will find out how to use a statistic called Kolmogorov Smirnov Statistic (KS statistic )to measure product demand model drift. The spark implementation for KS Statistic is available in my open source project chombo and avenir. The implementation uses a heavy doses of configuration and is application agnostic.

Continue reading
Posted in Data Science, Machine Learning, Spark, Statistics | Tagged | Leave a comment

Evaluation of Time Series Predictability with Kaboudan Metric using Prophet

You might be getting ready to build a time series forecasting model using state of the art LSTM network. Before you proceed you may want to pause and ask yourself whether your time series inherently predictable at all i.e whether it’s even possible to build a forecast model.

In this post, we will learn how to calculate a metric called Kaboudan Metric, that will indicate how predictable a time series is. The python implementation is available in my open source project avenir. Any forecasting model can be used to calculate this metric. I have used Prophet from Facebook.

Continue reading
Posted in Python, Time Series Analytic | Tagged , , | Leave a comment

Contextual Data Completeness Metric Computation on Spark

Data quality is critical for the healthy operation of any data driven enterprise. There are various kinds of data quality metrics. In this post, the focus will be on the completeness of data. Data quality from a completeness point of view will be expressed as metrics in different levels of granularity. Additionally, data completeness will be assessed within the context of a consuming process and hence it is contextual.

The Spark implementation is available in my open source project chombo. The implementation is agnostic to the specific data set. It’s configuration and meta data driven.

Continue reading
Posted in Big Data, Data Science, ETL, Spark | Tagged , | Leave a comment

Machine Learning Model Interpretation and Prescriptive Analytic with Lime

Machine learning model interpretablity is the degree to which a human can comprehend the reasons behind the prediction made by a model. Interpretablity may be required for various reasons e.g. meeting compliance requirements or gaining insight for high stakes situation e.g. medical diagnosis. In this post we will show how to use lime python library to to interpret a Random Forest based loan approval Machine Learning predictive model.

The implementation is available in my open source project avenir. To make it easier, a wrapper class around lime has been created, so that lime can be used without any Python coding by defining all relevant parameters in a properties configuration file.

Continue reading
Posted in Data Science, Machine Learning, Python | Tagged , , | Leave a comment

Automated Machine Learning with Hyperopt and Scikitlearn without Writing Python Code

The most challenging part of building supervised machine learning model is optimization for algorithm selection, feature selection and algorithm specific hyper parameter value selection that yields the best performing model. Undertaking such a task manually is not feasible, unless the model is very simple.

The purpose of Automated Machine Learning (AutoML) tools is to democratize Machine Learning by making this optimization process automated. In this post we will use one such autoML tool called Hyperopt along with Scikitlearn. and show how to choose the optimum Scikitlearn classification algorithm, feature subset and associated hyper parameters for the algorithm. The solution is available in my open source project avenir on github.

Continue reading
Posted in Data Science, Machine Learning, Python, ScikitLearn, Supervised Learning | Tagged , , , | 3 Comments

Time Series Trend and Seasonality Component Decomposition with STL on Spark

You may be interested in decomposing a time series into level, trend, seasonality and remainder components to gain more insight into your time series. You may also be interested in decomposition to separate out the remainder component for anomaly detection. We will allude to various real life use cases later in this post. We will go through a time series decomposition solution using Seasonal and Trend decomposition using Loess (STL) algorithm as implemented on Spark.

The spark implementation is available in my open source project ruscello. The implementation is agonistic to any problem domain or data set, because as in my other Spark implementations, it is metadata and configuration driven. We will use eCommerce product sales data to show case the STL implementation.

Continue reading
Posted in Anomaly Detection, Big Data, Data Science, ETL, Spark, Time Series Analytic | Tagged , , , | Leave a comment