Author Archives: Pranab

About Pranab

I am Pranab Ghosh, a software professional in the San Francisco Bay area. I manipulate bits and bytes for the good of living beings and the planet. I have worked with myriad of technologies and platforms in various business domains for early stage startups, large corporations and anything in between. I am an active blogger and open source project owner. I am passionate about technology and green and sustainable living. My technical interest areas are Big Data, Distributed Processing, NOSQL databases, Machine Learning and Programming languages. I am fascinated by problems that don't have neat closed form solution.

Model Drift Detection with Kolmogorov Smirnov Statistic on Spark

In retail business, you may be using various business solutions based on product demand data e.g inventory management or how a newly introduced product may be performing with time. The buying behavior model may change with time rendering the those … Continue reading

Posted in Data Science, Machine Learning, Spark, Statistics | Tagged | Leave a comment

Evaluation of Time Series Predictability with Kaboudan Metric using Prophet

You might be getting ready to build a time series forecasting model using state of the art LSTM network. Before you proceed you may want to pause and ask yourself whether your time series inherently predictable at all i.e whether … Continue reading

Posted in Python, Time Series Analytic | Tagged , , | Leave a comment

Contextual Data Completeness Metric Computation on Spark

Data quality is critical for the healthy operation of any data driven enterprise. There are various kinds of data quality metrics. In this post, the focus will be on the completeness of data. Data quality from a completeness point of … Continue reading

Posted in Big Data, Data Science, ETL, Spark | Tagged , | Leave a comment

Machine Learning Model Interpretation and Prescriptive Analytic with Lime

Machine learning model interpretablity is the degree to which a human can comprehend the reasons behind the prediction made by a model. Interpretablity may be required for various reasons e.g. meeting compliance requirements or gaining insight for high stakes situation … Continue reading

Posted in Data Science, Machine Learning, Python | Tagged , , | Leave a comment

Automated Machine Learning with Hyperopt and Scikitlearn without Writing Python Code

The most challenging part of building supervised machine learning model is optimization for algorithm selection, feature selection and algorithm specific hyper parameter value selection that yields the best performing model. Undertaking such a task manually is not feasible, unless the … Continue reading

Posted in Data Science, Machine Learning, Python, ScikitLearn, Supervised Learning | Tagged , , , | 2 Comments

Time Series Trend and Seasonality Component Decomposition with STL on Spark

You may be interested in decomposing a time series into level, trend, seasonality and remainder components to gain more insight into your time series. You may also be interested in decomposition to separate out the remainder component for anomaly detection. … Continue reading

Posted in Anomaly Detection, Big Data, Data Science, ETL, Spark, Time Series Analytic | Tagged , , , | Leave a comment

Missing Value Imputation with Restricted Boltzmann Machine Neural Network

Missing value is a common problem in many real world data set. There are various techniques for imputing missing values. We will use a kind of Neural Network called RBM for imputing missing values. Restricted Boltzmann Machine (RBM) are stochastic … Continue reading

Posted in Data Science, Deep Learning, ETL, Machine Learning, Python | Tagged , , , | Leave a comment