Realtime Concept Drift Detection for Machine Learning Classification Models

Concept drift can occur when the relationship between the input and output data changes i.e P(Y|X) changes while P(X) remains same. It happens when the behavior of the underlaying process has changed with respect to the model training time. This situation is not uncommon in real world situations, especially when it comes to ML models that involve human behavior, explicitly or implicitly. Please refer to my earlier post for more technical information on concept drift. I had a batch based implementation of several model drift detection algorithm as described in my earlier post. I implemented window based real time API for concept drift detection made them part of my Python package matumizi. The code is in my GitHub repo whakapai.

Continue reading
Advertisement
Posted in AI, Anomaly Detection, Machine Learning, mlops, Python | Tagged , , | Leave a comment

Delivery Vehicle Route Planning with Ant Colony Optimization

Many real world optimization problems are not simple and linear that’s amenable to neat closed form solution. For such problems there are various heuristic optimization techniques. These heuristics algorithms find good enough sub optimal solutions while limiting computing cost. Many of the heuristic optimization techniques are nature inspired. The book Clever Algorithms Nature Inspired Programming Recipes by Jason Brownlee is a wonderful resource on the topic. The link has a download link for the book. There is one class of nature inspired optimization algorithms called Swarm Intelligence. Ant Colony Optimization is technique that falls under the category of Swarm Optimization.

In this post we will solve the problem of delivery route planning with Ant Colony Optimization (AOC). The implementation is available in my Python package arotau. This package contains several heuristic optimization algorithm implementation. Here is the GitHub repository.

Continue reading
Posted in Optimization, Python | Tagged , , | Leave a comment

Synthetic Time Series Data Generation

Recently I started working on a Python package which is everything time series, with specific focus on EDA, forecasting, classification and anomaly detection. It will leverage other Python libraries wherever appropriate. My first realization was that I need a Python module to generate synthetic time series data. This post is all about synthetic data generation for time series. Our generation example will be a time series with trend, seasonal cycle and random noise. It’s part of of new Python package zaman, which is in the work in progress stage right now. The source code is in the GitHub repo whakapai. The repo contains source for several data science Python packages.

Continue reading
Posted in Data Science, Python, time series | Tagged , | Leave a comment

Simulating A/B Test with Counterfactual and Machine Learning Regression Model

Performing A/B test is costly. It takes time and resource.AB testing is a way trying multiple versions of something to find out which works best based on some metric. it’s also called Randomized Controlled Test (RCT). There are many application where A/B testing is prevalent e.g web site design, marketing campaign, drugs trials. Performing A/B testing is time consuming and costly.

In this post, we will go through a simulation based alternative to A/B test using counterfactual and a machine learning regression model. The use case is for a targeted marketing campaign. We will simulate different marketing campaign targets to discover which one generates maximum revenue. The ML regression model is neural, implemented with a no code PyTorch framework, which is available as a Python package.

Continue reading
Posted in AI, causality, Data Science, Deep Learning, Machine Learning, Python | Tagged , , , | Leave a comment

Synthetic Regression Data Generation in Python

In one of my projects, I needed to to generate synthetic data for a regression model. After looking around I could not find anything satisfactory, including Scikit-Learn. I wanted to have more control over the data generation process. I decided to implement my own data generator. It provides lot of control over the generation process, all configured in a configuration file. Hope you will find it useful. It’s available in Python package matumizi. For more details please refer to the GitHub repo whakapai.

Continue reading
Posted in AI, Machine Learning, Python, Uncategorized | Tagged , | 3 Comments

AI Past, Present and Future

AI has gone through many cycle of ups and downs. There has been stunning progress and as well as disappointing slump. Fueled by Machine Learning and specifically Deep Learning there has lot of progress in the last 15 years. But still we are not anywhere near Artificial General Intelligence. It remains elusive. Here is a survey. The focus is breadth and not depth. For deeper study of any particular topic, the citations provided can be followed.

Continue reading
Posted in AI, Machine Learning | Tagged , | 1 Comment

Information Gain based Feature Selection in Python for Machine Learning Models

There are various techniques for feature selection for a classification or regression ML models. There is a category of techniques founded on information theory. The particular technique under discussion is based on information gain and belongs to the same category. Information gain (IG) is the reduction in entropy or surprise after the data is split at certain value of a feature variable. It’s the technique used for training Decision Tree models. We will be using the same technique, but for feature selection.

The implementation is available in my Python package matumizi. The package provides among other features, a data exploration class with more than 100 data exploration functions. Among them are some information theory based feature selection algorithms. The codes in available in the GitHub repository whakapai

Continue reading
Posted in Machine Learning, Python | Tagged , | 1 Comment

Machinery Fault Detection with Vibration Data Anomaly Detection using Variational Auto Encoder.

Auto Encoder is a unsupervised Deep Learning model. Data is encoded to a lower dimensional latent space using an encoder network. The decoder decodes the latent space data trying to reconstruct the original data. With Variational Auto Encoder (VAE), the encoder encodes probability distribution in the latent space. The decoders samples from the latent distribution and decodes to re generate original input. In this post we will find out how VAE can be used to identify faulty machinery by finding anomaly in the vibration data. The implementation is based on PyTorch and available in my Python package torvik. Code is available in the Github repo whakapai.

Continue reading
Posted in Anomaly Detection, Deep Learning, Internet of Things, Machine Learning, Python, PyTorch | Tagged , , | Leave a comment

Patient Appointment Management Optimization with Simulated Annealing

Optimization involves finding a set of problem parameter values that corresponds to minimization of some cost value. There is a class optimization algorithms calledHeuristic Optimization for real world complex non linear optimization problems. Many of them are inspired by natural processes. Simulated Annealing is one such algorithm. We will use this algorithm to find optimum patient appointments in a doctor’s office. The implementation is part of my Python package called arotau. It contains various heuristic optimization algorithm implementation. I will be adding more in future. Here is the GitHub repo.

Continue reading
Posted in AI, Optimizatiom, Python | Tagged , | Leave a comment

Stock Portfolio Balancing with Monte Carlo Simulation

Portfolio balancing is a complex optimization problem. The problem can be stated as assignment of weights to different stocks in the portfolio so that a metric called Sharpe Ratio is maximized. In this post we will see how Monte Carlo simulation provides a simple solution. The approach is brute force, because with simulation we will be doing random search through the solution space. The Monte Carlo simulation module is part of my Python package matumizi. Here is the GitHub repo. The main focus of matumizi is a Exploratory Data Analysis(EDA) module with over 100 functions

Continue reading
Posted in Data Science, Python, Simulation, Statistics | Tagged , , | Leave a comment