While all the focus is on maximizing model accuracy while training a machine learning model, enough attention is not paid to model robustness. You may have a perfectly trained model with high accuracy, but how confident are you about the accuracy. The accuracy may not be stable. It may vary across different regions of the feature space. Or the model may be vey sensitive to moderately out of distribution data following production deployment.
The focus of this post is overview of various robustness metrics and then showing some results for a particular metric. The implementation is available in my open source Github repository avenir.
Any machine learning model used for making decisions regarding humans may potentially be biased because the data used to train the model may be tainted with human bias. A model trained with biased data will exhibit the same bias when used for making predictions. Some examples of such biases are loan approval, recruitment and crime prediction. Such biased behavior of models may in violation of various anti discriminatory laws in many countries. Proper steps are necessary to detect such bias and take the necessary steps to remove such bias and comply with regulatory requirements
The focus of this post is detecting and measuring human bias according to various metrics. The python implementation is available in my open source GitHub repository avenir.
Most companies put lot of effort ensuring superb customer service. They want to resolve customer issues as quickly as possible leaving a positive experience with customers. It’s been said that one negative experience with customer service an obliterate loyalty to a company built over many years. Machine Learning can play a significant role in improving the quality of customer service.
In this post we go though a solution for detecting anomalous customer service cases using AutoEncoder. An anomalous customer service case will in many cases represent poor customer service. The solution is based on PyTorch implementation of AutoEncoder. I have implemented a Python wrapper class around PyTorch AutoEncoder. This along with a configuration file makes it easier to use PyTorch AutoEncoder. The solution is available in my open source Github project avenir.
Concept drift is a serious problem for production deployed machine learning models. Concept drift occurs there is significant change in the underlying data generation process causing significant shift in the posterior distribution p(y|x). Concept drift is manifested as significant increase in error rates for deployed models in production. To mitigate the risk, it is critical to monitor performance of deployed models and detect any concept drift. If not detected and a model trained with recent data deployed, concept drift may render your model ineffective in production. One recent example of detrimental effect concept drift, as reported in media is the worsening performance of many deployed machine learning models as result of significant customer behavior change due to the Corona virus.
In this post, we will go through some techniques for supervised concept drift detection. We will also go through Python implementation of the algorithms along with results using an algorithm called Early Drift Detection Method (EDDM).The Python implementation is available in my open source GitHub repo for anomaly detection called beymani.
There are many complex real world optimization problems for which it’s not possible to obtain the exact best solution efficiently with reasonable amount of computing resources. Often the solution search space for such problems is combinatorially explosive. For such problems, heuristic optimizations are the only pragmatic option. Heuristic optimization algorithm with significantly reduced computational cost is used when a sub optimum solution is acceptable.
In this post we will go through a solution for meeting schedule optimization with Genetic Algorithm (GA) in Python. For this seemingly innocuous problem, search space may have trillions of solutions to explore. I have implemented set of heuristic optimization algorithm, including GA available in my open source Github repository avenir. The implementations are reusable and agnostic to any specific problem.
Machine Learning has been very successful using observational data to build models for predictions, but does not go far enough for causal inference. We humans use cause and effect to learn about the world. In causal inference statistical tools are used to analyze cause and effect. In causal analysis, our goal is to set a variable to a specific value to find the outcome in another variable, which aids in decision making. This is traditionally done through Randomized Control Trial or A/B testing. However in many real life cases A/B testing is not feasible or too expensive. In this post we will discuss solution for causal inference with deep learning models. We will use manufacturing supply chain as an example where our goal will be to gain insight on how to reduce back order to optimize profit.
The causal inference analysis in this post is based causal graphical model and do calculus. The implementation based on PyTorch is available in my open source project avenir in GitHub.
The goal of change point detection is to detect the times when statistically significant and sustained changes happen in a time series. It has wide range of applications in various domains including retail, medical, IoT, finance, business and meteorology. In this post we will go through a solution as implemented on Spark, based on non parametric two sample statistic to identify change points. Retail eCommerce sales data will be used as an example to show case the solution. Abrupt changes in sales can occur for various reasons e.g cannibalization by a competing product, sudden increase in sale due wrong posted sale price etc.
The implementation is part of my OSS project beymani in Github. As with all my projects, the implementation, Continue reading
Some time ago I worked on an enterprise search project, where we were tasked to improve the performance of an enterprise Solr search deployment. We recommended various improvements based on classic NLP techniques. One of the items on the agenda was deep learning language model based semantic search. Unfortunately we never got that to it because of time and budgetary constraints.
Recently I got a chance to experiment with BERT pre trained Transformer model for semantic search. I experimented with various similarity algorithms for query and document vector embeddings. I will share my findings in this post along with suggestions on how to integrate it with Solr or ElasticSearch to boost performance. The python script is available Continue reading
Posted in AI, Deep Learning, elastic search, Machine Learning, NLP, Python, PyTorch, Search
Tagged annoy library, bert, glove, got, milvus library, semantic search, similarity search, transformer, word2vec
It’s a costly mistake to jump straight into building machine learning models before getting a good insight into your data. I have made the mistake and paid the price. Since then I made a resolution to learn about the data as much as possible first before taking the next step. While exploring data, I always found myself using multiple python libraries and doing plethora of imports for various python modules.
That experience motivated me to consolidate all common python data exploration functions, in one python class to make it easier to use. As an added feature I have also provided a workspace like interface, using which you can register multiple data sets with user provided name for each data set. You can refer to the data sets by name and perform various operations. The python implementation is available Continue reading