Stock Portfolio Balancing with Monte Carlo Simulation

Portfolio balancing is a complex optimization problem. The problem can be stated as assignment of weights to different stocks in the portfolio so that a metric called Sharpe Ratio is maximized. In this post we will see how Monte Carlo simulation provides a simple solution. The approach is brute force, because with simulation we will be doing random search through the solution space. The Monte Carlo simulation module is part of my Python package matumizi. Here is the GitHub repo. The main focus of matumizi is a Exploratory Data Analysis(EDA) module with over 100 functions

Continue reading
Posted in Data Science, Python, Simulation, Statistics | Tagged , , | Leave a comment

Pricing Policy Evaluation and Comparison with Temporal Difference Learning

There are two main kinds of problems in Reinforcement Learning (RL), evaluation of a policy aka the prediction problem and finding the best policy aka the control problem. A policy dictates the action to be taken given the current state. In this post, we will evaluate 2 pricing policies for air fare using Temporal Difference (TD) learning algorithms. Then we will use a policy improvement algorithm to find the better of the 2 policies. We will use my Python package for RL called qinisa. It has some classic Reinforcement Learning and Multi Arm Bandit algorithms.Here is the GitHub repo.

Continue reading
Posted in AI, Machine Learning, Python, Reinforcement Learning | Tagged | Leave a comment

Tabular Data Column Semantic Type Identification with Contrastive Deep Learning

When data is aggregated from various source in a dynamic environment where the data format might change without any notice, identifying semantic type of columns in data is a challenging problem. In this post the problem semantic type identification of data columns will be framed as a classification problem with manually engineered column features. Also instead of using a normal SoftMax label probabilities, we will be using contrastive learning. The solution is available in my OSS GitHub repo whakapai. It’s also available as a Python package called torvik.

Continue reading
Posted in Data Science, Deep Learning, Python, PyTorch | Tagged , | 1 Comment

Feature Selection with Information Theory Based Techniques in Python.

Feature selection is the process of selection a subset of features most relevant from a given set of features for a supervised machine learning problem. There are many techniques for feature selection. in this post we will use 4 information theory based feature selection algorithms. This post is not about feature engineering which is construction of new features from a given set of features. The implementation is available in the daexp module of my python package matumizi. The GitHub repo is whakapai.

Continue reading
Posted in Data Science, Machine Learning, Python | Tagged , | Leave a comment

Discovering Subject Matter Experts from Email Communication Data using Graph Convolution Network.

Deep Learning model architecture aligns with a specific structure of the data e.g RNN or LSTM for linear data like text, CNN for grid data like image. The structure of the data in these cases are specialized kind graph structure. Linear data like text is a linear graph and grid data like like image is a grid graph. Graph Neural Network(GNN) is very powerful because it can process data with any arbitrary graph structure. Data with generic graph structure abound in real life e.g social network, paper citation graph. In this post, we will find out how GNN can be used to discover subject matter experts from email communication data.

We will use a type of GNN called Graph Convolution Network (GCN) for the solution. A no code GCN implementation based on PyTorch is available in my Github repo whkapai. it’s also available as part of Python package in TestPyPi

Continue reading
Posted in Deep Learning, Machine Learning, Python, PyTorch | Tagged , , | Leave a comment

Gig Economy Workforce Scheduling with Reinforcement Learning

Gig economy workers are typically work on a contract, potentially temporary and called to work on as needed basis. Some examples are delivery service, app based taxi service, content creation and low level administrative work. . A company may have a pool of gig workers. On a given day based on demand forecast they might need certain number of workers. How to decided which workers to call from the pool, that’s most beneficial to the company. It’s complex decision making problem. We are going to find out in this most how a type of Reinforcement Learning (RL) called Multi Arm Bandit (MAB) can effectively solve this decision making problem.

The Python implementation is available in my OSS GitHub repository avenir. The use case for the solution is a fictitious food delivery service.

Continue reading
Posted in Machine Learning, multi arm bandit, Python, Reinforcement Learning | Tagged , , | Leave a comment

Out of Distribution Data Detection in Deployed Machine Learning Models

If a deployed machine learning model encounters an out of distribution data, it should either reject it or delegate it to a human reviewer for further investigation and decision making. A sample is out of distribution (OOD) when it is generated by a distribution different from the distribution of the training data. For high stakes application such as finance or medical it’s critical to detect OOD data. Out of distribution data is related to data drift, except that data drift signifies a permanent shift in data data distribution. Out of distribution data is closer to outlier or anomalous data problem. Outlier detection aims to detect samples that are markedly different from most of the data.

There are many techniques for for OOD detection. In this post we will go through an OOD detection technique based on nearest neighbor algorithm applied to the latent data for a deep learning model. The implementation is available in my OSS Github repo avenir

Continue reading
Posted in AI, Machine Learning, mlops, Outlier Detection, Python, PyTorch | Tagged , , | Leave a comment

Remedial Action Recommendation with Machine Learning and Genetic Algorithm

Prescriptive analytic sits at the top of a three tier analytic pyramid. The bottom layers are descriptive and predictive analytic. Prescriptive analytic entails action recommendations based on the results of descriptive and predictive analytic, which if executed will have have positive business impact. As an illustrative example, after a machine learning has predicted that a customer is very likely to churn in the near future, the business might be interested in getting some remedial action recommendations which if implemented will prevent the churn.

In this post we will go through a solution for remedial action based on predictive Machine Learning (ML) and Genetic Algorithm (GA) , using loan approval as an example. Following the rejection of a loan application by the ML model, the bank may be interested in a set of remedial action recommendations for the applicant, so that the negative outcome can turned around to a positive one. The implementation is available in my OSS Github repo avenir.

Continue reading
Posted in AI, Data Science, Deep Learning, Machine Learning, Optimizatiom, Python, PyTorch | Tagged , , | Leave a comment

Conformal Prediction for a Neural Regression Model

When a deployed machine learning model makes a prediction, should we accept the prediction on its face value or question the reliability of the prediction. For certain critical applications like medical and aviation, where some decision making is involved post prediction unless there is high confidence associated with the prediction it may be too risky to accept it.

How do you associate a confidence value with the model prediction.It’s tempting to use the probability prediction for a classification as a measure confidence. However the predicted class probability has nothing to do with confidence. This is where conformal prediction enters into the picture. It enables us to associate a confidence level with the model prediction. For decision making system e.g deciding whether to treat patient based on the model prediction of chest X-ray, it’s critical to have a level of confidencve with the model prediction.

Continue reading
Posted in AI, Machine Learning, Python | Tagged , | Leave a comment

Machine Learning Model Performance Robustness Based on Local Neighborhood Performance

After training a machine learning model we generally test the model with a validation data set. We calculate accuracy or some other performance metric. This metric is global i.e based on the whole validation data set. How do you know how robust your model is. One measure of robustness is the std deviation or confidence interval of the performance metric calculated for various local neighborhoods of the test data. For a robust model, the std deviation or confidence interval of local performance metric should be low.

In this post, we will use a neural network model for loan approval and investigate robustness of the models. The implementation can be found in my OSS Github repo avenir.

Continue reading
Posted in AI, Machine Learning, Performance, Python, PyTorch | Tagged , | Leave a comment