Class Separation based Machine Learning Model Performance Metric


Output of binary classifier is typically the predicted probability of some class. The real probability value is converted to a binary value based on some probability threshold. For a well trained model, the predicted probability values should be clustered around 0 and 1. The class Separation metric will have a large value in such cases,which is a desirable property of the model.

Two Sample Statistic

There are some statistical tests that will indicate whether 2 samples are from the same distribution. If the two samples are from the same distributions,the statistic value is close to zero. As the distributions differ, the statistic value goes up. They are all based on difference in cumulative distribution. The  post cited  above is about a Python class that consolidates more than 70 exploratory data analysis (EDA) functions from various popular Python libraries.

One such statistic is Kolmogorov-Smirnov (KS) statistic. It’s based on the maximum difference between the cumulative distribution of 2 samples. The 2 samples in our cases are the positive and  negative predicted  class probability values. An ideal machine learning classification model will have large  KS statistic. The ideal range for KS statistic is 0.45 to 0.70. in this range the model is considered to have good separation property. Above 0.7 the model is considered to be overfitted i.e large error due to variance. Below 0.45 the model is considered to underfited i.e large error due to bias

Classification Model for Loan Approval

A neural network model with one hidden layer was trained for loan approval. The fields consists of the folowing. Data was artificially generated using ancestral sampling. Desired amount of noise is added to the data by the switching the class label for some random records.

  • Loan ID (not feature)
  • Marital status
  • No of children
  • Education level
  • Whether self employed
  • Income
  • Years of experience
  • No of years in current job
  • Debt amount
  • Loan amount
  • Loan term
  • Credit score
  • Bank account balance
  • Retirement account balance
  • No of prior mortgage loans
  • Approved or not (target)

KS statistic was calculated using positive and negative class probability distributions. The statistic value for this model was found to be 0.362.

It falls below the desired range and hence moderately separable, although the model accuracy was high. The model could be tuned to improve the KS statistic. I have not done any tuning.

Please follow the tutorial to generate data, train the model and calculate KS statistic as class sepration metric

Alternative Separation Metric

There are some alternative solutions for separation metric. Two approaches, one based cluster separation and the other based on distribution divergence will be discussed here.

If the class conditional data distribution is uni modal, then the data belonging to each class following model predictions can be assumed to be clusters. With that assumption we can apply well known cluster quality metrics used in unsupervised clustering problem.

One such measure is Dunn index. It’s based on the ratio of minimum inter cluster distance and maximum intra cluster distance between points. The distance could be maximum distance between point pairs, average distance between point pairs etc. If the classes are well separated, Dunn index will be higher.

We could measure the divergence between multidimensional feature data distribution for predicted positive and negative classes. Depending on the dimensionality, large number of test samples will be required. Divergence could be calculated with absolute difference or KL divergence or a symmetric version of it.

Model Drift

Model drift in presence of non stationary data is a serious problem for models deployed in production. Drift manifests itself with failing performance. As the model drifts, the class separation property will deteriorate.

Wrapping Up

We should look beyond the common performance metrics like accuracy, precision, recall etc to evaluate model performance.

Class separation metric is one such example. Between two models with comparable accuracy, the one with superior class sepration property should be preferred. Class separation metric should be considered as a secondary metric and not a replacement for primary model performance metric.

About Pranab

I am Pranab Ghosh, a software professional in the San Francisco Bay area. I manipulate bits and bytes for the good of living beings and the planet. I have worked with myriad of technologies and platforms in various business domains for early stage startups, large corporations and anything in between. I am an active blogger and open source project owner. I am passionate about technology and green and sustainable living. My technical interest areas are Big Data, Distributed Processing, NOSQL databases, Machine Learning and Programming languages. I am fascinated by problems that don't have neat closed form solution.
This entry was posted in Data Science, Machine Learning, mlops and tagged , . Bookmark the permalink.