While all the focus is on maximizing model accuracy while training a machine learning model, enough attention is not paid to model robustness. You may have a perfectly trained model with high accuracy, but how confident are you about the accuracy. The accuracy may not be stable. It may vary across different regions of the feature space. Or the model may be vey sensitive to moderately out of distribution data following production deployment.
The focus of this post is overview of various robustness metrics and then showing some results for a particular metric. The implementation is available in my open source Github repository avenir.
Continue reading