Statistical methods to identify error patterns in confusion matrices
The study of misclassifications produced by a classifier involves to examine the off-diagonal elements in the associate confusion matrix. These cells reveal which classes are confused with others, highlighting systematic bias or poor feature separation that must be identified. In this paper techniques to analyze misclassifications are proposed. Specifically to detect problems of overprediction or underprediction of given classes and to identify if a classifier has a bias toward certain specific labels. By using a Dirichlet distribution, a Bayesian approach is also proposed to estimate the probabilities of misclassification between classes. In certain cases, it is also possible to visualize these methods. Applications, including a set of omic data, are carried out by using the software R.
Palabras clave: bias of classification confusion matrix Dirichlet distribution missclassification posterior distribution overprediction underprediction