Diagnostic tools for outlier detection based on robust clustering
It is well known that the presence of a few outlying measurements can severely affect many commonly used statistical techniques. Therefore, robust methods are recommended to increase resistance to such anomalous data. However, detecting outliers can also be a goal in itself, given their potential interest in many applications. It is also reasonable to assume that the data may be heterogeneous, involving multiple populations not easily separable due to the way they were collected. Robust clustering techniques provide a natural way to handle both heterogeneity and outliers. In particular, we focus on methods based on trimming. Starting from a high trimming level, as a conservative approach, appropriate diagnostic tools can help identify those measurements that should be considered outliers. It is also important to address increasing dimensionality, account for dependence structures, and detect not only outlying observations but also individual outlying data entries.
Keywords: robustness clustering trimming