Controlling 2D Artificial Data Mixtures Overlap
Abstract
Clustering methods are used for identifying groups of similar objects considered as homogenous set. Unfortunately, analytic performance evaluation of clustering methods is a difficult task because of their ad-hoc nature. In this paper, we propose a new test case generator of artificial data for 2 dimensional Gaussian mixtures. The proposed generator has two interesting advantages: the first one is its ability to produce simulated mixture for any number of components, while the second one resides in the fact that it formally quantifies the overlap rate which allows us to add some complexity to the data. Clustering algorithms and validity indices behavior is also analyzed by changing the overlap rate between clusters.
Keywords
Clustering algorithms, unsupervised learning, Gaussian mixture, Gaussian components overlap