Controlling 2D Artificial Data Mixtures Overlap

Mohammed Ouali, Walid Mahdi, Radhwane Gharbaoui, Seyyid Ahmed Medjahed

Abstract


Clustering methods are used for identifying groups of similar objects considered as homogenous set. Unfortunately, analytic performance evaluation of clustering methods is a difficult task because of their ad-hoc nature. In this paper, we propose a new test case generator of artificial data for 2 dimensional Gaussian mixtures. The proposed generator has two interesting advantages: the first one is its ability to produce simulated mixture for any number of components, while the second one resides in the fact that it formally quantifies the overlap rate which allows us to add some complexity to the data. Clustering algorithms and validity indices behavior is also analyzed by changing the overlap rate between clusters.

Keywords


Clustering algorithms, unsupervised learning, Gaussian mixture, Gaussian components overlap

Full Text: PDF