Classification of Complex Diseases using an Improved Binary Cuckoo Search and Conditional Mutual Information Maximization
Abstract
With the advancement of various computational techniques, there is an exponential growth in genomic data. To analyze such huge amount of data, there is necessity of efficient machine learning techniques. The genomic data usually suffers from “curse of dimensionality” problem, having large number of n (features) and small number of p (samples), which makes classification task very complex. In the present study, a new intelligent hybrid method based on CMIM (condition mutual information maximization) and novel IBCS (Improved binary cuckoo search) is used for classifying various complex diseases. The CMIM is used to deal with dimensionality problem and IBCS is to select most informative features. Generally the standard BCS (binary cuckoo search algorithm) is used for feature selection but it has problems like low optimization accuracy and low localized searching. The IBCS overcome the shortcomings of BCS, and improved the classification accuracy by choosing best informative feature subset. The proposed technique applied on five different SNPs dataset which are publically available on NCBI GEO. The proposed model attains high classification accuracy and outperformed other feature selection techniques. The IBCS was also compared with other metaheuristics algorithms such as Binary GA, Binary PSO and Binary ACO, and the result shows that it has better classification accuracy.
Keywords
Metaheuristic, CMIM, IBCS, feature selection, classification, SNP