Font Size:
ENSEMBLE CLASSIFIERS SUPPORT VECTOR MACHINE FOR DNA MICROARRAY MULTICLASS IMBALANCE
Last modified: 2018-07-07
Abstract
Microarray technology enables a measurement on a large and parallel scale to express about thousands and maybe even tens of thousands of genes. It has become one of the most successful molecular biological technologies in the modern era and has been widely applied to predict gene function or even discover new subtypes of specific tumors and cancer classification. However, microarray data are known to have features such as high dimensions, small samples, high noise and unbalanced class distributions (imbalanced). The imbalanced data condition becomes a classification problem, since the classification engine will tend to predict the majority class compared to the minority class. This results in the classification of minority classes being underestimated and influencing performance evaluation criteria. Therefore, in this research will be applied Random Undersampling method that serves to minimize negative impact from loss of information while maximizing positive impact of data cleaning in undersampling process. The data used in this research are simulation data and real data. The real data obtained through http://www.gems-system.org/ is DNA Microarray, namely leukemia 1, brain tumor 1 and Lung Cancer. This study uses threefold cross-validation. The One Against All (OAA) SVM method is used for multiclass classification coupled with the handling of imbalanced condition problems by using the Sampling-Based Approach approach that is the Random Undersampling (RUS) algorithm. Evaluation criteria of classification performance based on Accuracy, F-Score and G-mean values
Keywords
Imbalanced Data, Multiclass Classification, Random Undersampling (RUS), SVM One Against All (OAA)