Last modified: 2018-07-07
Abstract
Indonesia is one of the leading rice producers and biggest rice consumers in the world, averaging more than 200 kilograms per head each year. For this reason, Indonesia needs to be aware of threats affecting the sustainability of rice production systems. Bacterial leaf blight (Xanthomonas oryzae pv. oryzae) is one of the disease that distract growth of rice, this bacteria causes severe damages in many rice cultivation regions of the world. Currently, Bacterial leaf blight (BLB) is reported to not only damage wetland rice, but also upland rice in Indonesia. BLB disease control through the development of BLB resistant varieties is one of the most effective and simplest way to be applied by farmers. The experimental method (in vitro) in laboratory is one of the usual ways to find disease resistance gene, but this method is costly, requires a lot of time and known to have a big error. Thus, a new method is required to overcome these shortcomings. Computational method is one solution to solve this problem. There are three steps from computation methods in predicting genes resistance disease which are extraction, feature selection, and machine learning. Feature extraction is digging up useful information in proteins and describe it as a normalized vector feature. In this paper, we used global encoding in the feature extraction process. The new dataset from feature extraction that is used in prediction of disease-resistant dataset consists of hundreds of features. However, high dimensional will make overfitting in classification and high computational cost. Therefore, it takes feature selection by reducing the number of features to be used in the classification and generating new data set containing the best and most relevant features. We then build a model that represents the data to analyze disease-resistant gene in rice. With the development of machine learning research, there are multiple ways to analyse disease-resistant gene data using machine learning methods. However, the popular and effective methods with a high accuracy is Support Vector Machine (SVM). SVM has demonstrated high classification capabilities in the field of protein prediction, functional classification of proteins, multiplication of protein folds and subcellular location prediction. The result we obtained is with only ten features, the performance of purpose method indicates that model can represent protein information 90.91% with training data used 90%. This result are very encouraging and show that our purpose method is very useful, because with short running time and low dimension data we can predict disease-resistant gene in rice with high accuracy.