Comparative study of standalone classifier and ensemble classifier

Ensemble learning is one of machine learning method that can solve performance measurement problem. Standalone classifiers often show a poor performance result, thus why combining them with ensemble methods can improve their performance scores. Ensemble learning has several methods, in this study, three methods of ensemble learning are compared with standalone classifiers of support vector machine, Naïve Bayes, and decision tree. bagging, AdaBoost, and voting are the ensemble methods that are combined then compared to standalone classifiers. From 1670 dataset of twitter mentions about tourist’s attraction, ensemble methods did not show a specific improvement in accuracy and precision measurement since it generated the same result as decision tree as standalone classifier. Bagging method showed a significant development in recall, f-measure, and area under curve (AUC) measurement. For overall performance, decision tree as standalone classifier and decision tree with AdaBoost method have the highest score for accuracy and precision measurements, meanwhile support vector machine with bagging method has the highest score for recall, f-measure, and AUC.


INTRODUCTION
Ensemble learning is a part of machine learning that can produce better performance in predicting or classifying data patterns [1]. Technically, this method combines several machine learning algorithms to avoid the risk of underperformance by reducing bias and variance [2]. Generally, each algorithm holds its own characteristic and serves different advantages and disadvantages. The differences in the characteristics of each algorithm lead to differences statistically, computational, and representational, thus triggering this ensemble learning approach [3].
Ensemble learning has different methods in order to improve accuracy and stability. The methods show different way in their approaches such as 1) bagging [4] which generates several versions of predictor to achieve a result of an aggregated prediction; 2) boosting [5] which runs the weak learning algorithm on different distributions over training data before merging the classifiers pin a single composite classifier; and 3) voting [6] as the simplest among the others that combine all algorithms including minimum probability, maximum probability, majority voting, product of probability, and average of probabilities. This study will use those three methods to reduce the bias in performance between decision tree, Naïve Bayes, and support vector machine algorithms (SVM). Koutanaei et al. [7] conducted a research about ensemble learning method and feature selection algorithms for credit scoring. This research analyzed artificial neural network (ANN), classification and regression tree (CART), Naïve Bayes, and SVM. Apart from using bagging and AdaBoost this study added two more methods in ensemble learning, which are stacking and random forest. The outcome of this research is ANN-AdaBoost algorithm classified as the best classifier for credit scoring, meanwhile Naïve Bayes and SVM-AdaBoost are the worst classifiers.
A research by Ankit and Saleena [8] studied about classification system for twitter sentiment analysis. This research used a weighted ensemble classifier as a proposed ensemble classifier to combine base learners and form a single classifier. This method was compared with the individual classifier and majority voting ensemble classifier. The algorithms that used are Naïve Bayes, random forest, SVM, and logistic regression. The outcome of this research showed that the proposed classifier outperformed the other methods.
Onan et al. [6] conducted a research about latent dirichlet allocation (LDA) based topic modelling in text sentiment classification. This research used LDA and Naïve Bayes, SVM, logistic regression, radial basis function network, and K-nearest neighbor algorithms in empirical analysis. For the ensemble learning method, this research used bagging, AdaBoost, random subspace, voting, and stacking. The outcome of this research showed that stacking ensemble got the highest performance score.
Anshary and Trilaksono [9] conducted a research about target market classification using ensemble method. This research used CART to perform ensemble methods. The methods that used are bagging and boosting. Using a total of 3000 dataset from a specific account that have 200,000 followers, the datasets were divided into 4:1 of training and testing data. The result of this research showed that bagging increased the value of precision by 1.9% and provide the highest performance score among those three.
Fouad et al. [10] conducted a research about sentiment analysis using feature selection and classifier ensemble. This research used voting ensemble method to search majority decision among SVM, Naïve Bayes, and logistic regression. The outcome of this research showed that voting ensemble can outperform the other classifier in two datasets. Meanwhile SVM and logistic regression outperform others in one dataset respectively.
This study aims to increase the performance of classifier and also to reduce the bias in classifying sentiment. Three classifiers (decision tree, Naïve Bayes, and support vector machine) are compared in 4 condition; 1) three classifiers performance score using cross validation; 2) bagging version of each classifier; 3) AdaBoost version of each classifier; 4) an ensemble version of three classifiers using voting method. Using 1670 data collected from twitter, those 4 conditions are used to conclude which one is better in overall performance score.
This paper is divided into five sections. Section 1 states the background, purpose of study and related research. Section 2 states the research methodology that is used in this study. Section 3 states the result and discussion of this study. And the last section states the conclusion and future works regarding this study.

RESEARCH METHOD
This research diagram was including 3 phases, which are 1) Data preprocessing which include retrieving data from twitter and pre-processing conducted; 2) Training and testing which implemented three classifiers (decision tree, support vector machine, and Naïve Bayes) to individual classification and ensemble classification (including bagging, AdaBoost, and voting); 3) and lastly parameter measurement uses 5 measurement including accuracy, precision, recall, f-measure, and ROC curve. The diagram was shown in Figure 1.

Data collecting and pre-processing
From 3000 data that collected from twitter, only 1670 were passed the manual classification as only those data that passed the feasibility labelling. 1670 data consist of tweets regarding Ancol's tourist attraction queries, such as 'ancol', 'dufan', 'seaworld', and 'ocean dream samudra'. After collecting those data into comma separated values (CSV) format, pre-processing phase was conducted. The values for sentiment are positive and negative.
Pre-processing phase start from 1) transforming all letters into lowercase (case folding); 2) separating words into tokens and removing all punctuations and whitespace (tokenizing); 3) removing unnecessary words using stopwords dictionary (stopwords removal); 4) and lastly transforming all tokens into their base word (stemming). Stopwords dictionary used dictionary from Tala [11] that consisted Indonesian words of stopwords. This dictionary already used as reference a lot for others pre-processing phase. Meanwhile for stemming, Sastrawi dictionary [12] was used in form of regular expression (regex).

Testing and training
This research applies cross validation to divide dataset into testing and training dataset. Cross validation is a technique that divides samples into k subsets of the same size. In range of 2-10 k-folds, the least number of iterations would be 1 meanwhile the most would be 9 for training phase. Meanwhile 1 single subset is used as testing dataset.

Ensemble method
Ensemble methods are used to reduce bias and increase performance score for classifier. In other words, to solve the problem that a single classifier faced. Ensemble methods combine the outputs of base classifier to boost up the performance score [13]. In training phase, three standalone classifiers (decision tree, support vector machine, and Naïve Bayes) are generated into ensemble learning as a base of learning algorithm. In bagging method, the datasets are used randomly [14] before they are combined using majority voting as the final classification in testing phase. Meanwhile AdaBoost creates base classifiers sequentially by weighting through iterations [7] and later the weighting is adapted by base classifier's misclassification in testing phase. Lastly voting method uses majority decision collected from three base classifiers. As one of the  ISSN: 1693-6930 crucial problems of machine learning lies in its minimum error function, ensemble method could set 'an average' that will reduce the risk of choosing wrong classifier [9]. Thus, ensemble learning produces base classifiers then combine them to get better result.

Decision tree
Decision tree is is a classifier which each internal node of the tree represents a condition on a feature of model, each branch is the output of the previous condition, and each leaf reflects the predicted class from the algorithm [15]. Decision tree is a classification algorithm that determines the method of decision making based on samples or certain criteria and classes. In other words, the decision tree eliminates unnecessary calculations in classification and seems flexible because it can select features from different internal nodes. Decision node takes an action to select of the edges stemming. The edges stemming is selected randomly in chance note, meanwhile the terminal nodes will represent the end of the actions.

SVM
Support vector machine (SVM) is a classifier defined by a separating hyperplane, which labeled data training as output categories [16]. SVM algorithm stated as [17] below: − A classifier for a binary classification will be symbolized as y (labels) and x (features) to denote the class labels and parameters w (normal to the line) and b (bias) as stated in formula 1.
− Then SVM will be represented by a separated hyperplane f(x) that geometrically bisects the data space into two diverse regions thus resulting in classification of the input data space into two categories. − The function f(x) denotes the hyperplane in classification of data set, then the two regions created by the hyperplane correspond to the two categories of data under two class labels. − Let the class labels that needs to be assigned to the data vectors to implement supervised classification be denoted by yi, which is +1 for one category of data vectors and -1 for the other category of data vectors.

Naïve Bayes
Naïve Bayes represents a probabilistic model that allows to capture uncertainty by determining the outcomes probabilities [18]. Naïve Bayes generates probabilities using formulas below: = ℎ

Bagging
Bagging is a parallel method, where it generated base learners in parallel while concerned with reducing variance and obtaining good generalization ability [19]. Bagging uses dataset in random and combines them using majority voting. Bagging generates several versions of predictor to achieve a result of an aggregated prediction [20]. In bagging method, the datasets are used randomly [14] before they are combined using majority voting as the final classification in testing phase. Bagging is fit perfectly in classification issues [21]. The pseudo-code [22] of bagging is shown in Figure 2.

Boosting
Boosting's most popular algorithm, AdaBoost was introduced through Freund and Schapire [5] which settled as the sequential learning method. This is also one of the famous boosting method which creates base classifiers by training weighted transactions through iterations. Similar to bagging, [23] boosting main goal is to classify using average of numeric estimation to base classifier model's output. Boosting also uses new influenced models and reinforces new model to promote them to become experts. The pseudo-code of boosting [22] is stated in Figure 3.

Voting
Voting method generates their prediction by forming the overall ensemble prediction [3]. Voting combines all base classifiers while including minimum and maximum probabilities, majority voting, product of probability, and average of probabilities [6]. Pseudo-code for voting [24] is stated in Figure 4.

RESULTS AND ANALYSIS
Each classifier has different best accuracy score according to its k-folds. To summarize it, from 2-10 range, this study used k-fold 2, 6, and 9 as those 3 hold each classifier's highest accuracy score. Furthermore, it explained in Table 1. Based on the accuracy shown in Table 1, the highest accuracy score is shown on standalone classifier and AdaBoost method, seen by decision tree highest score both on k-folds 9 with 88.14%. The measurement for precision, recall, f-measure, and area under curve (AUC) are shown in Figures 5-8.  As the summary of accuracy measurement shown in Table 1, it can be seen that both decision tree as a standalone classifier can against decision tree with AdaBoost method. Thus, for accuracy there is no specific improvement between standalone classifiers and ensemble classifiers. In Figure 5, it can be seen that the highest precision score shown in both decision tree and decision tree with AdaBoost method too, so the same conclusion applied to this measurement too. In Figure 6, the highest score for recall is shown in support TELKOMNIKA Telecommun Comput El Control  Comparative study of standalone classifier and ensemble classifier (Tri Okta Priasni) 1753 vector machine with bagging method. This showed that bagging method gave quite big improvement from 87.02% (standalone support vector machine) to 97.07%. Meanwhile for F-measure, in Figure 7 it stated that the highest score is also shown in support vector machine with bagging method with quite big improvement from 92.61% to 98.30%. And for the last measurement in Figure 8, it showed that support vector machine with bagging method also has the highest score for AUC measurement with 0.894 and classified as good classification.

CONCLUSION
Ensemble method does not affect accuracy and precision measurement that much, but can affect recall, f-measure, also AUC quite a lot since the biggest improvement from this study shown in the measurement of recall, f-measure and AUC. For overall performance score, support vector machine with bagging method outperforms other classifiers in term of recall, f-measure, and AUC measurement. Meanwhile decision tree (both standalone and AdaBoost method) outperform other classifiers in term of accuracy and precision. Voting method in fact, does not stand out in comparison with other ensemble classifiers. For future works suggestion, this study can improve by adding more datasets and also other ensemble methods. Also, since this study only included 'positive' and 'negative' sentiment, the addition of 'neutral' sentiment can be added to see if it will affect the performance measurement.