Supervised Entity Tagger for Indonesian Labor Strike Tweets using Oversampling Technique and Low Resource Features
Ayu Purwarianti, Lisa Madlberger, Mochammad Ibrahim
Abstract
We propose an entity tagger for Indonesian tweets sent during labor strike events using supervised learning methods. The aim of the tagger is to extract the date, location and the person/organization involved in the strike. We use SMOTE (Synthetic Minority Oversampling Technique) as an oversampling technique and conducted several experiments using Twitter data to evaluate different settings with varying machine learning algorithms and training data sizes. In order to test the low resource features, we also conducted experiments for the system without employing the word list feature and the word normalization. Our results indicated that different treatment of different types of machine learning algorithms with low resource features can lead to a good accuracy score. Here, we tried Naïve Bayes, C4.5, Random Forest and SMO (Sequential Minimal Optimization) algorithms using Weka as the machine learning tools. For the Naïve Bayes, due to the data distribution based of the class probability, the best accuracy was achieved by removing data duplication. For C4.5 and Random Forest, SMOTE gave higher accuracy result compared to the original data and the data with data duplication removal. For SMO, there is no significant difference among various sizes of training data.
Keywords
: Indonesia NER, SMOTE, supervised learning, word level feature, word window feature
DOI:
http://doi.org/10.12928/telkomnika.v14i4.3876
Refbacks
There are currently no refbacks.
This work is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License .
TELKOMNIKA Telecommunication, Computing, Electronics and Control ISSN: 1693-6930, e-ISSN: 2302-9293Universitas Ahmad Dahlan , 4th Campus Jl. Ringroad Selatan, Kragilan, Tamanan, Banguntapan, Bantul, Yogyakarta, Indonesia 55191 Phone: +62 (274) 563515, 511830, 379418, 371120 Fax: +62 274 564604
<div class="statcounter"><a title="Web Analytics" href="http://statcounter.com/" target="_blank"><img class="statcounter" src="//c.statcounter.com/10241713/0/0b6069be/0/" alt="Web Analytics"></a></div> View TELKOMNIKA Stats