Experimental of vectorizer and classifier for scrapped social media data

Setiawan Assegaff, Errissya Rasywir, Yovi Pratama

Abstract


In this study, we used several classifiers and vectorizers to see their effect on processing social media data. In this study, the classifiers used were random forest, logistic regression, Bernoulli Naive Bayes (NB), and support vector clustering (SVC). Random forests are used to reduce spatial complexity, and also to minimize errors. Logistic regression is a method with a statistical model whose basic form uses a logistic function to represent the binary dependent variable. Then, the Naive Bayes function uses binary elements and SVC which has so far given good results rivals other guided learning. Our tests use social media data. Based on the tests that have been carried out on classifier variations and vectorizer variations, it was found that the best classifier is a linear regression algorithm based on predictive adaptive compared to the random forest method based on decision trees, probability-based Bernoulli NB and SVC which work by clustering. Meanwhile, from the test results on the count vectorizer, term frequency-inverse document frequency (TFIDF), and hashing, the best accuracy is achieved on the TFIDF vectorizer. In this case, it means that the TFIDF vectorizer has a better value in presenting word feature dimensions.

Keywords


classifier; experiment; social media; text processing; vectorizer;

Full Text:

PDF


DOI: http://doi.org/10.12928/telkomnika.v21i4.24180

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

TELKOMNIKA Telecommunication, Computing, Electronics and Control
ISSN: 1693-6930, e-ISSN: 2302-9293
Universitas Ahmad Dahlan, 4th Campus
Jl. Ringroad Selatan, Kragilan, Tamanan, Banguntapan, Bantul, Yogyakarta, Indonesia 55191
Phone: +62 (274) 563515, 511830, 379418, 371120
Fax: +62 274 564604

View TELKOMNIKA Stats