MESIN PENCARI DOKUMEN DENGAN PENGKLASTERAN SECARA OTOMATIS

Entin Martiana, Nur Rosyid, Usmaida Agusetia

Abstract


Web mining in searching based on keywords by automatic clustering is a document searching method by classifying documents based on its keyword. Following is the clustering by centroid linkage hierarchical method (CLHM) to the number of keywords from each document. In clustering, initialization is commonly required for the number of cluster to be formed first, however, in some clustering cases, the user cannot determine how many clusters can be built. Therefore, on this paper, the Valley tracing method is applied as a constraint which identifies variants movement from each cluster formation step and also analyzes its pattern to form automatic clustering. Document data used are from text mining process on documents. Based on 424 documents, this research shows that clustering method using CLHM algorithm can be generally used to classifying documents with exact number automatically.


Full Text:

PDF

References


. Agus AZ, Setiono AN. Klasifikasi Dokumen Berita Kejadian Berbahasa Indonesia dengan Algoritma Single Pass Clustering. Proceeding of SITIA. Surabaya. 2002: 1-6.

. Barakbah AR, Arai K. Determining Constraints of Moving Variance to Find Global Optimum and Make Automatic Clustering. Proceeding of IES. Surabaya. 2004: 409-413.

. Uramoto N, Matsuzawa H, Nagano T, Murakami A, Takeuchi H, Takeda K. A Text-Mining System for Knowledge Discovery from Biomedical Documents. IBM Systems Journal. 2004; 43(3): 516-533.

. Hammouda KM, Kamel MS. Efficient phrase-based document indexing for Web document clustering. Knowledge and Data Engineering, IEEE Transactions on. 2004; 16(10): 1279-1296.

. Bulacu M, Schomaker L. Text-Independent Writer Identification and Verification Using Textural and Allographic Features. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 2007; 29(4): 701-717.

. Ashraf F, Ozyer T, Alhajj R. Employing Clustering Techniques for Automatic Information Extraction From HTML Documents. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on. 2008; 38(5): 660-673.

. Man L, Chew Lim T, Jian S, Yue L. Supervised and Traditional Term Weighting Methods for Automatic Text Categorization. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 2009; 31(4): 721-735.

. Barakbah AR, Arai K. Identifying Moving Variance to Make Automatic Clustering for Normal Data Set. In. Proc. IECI Japan Workshop (IJW). Tokyo. 2004: 125-134.




DOI: http://doi.org/10.12928/telkomnika.v8i1.603

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

TELKOMNIKA Telecommunication, Computing, Electronics and Control
ISSN: 1693-6930, e-ISSN: 2302-9293
Universitas Ahmad Dahlan, 4th Campus
Jl. Ringroad Selatan, Kragilan, Tamanan, Banguntapan, Bantul, Yogyakarta, Indonesia 55191
Phone: +62 (274) 563515, 511830, 379418, 371120
Fax: +62 274 564604

View TELKOMNIKA Stats