Improving Multi-Document Summary Method Based on Sentence Distribution

Aminul Wahib, Agus Zainal Arifin, Diana Purwitasari

Abstract


Automatic multi-document summaries had been developed by researchers. The method used to select sentences from the source document would determine the quality of the summary result. One of the most popular methods used in weighting sentences was by calculating the frequency of occurrence of words forming the sentences. However, choosing sentences with that method could lead to a chosen sentence which didn't represent the content of the source document optimally. This was because the weighting of sentences was only measured by using the number of occurrences of words. This study proposed a new strategy of weighting sentences based on sentences distribution to choose the most important sentences which paid much attention to the elements of sentences that were formed as a distribution of words. This method of sentence distribution enables the extraction of an important sentence in multi-document summarization which served as a strategy to improve the quality of sentence summaries. In that respect were three concepts used in this study: (1) clustering sentences with similarity based histogram clustering, (2) ordering cluster by cluster importance and (3) selection of important sentence by sentence distribution. Results of experiments showed that the proposed method had a better performance when compared with SIDeKiCK and LIGI methods. Results of ROUGE-1 showed the proposed method increasing 3% compared with the SIDeKiCK method and increasing 5.1% compared with LIGI method. Results of ROUGE-2 proposed method increase 13.7% compared with the SIDeKiCK and increase 14.4% compared with LIGI method.


Keywords


Multi-document summaries; Extracting important sentences; Sentence distribution;

Full Text:

PDF

References


Ouyang, Y. Li W., Zhang R., Li S., Lu Q. A Progressive Sentence Selection Strategy for Document Summarization. Journal of information Precessing and Management. 2013; 49(1): 213-221.

Suputra H. G. I., Arifin Z. A., Yuniarti A. Strategi Pemilihan Kalimat pada Peringkasan Multi-Dokumen Berdasarkan Metode Clustering Kalimat. Master Thesis. Surabaya: Postgraduate ITS; 2013.

Sarkar, K. Sentence Clustering-based Summarization of Multiple Text Documents. International Journal of Computing Science and Communication Technologies.2009; 2 (1): 325-335.

He, T., Li F., Shao, W., Chen, J., Ma, L. A New Feature-Fusion Sentence Selecting Strategy for Query-Focused Multi-document Summarization. Proceeding of International Conference Advance Language Processing and Web Information Technology. Eds: Ock C. et al., University of Normal, Wuhan. China.2008: 81-86

Wan, X., Yang, J., Xiao, J. Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction. Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Association for Computational Linguistics, Prague. Czech Republic.2007; 552-559.

Tian X. , Chai Y. An Improvement to TF-IDF: Term Distribution based Term Weight Algorithm. Journal of Software. 2011; 6(3):413-420.

Hammouda, K. M.,Kamel, M. S. Incremental Document Clustering Using Cluster Similarity Histograms. Proceeding of the 2003 IEEE/WIC International Conference on Web Intelligence. Eds: Liu, J. et al., University of Waterloo, Halifax. Canada. 2003:597-601.

Lin, C. Y. ROUGE: A Package for Automatic Evaluation of Summaries. In Proceedings of Workshop on Text Summarization Branches Out. Eds: Moens, M. F. and Szpakowicz, S., Association for Computational Linguistics. Barcelona. 2004:74-81.




DOI: http://doi.org/10.12928/telkomnika.v14i1.2330

Refbacks



Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

TELKOMNIKA Telecommunication, Computing, Electronics and Control
ISSN: 1693-6930, e-ISSN: 2302-9293
Universitas Ahmad Dahlan, 4th Campus
Jl. Ringroad Selatan, Kragilan, Tamanan, Banguntapan, Bantul, Yogyakarta, Indonesia 55191
Phone: +62 (274) 563515, 511830, 379418, 371120
Fax: +62 274 564604

View TELKOMNIKA Stats