Multi-document extractive text summarization: A comparative assessment on features

MUTLU, BEGÜM; SEZER, EBRU; AKCAYOL, MUHAMMET

doi:10.1016/j.knosys.2019.07.019

Multi-document extractive text summarization: A comparative assessment on features

MUTLU B., SEZER E., AKCAYOL M. A.

KNOWLEDGE-BASED SYSTEMS, cilt.183, 2019 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 183
Basım Tarihi: 2019
Doi Numarası: 10.1016/j.knosys.2019.07.019
Dergi Adı: KNOWLEDGE-BASED SYSTEMS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Hacettepe Üniversitesi Adresli: Evet

Özet

Text summarization is the process of generating a brief version of a text that preserves the salient information of the text. For information retrieval, it is a good dimension reduction solution. In addition, it reduces the required reading time. This study focused on extracting informative summaries from multiple documents using commonly used hand-crafted features from the literature. The first investigation focused on the generation of a feature vector. The features were the number of sentences, term frequency, similarity with the title, term frequency-inverse sentence frequency, sentence position, sentence length, sentence-sentence similarity, bushy-path results, phrases of the sentence, proper nouns, n-gram co-occurrence, and length of the document. Secondly, several combinations of these features were examined and a shallow multi-layer perceptron and two differently modeled fuzzy inference systems were used to extract salient sentences from texts in the Document Understanding Conference (DUC) dataset. The summarization performances of these models were evaluated using original classification performance metrics, and recall-oriented understudy for gisting evaluation (ROUGE)-n. This study recommended the use of fuzzy systems based on a feature vector and a fuzzy rule set for extractive text summarization. The extraction methods were evaluated against a changing compression ratio. Results of experiments showed that the implemented neural model tended to incorrectly infer sentences that were not considered salient by human annotators. However, for distinguishing between summary-worthy and summary-unworthy sentences, the fuzzy inference systems performed better than the utilized neural network, as well as better than the existing fuzzy inference-based text summarization approaches in the literature. (C) 2019 Elsevier B.V. All rights reserved.