Multi-document extractive text summarization: A comparative assessment on features


MUTLU B., SEZER E., AKCAYOL M. A.

KNOWLEDGE-BASED SYSTEMS, vol.183, 2019 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 183
  • Publication Date: 2019
  • Doi Number: 10.1016/j.knosys.2019.07.019
  • Journal Name: KNOWLEDGE-BASED SYSTEMS
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Hacettepe University Affiliated: Yes

Abstract

Text summarization is the process of generating a brief version of a text that preserves the salient information of the text. For information retrieval, it is a good dimension reduction solution. In addition, it reduces the required reading time. This study focused on extracting informative summaries from multiple documents using commonly used hand-crafted features from the literature. The first investigation focused on the generation of a feature vector. The features were the number of sentences, term frequency, similarity with the title, term frequency-inverse sentence frequency, sentence position, sentence length, sentence-sentence similarity, bushy-path results, phrases of the sentence, proper nouns, n-gram co-occurrence, and length of the document. Secondly, several combinations of these features were examined and a shallow multi-layer perceptron and two differently modeled fuzzy inference systems were used to extract salient sentences from texts in the Document Understanding Conference (DUC) dataset. The summarization performances of these models were evaluated using original classification performance metrics, and recall-oriented understudy for gisting evaluation (ROUGE)-n. This study recommended the use of fuzzy systems based on a feature vector and a fuzzy rule set for extractive text summarization. The extraction methods were evaluated against a changing compression ratio. Results of experiments showed that the implemented neural model tended to incorrectly infer sentences that were not considered salient by human annotators. However, for distinguishing between summary-worthy and summary-unworthy sentences, the fuzzy inference systems performed better than the utilized neural network, as well as better than the existing fuzzy inference-based text summarization approaches in the literature. (C) 2019 Elsevier B.V. All rights reserved.