Enhanced sentence representation for extractive text summarization: Investigating the syntactic and semantic features and their contribution to sentence scoring


MUTLU BİLGE B., SEZER E.

Expert Systems with Applications, vol.227, 2023 (SCI-Expanded) identifier

  • Publication Type: Article / Article
  • Volume: 227
  • Publication Date: 2023
  • Doi Number: 10.1016/j.eswa.2023.120302
  • Journal Name: Expert Systems with Applications
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Computer & Applied Sciences, INSPEC, Metadex, Public Affairs Index, Civil Engineering Abstracts
  • Keywords: Enhanced sentence representation, Extractive text summarization, Sentence scoring, Summarization corpora, Syntactic and semantic features
  • Hacettepe University Affiliated: Yes

Abstract

The primary challenge faced in extractive text summarization is related to the scoring of sentences, with the critical factor for scoring being the manner in which the sentence representation is conducted. This study aims to investigate this hypothesis and to perform a detailed analysis of the impact of sentence representation techniques that have been used both semantically and syntactically. The study initially evaluated the empirical impact of individual syntactic and semantic features on the accuracy of summarization. To examine syntactic usage, a comprehensive list of 40 syntactic features was developed, while semantic representation was accomplished using sentence embeddings. Subsequently, an improved feature set was proposed that jointly utilizes syntactic and semantic features. To assess the impact of this feature set on the resulting summaries, the proposed sentence representation was tested on three distinct summarization corpora consisting of lengthy scientific documents across diverse domains. The assessment of summary evaluation and classification performance evaluation metrics was conducted to evaluate the quality of the resulting summaries. The findings of the experiments indicated that the summaries generated by the proposed feature set performed better than not only those obtained using individual features but even summaries produced by state-of-the-art methods.