Enhanced sentence representation for extractive text summarization: Investigating the syntactic and semantic features and their contribution to sentence scoring

MUTLU BİLGE, BEGÜM; SEZER, EBRU

doi:10.1016/j.eswa.2023.120302

Enhanced sentence representation for extractive text summarization: Investigating the syntactic and semantic features and their contribution to sentence scoring

MUTLU BİLGE B., SEZER E.

Expert Systems with Applications, cilt.227, 2023 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 227
Basım Tarihi: 2023
Doi Numarası: 10.1016/j.eswa.2023.120302
Dergi Adı: Expert Systems with Applications
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Computer & Applied Sciences, INSPEC, Metadex, Public Affairs Index, Civil Engineering Abstracts
Anahtar Kelimeler: Enhanced sentence representation, Extractive text summarization, Sentence scoring, Summarization corpora, Syntactic and semantic features
Hacettepe Üniversitesi Adresli: Evet

Özet

The primary challenge faced in extractive text summarization is related to the scoring of sentences, with the critical factor for scoring being the manner in which the sentence representation is conducted. This study aims to investigate this hypothesis and to perform a detailed analysis of the impact of sentence representation techniques that have been used both semantically and syntactically. The study initially evaluated the empirical impact of individual syntactic and semantic features on the accuracy of summarization. To examine syntactic usage, a comprehensive list of 40 syntactic features was developed, while semantic representation was accomplished using sentence embeddings. Subsequently, an improved feature set was proposed that jointly utilizes syntactic and semantic features. To assess the impact of this feature set on the resulting summaries, the proposed sentence representation was tested on three distinct summarization corpora consisting of lengthy scientific documents across diverse domains. The assessment of summary evaluation and classification performance evaluation metrics was conducted to evaluate the quality of the resulting summaries. The findings of the experiments indicated that the summaries generated by the proposed feature set performed better than not only those obtained using individual features but even summaries produced by state-of-the-art methods.