Expert Systems with Applications, cilt.227, 2023 (SCI-Expanded)
The primary challenge faced in extractive text summarization is related to the scoring of sentences, with the critical factor for scoring being the manner in which the sentence representation is conducted. This study aims to investigate this hypothesis and to perform a detailed analysis of the impact of sentence representation techniques that have been used both semantically and syntactically. The study initially evaluated the empirical impact of individual syntactic and semantic features on the accuracy of summarization. To examine syntactic usage, a comprehensive list of 40 syntactic features was developed, while semantic representation was accomplished using sentence embeddings. Subsequently, an improved feature set was proposed that jointly utilizes syntactic and semantic features. To assess the impact of this feature set on the resulting summaries, the proposed sentence representation was tested on three distinct summarization corpora consisting of lengthy scientific documents across diverse domains. The assessment of summary evaluation and classification performance evaluation metrics was conducted to evaluate the quality of the resulting summaries. The findings of the experiments indicated that the summaries generated by the proposed feature set performed better than not only those obtained using individual features but even summaries produced by state-of-the-art methods.