16th Annual Conference on Intelligent Text Processing and Computational Linguistics (CICLing), Cairo, Egypt, 14 - 20 April 2015, vol.9042, pp.516-524
Semantic similarity between two sentences is concerned with measuring how much two sentences share the same or related meaning. Two methods in the literature for measuring sentence similarity are cosine similarity and overall similarity. In this work we investigate if it is possible to improve the performance of these methods by integrating different word level semantic relatedness methods. Four different word relatedness methods are compared using four different data sets compiled from different domains, providing a testbed formed of various range of writing expressions to challenge the selected methods. Results show that the use of corpus-based word semantic similarity function has significantly outperformed that of WordNet-based word semantic similarity function in sentence similarity methods. Moreover, we propose a new sentence similarity measure method by modifying an existing method which incorporates word order and lexical similarity called as overall similarity. Furthermore, the results show that the proposed method has significantly improved the performance of the overall method. All the selected methods are tested and compared with other state-of-the-art methods.