Learning term weights by overfitting pairwise ranking loss


Creative Commons License

Sahin Ö., ÇİÇEKLİ İ., Ercan G.

Turkish Journal of Electrical Engineering and Computer Sciences, cilt.30, sa.5, ss.1914-1930, 2022 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 30 Sayı: 5
  • Basım Tarihi: 2022
  • Doi Numarası: 10.55730/1300-0632.3913
  • Dergi Adı: Turkish Journal of Electrical Engineering and Computer Sciences
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC, TR DİZİN (ULAKBİM)
  • Sayfa Sayıları: ss.1914-1930
  • Anahtar Kelimeler: Information retrieval, passage ranking, term weighting, pairwise ranking optimization
  • Hacettepe Üniversitesi Adresli: Evet

Özet

© 2022 Turkiye Klinikleri. All rights reserved.A search engine strikes a balance between effectiveness and efficiency to retrieve the best documents in a scalable way. Recent deep learning-based ranker methods are proving to be effective and improving the state-of-the-art in relevancy metrics. However, as opposed to index-based retrieval methods, neural rankers like bidirectional encoder representations from transformers (BERT) do not scale to large datasets. In this article, we propose a query term weighting method that can be used with a standard inverted index without modifying it. Query term weights are learned using relevant and irrelevant document pairs for each query, using a pairwise ranking loss. The learned weights prove to be more effective than term recall which is a probabilistic relevance feedback, previously used for the task. We further show that these weights can be predicted with a BERT regression model and improve the performance of both a BM25 based index and an index already optimized with a term weighting function.