Information retrieval effectiveness of Turkish search engines


Bitirim Y., Tonta Y. A. , Sever H.

ADVANCES IN INFORMATION SYSTEMS, cilt.2457, ss.93-103, 2002 (SCI İndekslerine Giren Dergi) identifier

  • Cilt numarası: 2457
  • Basım Tarihi: 2002
  • Dergi Adı: ADVANCES IN INFORMATION SYSTEMS
  • Sayfa Sayıları: ss.93-103

Özet

This is an investigation of information retrieval performance of Turkish search engines with respect to precision, normalized recall, coverage and novelty ratios. We defined seventeen query topics for Arabul, Arama, Netbul and Superonline. These queries were carefully selected to assess the capability of a search engine for handling broad or narrow topic subjects, exclusion of particular information, identifying and indexing Turkish characters, retrieval of hub/authoritative pages, stemming of Turkish words, correct interpretation of Boolean operators. We classified each document in a retrieval output as being "relevant" or "nonrelevant" to calculate precision and normalized recall ratios at various cut-off points for each pair of query topic and search engine. We found the coverage and novelty ratios for each search engine. We also tested how search engines handle meta-tags and dead links. Arama appears to be the best Turkish search engine in terms of average precision and normalized recall ratios, and the coverage of Turkish sites. Turkish characters (and stemming as well) still cause bottlenecks for Turkish search engines. Superonline and Netbul make use of the indexing information in metatag fields to improve retrieval results.