Computer Science Terminology Extraction from Parallel Corpora


Sahin O., Kurtoglu A., ERCAN G.

26th IEEE Signal Processing and Communications Applications Conference (SIU), İzmir, Türkiye, 2 - 05 Mayıs 2018 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası:
  • Doi Numarası: 10.1109/siu.2018.8404174
  • Basıldığı Şehir: İzmir
  • Basıldığı Ülke: Türkiye
  • Hacettepe Üniversitesi Adresli: Evet

Özet

For the sake of having a higher impact, academic publications are written in English and native language. However, as most academic journals and conferences use English as the publication language, new terms are usually introduced in English. As a consequence, introduction of new terms to other languages became harder. This problem is more evident in dynamic fields like Informatics where new terms are frequently introduced. In this work as a solution to this problem, a system for producing an English-Turkish terminology is constructed. Using the English and Turkish abstracts of thesis produced by Turkish universities terminology dictionary is formed automatically. Two existing methods in the literature are used and compared. One of the methods is modified for Turkish and extended to create a more comprehensive dictionary. Created system is able to automatically update itself with new terms using the new publications. Experiments evaluating the accuracy of the system is conducted using a manually built Informatics Terminology. The proposed algorithm is able to improve both the precision and recall of the system.