Improving Word Representation by Tuning Word2Vec Parameters with Deep Learning Model

TEZGİDER M., Yildiz B., AYDIN G.

International Conference on Artificial Intelligence and Data Processing (IDAP), Malatya, Türkiye, 28 - 30 Eylül 2018, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası:
Basıldığı Şehir: Malatya
Basıldığı Ülke: Türkiye
Hacettepe Üniversitesi Adresli: Evet

Özet

Deep learning has become one of the most popular machine learning methods. The success in the text processing, analysis and classification has been significantly enhanced by using deep learning. This success is contributed by the quality of the word representations. TFIDF, FastText, Glove and Word2Vec are used for the word representation. In this work, we aimed to improve word representations by tuning Word2Vec parameters. The success of the word representations was measured by using a deep learning classification model. The minimum word count, vector size and window size parameters of Word2Vec were used for the measurement. 2,8 million Turkish texts consisting of 243 million words to create word embedding (word representations) and around 263 thousand documents consisting of 15 different classes for classification were used. We observed that correctly selected parameters increased the word representation quality and thus the accuracy of classification.