Publish In
International Journal of Advances in Electronics and Computer Science-IJAECS
Journal Home
Volume Issue
Volume-6,Issue-4  ( Apr, 2019 )
Paper Title
Efficiently Distributed Representation of Words and Phrases using Negative Sampling for Regional Languages
Author Name
Venkatakrishnan K, Tejas Kaushik, Vijeth Nandan, Rahul Agrawal, S Natarajan
PES University, Bangalore, India Principal Machine Learning Manager, Microsoft, Bangalore, India Professor, PES University, Bangalore, India
This paper introduces the concept of multilingual word semantic similarity which helps in measuring semantic similarity of word pairs within languages: English, Hindi, Kannada, German, Italian. The model was trained efficiently with high quality datasets. The total dataset size used for all the five languages is about 90GB. This paper proposes a computationally efficient technique of measuring semantic similarity of word pairs by building a neural network model. This paper also introduces the idea of negative sampling in order to improve the accuracy of the model. We also propose a technique to detect phrases in order to improve our models accuracy. The results obtained show that combining statistical knowledge from text corpus (word embeddings) give very high accuracy. Keywords - Word Embeddings, Negative Sampling, Phrase Detection.
  View Paper