Publish In |
International Journal of Advances in Electronics and Computer Science-IJAECS |
![]() Journal Home Volume Issue |
||||||||
Issue |
Volume-6,Issue-4 ( Apr, 2019 ) | |||||||||
Paper Title |
Efficiently Distributed Representation of Words and Phrases using Negative Sampling for Regional Languages | |||||||||
Author Name |
Venkatakrishnan K, Tejas Kaushik, Vijeth Nandan, Rahul Agrawal, S Natarajan | |||||||||
Affilition |
PES University, Bangalore, India Principal Machine Learning Manager, Microsoft, Bangalore, India Professor, PES University, Bangalore, India | |||||||||
Pages |
70-72 | |||||||||
Abstract |
This paper introduces the concept of multilingual word semantic similarity which helps in measuring semantic similarity of word pairs within languages: English, Hindi, Kannada, German, Italian. The model was trained efficiently with high quality datasets. The total dataset size used for all the five languages is about 90GB. This paper proposes a computationally efficient technique of measuring semantic similarity of word pairs by building a neural network model. This paper also introduces the idea of negative sampling in order to improve the accuracy of the model. We also propose a technique to detect phrases in order to improve our models accuracy. The results obtained show that combining statistical knowledge from text corpus (word embeddings) give very high accuracy. Keywords - Word Embeddings, Negative Sampling, Phrase Detection. | |||||||||
View Paper |