DOIONLINE

DOIONLINE NO - IJAECS-IRAJ-DOIONLINE-15752

Publish In
International Journal of Advances in Electronics and Computer Science-IJAECS
Journal Home
Volume Issue
Issue
Volume-6,Issue-4  ( Apr, 2019 )
Paper Title
Efficiently Distributed Representation of Words and Phrases using Negative Sampling for Regional Languages
Author Name
Venkatakrishnan K, Tejas Kaushik, Vijeth Nandan, Rahul Agrawal, S Natarajan
Affilition
PES University, Bangalore, India Principal Machine Learning Manager, Microsoft, Bangalore, India Professor, PES University, Bangalore, India
Pages
70-72
Abstract
This paper introduces the concept of multilingual word semantic similarity which helps in measuring semantic similarity of word pairs within languages: English, Hindi, Kannada, German, Italian. The model was trained efficiently with high quality datasets. The total dataset size used for all the five languages is about 90GB. This paper proposes a computationally efficient technique of measuring semantic similarity of word pairs by building a neural network model. This paper also introduces the idea of negative sampling in order to improve the accuracy of the model. We also propose a technique to detect phrases in order to improve our models accuracy. The results obtained show that combining statistical knowledge from text corpus (word embeddings) give very high accuracy. Keywords - Word Embeddings, Negative Sampling, Phrase Detection.
  View Paper