DOIONLINE :: HOME

Publish In	International Journal of Advances in Electronics and Computer Science-IJAECS	Journal Home Volume Issue
Issue	Volume-6,Issue-4 ( Apr, 2019 )
Paper Title	Efficiently Distributed Representation of Words and Phrases using Negative Sampling for Regional Languages
Author Name	Venkatakrishnan K, Tejas Kaushik, Vijeth Nandan, Rahul Agrawal, S Natarajan
Affilition	PES University, Bangalore, India Principal Machine Learning Manager, Microsoft, Bangalore, India Professor, PES University, Bangalore, India
Pages	70-72
Abstract	This paper introduces the concept of multilingual word semantic similarity which helps in measuring semantic similarity of word pairs within languages: English, Hindi, Kannada, German, Italian. The model was trained efficiently with high quality datasets. The total dataset size used for all the five languages is about 90GB. This paper proposes a computationally efficient technique of measuring semantic similarity of word pairs by building a neural network model. This paper also introduces the idea of negative sampling in order to improve the accuracy of the model. We also propose a technique to detect phrases in order to improve our models accuracy. The results obtained show that combining statistical knowledge from text corpus (word embeddings) give very high accuracy. Keywords - Word Embeddings, Negative Sampling, Phrase Detection.
	View Paper

Need advice?