DOIONLINE :: HOME

DOIONLINE NO - IJAECS-IRAJ-DOIONLINE-4853

Publish In	International Journal of Advances in Electronics and Computer Science-IJAECS									Journal Home Volume Issue
Issue	Volume-3,Issue-6 ( Jun, 2016 )
Paper Title	Parallel Implementation Of K-Means Algorithm Using Hadoop
Author Name	Jerril Mathson Mathew, Jyothis Joseph
Affilition	M.Tech Scholar, Department of CSE, College of Engineering Kidangoor Asst. Prof., Department of CSE, College of Engineering Kidangoor
Pages	150-153
Abstract	Clustering is regarded as one of the momentous task in data mining which deals with primarily grouping of similar data. To cluster large data is a point of concern. In recent years, data clustering has been studied extensively and a lot of methods and theories have been achieved. Hadoop is a software framework which deals with distributed processing of vast amount of data across groups of distributed computers using Map-Reduce programming model. The Map-Reduce computing model have two phases; a map phase and a reduce phase. The map phase calculates the distances between each point and each cluster and allots each point to its nearest cluster. All the points which belong to the same cluster are sent to a single reduce phase. The reduce phase calculates the new cluster centers for the next Map-Reduce job. Map-Reduce allows a kind of parallelization to solve a problem that involves large datasets using computing clusters and is also a striking implication for data clustering involving large datasets. This paper focuses on studying the parallel implementation of KMeans clustering algorithm using Map-Reduce computing model of Hadoop on different datasets. Keywords— Data Mining, Data Clustering, Parallel Computing, Map-Reduce, K-Means algorithm, Hadoop, HDFS, Machine Learning.
	View Paper