DOIONLINE

DOIONLINE NO - IJAECS-IRAJ-DOIONLINE-4853

Publish In
International Journal of Advances in Electronics and Computer Science-IJAECS
Journal Home
Volume Issue
Issue
Volume-3,Issue-6  ( Jun, 2016 )
Paper Title
Parallel Implementation Of K-Means Algorithm Using Hadoop
Author Name
Jerril Mathson Mathew, Jyothis Joseph
Affilition
M.Tech Scholar, Department of CSE, College of Engineering Kidangoor Asst. Prof., Department of CSE, College of Engineering Kidangoor
Pages
150-153
Abstract
Clustering is regarded as one of the momentous task in data mining which deals with primarily grouping of similar data. To cluster large data is a point of concern. In recent years, data clustering has been studied extensively and a lot of methods and theories have been achieved. Hadoop is a software framework which deals with distributed processing of vast amount of data across groups of distributed computers using Map-Reduce programming model. The Map-Reduce computing model have two phases; a map phase and a reduce phase. The map phase calculates the distances between each point and each cluster and allots each point to its nearest cluster. All the points which belong to the same cluster are sent to a single reduce phase. The reduce phase calculates the new cluster centers for the next Map-Reduce job. Map-Reduce allows a kind of parallelization to solve a problem that involves large datasets using computing clusters and is also a striking implication for data clustering involving large datasets. This paper focuses on studying the parallel implementation of KMeans clustering algorithm using Map-Reduce computing model of Hadoop on different datasets. Keywords— Data Mining, Data Clustering, Parallel Computing, Map-Reduce, K-Means algorithm, Hadoop, HDFS, Machine Learning.
  View Paper