DOIONLINE :: HOME

DOIONLINE NO - IJAECS-IRAJ-DOIONLINE-16371

Publish In	International Journal of Advances in Electronics and Computer Science-IJAECS									Journal Home Volume Issue
Issue	Volume-6,Issue-10 ( Oct, 2019 )
Paper Title	Imbalanced Datasets in Defect Prediction
Author Name	Ebubeogu Amarachukwu Felix, Akanwa Chinedu Smith
Affilition	Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia 50603, Faculty of Business and Management Technology, University Kuala Lumpur, Kuala Lumpur, Malaysia 50250
Pages	18-25
Abstract	Data pre-processing is important in software defect prediction. Despite achievements in defect prediction, data reliability remains an issues because of class imbalance; the success of a prediction study relies on the quality of data utilized. In this paper, we present a data pre-processing technique that can accurately identify defective and defect-free modules in a dataset and renders the dataset suitable for defect classification. We applied a top-down technique that considers datasets as a unit to identify both the defective and defect-free classes on 10 projects from the PROMISE repository. The support vector machine classifier achieved an average classification accuracy and specificity of 89.78% and 98.90%; the neural network classifier achieved an area under the receiver operating characteristic curve, Brier score, MCC, precision, and g-mean of 83.53%, 15.12%, 34.37%, 63.04%, and 41.87%; respectively; the naïve Bayes classifier achieved a recall, and a J-coefficient of 78.53% and 31.89%, respectively, and the K-nearest-neighbors classifier achieved an average information score of 36.14%. This manuscript calls for the need to properly pre- process datasets before they are applied in machine learning studies to avoid misleading results. Keywords - Machine Learning, Data Pre-processing, Classification Algorithms, Defect Prediction, Imbalanced Data.
	View Paper