Publish In
International Journal of Advances in Electronics and Computer Science-IJAECS
Journal Home
Volume Issue
Volume-6,Issue-10  ( Oct, 2019 )
Paper Title
Imbalanced Datasets in Defect Prediction
Author Name
Ebubeogu Amarachukwu Felix, Akanwa Chinedu Smith
Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia 50603, Faculty of Business and Management Technology, University Kuala Lumpur, Kuala Lumpur, Malaysia 50250
Data pre-processing is important in software defect prediction. Despite achievements in defect prediction, data reliability remains an issues because of class imbalance; the success of a prediction study relies on the quality of data utilized. In this paper, we present a data pre-processing technique that can accurately identify defective and defect-free modules in a dataset and renders the dataset suitable for defect classification. We applied a top-down technique that considers datasets as a unit to identify both the defective and defect-free classes on 10 projects from the PROMISE repository. The support vector machine classifier achieved an average classification accuracy and specificity of 89.78% and 98.90%; the neural network classifier achieved an area under the receiver operating characteristic curve, Brier score, MCC, precision, and g-mean of 83.53%, 15.12%, 34.37%, 63.04%, and 41.87%; respectively; the naïve Bayes classifier achieved a recall, and a J-coefficient of 78.53% and 31.89%, respectively, and the K-nearest-neighbors classifier achieved an average information score of 36.14%. This manuscript calls for the need to properly pre- process datasets before they are applied in machine learning studies to avoid misleading results. Keywords - Machine Learning, Data Pre-processing, Classification Algorithms, Defect Prediction, Imbalanced Data.
  View Paper