Summary
Software defect prediction (SDP) is a very important way for analyzing software quality and reducing development costs. The data during software lifecycle can be used to predict software defect. Currently, many SDP models have been proposed; however, their performance was not always ideal. In many existing prediction models based on machine learning, the distance metric between samples has significant impact on the performance of the SDP model. In addition, most samples are usually class imbalanced. To solve these issues, in this paper, a novel distance metric learning based on cost-sensitive learning (CSL) is proposed for reducing the impact of class imbalance of samples, which is then applied to the large margin distribution machine (LDM) to substitute the traditional kernel function. Further, the improvement and optimization of LDM based on CSL are also studied, and the improved LDM is used as the SDP model, called as CS-ILDM. Subsequently, the proposed CS-ILDM is applied to five publicly available data sets from the NASA Metrics Data Program repository and its performance is compared to other existing SDP models. The experimental results confirm that the proposed CS-ILDM not only has good prediction performance, but also can reduce the misprediction cost and avoid the impact of class imbalance of samples.