摘要
Cross-project defect prediction (CPDP) is a promising approach to help to allocate testing efforts efficiently and guarantee software reliability in the early software lifecycle. A CPDP method usually trains a software defect classifier based on labeled data sets. Then the trained classifier can predict new projects without labeled data. Most previous CPDP techniques focused on manually designing handcrafted features. However, these handcrafted features ignore the programs & x2019; semantic information. Moreover, some other existing defect prediction approaches learned semantic features from source code to build classifiers directly. However, they did not consider the distribution divergence between source and target projects. To address these limitations, we put forward a new method called Adversarial Discriminative Convolutional Neural Network (ADCNN). It can extract the transferable semantic features from source code for CPDP tasks. Specifically, we first parse source files into token vectors and then map them to integer vectors via word embedding. Second, we combine adversarial learning with discriminative feature learning to train the ADCNN model. The key of the ADCNN model is to learn the discriminative mapping of the target project to the source feature space by deceiving a domain discriminator. A domain discriminator tries to distinguish the target project files from the source project files. Finally, we use the extracted transferable semantic features to build a classifier for CPDP tasks. We evaluate our method on ten benchmark projects in terms of F-measure, AUC, and PofB20 (an effort-aware evaluation metric). The experimental results demonstrate that our ADCNN method performs better compared with other related CPDP methods.
