摘要

Classification with incomplete data is practically challenging since missing data hinders classification. The two-stage strategy of filling missing values first and then classifying complete data degenerates performance due to the separate optimization of classification and imputation. While efforts have been made to classify incomplete data directly without imputation, the benefits of imputation on classification have been sacrificed. To this end, this paper proposes a Category -Aware Optimal Transport Neural Network (CAOT-NN) for incomplete data classification. Specifically, we design a category-aware optimal transport method conditioned on intra-category data distribution, which explicitly employs the category information of each sample and fills in discriminative values for the downstream classification module. Moreover, we reconstruct the observed values during the imputation process when performing classification learning, thus implicitly utilizing the category information of the classifier for missing values estimation. Extensive experiments conducted on 16 real-world and 30 synthetic datasets from the UCI repository demonstrate the superiority of CAOT-NN against existing state-of-the-art methods. Also, the visualization analysis reveals that CAOT-NN enables a tighter intra-category distribution of the imputed data, implying that the missing mechanism may be utilized as a data augmentation strategy to boost accuracy.