A Self Training Mechanism With Scanty and Incompletely Annotated Samples for Learning-Based Cloud Detection in Whole Sky Images

摘要

Cloud detection is one of important tasks in automatic ground-based cloud observation systems with ground-based cloud images. Most supervised methods need substantial annotated samples for model training, while the pixel-level annotation costs a lot. In this letter, a self-training mechanism is proposed to significantly reduce the requirement of annotated samples. With a number of original images, only a few images need to be annotated (even incompletely), and a local region classifier model can be initialized with the annotated samples. Then the model is retrained iteratively using unlabeled samples with high confidence pseudo labels given by a fusion decision. The finely trained model can classify the local regions into "cloud" or "sky". The experiments show that the proposed mechanism is effective for several classifiers. The proposed method can outperform unsupervised methods and achieve comparable results with fully supervised learning methods but using much fewer annotated samples. @@@ Plain Language Summary Nowadays, most ground-based sky observation tasks are completed by automatic systems with sky-imaging devices instead of human observers. Cloud detection is one of the tasks and often appeals to image segmentation. Some unsupervised image segmentation methods are fast but the results of them are not good enough. So some researches divide the image local regions into "sky" and "cloud" through the two-class classification models based on supervised learning and can achieve better results. However, the learning-based models training needs substantial samples which are annotated finely by experts. While finely annotating the cloud images is a high cost and low efficiency work in practice. We propose a self-training mechanism which can reduce the requirement of annotated samples for several learning-based models. At first, we trained the two-class classification models with only a few annotated samples for initiation. Then the models were trained iteratively with lots of unlabeled raw images based on our propose mechanism. Finally, the fine trained models can be directly applied to cloud detection and outperform unsupervised segmentation methods. What's more, compared with the supervised methods, the proposed method can obtain comparable results with much fewer annotated samples (even fewer than 10%).

关键词

CLASSIFICATION SYSTEM