摘要
Recent studies indicate that a classifier is vulnerable in an adversarial environment. The label flipping attack aims to mislead the training process. Some countermeasures have been proposed, but are usually designed for a particular classifier only, or may cause information loss. This study aims to investigate a generic model which fully utilizes the contaminated samples in learning. We assume a small untainted dataset is obtained from an application in addition to a contaminated dataset. The adversarial learning problem is formulated as transfer learning in which the influence of contaminated samples is reduced by only extracting the information similar to the untainted samples from the contaminated set using transfer learning. Our study considers a popular method, TrAdaBoost, and indicates that its performance is closely related to the initialized weights of samples. A initialization method is devised specifically for an adversarial setting to avoid assigning a large weight to the contaminated samples. The experimental results confirm that TrAdaBoost extracts only the benign knowledge from the contaminated set successfully. Moreover, our proposed initialization method significantly enhances the robustness of the model. This study presents a promising direction using transfer learning to defend against poisoning attacks.
- 单位