摘要
Context: In host-based anomaly detection, feature extraction on the system call traces is important to build an effective anomaly detection model. Different kinds of feature extraction methods are recently proposed and most of them aim at preserving the positional information of the system calls within a trace. These extracted features are generally named from system calls, therefore, cannot be used directly in the case of cross platform applications. In addition, some of these feature extraction methods are very costly to implement. @@@ Objective: This paper presents a new feature extraction method. It aims at extracting features that are irrelevant to the names of system calls. The samples represented by the extracted features can be directly used in the case of cross platform applications. In addition, this method is lightweight in that the feature values are not expensive to compute. @@@ Method: The proposed method firstly transforms the system calls in a trace into frequency sequences of n-grams and then explores a fixed number of statistical features on the frequency sequences. The extracted features are irrelevant to the names/indexes of system calls on a platform. The calculation of feature values works on the frequency sequences rather than on system call sequences. These feature vectors built on the training set with only normal data are then used to train a one class classification model for anomaly detection. @@@ Results: We compared our method with four previously proposed feature extraction methods on system call traces. When used on the same platform, even though our method does not always obtain the highest AUC, overall, it performs better than all the compared methods. When testing on cross platform, it performs the best among all compared methods. @@@ Conclusion: The features extracted by our method are platform-independent and are suitable for anomaly detection across platforms.
- 
                                单位广东药学院
