TraceGra: A trace-based anomaly detection for microservice using graph deep learning

作者:Chen, Jian; Liu, Fagui*; Jiang, Jun*; Zhong, Guoxiang; Xu, Dishi; Tan, Zhuanglun; Shi, Shangsong
来源:COMPUTER COMMUNICATIONS, 2023, 204: 109-117.
DOI:10.1016/j.comcom.2023.03.028

摘要

Trace is widely used to detect anomalies in distributed microservice systems because of the capability of precisely reconstructing user request paths. However, most existing trace-based anomaly detection approaches treat the trace as a sequence of microservice invocations with response time information, which ignores the graph structure of trace and abnormal resource consumption of the complex distributed deployment environment of microservice. In this paper, we propose TraceGra, an unsupervised encoder-decoder anomaly detection approach. TraceGra first provides a unified graph representation to combine traces and performance metrics of the container. Then, it introduces the graph neural network (GNN) and long short-term memory network (LSTM) to extract the topology and temporal features, respectively. Finally, it adds the two-part loss value with two hyperparameters as the anomaly score. The evaluation results on an open-source dataset and a local dataset collected from an ARM server cluster show that TraceGra achieves a high precision (0.97) and recall (0.93), outperforming some state-of-the-art approaches with an average increase of 0.1 in F1-score.