ScholarMate
客服热线:400-1616-289

TCSA: Efficient Localization of Busy-Wait Synchronization Bugs for Latency-Critical Applications

Li, Ning; Guo, Jianmei*; Huang, Bo; Li, Yuyang; Zhang, Yilei; Li, Chengdong; Huang, Wenxin
Science Citation Index Expanded
-

摘要

Busy-wait synchronization is often used for latency-critical applications to ensure low latency. Unfortunately, its performance bugs due to thread contention may lead to request failures or even system crashes. Localizing the performance bugs of busy-wait synchronization is not trivial because we have to pinpoint the exact moment of occurrence from a relatively long measurement period and simultaneously identify candidate busy-wait threads from numerous concurrent threads. Existing methods often rely on hotspot-driven analysis of lock-related functions, but they still need extensive manual work to localize busy-wait threads. This paper proposes timing call stack analysis (TCSA), an efficient approach to localizing busy-wait synchronization bugs. The key idea is to time-serialize the function call stacks of applications and identify consecutive identical call stacks to catch busy-wait threads. TCSA can handle any application regardless of its programming language and identify various busy-wait patterns, including spinlocks, chaining spinlocks, futexes, and safepoint checks within the Java Virtual Machine. Compared to the state-of-the-art, TCSA can effectively diminish the quantity of examined records (e.g., threads and functions) by 1 to 3 orders of magnitude. TCSA has been deployed to a large cloud service provider, demonstrating its effectiveness, efficiency, and practicality in four real latency-critical applications.

关键词

Synchronization Computer bugs Java Virtual machining Time factors Monitoring Location awareness Busy-wait synchronization latency-critical applications performance bug localization timing call stack analysis