Reservoir-based sampling over large graph streams to estimate triangle counts and node degrees

Authors:Lingling Zhang; Hong Jiang; Fang Wang; Dan Feng; Yanwen Xie
Source:Future Generation Computer Systems-The International Journal of Grid Computing Theory Methods and Applications, 2020, 108: 244-255.
DOI:10.1016/j.future.2020.02.077

Summary


Reservoir sampling is widely employed to characterize large graph streams by producing edge samples. However, existing reservoir-based sampling methods mainly focus on counting triangles but perform poorly in analyzing topological characteristics reflected by node degrees. This paper proposes a new method, called triangle-induced reservoir sampling, or T-Sample, to count triangles and estimate node degrees simultaneously and efficiently. While every edge in a graph stream is processed only once by T-Sample, a dual sampling mechanism performing both uniform sampling and non-uniform sampling is carefully designed. Specifically, T-Sample’s uniform sampling is used to count triangles by a newly proposed method with smaller estimation variances than existing reservoir-based sampling methods; whereas, its non-uniform sampling ensures that edge samples are connected. Experimental results driven by real datasets show that T-Sample can count triangles with smaller estimation errors and variances than the state-of-the-art reservoir-based sampling methods while obtaining much more accurate information about node degrees at smaller time and memory costs.

  • Institution
    The University of Texas

Full-Text