Summary
Random-walk based sampling methods have been widely employed to characterize social networks. However, existing random-walk based sampling methods cause inaccuracies in estimating the degree structure and high sampling costs in estimating clique structures. In this paper, we propose a dual random-walk based sampling method, called DRaWS by designing a dual residence of the random walker, to estimate both the distributions of degree and clique size with low costs. The key idea behind DRaWS is that it leverages the many-to-one formation between many nodes and one clique in a large graph to shorten the sampling paths and thus reduce the sampling costs greatly while reflecting the different sampling probabilities of the two types of node structures. Meanwhile, DRaWS employs the one-to-many representativeness between one node and many nodes in a clique to improve the quality of samples. Furthermore, two re-weighted estimators for DRaWS’s process are proposed to estimate the two different node structures. Experimental evaluation driven by real graph datasets shows that DRaWS drastically cuts down the sampling costs of the state-of-the-art methods while increasing the accuracy when estimating both the degree and clique structural properties.
-
InstitutionThe University of Texas