Efficient m-closest entity matching over heterogeneous information networks

Authors:Long, Wancheng; Li, Xiaowen; Wang, Liping*; Zhang, Fan; Lin, Zhe; Lin, Xuemin
Source:Knowledge-Based Systems, 2023, 263: 110299.
DOI:10.1016/j.knosys.2023.110299

Summary

This work investigates a novel m-closest entity (mCE) matching problem over geographic het-erogeneous information networks (Geo-HINs). That is, given a Geo-HIN G and m query graphs {q1, q2, ... , qm}, mCE matching aims to find a group of geographic entities (geo-entities) whose patterns match the query graphs {q1, q2, ... , qm} correspondingly, for which the maximum distance between any geo-entity pair (i.e., the diameter) in the group is minimized. As a fundamental problem, the mCE matching can be applied for many scenarios, e.g., travel itinerary recommendation and city planning. The existing works have not simultaneously considered the characteristics of patterns match-ing and spatial search so that they cannot solve our problem, which is computationally expensive. To solve this problem efficiently, we propose a unified framework named Fuzzy - Exact framework to process entity matching and spatial search comprehensively, in which pruning abilities at non -spatial and spatial layers are cooperatively explored. Two mutually adaptive auxiliary data structures named Arc - Tree and Arc - Forest are devised to maintain the intermediate search results which are exploited to enhance the search process between non-spatial and spatial layers. Experimental results demonstrate that our algorithm can outperform the baseline methods by 2 orders of magnitude on runtime.

  • Institution
    Fudan University; guangzhou university; Shanghai Jiao Tong University

Full-Text