摘要

Clustering is a classical research field due to its broad applications in data mining such as emotion detection, event extraction and topic discovery. It aims to discover intrinsic patterns which can be formed as clusters from a collection of data. Significant progress have been made by the Density-based Spatial Clustering of Applications with Noise (DBSCAN) and its variants. However, there is a major limitation that current density-based algorithms suffer from linear connection problem, where they perform poorly to discriminate objective clusters which are "connected" by a few data points. Moreover, the parameter setting and the time cost make it hard to be well-adapted in massive data analysis. To address these problems, we propose a novel adaptive density-based spatial clustering algorithm called Ada-DBSCAN, which consists of a data block splitter and a data block merger, coordinated by local clustering and global clustering. We conduct extensive experiments on both artificial and real-world datasets to evaluate the effectiveness of Ada-DBSCAN. Experimental results show that our algorithm evidently outperforms several strong baselines in both clustering accuracy and human evaluation. Besides, Ada-DBSCAN shows significant improvement of efficiency compared with DBSCAN.