Association Journal of CSIAM
Supervised by Ministry of Education of PRC
Sponsored by Xi'an Jiaotong University
ISSN 1005-3085  CN 61-1269/O1

Chinese Journal of Engineering Mathematics ›› 2017, Vol. 34 ›› Issue (4): 354-366.doi: 10.3969/j.issn.1005-3085.2017.04.003

Previous Articles     Next Articles

Rough Dissimilarity Measurement and Its Application in Hierarchical Clustering

LI Chun-zhong,   ZHENG Yu-bang,   WANG Ting   

  1. Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu 233030
  • Received:2015-12-21 Accepted:2016-07-27 Online:2017-08-15 Published:2017-10-15
  • Supported by:
    The National Natural Science Foundation of China (61305070); the National Program on Key Basic Research Project 973 Program (2013CB329404); the Natural Science Foundation of Anhui Province (KJ2015A076).

Abstract: Local structural feature is important in data analysis procedure. In order to obtain a simple and effective feature detection method for data set's local structures, this paper proposed for detecting local structure a direction consistence measurement, rough dissimilarity, by combing re-sampling and a classical neighborhood selection method. This measurement is a optimized selection method for neighborhood, which considers not only the classical sorting method based on Euclidean distance but also the local structures of the data set. The new dissimilarity measurement can be used in unsupervised learning to construct a hierarchical subgraph clustering, RDClust, because of the advantages of a low computation load and a good direction structure detection performance. The new clustering based on direction consistence measurement has three advantages: 1) It has a low computation load and is an approximately linear method; 2) It needs no assumption for the shape and the distribution of cluster, and can detect local structures of a data set automatically; 3) It has only one parameter which is relatively robust to clustering results. The new clustering based on direction consistence dissimilarity has good performance in testing with synthetic and real data sets.

Key words: clustering, nearest neighborhood, $k$-th nearest neighbor connection, hierarchical connection graph

CLC Number: