粗略不相似度量及其在层次聚类中的应用

doi:10.3969/j.issn.1005-3085.2017.04.003

工程数学学报 ›› 2017, Vol. 34 ›› Issue (4): 354-366.doi: 10.3969/j.issn.1005-3085.2017.04.003

粗略不相似度量及其在层次聚类中的应用

李春忠, 郑玉棒, 汪婷

安徽财经大学统计与应用数学学院，蚌埠 233030

收稿日期:2015-12-21 接受日期:2016-07-27 出版日期:2017-08-15 发布日期:2017-10-15
基金资助:
国家自然科学基金(61305070)；国家重点基础研究发展规划973计划(2013CB329404)；安徽省自然科学基金(KJ2015A076).

Rough Dissimilarity Measurement and Its Application in Hierarchical Clustering

LI Chun-zhong, ZHENG Yu-bang, WANG Ting

Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu 233030

Received:2015-12-21 Accepted:2016-07-27 Online:2017-08-15 Published:2017-10-15
Supported by:
The National Natural Science Foundation of China (61305070); the National Program on Key Basic Research Project 973 Program (2013CB329404); the Natural Science Foundation of Anhui Province (KJ2015A076).

摘要/Abstract

摘要： 局部结构特征在数据分析过程中具有重要的作用．为获得简单有效的数据集局部结构化特征检测方法，本文结合重采样误差分析和传统的近邻选择方法提出了一种检测局部结构特征的方向一致性度量---粗略不相似性度量．该度量是一种优化的近邻选择方法，不仅考虑了传统的欧氏距离排序，而且考虑了局部方向结构特征．因其计算和存储复杂度小以及具有优越的结构检测性能，可应用于无监督学习形成一种层次化的子图聚类算法---RDClust，与经典聚类算法相比，其优势在于：一是计算复杂度较小，是近似线性算法；二是无需对类的形状和分布形式做任何的假设，可自动体现数据集的局部结构；三是有一个近邻参数，且该参数对结果较鲁棒．在人工和真实数据集上的实验显示了新的度量方式应用于新算法的优越性能．

关键词: 聚类, 近邻域, 第$k$个近邻点连接, 层次连接图

Abstract: Local structural feature is important in data analysis procedure. In order to obtain a simple and effective feature detection method for data set's local structures, this paper proposed for detecting local structure a direction consistence measurement, rough dissimilarity, by combing re-sampling and a classical neighborhood selection method. This measurement is a optimized selection method for neighborhood, which considers not only the classical sorting method based on Euclidean distance but also the local structures of the data set. The new dissimilarity measurement can be used in unsupervised learning to construct a hierarchical subgraph clustering, RDClust, because of the advantages of a low computation load and a good direction structure detection performance. The new clustering based on direction consistence measurement has three advantages: 1) It has a low computation load and is an approximately linear method; 2) It needs no assumption for the shape and the distribution of cluster, and can detect local structures of a data set automatically; 3) It has only one parameter which is relatively robust to clustering results. The new clustering based on direction consistence dissimilarity has good performance in testing with synthetic and real data sets.

Key words: clustering, nearest neighborhood, $k$-th nearest neighbor connection, hierarchical connection graph

中图分类号:

TP301

李春忠, 郑玉棒, 汪婷. 粗略不相似度量及其在层次聚类中的应用[J]. 工程数学学报, 2017, 34(4): 354-366.

LI Chun-zhong, ZHENG Yu-bang, WANG Ting. Rough Dissimilarity Measurement and Its Application in Hierarchical Clustering[J]. Chinese Journal of Engineering Mathematics, 2017, 34(4): 354-366.

[1]	李春忠, 靖稳峰, 徐健. 基于多尺度信息融合的层次聚类算法[J]. 工程数学学报, 2019, 36(3): 245-255.
[2]	张旭, 陈冬东, 叶志强, 李启明, 谢建平. 从多组学数据挖掘结核病关键基因的综合策略[J]. 工程数学学报, 2018, 35(5): 515-522.
[3]	徐朱佳, 谢锐, 刘嘉, 梅玉. 隐马尔科夫模型的改进及其在金融预测中的应用[J]. 工程数学学报, 2017, 34(5): 469-478.

粗略不相似度量及其在层次聚类中的应用

Rough Dissimilarity Measurement and Its Application in Hierarchical Clustering

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 3

编辑推荐

Metrics

本文评价