基于局部结构自表达的鲁棒演化聚类算法

doi:10.3969/j.issn.1005-3085.2024.06.002

工程数学学报 ›› 2024, Vol. 41 ›› Issue (6): 1006-1020.doi: 10.3969/j.issn.1005-3085.2024.06.002cstr: 32411.14.1005-3085.2024.06.002

基于局部结构自表达的鲁棒演化聚类算法

李春忠¹, 鞠文亮¹, 靖凯立², 桂扬³

1. 安徽财经大学统计与应用数学学院，蚌埠 233000
2. 西安交通大学数学与统计学院，西安 710049
3. 北京科技大学数理学院，北京 100083

收稿日期:2024-01-25 接受日期:2024-04-14 出版日期:2024-12-15 发布日期:2024-12-15
基金资助:
安徽省高校自然科学基金 (KJ2021A0481; KJ2021A0473).

A Robust Evolutionary Clustering Algorithm Based on Local Structure Self-expression

LI Chunzhong¹, JU Wenliang¹, JING Kaili², GUI Yang³

1. School of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu 233000
2. School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an 710049
3. School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083

Received:2024-01-25 Accepted:2024-04-14 Online:2024-12-15 Published:2024-12-15
Supported by:
The Natural Science Foundation of Colleges and Universities in Anhui Province (KJ2021A0481; KJ2021A0473).

摘要/Abstract

摘要：

聚类是一种无监督学习方法，它通过对样本特征分析度量数据间的相似性和差异性，利用簇内相似性高、簇间差异性大的特性对数据进行自动化分组，被广泛应用于计算机视觉、文本挖掘、生物信息等领域。聚类算法在鲁棒性、普适性、类数选择等方面存在提升空间，且算法的效果很大程度上受到数据集密度和流形的影响。提出了基于局部结构自表达的鲁棒演化聚类算法，该算法通过使用径向基函数并加入先验信息获取数据的局部密度差异特征，构建全新的相似性度量，在此过程融入了数据局部结构特征的提取机制和稳定类的识别机制，使聚类具有更好的鲁棒性和普适性。动态演化聚类在这两个方面有着天然的优势，可在动态的聚类过程中持续优化聚类结果，使得聚类效果得到了很大改进。新算法通过数据集结构信息自表达对局部和整体特征进行信息融合，同时在动态的演化过程中监控类的稳定性，从而得到更好的聚类结果。在人工数据集和真实数据集上的实验结果表明，新算法的聚类性能更优越。

关键词: 聚类, 相似性度量, 相对局部密度, 最近邻, 自表达

Abstract:

Clustering is an unsupervised learning method that measures the similarity and difference between data by analyzing sample features. It utilizes the characteristics of high intra cluster similarity and large inter cluster differences to automate the process of grouping data. It is widely used in fields such as computer vision, text mining, biological information and so on. There is still improvement room in clustering algorithms in terms of robustness, universality, and class number selection, and the effectiveness of the algorithms is largely influenced by the density and manifold of the dataset. This paper proposes a robust evolutionary clustering algorithm based on local structure self-expression. This algorithm uses radial basis functions and adds prior information to obtain local density difference features of the data, constructing a new similarity measure. In this process, the extraction mechanism of local structural features of data and the recognition mechanism of stable classes are integrated, making clustering more robust and universal. Dynamic evolutionary clustering has natural advantages in these two aspects, which can continuously optimize clustering results during the dynamic clustering process, resulting in significant improvements in clustering performance. The new algorithm integrates local and global features through self-expression of the structure information in the dataset, while monitoring the stability of the class during dynamic evolution, in order to obtain better final clustering results. The experimental results on both synthetic and real datasets demonstrate that the clustering performance of the new algorithm is superior.

Key words: clustering, similarity measurement, relative local density, nearest neighbor, self-expression

中图分类号:

TP301

李春忠, 鞠文亮, 靖凯立, 桂扬. 基于局部结构自表达的鲁棒演化聚类算法[J]. 工程数学学报, 2024, 41(6): 1006-1020.

LI Chunzhong, JU Wenliang, JING Kaili, GUI Yang. A Robust Evolutionary Clustering Algorithm Based on Local Structure Self-expression[J]. Chinese Journal of Engineering Mathematics, 2024, 41(6): 1006-1020.

[1]	宫衍圣, 蔡科平, 王志强, 李鑫鑫, 靖稳峰. 基于机器学习的文本半自动类别标注方法[J]. 工程数学学报, 2021, 38(6): 750-762.
[2]	王玉学, 汪子强. 地下物流系统网络数学模型构建[J]. 工程数学学报, 2020, 37(6): 664-672.
[3]	李春忠, 靖稳峰, 徐健. 基于多尺度信息融合的层次聚类算法[J]. 工程数学学报, 2019, 36(3): 245-255.
[4]	张旭, 陈冬东, 叶志强, 李启明, 谢建平. 从多组学数据挖掘结核病关键基因的综合策略[J]. 工程数学学报, 2018, 35(5): 515-522.
[5]	徐朱佳, 谢锐, 刘嘉, 梅玉. 隐马尔科夫模型的改进及其在金融预测中的应用[J]. 工程数学学报, 2017, 34(5): 469-478.
[6]	李春忠, 郑玉棒, 汪婷. 粗略不相似度量及其在层次聚类中的应用[J]. 工程数学学报, 2017, 34(4): 354-366.

基于局部结构自表达的鲁棒演化聚类算法

A Robust Evolutionary Clustering Algorithm Based on Local Structure Self-expression

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 6

编辑推荐

Metrics

本文评价