在线咨询
中国工业与应用数学学会会刊
主管:中华人民共和国教育部
主办:西安交通大学
ISSN 1005-3085  CN 61-1269/O1

工程数学学报 ›› 2024, Vol. 41 ›› Issue (6): 1006-1020.doi: 10.3969/j.issn.1005-3085.2024.06.002

• • 上一篇    下一篇

基于局部结构自表达的鲁棒演化聚类算法

李春忠1,  鞠文亮1,  靖凯立2,  桂  扬3   

  1. 1. 安徽财经大学统计与应用数学学院,蚌埠 233000
    2. 西安交通大学数学与统计学院,西安 710049
    3. 北京科技大学数理学院,北京 100083
  • 收稿日期:2024-01-25 接受日期:2024-04-14 出版日期:2024-12-15 发布日期:2024-12-15
  • 基金资助:
    安徽省高校自然科学基金 (KJ2021A0481; KJ2021A0473).

A Robust Evolutionary Clustering Algorithm Based on Local Structure Self-expression

LI Chunzhong1,  JU Wenliang1,  JING Kaili2,  GUI Yang3   

  1. 1. School of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu 233000
    2. School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an 710049
    3. School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083
  • Received:2024-01-25 Accepted:2024-04-14 Online:2024-12-15 Published:2024-12-15
  • Supported by:
    The Natural Science Foundation of Colleges and Universities in Anhui Province (KJ2021A0481; KJ2021A0473).

摘要:

聚类是一种无监督学习方法,它通过对样本特征分析度量数据间的相似性和差异性,利用簇内相似性高、簇间差异性大的特性对数据进行自动化分组,被广泛应用于计算机视觉、文本挖掘、生物信息等领域。聚类算法在鲁棒性、普适性、类数选择等方面存在提升空间,且算法的效果很大程度上受到数据集密度和流形的影响。提出了基于局部结构自表达的鲁棒演化聚类算法,该算法通过使用径向基函数并加入先验信息获取数据的局部密度差异特征,构建全新的相似性度量,在此过程融入了数据局部结构特征的提取机制和稳定类的识别机制,使聚类具有更好的鲁棒性和普适性。动态演化聚类在这两个方面有着天然的优势,可在动态的聚类过程中持续优化聚类结果,使得聚类效果得到了很大改进。新算法通过数据集结构信息自表达对局部和整体特征进行信息融合,同时在动态的演化过程中监控类的稳定性,从而得到更好的聚类结果。在人工数据集和真实数据集上的实验结果表明,新算法的聚类性能更优越。

关键词: 聚类, 相似性度量, 相对局部密度, 最近邻, 自表达

Abstract:

Clustering is an unsupervised learning method that measures the similarity and difference between data by analyzing sample features. It utilizes the characteristics of high intra cluster similarity and large inter cluster differences to automate the process of grouping data. It is widely used in fields such as computer vision, text mining, biological information and so on. There is still improvement room in clustering algorithms in terms of robustness, universality, and class number selection, and the effectiveness of the algorithms is largely influenced by the density and manifold of the dataset. This paper proposes a robust evolutionary clustering algorithm based on local structure self-expression. This algorithm uses radial basis functions and adds prior information to obtain local density difference features of the data, constructing a new similarity measure. In this process, the extraction mechanism of local structural features of data and the recognition mechanism of stable classes are integrated, making clustering more robust and universal. Dynamic evolutionary clustering has natural advantages in these two aspects, which can continuously optimize clustering results during the dynamic clustering process, resulting in significant improvements in clustering performance. The new algorithm integrates local and global features through self-expression of the structure information in the dataset, while monitoring the stability of the class during dynamic evolution, in order to obtain better final clustering results. The experimental results on both synthetic and real datasets demonstrate that the clustering performance of the new algorithm is superior.

Key words: clustering, similarity measurement, relative local density, nearest neighbor, self-expression

中图分类号: