在线咨询
中国工业与应用数学学会会刊
主管:中华人民共和国教育部
主办:西安交通大学
ISSN 1005-3085  CN 61-1269/O1

工程数学学报 ›› 2019, Vol. 36 ›› Issue (1): 1-17.doi: 10.3969/j.issn.1005-3085.2019.01.001

• •    下一篇

变量选择集成方法

张春霞,   李俊丽   

  1. 西安交通大学数学与统计学院,西安  710049
  • 收稿日期:2017-03-06 接受日期:2018-09-14 出版日期:2019-02-15 发布日期:2019-04-15
  • 基金资助:
    国家自然科学基金,(11671317; 61572393).

Variable Selection Ensemble Methods

ZHANG Chun-xia,   LI Jun-li   

  1. School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an 710049
  • Received:2017-03-06 Accepted:2018-09-14 Online:2019-02-15 Published:2019-04-15
  • Supported by:
    The National Natural Science Foundations of China (11671317; 61572393).

摘要: 随着海量高维数据在众多研究和应用领域的不断涌现,如何利用数据的稀疏性特征,从中挖掘到有价值的信息显得至关重要.变量选择作为可解释性建模、提高统计推断和预测精度的有效工具,在高维数据的分析中发挥着愈来愈重要的作用.由于集成学习能显著提高选择精度、缓解变量选择过程的不稳定性、降低噪声变量被误选的机率,变量选择集成方法近年来得到了广泛研究.为了给相关方向的研究者提供一个系统的参考资料,论文对现有的变量选择集成方法进行了详细阐述,按照构建集成所用的不同策略将其分为两大类,分析了各类方法的特征,并采用数值试验研究了各类方法在变量选择、预测等方面的性能.最后,论文对变量选择集成方法在未来值得研究的方向进行了探讨.

关键词: 高维数据分析, 变量选择, 线性回归模型, 集成学习, 稳定性

Abstract: With the emergence of massive high-dimensional data in many research and application fields, it is crucial to mine valuable information by using the sparsity of high-dimensional data. Being an effective tool for building an interpretative model, improving inference and prediction accuracy, variable selection plays an increasingly important role in statistical modelling of high-dimensional data. Because ensemble learning has advantages to significantly improve selection accuracy, to alleviate the instability of traditional selection methods, and to reduce falsely including noise variables, variable selection ensemble (VSE) methods have gained considerable interest in context of variable selection. In order to provide a systematic reference for researchers in related fields, this paper presents a detailed survey of the existing VSEs and categorizes them into two classes according to their different strategies. The main characteristics of the methods in each class are also analyzed. In the meantime, some simulated experiments are carried out to investigate the selection and prediction performance of some representative VSE techniques. Finally, several research directions of VSEs deserved to be further studied are discussed.

Key words: high-dimensional data analysis, variable selection, linear regression model, ensemble learning, stability

中图分类号: