变量选择集成方法

doi:10.3969/j.issn.1005-3085.2019.01.001

工程数学学报 ›› 2019, Vol. 36 ›› Issue (1): 1-17.doi: 10.3969/j.issn.1005-3085.2019.01.001

• • 下一篇

变量选择集成方法

张春霞, 李俊丽

西安交通大学数学与统计学院，西安 710049

收稿日期:2017-03-06 接受日期:2018-09-14 出版日期:2019-02-15 发布日期:2019-04-15
基金资助:
国家自然科学基金,(11671317; 61572393).

Variable Selection Ensemble Methods

ZHANG Chun-xia, LI Jun-li

School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an 710049

Received:2017-03-06 Accepted:2018-09-14 Online:2019-02-15 Published:2019-04-15
Supported by:
The National Natural Science Foundations of China (11671317; 61572393).

摘要/Abstract

摘要： 随着海量高维数据在众多研究和应用领域的不断涌现，如何利用数据的稀疏性特征，从中挖掘到有价值的信息显得至关重要．变量选择作为可解释性建模、提高统计推断和预测精度的有效工具，在高维数据的分析中发挥着愈来愈重要的作用．由于集成学习能显著提高选择精度、缓解变量选择过程的不稳定性、降低噪声变量被误选的机率，变量选择集成方法近年来得到了广泛研究．为了给相关方向的研究者提供一个系统的参考资料，论文对现有的变量选择集成方法进行了详细阐述，按照构建集成所用的不同策略将其分为两大类，分析了各类方法的特征，并采用数值试验研究了各类方法在变量选择、预测等方面的性能．最后，论文对变量选择集成方法在未来值得研究的方向进行了探讨．

关键词: 高维数据分析, 变量选择, 线性回归模型, 集成学习, 稳定性

Abstract: With the emergence of massive high-dimensional data in many research and application fields, it is crucial to mine valuable information by using the sparsity of high-dimensional data. Being an effective tool for building an interpretative model, improving inference and prediction accuracy, variable selection plays an increasingly important role in statistical modelling of high-dimensional data. Because ensemble learning has advantages to significantly improve selection accuracy, to alleviate the instability of traditional selection methods, and to reduce falsely including noise variables, variable selection ensemble (VSE) methods have gained considerable interest in context of variable selection. In order to provide a systematic reference for researchers in related fields, this paper presents a detailed survey of the existing VSEs and categorizes them into two classes according to their different strategies. The main characteristics of the methods in each class are also analyzed. In the meantime, some simulated experiments are carried out to investigate the selection and prediction performance of some representative VSE techniques. Finally, several research directions of VSEs deserved to be further studied are discussed.

Key words: high-dimensional data analysis, variable selection, linear regression model, ensemble learning, stability

中图分类号:

TP181
O212.4

张春霞, 李俊丽. 变量选择集成方法[J]. 工程数学学报, 2019, 36(1): 1-17.

ZHANG Chun-xia, LI Jun-li. Variable Selection Ensemble Methods[J]. Chinese Journal of Engineering Mathematics, 2019, 36(1): 1-17.

[1]	朱焕, 高德宝. 捕食者和食饵都具有阶段结构的时滞捕食系统的稳定性和 Hopf 分支（英）[J]. 工程数学学报, 2019, 36(6): 693-707.
[2]	杨晓忠, 吴立飞. 时间分数阶扩散方程的一种交替分带并行差分方法[J]. 工程数学学报, 2019, 36(5): 535-550.
[3]	王玲书, 张雅南, 苏欢. 一类具有阶段结构和饱和发生率的生态流行病模型的稳定性[J]. 工程数学学报, 2019, 36(4): 406-418.
[4]	王蓉, 杨文彬, 李艳玲. 一类带有恐惧效应的捕食-食饵模型的定性分析[J]. 工程数学学报, 2019, 36(4): 439-450.
[5]	王玉萍, 蔺小林, 李建全. 一类具有Beverton-Holt出生函数的阶段结构传染病模型的全局分析[J]. 工程数学学报, 2019, 36(4): 451-460.
[6]	王冠伟, 张春霞, 殷清燕. RS-BART：一种提升贝叶斯可加回归树预测性能的新方法（英）[J]. 工程数学学报, 2019, 36(4): 461-477.
[7]	张凤琴, 赵甜, 刘汉武. 一类具有阶段结构的传染病模型的全局分析[J]. 工程数学学报, 2019, 36(3): 333-343.
[8]	张冬洁, 张卫国, 雍燕, 李想. 河床流体模型方程扭状孤波解的渐近稳定性[J]. 工程数学学报, 2019, 36(2): 165-178.
[9]	蒋杰, 陈志平. 具有二次目标函数的多阶段随机规划问题的稳定性研究（英）[J]. 工程数学学报, 2019, 36(2): 198-218.
[10]	廖书, 杨炜明. 一类非标准离散霍乱动力学模型[J]. 工程数学学报, 2019, 36(1): 85-98.
[11]	张志信, 张玉峰, 蒋威. 分数阶退化时滞微分系统的稳定性问题[J]. 工程数学学报, 2018, 35(1): 45-54.

变量选择集成方法

Variable Selection Ensemble Methods

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 11

编辑推荐

Metrics

本文评价