在线咨询
中国工业与应用数学学会会刊
主管:中华人民共和国教育部
主办:西安交通大学
ISSN 1005-3085  CN 61-1269/O1

工程数学学报 ›› 2015, Vol. 32 ›› Issue (5): 677-689.doi: 10.3969/j.issn.1005-3085.2015.05.006

• • 上一篇    下一篇

线性回归模型的Boosting变量选择方法

李  毓1,   张春霞2,   王冠伟3   

  1. 1- 信阳师范学院经济与管理学院,信阳 464000
    2- 西安交通大学数学与统计学院,西安 710049
    3- 西安工业大学机电工程学院,西安 710021
  • 收稿日期:2014-02-26 接受日期:2014-09-01 出版日期:2015-10-15 发布日期:2015-12-15
  • 基金资助:
    国家自然科学基金 (11201367; 91230101);国家重点基础研究发展计划973项目 (2013CB329406);河南省社科规划项目 (2014BJJ069);河南省教育厅科学技术研究重点项目 (14B910001).

Boosting Variable Selection Algorithm for Linear Regression Models

LI Yu1,   ZHANG Chun-xia2,   WANG Guan-wei3   

  1. 1- School of Economics and Management, Xinyang Normal University, Xinyang 464000
    2- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an 710049
    3- School of Mechatronic Engineering, Xi'an Technological University, Xi'an 710021
  • Received:2014-02-26 Accepted:2014-09-01 Online:2015-10-15 Published:2015-12-15
  • Supported by:
    The National Natural Science Foundation of China (11201367; 91230101); the National Basic Research Program 973 (2013CB329406); the Social Sciences Planning Project of Henan Province (2014BJJ069);the Key Project of Henan Education Committee (14B910001).

摘要: 针对线性回归模型的变量选择问题,本文基于遗传算法提出了一种新的Boosting学习方法.该方法对每一训练个体赋予权重,以遗传算法作为Boosting的基学习算法,将带有权重分布的训练集作为遗传算法的输入进行变量选择.同时,根据前一次变量选择效果的好坏更新训练集上的权重分布.重复上述步骤多次,最后以加权融合方式合并多次变量选择的结果.基于模拟和实际数据的试验结果表明,本文新提出的Boosting方法能显著提高传统遗传算法用于变量选择的质量,准确识别出与响应变量相关的协变量,这为线性回归模型的变量选择提供了一种有效的新方法.

关键词: Boosting算法, 变量选择, 集成学习, 遗传算法, 多样性

Abstract:

With respect to variable selection for linear regression models, this paper proposes a novel Boosting learning method based on genetic algorithm. In the novel algorithm, all training examples are firstly assigned equal weights and a traditional genetic algorithm is adopted as the base learning algorithm of Boosting. Then, the training set associated with a weight distribution is taken as the input of genetic algorithm to do variable selection. Subsequently, the weight distribution is updated according to the quality of the previous variable selection results. Through repeating the above steps for multiple times, the results are then fused via a weighted combination rule. The performance of the proposed Boosting method is investigated on some simulated and real-world data. The experimental results show that our method can significantly improve the variable selection performance of traditional genetic algorithm and accurately identify the relevant variables. Thus, the novel Boosting method can be deemed as an effective technique for handling variable selection problems in linear regression models.

Key words: Boosting algorithm, variable selection, ensemble learning, genetic algorithm, diversity

中图分类号: