在线咨询
中国工业与应用数学学会会刊
主管:中华人民共和国教育部
主办:西安交通大学
ISSN 1005-3085  CN 61-1269/O1

工程数学学报 ›› 2019, Vol. 36 ›› Issue (4): 461-477.doi: 10.3969/j.issn.1005-3085.2019.04.009

• • 上一篇    下一篇

RS-BART:一种提升贝叶斯可加回归树预测性能的新方法(英)

王冠伟1,  张春霞2,   殷清燕3   

  1. 1- 西安工业大学机电工程学院,西安  710021
    2- 西安交通大学数学与统计学院,西安  710049
    3- 西安建筑科技大学理学院,西安  710055
  • 收稿日期:2017-08-01 接受日期:2019-04-15 出版日期:2019-08-15 发布日期:2019-10-15
  • 基金资助:
    国家自然科学基金(11671317;11601412);陕西省科技攻关计划(2016GY-067);陕西省科技协调与创新重点实验室项目(2014SZS20-K04).

RS-BART: a Novel Technique to Boost the Prediction Ability of Bayesian Additive Regression Trees

WANG Guan-wei1,  ZHANG Chun-xia2,  YIN Qing-yan3   

  1. 1- School of Mechatronic Engineering, Xi'an Technological University, Xi'an 710021
    2- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an 710049
    3- School of Science, Xi'an University of Architecture and Technology, Xi'an 710055
  • Received:2017-08-01 Accepted:2019-04-15 Online:2019-08-15 Published:2019-10-15
  • Supported by:
    The National Natural Science Foundation of China (11671317; 11601412); the Key Science and Technology Program of Shaanxi Province (2016GY-067); the Key Laboratory Program of Science and Technology Co-ordination and Innovation Project of Shaanxi Province (2014SZS20-K04).

摘要: 在有监督学习的任务中,任何方法的主要目标是对未来数据进行准确的预测.作为梯度boosting算法的贝叶斯版本,贝叶斯可加回归树(Bayesian additive regression trees, BART)模型在此方面具有巨大潜力.但是,BART得到的关注远远低于随机森林和梯度boosting算法.为扩展BART的应用范围,文中首先对BART模型作了较为详尽的综述.考虑到BART在高维情况下会出现过拟合,本文提出了RS-BART方法以提高其预测性能.RS-BART首先对所有预测变量根据其相对重要性排序,然后使用重要性度量训练一些低维或中等维度的BART模型,将其预测结果平均或投票来得到最终的预测结果.基于模拟和实际数据的试验结果表明,与一些最先进的方法(如随机森林、boosting和BART)相比,RS-BART具有更好或基本相当的预测性能.因此,RS-BART可以作为用于解决实际应用中高维且稀疏预测任务的一种有效工具.

关键词: 集成学习, 贝叶斯可加回归树, 预测精度, 随机森林, Gibbs采样

Abstract: In supervised learning tasks, it is crucial for any algorithm to make accurate predictions on future data. As a Bayesian version of the gradient boosting algorithm, Bayesian additive regression trees (BART) have great potential to achieve high prediction accuracy. As far as we know, however, BART has not received as much attention as random forests and boosting. Thus, a comprehensive overview of BART is first presented to facilitate its understanding. Considering that BART may suffer from over-fitting in high-dimensional situations, one novel technique called RS-BART is developed to enhance its performance. Through first sorting all the variables with their relative importance, some low- or medium-dimensional BART models are trained with important variables. The predictions produced by these BART models are then integrated into the final result. By conducting experiments with some simulated and real data, RS-BART is demonstrated to perform better than or competitively with some state-of-the-art techniques including random forests, boosting and BART. Thus, RS-BART can be deemed as a competitive tool to solve real prediction tasks, especially high-dimensional but sparse ones.

Key words: ensemble learning, Bayesian additive regression tree, prediction accuracy, random forest, Gibbs sampling

中图分类号: