在线咨询
中国工业与应用数学学会会刊
主管:中华人民共和国教育部
主办:西安交通大学
ISSN 1005-3085  CN 61-1269/O1

工程数学学报 ›› 2022, Vol. 39 ›› Issue (3): 401-412.doi: 10.3969/j.issn.1005-3085.2022.03.005

• • 上一篇    下一篇

基于筛选排序算法的多均值变点估计

李 扬1, 吴密霞1, 胡 尧2, 杨 超3   

  1. 1. 北京工业大学理学部统计与数据科学系,北京 100124
    2. 贵州大学数学与统计学院,贵阳 550025
    3. 贵阳市第二中学,贵阳 550001
  • 出版日期:2022-06-15 发布日期:2022-08-15
  • 通讯作者: 吴密霞 E-mail: wumixia@bjut.edu.cn
  • 基金资助:
    国家自然科学基金 (11661018; 11771032).

Multiple Mean Change-points Estimation Based on Screening and Ranking Algorithm

LI Yang1,   WU Mixia1,   HU Yao2,   YANG Chao3   

  1. 1. Department of Statistics and Data Science, Faculty of Science, Beijing University of Technology, Beijing 100124 
    2. School of Mathematics and Statistics, Guizhou University, Guiyang 550025 
    3. No.2 High School of Guiyang, Guiyang 550001
  • Online:2022-06-15 Published:2022-08-15
  • Contact: M. Wu. E-mail address: wumixia@bjut.edu.cn
  • Supported by:
    The National Natural Science Foundation of China (11661018; 11771032).

摘要:

多均值变点估计问题是目前统计界的一个热点问题。文献中已有多种算法处理该问题,其中筛选排序算法 (Screening and Ranking algorithm, SaRa) 由于具有快速检测和高精度的特点而被广泛关注。值得注意的是,该算法在筛选步骤的阈值选取倾向于保守,其主要原因是 SaRa 算法中方差参数采用了分段方法进行估计。本文的主要目的是改进多均值变点估计的 SaRa 算法。首先,运用局部多项式结合交叉验证方法给出了误差标准差的一个全局估计,并将其应用于初筛变点步骤中。然后,通过对候选变点的局部诊断函数值进行排序,进而结合 MBIC 准则得到了最终的变点估计。数值模拟结果显示了本文所提出的改进的 SaRa 算法在变点的数目及位置的估计准确率均高于现有的方法。最后,将该方法应用于深圳市车流量实际数据,通过分析该区域的工作日及非工作日变点分布特点,可为交管部门和出行人群提供参考建议。

关键词: 多均值变点, 局部多项式估计, 交叉验证, 筛选排序, MBIC 准则

Abstract:

The multiple change-points estimation problem is a hot issue in current statistics, and there are many algorithms in the literature. Among them, Screening and Ranking algorithm (SaRa) has attracted wide attention due to its fast detection and high precision characteristics. However, this algorithm tends to be conservative in the threshold selection of the screening procedure. The reason is that the variance in SaRa is separately estimated in each segment process. The main purpose of this paper is to improve SaRa. Firstly, a global estimate of the variance is calculated through local polynomial approximation with the bandwith selected by the cross validation method. The initial change-points are obtained from screening based on the improved threshold. Then, to order those points in terms of the local diagnostic function values, the number of final change-points is determined by maximizing the MBIC. Numerical results show that the proposed algorithm has high accuracies in the estimation of the number and locations of change points in comparison to existing methods. Finally, this method is app-lied to the actual traffic flow data of Shenzhen city. The distribution characteristics of change points on working days and non-working days in this area are analyzed, which can provide some guidance for traffic control departments and travelers.

Key words: multiple mean change-points, local polynomial estimation, cross-validation, screening and ranking, MBIC criterion

中图分类号: