在线咨询
中国工业与应用数学学会会刊
主管:中华人民共和国教育部
主办:西安交通大学
ISSN 1005-3085  CN 61-1269/O1

工程数学学报 ›› 2023, Vol. 40 ›› Issue (1): 55-68.doi: 10.3969/j.issn.1005-3085.2023.01.004

• • 上一篇    下一篇

基于Pena距离的偏Laplace正态数据下位置回归模型的统计诊断

郑桂芬1,   王丹璐2,   吴刘仓2   

  1. 1. 文山学院人工智能学院,云南 文山 663000;2. 昆明理工大学理学院,云南 昆明 650093
  • 出版日期:2023-02-15 发布日期:2023-04-11
  • 基金资助:
    国家自然科学基金(11861041; 11261025).

Statistical Diagnosis of Location Regression Model Based on Pena Distance under Skew Laplace Normal Data

ZHENG Guifen1,   WANG Danlu2,   WU Liucang2   

  1. 1. College of Artificial Intelligence, Wenshan University, Wenshan, Yunnan 663000;
    2. Faculty of Science, Kunming University of Science and Technology, Kunming, Yunnan 650093
  • Online:2023-02-15 Published:2023-04-11
  • Supported by:
    The National Natural Science Foundation of China (11861041; 11261025).

摘要:

目前医学、社会学、生物等领域都存在尖峰、厚尾且偏斜的数据,针对这类数据,采用偏Laplace正态数据去拟合,得到的结果会更加精确。同时,在统计学中,异常点或强影响点对统计诊断的结果会产生很大的影响,因此对异常点或强影响点的诊断就显得尤为重要。常用的似然距离、Cook距离等研究删除一个(组)点对回归分析与预测值的影响,而Pena距离研究删除样本中各点对某一特定点回归值以及预测值的影响。基于此,在Pena距离下对偏Laplace正态数据下位置回归模型的影响分析进行了研究,利用EM算法对偏Laplace正态分布下的位置回归模型做了统计诊断。得到偏Laplace正态数据下位置回归模型下Pena距离的表达式以及高杠杆异常点的判别方法,并把Pena距离、Cook距离、似然距离进行比较,得到在某些情况下利用Pena距离检测异常点比Cook距离与似然距离更优。模拟与实例研究说明提出的模型与方法具有合理性。

关键词: Pena距离, 偏Laplace正态分布, 位置回归模型, EM算法

Abstract:

At present, data with sharp peaks, thick tails and skew appear in medicine, sociology, biology and other fields. For such data, adopting the Skew Laplace normal data to fit will get more accurate results. At the same time, in statistics, abnormal points or strong influence points will have a great impact on the results of statistical diagnosis, and hence the diagnosis of abnormal points or strong influence points is particularly important. Common methods such as Likelihood distance, Cook distance, etc., study the impact of deleting a point (group) on the regression analysis and predicted value. In the reasearch, the influence of Pena distance on the regression value and predicted value of a specific point after the deletion of each point in the sample is studied. Moreover, the influence of Pena distance on the Location regression model in the Skew Laplace normal data is studied, and the EM algorithm is applied to make a statistical diagnosis of the location regression model in the Skew Laplace normal distribution. The expression of the Pena distance and the discrimination method of high-leverage outliers under the location regression model with Skew Laplace normal data are obtained. The comparsion shows the Pena distance is compared with Cook distance and Likelihood distance, and the Pena distance is better than Cook distance and Likelihood distance in some cases. Through Monte Carlo simulation and a real example analysis, the proposed model and the proposed method are shown to be reasonable.

Key words: Pena distance, skew Laplace normal distribution, location regression model, EM algorithm

中图分类号: