分布式变量选择---MCP正则化

doi:10.3969/j.issn.1005-3085.2021.03.001

Chinese Journal of Engineering Mathematics

Distributed Variable Selection---MCP Regularization

WANG Ge-hua, WANG Pu-yu, ZHANG Hai

School of Mathematics, Northwest University, Xi'an 710069

Received:2019-02-27 Accepted:2019-06-13 Online:2021-06-15 Published:2021-08-15
Supported by:
The National Natural Science Foundation of China (11571011).

Abstract

Abstract: With the development of the digital age, a large number of high-dimensional data has been collected in various disciplines and fields. Faced with the huge amount of collected data, it becomes a great challenge for us to transform it into a form that can not only be stored and analyzed, but also can provide a reference for solving practical problems. In view of the current state of data storage, the distributed storage has emerged properly, in which data are stored in different machines in a certain way without any repetition, so as to solve the problem of data storage. Then, how to design a machine learning algorithm which is suitable for distributed data storage becomes another problem to be solved. As the theory of information technology has developed rapidly, the formulation and development of regularization methods provide us with an effective tool for processing and analyzing massive high-dimensional data, but they are only suitable for single-machine data processing. Concerning the superiority of non-convex regularization for variable selection and feature extraction, we combine distributed storage with non-convex regularization methods. We focus on non-convex regularization methods based on distributed computing to solve the storage and analysis of massive high-dimensional data. This paper studies the variable selection problem in the form of distributed data storage. We store the data separately in multiple computers that can communicate with each other, and propose a distributed MCP method. The distributed MCP algorithm implements interactive information between adjacent computers based on the ADMM algorithm, completes variable selection of full data, and ensures the convergence. The variable selection result of the distributed method is the same as that of the non-distributed method. Finally, the experimental results show that the proposed method is suitable for processing distributed storage data.

Key words: distributed, sparse, MCP, ADMM

CLC Number:

O213
O236.2

WANG Ge-hua, WANG Pu-yu, ZHANG Hai. Distributed Variable Selection---MCP Regularization[J]. Chinese Journal of Engineering Mathematics, doi: 10.3969/j.issn.1005-3085.2021.03.001.

[1]	SHEN Yuan-yuan, CAO Wen-fei, HAN Guo-dong. Bayesian Modeling and Variational Inference for Logistic Group Sparse Regression Model [J]. Chinese Journal of Engineering Mathematics, 2020, 37(2): 203-214.
[2]	MA Yan, ZHANG Hai. Structure Learning of Gaussian Graphical Model with Covariates [J]. Chinese Journal of Engineering Mathematics, 2018, 35(5): 489-501.
[3]	XIA Zhi-ming, ZHAO Wen-zhi, XU Zong-ben. Principle Component Analysis for Tensors and Compression Theory for High-dimensional Information [J]. Chinese Journal of Engineering Mathematics, 2017, 34(6): 571-590.

Distributed Variable Selection---MCP Regularization

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 3

Recommended Articles

Metrics

Comments