Association Journal of CSIAM
Supervised by Ministry of Education of PRC
Sponsored by Xi'an Jiaotong University
ISSN 1005-3085  CN 61-1269/O1

Chinese Journal of Engineering Mathematics

    Next Articles

Distributed Variable Selection---MCP Regularization

WANG Ge-hua,  WANG Pu-yu,  ZHANG Hai   

  1. School of Mathematics, Northwest University, Xi'an 710069
  • Received:2019-02-27 Accepted:2019-06-13 Online:2021-06-15 Published:2021-08-15
  • Supported by:
    The National Natural Science Foundation of China (11571011).

Abstract: With the development of the digital age, a large number of high-dimensional data has been collected in various disciplines and fields. Faced with the huge amount of collected data, it becomes a great challenge for us to transform it into a form that can not only be stored and analyzed, but also can provide a reference for solving practical problems. In view of the current state of data storage, the distributed storage has emerged properly, in which data are stored in different machines in a certain way without any repetition, so as to solve the problem of data storage. Then, how to design a machine learning algorithm which is suitable for distributed data storage becomes another problem to be solved. As the theory of information technology has developed rapidly, the formulation and development of regularization methods provide us with an effective tool for processing and analyzing massive high-dimensional data, but they are only suitable for single-machine data processing. Concerning the superiority of non-convex regularization for variable selection and feature extraction, we combine distributed storage with non-convex regularization methods. We focus on non-convex regularization methods based on distributed computing to solve the storage and analysis of massive high-dimensional data. This paper studies the variable selection problem in the form of distributed data storage. We store the data separately in multiple computers that can communicate with each other, and propose a distributed MCP method. The distributed MCP algorithm implements interactive information between adjacent computers based on the ADMM algorithm, completes variable selection of full data, and ensures the convergence. The variable selection result of the distributed method is the same as that of the non-distributed method. Finally, the experimental results show that the proposed method is suitable for processing distributed storage data.

Key words: distributed, sparse, MCP, ADMM

CLC Number: