Association Journal of CSIAM
Supervised by Ministry of Education of PRC
Sponsored by Xi'an Jiaotong University
ISSN 1005-3085  CN 61-1269/O1

Chinese Journal of Engineering Mathematics ›› 2021, Vol. 38 ›› Issue (6): 750-762.doi: 10.3969/j.issn.1005-3085.2021.06.001

    Next Articles

Semi-automatic Text Category Labelling Method Based on Machine Learning

GONG Yansheng1,   CAI Keping2,   WANG Zhiqiang3,   LI Xinxin4,   JING Wenfeng4   

  1. 1. China Railway First Survey and Design Institute Group Co., Ltd, Xi'an 710043
    2. Xi'an Technological University, Xi'an 710021
    3. State Grid Zhejiang Electric Power Corporation Information & Telecommunication Branch, Hangzhou 310007
    4. School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an 710049
  • Online:2021-12-15 Published:2022-02-15
  • Supported by:
    China Railway Construction Corporation 2018 Major Science and Technology Special Project (18-A02); the Science and Technology Planning Project of Xi'an City (20180916CX5JC6).

Abstract:

In the text classification problem, the efficiency of manual labelling is very low, and professionals familiar with the research field are needed to carry out this work. In order to improve the efficiency of text data labelling, a semi-automatic paper category labelling method is proposed. Firstly, the vector representation of paper abstracts is derived by the combination of word2vec and TF-IDF; then the K-means algorithm is used to complete text clustering; K classification models are constructed through the $L_1$-LR binary classification model; For each binary classification model, the word corresponding to the coefficient with large absolute weight value is selected as the subject word. Finally, the label of each category is determined according to the subject word. The proposed semi-automatic paper category labelling method greatly improves the efficiency of text labelling.

Key words: semi-automatic category labelling, machine learning, text clustering, $L_1$-LR binary classification model

CLC Number: