在线咨询
中国工业与应用数学学会会刊
主管:中华人民共和国教育部
主办:西安交通大学
ISSN 1005-3085  CN 61-1269/O1

工程数学学报 ›› 2021, Vol. 38 ›› Issue (5): 627-636.doi: 10.3969/j.issn.1005-3085.2021.05.003

• • 上一篇    下一篇

基于目标域自适应 SVM 分类器的微博情绪分类

郝笑弘,    薛保菊   

  1. 山西水利职业技术学院,太原 030032
  • 出版日期:2021-10-15 发布日期:2021-12-15

Microblog Emotion Classification Based on Target Domain Adaptive SVM Classifier

HAO Xiaohong,   XUE Baoju   

  1. School of Shanxi Conservancy Technical Institute, Taiyuan 030032
  • Online:2021-10-15 Published:2021-12-15

摘要:

目前微博数据上难以实现跨领域情绪分类,很多基于特征和实例的方法表现出分类准确度低和计算速度慢等问题.针对此问题,本文提出一种基于模型的自适应支持向量机 (SVM) 的微博情绪分类方法,简称 MASVM.该方法可以将现有模型直接应用到新目标域数据.首先,在“多对一”SVM 自适应模型的基础上,对源域数据集训练的分类器决策函数进行修改,创建出一个适应目标域分类器.然后,扩展多个分类器适应框架,根据基础分类器在较小目标域标签样本集合中的分配性能,学习基础分类器的权重控制.最后,最大限度使用通用语料库训练出的基础分类器,通过领域适应,将情绪分类的分类误差最小化.实验结果表明,所提方法在准确度和计算效率方面优于基准方法和其他类似方法的性能.另外,目标域标签数据的不同比例也会对分类性能造成影响.由于所提方法的性能优于域外基础模型和域自适应方法,表现出了与域内上界模型相近的性能,且具有计算速度优势,可以作为一种快速微博情绪分类方法加以应用.

关键词: 微博数据, 自适应, 情绪分类, 目标域, 分类器, 决策函数

Abstract:

At present, it is difficult to achieve cross domain sentiment classification in microblog data. Many feature-based and case-based methods have low classification accuracy and slow computing speed. To solve these problems, a model-based adaptive support vector machine (SVM) method is proposed, that is MASVM, which can directly apply existing models to new target domain data. Firstly, Based on the many-to-one SVM adaptive model, the classifier decision function trained by the source domain data set is modified, and a classifier adapted to the target domain is created. Then, we extend the multiple classifier adaptation framework, and learn the weight control of the basic classifier according to the allocation performance of the basic classifier in the smaller target domain tag set. Finally, the basic classifier trained by the general corpus is used to minimize the error of the emotional classification through domain adaptation. The experimental results show that the proposed method is better than the performance of the benchmark method and some similar methods in terms of accuracy and computational efficiency. In addition, the different proportions of the target domain label data will also affect the classification performance. As the performance of the proposed method is better than the basic model and domain adaptive method, it shows similar performance with the upper bound model in the domain, and has the advantage of computing speed, so it can be used as a fast microblog emotion classification method.

Key words: microblog data, adaptive, emotional classification, target domain, classifier, decision function

中图分类号: