在线咨询
中国工业与应用数学学会会刊
主管:中华人民共和国教育部
主办:西安交通大学
ISSN 1005-3085  CN 61-1269/O1

工程数学学报 ›› 2021, Vol. 38 ›› Issue (5): 601-609.doi: 10.3969/j.issn.1005-3085.2021.05.001

• •    下一篇

基于三种贝叶斯方法的结核病基因数据挖掘及生物信息学分析

张  旭,   吴佩望,   乔  峰   

  1. 西南大学数学与统计学院,重庆 400715
  • 出版日期:2021-10-15 发布日期:2021-12-15
  • 通讯作者: 吴佩望 E-mail: 664249360@qq.com
  • 基金资助:
    国家自然科学基金 (11701471);重庆市基础科学与前沿技术研究项目 (cstc2017jcyjAX0476).

Gene Data Mining and Bioinformatics Analysis of Tuberculosis Based on Three Bayesian Methods

ZHANG Xu,   WU Peiwang,   QIAO Feng   

  1. School of Mathematics and Statistics, Southwest University, Chongqing 400715
  • Online:2021-10-15 Published:2021-12-15
  • Contact: P. Wu. E-mail address: 664249360@qq.com
  • Supported by:
    The National Natural Science Foundation of China (11701471); the Basic Science and Frontier Technology Research Project of Chongqing (cstc2017jcyjAX0476).

摘要:

确定结核病易感宿主基因对结核病的治疗与防控起着关键作用,而目前只有少数基因被证实与其相关.本文基于结核病患者外周血单核细胞基因芯片数据集GSE54992,先通过两种基于贝叶斯框架的方法:信息先验性贝叶斯检验和线性模型及经验贝叶斯方法对该数据集进行分析并筛选出正常样本与活动性结核病患者样本之间的差异表达基因,发现了319个被两种方法均识别出的差异表达基因.再利用这些基因对独立验证集GSE83456进行建模,通过朴素贝叶斯分类器验证,得出了较高的分类准确率.最后通过GO功能富集分析和KEGG通路分析,从生物学角度分析了结核病发病的分子机制.该研究突出了三种贝叶斯方法的综合应用在基因数据分析中的重要作用,为发掘结核病特异性生物标志物提出了新的综合策略,为结核病的预防、诊断和治疗提供了重要线索.

关键词: 结核病, 信息先验性贝叶斯检验, 线性模型及经验贝叶斯, 朴素贝叶斯分类器

Abstract:

Identification of susceptible host genes plays a key role in the treatment and prevention of tuberculosis. So far, only a few genes have been confirmed to be related to it. Based on the gene chip dataset GSE54992 of peripheral blood mononuclear cells of tuberculosis patients, this paper selects the differentially expressed genes between normal samples and active tuberculosis samples by combing information prior Bayesian test and linear model empirical Bayesian method. 319 differentially expressed genes are recognized by both methods. These genes are modeled by using naive Bayes classifier based on the independent verification set GSE83456 where a high classification accuracy is obtained. Finally, the molecular mechanism of tuberculosis is analyzed from the biological point of view through GO function enrichment and KEGG pathway analysis. This study highlights the important roles of the three Bayesian methods in gene data analysis. It provides a new comprehensive strategy for exploring specific biomarkers of tuberculosis, and shows the important clues for the prevention, diagnosis and treatment of tuberculosis.

Key words: tuberculosis, information prior Bayesian test, linear model and empirical Bayes, naive Bayes classifier

中图分类号: