论文标题
使用有效的特征选择和机器学习从结肠样品中检测溃疡性结肠炎
Detecting ulcerative colitis from colon samples using efficient feature selection and machine learning
论文作者
论文摘要
溃疡性结肠炎(UC)是炎症性肠病(IBD)的最常见形式之一,其特征在于结肠粘膜的炎症。 UC的诊断基于临床症状,然后根据内窥镜,组织学和实验室发现得到证实。特征选择和机器学习以前已用于创建模型以促进某些疾病的诊断。在这项工作中,我们使用了最近开发的特征选择算法(DRPT)与支持矢量机(SVM)分类器相结合,以基于结肠样品中32个基因的表达值来区分健康受试者和具有UC的受试者的模型。我们使用UC活跃和不活跃时期受试者的结肠样品的独立基因表达数据集验证了我们的模型。我们的模型完美地检测了所有活性病例,在非活动情况下平均精度为0.62。与以前的研究中报道的结果以及使用机器学习(BioDISCML)的最近发表的生物标志物发现软件生成的模型相比,我们检测UC的最终模型在平均精度方面显示出更好的性能。
Ulcerative colitis (UC) is one of the most common forms of inflammatory bowel disease (IBD) characterized by inflammation of the mucosal layer of the colon. Diagnosis of UC is based on clinical symptoms, and then confirmed based on endoscopic, histologic and laboratory findings. Feature selection and machine learning have been previously used for creating models to facilitate the diagnosis of certain diseases. In this work, we used a recently developed feature selection algorithm (DRPT) combined with a support vector machine (SVM) classifier to generate a model to discriminate between healthy subjects and subjects with UC based on the expression values of 32 genes in colon samples. We validated our model with an independent gene expression dataset of colonic samples from subjects in active and inactive periods of UC. Our model perfectly detected all active cases and had an average precision of 0.62 in the inactive cases. Compared with results reported in previous studies and a model generated by a recently published software for biomarker discovery using machine learning (BioDiscML), our final model for detecting UC shows better performance in terms of average precision.