论文标题
格式塔:Squad2.0的堆叠合奏
Gestalt: a Stacking Ensemble for SQuAD2.0
论文作者
论文摘要
我们提出了一个深入学习系统 - 用于Squad2.0任务 - 在上下文段落中找到或表明缺乏对问题的正确答案。我们的目标是学习一个异质squad2.0型号的合奏,当正确融合时,其表现优于合奏本身的最佳模型。我们创建了一个堆叠合奏,将基于阿尔伯特和罗伯塔的两个模型的顶级N预测结合到了一个多类分类任务中,以从他们的预测中选择最佳答案。我们探索了各种集合配置,输入表示和模型架构。为了进行评估,我们检查了测试集EM和F1分数;我们表现最好的合奏结合了基于CNN的元模型,分别得分为87.117和90.306,与EM中最佳模型的基线表现相比,EM的相对提高为0.55%,F1得分为0.61%,Albert基于Albert的模型的基线性能为86.644,为EM和89.760 for f1。
We propose a deep-learning system -- for the SQuAD2.0 task -- that finds, or indicates the lack of, a correct answer to a question in a context paragraph. Our goal is to learn an ensemble of heterogeneous SQuAD2.0 models that, when blended properly, outperforms the best model in the ensemble per se. We created a stacking ensemble that combines top-N predictions from two models, based on ALBERT and RoBERTa, into a multiclass classification task to pick the best answer out of their predictions. We explored various ensemble configurations, input representations, and model architectures. For evaluation, we examined test-set EM and F1 scores; our best-performing ensemble incorporated a CNN-based meta-model and scored 87.117 and 90.306, respectively -- a relative improvement of 0.55% for EM and 0.61% for F1 scores, compared to the baseline performance of the best model in the ensemble, an ALBERT-based model, at 86.644 for EM and 89.760 for F1.