格式塔：Squad2.0的堆叠合奏

论文标题

格式塔：Squad2.0的堆叠合奏

Gestalt: a Stacking Ensemble for SQuAD2.0

论文作者

El-Geish, Mohamed

论文摘要

我们提出了一个深入学习系统 - 用于Squad2.0任务 - 在上下文段落中找到或表明缺乏对问题的正确答案。我们的目标是学习一个异质squad2.0型号的合奏，当正确融合时，其表现优于合奏本身的最佳模型。我们创建了一个堆叠合奏，将基于阿尔伯特和罗伯塔的两个模型的顶级N预测结合到了一个多类分类任务中，以从他们的预测中选择最佳答案。我们探索了各种集合配置，输入表示和模型架构。为了进行评估，我们检查了测试集EM和F1分数；我们表现最好的合奏结合了基于CNN的元模型，分别得分为87.117和90.306，与EM中最佳模型的基线表现相比，EM的相对提高为0.55％，F1得分为0.61％，Albert基于Albert的模型的基线性能为86.644，为EM和89.760 for f1。

We propose a deep-learning system -- for the SQuAD2.0 task -- that finds, or indicates the lack of, a correct answer to a question in a context paragraph. Our goal is to learn an ensemble of heterogeneous SQuAD2.0 models that, when blended properly, outperforms the best model in the ensemble per se. We created a stacking ensemble that combines top-N predictions from two models, based on ALBERT and RoBERTa, into a multiclass classification task to pick the best answer out of their predictions. We explored various ensemble configurations, input representations, and model architectures. For evaluation, we examined test-set EM and F1 scores; our best-performing ensemble incorporated a CNN-based meta-model and scored 87.117 and 90.306, respectively -- a relative improvement of 0.55% for EM and 0.61% for F1 scores, compared to the baseline performance of the best model in the ensemble, an ALBERT-based model, at 86.644 for EM and 89.760 for F1.

下载PDF全文

下载文献需遵守相关版权规定

论文标题