部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

Integrating Rankings into Quantized Scores in Peer Review

论文作者

Liu, Yusha, Xu, Yichong, Shah, Nihar B., Singh, Aarti

论文摘要

在同行评审中，通常要求审阅者为论文提供分数。然后在决策过程中以各种方式通过区域椅子或程序主持人使用分数。这些分数通常以量化形式引起，以适应人类在数值价值中描述其观点的有限认知能力。已经发现，量化的分数遭受了许多联系，从而导致大量信息损失。为了减轻此问题，会议已开始要求审阅者另外提供他们审查的论文的排名。但是，有两个关键挑战。首先，没有使用此排名信息的标准过程，区域椅可以以不同的方式使用它（包括简单地忽略它们），从而导致同行评审过程中的任意性。其次，没有合理的接口可以明智地使用此数据，也没有将其纳入现有工作流中的方法，从而导致效率低下。我们采用一种原则性的方法将排名信息集成到分数中。我们方法的输出是与每个评论有关的更新分数，还包含了排名。我们的方法通过以下方式解决了上述两个挑战：（i）确保以相同方式将排名纳入更新分数中，从而减轻任意性，以及（ii）允许无缝地使用为分数设计的现有接口和工作流程。我们在综合数据集以及ICLR 2017会议上的同行评论上进行了经验评估我们的方法，并发现与ICLR 2017数据的最佳性能基线相比，它将错误降低了约30％。

In peer review, reviewers are usually asked to provide scores for the papers. The scores are then used by Area Chairs or Program Chairs in various ways in the decision-making process. The scores are usually elicited in a quantized form to accommodate the limited cognitive ability of humans to describe their opinions in numerical values. It has been found that the quantized scores suffer from a large number of ties, thereby leading to a significant loss of information. To mitigate this issue, conferences have started to ask reviewers to additionally provide a ranking of the papers they have reviewed. There are however two key challenges. First, there is no standard procedure for using this ranking information and Area Chairs may use it in different ways (including simply ignoring them), thereby leading to arbitrariness in the peer-review process. Second, there are no suitable interfaces for judicious use of this data nor methods to incorporate it in existing workflows, thereby leading to inefficiencies. We take a principled approach to integrate the ranking information into the scores. The output of our method is an updated score pertaining to each review that also incorporates the rankings. Our approach addresses the two aforementioned challenges by: (i) ensuring that rankings are incorporated into the updates scores in the same manner for all papers, thereby mitigating arbitrariness, and (ii) allowing to seamlessly use existing interfaces and workflows designed for scores. We empirically evaluate our method on synthetic datasets as well as on peer reviews from the ICLR 2017 conference, and find that it reduces the error by approximately 30% as compared to the best performing baseline on the ICLR 2017 data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题