估计AI的脆弱性：安全完整性水平以及测试分布性能的需求

论文标题

估计AI的脆弱性：安全完整性水平以及测试分布性能的需求

Estimating the Brittleness of AI: Safety Integrity Levels and the Need for Testing Out-Of-Distribution Performance

论文作者

Lohn, Andrew J.

论文摘要

对人工智能（AI）的测试，评估，验证和验证（TEVV）是一个挑战，威胁要限制AI研究人员致力于生产的经济和社会奖励。 TEVV对AI的核心任务是估计脆弱性，在该范围内，该系统在某些范围内的功能很好，并且在这些范围之外的范围很差。本文认为，这些标准都不确定深度神经网络。首先，高度吹捧的AI成功（例如，图像分类和语音识别）甚至比在设计范围内（完美的分布样本）甚至在关键系统中通常认证的数量级。其次，随着输入进一步分发（OOD），性能仅逐渐下降。尽管容易产生失败的AI组件，以及评估和改善OOD性能，以使AI清除TEVV和认证的障碍，但仍需要增强重点。

Test, Evaluation, Verification, and Validation (TEVV) for Artificial Intelligence (AI) is a challenge that threatens to limit the economic and societal rewards that AI researchers have devoted themselves to producing. A central task of TEVV for AI is estimating brittleness, where brittleness implies that the system functions well within some bounds and poorly outside of those bounds. This paper argues that neither of those criteria are certain of Deep Neural Networks. First, highly touted AI successes (eg. image classification and speech recognition) are orders of magnitude more failure-prone than are typically certified in critical systems even within design bounds (perfectly in-distribution sampling). Second, performance falls off only gradually as inputs become further Out-Of-Distribution (OOD). Enhanced emphasis is needed on designing systems that are resilient despite failure-prone AI components as well as on evaluating and improving OOD performance in order to get AI to where it can clear the challenging hurdles of TEVV and certification.

下载PDF全文

下载文献需遵守相关版权规定

论文标题