论文标题

估计AI的脆弱​​性:安全完整性水平以及测试分布性能的需求

Estimating the Brittleness of AI: Safety Integrity Levels and the Need for Testing Out-Of-Distribution Performance

论文作者

Lohn, Andrew J.

论文摘要

对人工智能(AI)的测试,评估,验证和验证(TEVV)是一个挑战,威胁要限制AI研究人员致力于生产的经济和社会奖励。 TEVV对AI的核心任务是估计脆弱性,在该范围内,该系统在某些范围内的功能很好,并且在这些范围之外的范围很差。本文认为,这些标准都不确定深度神经网络。首先,高度吹捧的AI成功(例如,图像分类和语音识别)甚至比在设计范围内(完美的分布样本)甚至在关键系统中通常认证的数量级。其次,随着输入进一步分发(OOD),性能仅逐渐下降。尽管容易产生失败的AI组件,以及评估和改善OOD性能,以使AI清除TEVV和认证的障碍,但仍需要增强重点。

Test, Evaluation, Verification, and Validation (TEVV) for Artificial Intelligence (AI) is a challenge that threatens to limit the economic and societal rewards that AI researchers have devoted themselves to producing. A central task of TEVV for AI is estimating brittleness, where brittleness implies that the system functions well within some bounds and poorly outside of those bounds. This paper argues that neither of those criteria are certain of Deep Neural Networks. First, highly touted AI successes (eg. image classification and speech recognition) are orders of magnitude more failure-prone than are typically certified in critical systems even within design bounds (perfectly in-distribution sampling). Second, performance falls off only gradually as inputs become further Out-Of-Distribution (OOD). Enhanced emphasis is needed on designing systems that are resilient despite failure-prone AI components as well as on evaluating and improving OOD performance in order to get AI to where it can clear the challenging hurdles of TEVV and certification.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源