基于搜索的强化学习测试

论文标题

基于搜索的强化学习测试

Search-Based Testing of Reinforcement Learning

论文作者

Tappler, Martin, Córdoba, Filip Cano, Aichernig, Bernhard K., Könighofer, Bettina

论文摘要

深入加强学习（RL）的评估本质上是具有挑战性的。尤其是学术政策的不透明性以及代理和环境的随机性质使得对深度RL代理的行为进行了困难。我们提出了一个基于搜索的测试框架，该框架可实现广泛的新型分析功能，以评估深RL代理的安全性和性能。为了进行安全测试，我们的框架采用了一种搜索算法，该算法搜索了解决RL任务的参考跟踪。搜索的回溯状态称为边界状态，构成关键安全情况。我们创建安全测试套件，以评估RL代理在这些边界状态附近逃脱安全关键情况的状况。对于强大的性能测试，我们通过模糊测试创建了一套各种迹线。这些模糊轨迹用于将代理带入多种潜在的未知状态，将代理的平均性能与模糊轨迹的平均性能进行比较。我们将基于搜索的测试方法应用于Nintendo的Super Mario Bros。

Evaluation of deep reinforcement learning (RL) is inherently challenging. Especially the opaqueness of learned policies and the stochastic nature of both agents and environments make testing the behavior of deep RL agents difficult. We present a search-based testing framework that enables a wide range of novel analysis capabilities for evaluating the safety and performance of deep RL agents. For safety testing, our framework utilizes a search algorithm that searches for a reference trace that solves the RL task. The backtracking states of the search, called boundary states, pose safety-critical situations. We create safety test-suites that evaluate how well the RL agent escapes safety-critical situations near these boundary states. For robust performance testing, we create a diverse set of traces via fuzz testing. These fuzz traces are used to bring the agent into a wide variety of potentially unknown states from which the average performance of the agent is compared to the average performance of the fuzz traces. We apply our search-based testing approach on RL for Nintendo's Super Mario Bros.

下载PDF全文

下载文献需遵守相关版权规定

论文标题