论文标题
快速图像重新定位的自我播放增强学习
Self-Play Reinforcement Learning for Fast Image Retargeting
论文作者
论文摘要
在这项研究中,我们解决图像重新定位,这是将输入图像调整为任意大小的任务。在一种称为Multiop的表现最佳的方法之一中,将多个重新定位的操作员组合在一起,并在每个阶段生成重新定位的图像,以找到最佳的运算符序列,以最大程度地减少原始图像和重新定位图像之间的距离。这种方法的局限性在于其巨大的处理时间,这严重禁止其实际使用。因此,本研究的目的是在合理的处理时间内找到运营商的最佳组合;我们提出了一种使用加固学习代理来预测每个步骤的最佳操作员的方法。这项研究的技术贡献如下。首先,我们提出了基于自我播放的奖励,这将对多个中的内容依赖距离的较大差异不敏感。其次,我们建议动态地改变每个动作的减肥重量,以防止算法落入本地最佳距离,并在其训练中仅选择最常用的操作员。我们的实验表明,我们通过三个数量级和与原始基于多手术器的方法相同的处理时间较小的处理时间进行了多手术图像重新定位,这是重新定位任务中表现最好的算法。
In this study, we address image retargeting, which is a task that adjusts input images to arbitrary sizes. In one of the best-performing methods called MULTIOP, multiple retargeting operators were combined and retargeted images at each stage were generated to find the optimal sequence of operators that minimized the distance between original and retargeted images. The limitation of this method is in its tremendous processing time, which severely prohibits its practical use. Therefore, the purpose of this study is to find the optimal combination of operators within a reasonable processing time; we propose a method of predicting the optimal operator for each step using a reinforcement learning agent. The technical contributions of this study are as follows. Firstly, we propose a reward based on self-play, which will be insensitive to the large variance in the content-dependent distance measured in MULTIOP. Secondly, we propose to dynamically change the loss weight for each action to prevent the algorithm from falling into a local optimum and from choosing only the most frequently used operator in its training. Our experiments showed that we achieved multi-operator image retargeting with less processing time by three orders of magnitude and the same quality as the original multi-operator-based method, which was the best-performing algorithm in retargeting tasks.