SWIN变压器中的N-Gram，可有效轻量级图像超分辨率

论文标题

SWIN变压器中的N-Gram，可有效轻量级图像超分辨率

N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution

论文作者

Choi, Haram, Lee, Jeongmin, Yang, Jihoon

论文摘要

尽管一些研究证明，具有窗户自我注意力（WSA）的Swin Transformer（SWIN）适用于单像超级分辨率（SR），但由于有限的接收场，重建高分辨率的图像时，普通的WSA忽略了广泛的区域。此外，许多深度学习SR方法都有密集的计算。为了解决这些问题，我们首次将N-Gram上下文引入了低级视觉。我们将n-gram定义为SWIN中相邻的本地窗口，它与将N-gram视为连续字符或单词的文本分析不同。 N-Grams通过滑动WSA相互交互，扩展了可恢复降解像素的区域。使用n-gram上下文，我们提出了Ngswin，这是一种具有SCDP瓶颈的有效SR网络，采用了层次编码器的多尺度输出。实验结果表明，与先前的领先方法相比，Ngswin在保持有效的结构时达到了竞争性能。此外，我们还通过N-Gram上下文改进了其他基于SWIN的SR方法，从而构建了增强的模型：Swinir-ng。我们改进的Swinir-NG优于当前最佳轻型SR方法，并建立最先进的结果。代码可在https://github.com/rami0205/ngramswin上找到。

While some studies have proven that Swin Transformer (Swin) with window self-attention (WSA) is suitable for single image super-resolution (SR), the plain WSA ignores the broad regions when reconstructing high-resolution images due to a limited receptive field. In addition, many deep learning SR methods suffer from intensive computations. To address these problems, we introduce the N-Gram context to the low-level vision with Transformers for the first time. We define N-Gram as neighboring local windows in Swin, which differs from text analysis that views N-Gram as consecutive characters or words. N-Grams interact with each other by sliding-WSA, expanding the regions seen to restore degraded pixels. Using the N-Gram context, we propose NGswin, an efficient SR network with SCDP bottleneck taking multi-scale outputs of the hierarchical encoder. Experimental results show that NGswin achieves competitive performance while maintaining an efficient structure when compared with previous leading methods. Moreover, we also improve other Swin-based SR methods with the N-Gram context, thereby building an enhanced model: SwinIR-NG. Our improved SwinIR-NG outperforms the current best lightweight SR approaches and establishes state-of-the-art results. Codes are available at https://github.com/rami0205/NGramSwin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题