论文标题

视频快照压缩成像的时空变压器

Spatial-Temporal Transformer for Video Snapshot Compressive Imaging

论文作者

Wang, Lishun, Cao, Miao, Zhong, Yong, Yuan, Xin

论文摘要

视频快照压缩成像(SCI)使用计算成像的概念通过单个测量来捕获多个顺序视频帧。基本原理是通过不同的掩码调节高速框架,这些调制框架求和到由低速2D传感器捕获的单个测量值(称为光学编码器);此后,如果需要,使用算法来重建所需的高速帧(配音软件解码器)。在本文中,我们考虑视频SCI中的重建算法,即从压缩测量中恢复一系列视频帧。具体而言,我们提出了一个时空变压器(STFORMER)来利用空间和时间域中的相关性。 stformer网络由令牌生成块,视频重建块组成,这两个块由一系列的stformer块连接。每个STFORMER块由空间自我发场分支组成,时间自我注意分支和这两个分支的输出由Fusion Network集成。对模拟和实际数据的广泛结果证明了Stformer的最新性能。代码和模型可在https://github.com/ucaswangls/stformer.git上公开获得

Video snapshot compressive imaging (SCI) captures multiple sequential video frames by a single measurement using the idea of computational imaging. The underlying principle is to modulate high-speed frames through different masks and these modulated frames are summed to a single measurement captured by a low-speed 2D sensor (dubbed optical encoder); following this, algorithms are employed to reconstruct the desired high-speed frames (dubbed software decoder) if needed. In this paper, we consider the reconstruction algorithm in video SCI, i.e., recovering a series of video frames from a compressed measurement. Specifically, we propose a Spatial-Temporal transFormer (STFormer) to exploit the correlation in both spatial and temporal domains. STFormer network is composed of a token generation block, a video reconstruction block, and these two blocks are connected by a series of STFormer blocks. Each STFormer block consists of a spatial self-attention branch, a temporal self-attention branch and the outputs of these two branches are integrated by a fusion network. Extensive results on both simulated and real data demonstrate the state-of-the-art performance of STFormer. The code and models are publicly available at https://github.com/ucaswangls/STFormer.git

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源