使用基于变压器的样式转移方法学习水下声学图像的视觉表示

论文标题

使用基于变压器的样式转移方法学习水下声学图像的视觉表示

Learning Visual Representation of Underwater Acoustic Imagery Using Transformer-Based Style Transfer Method

论文作者

Zhou, Xiaoteng, Yu, Changli, Yuan, Shihao, Yuan, Xin, Yu, Hangchi, Luo, Citong

论文摘要

水下自动目标识别（UATR）一直是海洋工程中的一个具有挑战性的研究主题。尽管深度学习带来了在土地和空中识别目标识别的机会，但由于传感器的性能和可训练的数据的规模，基于深度学习的水下目标识别技术已落后。这封信提出了一个学习水下声学图像的视觉表示的框架，该图像将基于变压器的样式转移模型作为主体。它可以用水下声学成像的视觉特征代替光学图像的低级纹理特征，同时保留其原始的高级语义内容。所提出的框架可以完全使用丰富的光学图像数据集来生成伪声学图像数据集，并将其用作训练水下声学目标识别模型的初始样本。这些实验选择双频识别声纳（DIDSON）作为水下声学数据源，并将FISH（最常见的海洋生物）视为研究主题。实验结果表明，所提出的方法可以产生高质量和高保真的伪声音样本，实现声学数据增强的目的，并为水下声学图像域转移研究提供支持。

Underwater automatic target recognition (UATR) has been a challenging research topic in ocean engineering. Although deep learning brings opportunities for target recognition on land and in the air, underwater target recognition techniques based on deep learning have lagged due to sensor performance and the size of trainable data. This letter proposed a framework for learning the visual representation of underwater acoustic imageries, which takes a transformer-based style transfer model as the main body. It could replace the low-level texture features of optical images with the visual features of underwater acoustic imageries while preserving their raw high-level semantic content. The proposed framework could fully use the rich optical image dataset to generate a pseudo-acoustic image dataset and use it as the initial sample to train the underwater acoustic target recognition model. The experiments select the dual-frequency identification sonar (DIDSON) as the underwater acoustic data source and also take fish, the most common marine creature, as the research subject. Experimental results show that the proposed method could generate high-quality and high-fidelity pseudo-acoustic samples, achieve the purpose of acoustic data enhancement and provide support for the underwater acoustic-optical images domain transfer research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题