使用多段信息编码（音乐）的自我监督表示学习

论文标题

使用多段信息编码（音乐）的自我监督表示学习

Self-Supervised Representation Learning With MUlti-Segmental Informational Coding (MUSIC)

论文作者

Niu, Chuang, Wang, Ge

论文摘要

自我监督的表示学习将高维数据映射到一个有意义的嵌入空间中，其中相似的语义内容样本彼此接近。最近的大多数表示方法学习方法最大化余弦相似性，或最大程度地减少通常在$ l2 $归一化单位hypersphere上的不同视图的嵌入特征之间的嵌入特征之间的距离。为了防止所有样品具有相同嵌入功能的微不足道的解决方案，已经开发了各种技术，例如对比度学习，停止梯度，方差和协方差正则化等。在这项研究中，我们建议对自我求职的表示的多段信息编码（音乐）进行学习。音乐将嵌入功能分为多个段，将样本区分为不同的语义簇，而不同的段则集中在不同的分区原理上。信息理论测量直接用于优化音乐，理论上保证了琐碎的解决方案。音乐不取决于常用的技术，例如内存库或大批次，不对称网络，梯度停止，动量重量更新等，从而使训练框架灵活。我们的实验表明，音乐比大多数相关的Barlow双胞胎和通过线性探测的ImageNet分类的VICREG方法取得更好的结果，并且不需要深层投影仪也不需要大的特征维度。代码将提供。

Self-supervised representation learning maps high-dimensional data into a meaningful embedding space, where samples of similar semantic contents are close to each other. Most of the recent representation learning methods maximize cosine similarity or minimize the distance between the embedding features of different views from the same sample usually on the $l2$ normalized unit hypersphere. To prevent the trivial solutions that all samples have the same embedding feature, various techniques have been developed, such as contrastive learning, stop gradient, variance and covariance regularization, etc. In this study, we propose MUlti-Segmental Informational Coding (MUSIC) for self-supervised representation learning. MUSIC divides the embedding feature into multiple segments that discriminatively partition samples into different semantic clusters and different segments focus on different partition principles. Information theory measurements are directly used to optimize MUSIC and theoretically guarantee trivial solutions are avoided. MUSIC does not depend on commonly used techniques, such as memory bank or large batches, asymmetry networks, gradient stopping, momentum weight updating, etc, making the training framework flexible. Our experiments demonstrate that MUSIC achieves better results than most related Barlow Twins and VICReg methods on ImageNet classification with linear probing, and requires neither deep projectors nor large feature dimensions. Code will be made available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题