论文标题
半监督哈希的自我监督的Bernoulli自动编码器
Self-Supervised Bernoulli Autoencoders for Semi-Supervised Hashing
论文作者
论文摘要
语义散列是一种基于代表高维数据的大规模相似性搜索的新兴技术,该技术使用相似性保留的二进制代码用于有效索引和搜索。最近已经显示,具有神经网络参数参数的Bernoulli潜在表示,可以成功培训各种自动编码器,以在监督和无监督的场景中学习此类代码,从而改善了更传统的方法,这要归功于它们可以在架构上处理二元约束。但是,尚未研究标签的场景。 本文研究了基于缺乏监督的变异自动编码器的哈希方法的鲁棒性,重点是目前正在使用的两种半监督方法。第一个增强了变异自动编码器的训练目标,以共同对数据和类标签进行建模。第二种方法利用注释来定义额外的成对损失,该损失在代码(锤)空间中的相似性与标签空间中的相似性之间实现一致性。我们的实验表明,两种方法都可以显着提高哈希码的质量。当标记点的数量较大时,成对方法可以表现出优势。但是,我们发现该方法迅速降解,并在标记样品减少时失去了优势。为了解决这个问题,我们提出了一种新型的监督方法,其中模型使用其标签分布预测来实施成对目标。与最佳基线相比,此过程在完全监督的设置中产生相似的性能,但是当标记的数据稀缺时,结果显着改善了结果。我们的代码可在https://github.com/amacaluso/ssb-vae上公开提供。
Semantic hashing is an emerging technique for large-scale similarity search based on representing high-dimensional data using similarity-preserving binary codes used for efficient indexing and search. It has recently been shown that variational autoencoders, with Bernoulli latent representations parametrized by neural nets, can be successfully trained to learn such codes in supervised and unsupervised scenarios, improving on more traditional methods thanks to their ability to handle the binary constraints architecturally. However, the scenario where labels are scarce has not been studied yet. This paper investigates the robustness of hashing methods based on variational autoencoders to the lack of supervision, focusing on two semi-supervised approaches currently in use. The first augments the variational autoencoder's training objective to jointly model the distribution over the data and the class labels. The second approach exploits the annotations to define an additional pairwise loss that enforces consistency between the similarity in the code (Hamming) space and the similarity in the label space. Our experiments show that both methods can significantly increase the hash codes' quality. The pairwise approach can exhibit an advantage when the number of labelled points is large. However, we found that this method degrades quickly and loses its advantage when labelled samples decrease. To circumvent this problem, we propose a novel supervision method in which the model uses its label distribution predictions to implement the pairwise objective. Compared to the best baseline, this procedure yields similar performance in fully supervised settings but improves the results significantly when labelled data is scarce. Our code is made publicly available at https://github.com/amacaluso/SSB-VAE.