论文标题
通过聚类任务的自我设计中的数据转换见解
Data Transformation Insights in Self-supervision with Clustering Tasks
论文作者
论文摘要
自学是扩展对标签稀缺领域的深度学习的关键。对于大多数自我监督的方法,数据转换起着重要作用。但是,到目前为止,尚未研究转换的影响。此外,不同的转换可能会对系统产生不同的影响。我们为在自我监督任务中使用数据转换的使用提供了新的见解,特别是与聚类有关的。我们从理论和经验上表明某些转换有助于自我监督聚类的融合。我们还显示了转换无用或在某些情况下有害时的情况。我们显示了更快的融合率,其中有效的转换以及某些非凸目标的家族以及与原始Optima集合的融合证明。我们进行了合成和现实世界数据实验。从经验上讲,我们的结果符合提供的理论见解。
Self-supervision is key to extending use of deep learning for label scarce domains. For most of self-supervised approaches data transformations play an important role. However, up until now the impact of transformations have not been studied. Furthermore, different transformations may have different impact on the system. We provide novel insights into the use of data transformation in self-supervised tasks, specially pertaining to clustering. We show theoretically and empirically that certain set of transformations are helpful in convergence of self-supervised clustering. We also show the cases when the transformations are not helpful or in some cases even harmful. We show faster convergence rate with valid transformations for convex as well as certain family of non-convex objectives along with the proof of convergence to the original set of optima. We have synthetic as well as real world data experiments. Empirically our results conform with the theoretical insights provided.