通过梯度剪辑来促进降低参数尺寸

论文标题

通过梯度剪辑来促进降低参数尺寸

Facilitate the Parametric Dimension Reduction by Gradient Clipping

论文作者

Lai, Chien-Hsun, Wang, Yu-Shuen

论文摘要

我们通过训练神经网络扩展了一种众所周知的降低方法，即从非参数到参数的T-分布的随机邻居嵌入（T-SNE）。参数技术的主要优点是处理新数据的概括，这对流媒体数据探索特别有益。但是，训练神经网络优化T-SNE目标函数经常失败。以前的方法通过预训练然后对网络进行微调来克服此问题。我们发现，训练失败来自梯度爆炸问题，当数据点在高维空间中遥远的数据点预计到附近的嵌入位置时，就会发生。因此，我们应用了梯度剪辑方法来解决问题。由于网络是通过直接优化T-SNE目标函数来训练的，因此我们的方法实现了与非参数T-SNE兼容的嵌入质量，同时享受了概括的能力。由于迷你批次网络培训，我们的参数降低方法非常有效。我们进一步将其他非参数最先进的方法（例如大节和UMAP）扩展到了参数版本。实验结果证明了我们方法的可行性。考虑到它的实用性，我们将很快发布公共用途的代码。

We extend a well-known dimension reduction method, t-distributed stochastic neighbor embedding (t-SNE), from non-parametric to parametric by training neural networks. The main advantage of a parametric technique is the generalization of handling new data, which is particularly beneficial for streaming data exploration. However, training a neural network to optimize the t-SNE objective function frequently fails. Previous methods overcome this problem by pre-training and then fine-tuning the network. We found that the training failure comes from the gradient exploding problem, which occurs when data points distant in high-dimensional space are projected to nearby embedding positions. Accordingly, we applied the gradient clipping method to solve the problem. Since the networks are trained by directly optimizing the t-SNE objective function, our method achieves an embedding quality that is compatible with the non-parametric t-SNE while enjoying the ability of generalization. Due to mini-batch network training, our parametric dimension reduction method is highly efficient. We further extended other non-parametric state-of-the-art approaches, such as LargeVis and UMAP, to the parametric versions. Experiment results demonstrate the feasibility of our method. Considering its practicability, we will soon release the codes for public use.

下载PDF全文

下载文献需遵守相关版权规定

论文标题