论文标题

图形拉普拉斯与KNN自调整内核的收敛

Convergence of Graph Laplacian with kNN Self-tuned Kernels

论文作者

Cheng, Xiuyuan, Wu, Hau-Tieng

论文摘要

从数据点构建的内核化gram矩阵$ w $ $ \ {x_i \} _ {i = 1}^n $ as $ w_ {ij} = k_0(\ frac {\ | x_i -x_i -x_j \ |^2} {2} {σ^2} {σ^2} {σ^2})$在基于图形的数据中广泛使用。一个重要的问题是如何选择内核带宽$σ$,以及一种称为自调整内核的常见做法,可自适应地设置$σ_i$在每个点$ x_i $ by $ k $ - $ near的邻居(KNN)距离。当$ x_i $从$ d $维的歧管中取样时,与固定带宽内核不同,嵌入可能具有高维空间的空间时,laplacian融合与自调整内核的理论结果是不完整的。本文证明了图形laplacian操作员$ l_n $ for(加权 - )laplacian的收敛性,用于一个新的KNN自我调节核$ W^{(α)} _ {ij} _ {ij} = k_0(\ frac {\ freac { \hatρ(x_j)})/\hatρ(x_i)^α\hatρ(x_j)^α$,其中$ \hatρ$是估计的带宽函数{by KNN},并且有限运算符也由$α$参数化。当$α= 1 $时,极限运算符是加权歧管laplacian $Δ_p$。具体而言,我们证明了$ l_n f $的点融合以及图形dirichlet形式的收敛。我们的分析基于首先建立$ \hatρ$的$ C^0 $一致性,该$ \hatρ$限制了相对估计错误$ | \hatρ-\barρ|/\barρ|/\barρ$均匀的概率均匀概率,其中$ \barρ= p^{ - 1/d/d} $,$ p $是数据密度函数。我们的理论结果揭示了通过低密度区域中较小的差异误差,自调整内核比固定带宽内核的优势。在算法中,不需要$ d $或数据密度的先验知识。理论结果由模拟数据和手写数字图像数据的数值实验支持。

Kernelized Gram matrix $W$ constructed from data points $\{x_i\}_{i=1}^N$ as $W_{ij}= k_0( \frac{ \| x_i - x_j \|^2} {σ^2} )$ is widely used in graph-based geometric data analysis and unsupervised learning. An important question is how to choose the kernel bandwidth $σ$, and a common practice called self-tuned kernel adaptively sets a $σ_i$ at each point $x_i$ by the $k$-nearest neighbor (kNN) distance. When $x_i$'s are sampled from a $d$-dimensional manifold embedded in a possibly high-dimensional space, unlike with fixed-bandwidth kernels, theoretical results of graph Laplacian convergence with self-tuned kernels have been incomplete. This paper proves the convergence of graph Laplacian operator $L_N$ to manifold (weighted-)Laplacian for a new family of kNN self-tuned kernels $W^{(α)}_{ij} = k_0( \frac{ \| x_i - x_j \|^2}{ ε\hatρ(x_i) \hatρ(x_j)})/\hatρ(x_i)^α\hatρ(x_j)^α$, where $\hatρ$ is the estimated bandwidth function {by kNN}, and the limiting operator is also parametrized by $α$. When $α= 1$, the limiting operator is the weighted manifold Laplacian $Δ_p$. Specifically, we prove the point-wise convergence of $L_N f $ and convergence of the graph Dirichlet form with rates. Our analysis is based on first establishing a $C^0$ consistency for $\hatρ$ which bounds the relative estimation error $|\hatρ - \barρ|/\barρ$ uniformly with high probability, where $\barρ = p^{-1/d}$, and $p$ is the data density function. Our theoretical results reveal the advantage of self-tuned kernel over fixed-bandwidth kernel via smaller variance error in low-density regions. In the algorithm, no prior knowledge of $d$ or data density is needed. The theoretical results are supported by numerical experiments on simulated data and hand-written digit image data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源