论文标题
一个多分辨率的理论,用于近似无限-P $ -Zero- $ n $:过渡推断,个性化预测和一个没有偏见 - 差异权衡的世界
A Multi-resolution Theory for Approximating Infinite-$p$-Zero-$n$: Transitional Inference, Individualized Predictions, and a World Without Bias-Variance Trade-off
论文作者
论文摘要
过渡性推断是一种经验主义概念,自古希腊以来就植根于临床医学。从一个治疗一个实体中获得的知识和经验用于处理相关但截然不同的实体。这种“过渡到类似”的概念使个性化的治疗含义具有操作意义,但其理论基础却违反了熟悉的归纳推理框架。实体的独特性是可能存在无限数量的属性(因此$ p = \ infty $)的结果,这需要零直接训练样本量(即$ n = 0 $),因为真正的豚鼠不存在。但是,有关小波和筛子方法的文献提出了通过多分辨率(MR)透视的过渡推断的原则性近似理论,我们使用分辨率水平来索引近似程度以最终个性。 MR推断寻求一项基本分辨率,以索引间接训练样本,该分辨率提供了足够的匹配属性,以增加结果与目标个体的相关性,但仍积累了足够的间接样本量以进行强大的估计。从理论上讲,MR推理依赖于无限的方差分析型分解,这为通过分辨率偏差的衰减率建模稀疏性的替代方法是主要分辨率水平的函数。出乎意料的是,这种分解揭示了一个没有差异的世界,当结果是潜在无限多个预测因子的确定性功能。在这个确定性的世界中,当解决方案偏差迅速衰减时,最佳分辨率在传统意义上更喜欢过度拟合。此外,当预测误差曲线中可能有许多“下降”,当预测因子的贡献不均匀并且其重要性的顺序与预测的包含顺序不符。
Transitional inference is an empiricism concept, rooted and practiced in clinical medicine since ancient Greece. Knowledge and experiences gained from treating one entity are applied to treat a related but distinctively different one. This notion of "transition to the similar" renders individualized treatments an operational meaning, yet its theoretical foundation defies the familiar inductive inference framework. The uniqueness of entities is the result of potentially an infinite number of attributes (hence $p=\infty$), which entails zero direct training sample size (i.e., $n=0$) because genuine guinea pigs do not exist. However, the literature on wavelets and on sieve methods suggests a principled approximation theory for transitional inference via a multi-resolution (MR) perspective, where we use the resolution level to index the degree of approximation to ultimate individuality. MR inference seeks a primary resolution indexing an indirect training sample, which provides enough matched attributes to increase the relevance of the results to the target individuals and yet still accumulate sufficient indirect sample sizes for robust estimation. Theoretically, MR inference relies on an infinite-term ANOVA-type decomposition, providing an alternative way to model sparsity via the decay rate of the resolution bias as a function of the primary resolution level. Unexpectedly, this decomposition reveals a world without variance when the outcome is a deterministic function of potentially infinitely many predictors. In this deterministic world, the optimal resolution prefers over-fitting in the traditional sense when the resolution bias decays sufficiently rapidly. Furthermore, there can be many "descents" in the prediction error curve, when the contributions of predictors are inhomogeneous and the ordering of their importance does not align with the order of their inclusion in prediction.