共同信息学习回归器：培训回归系统的信息理论观点

论文标题

共同信息学习回归器：培训回归系统的信息理论观点

Mutual Information Learned Regressor: an Information-theoretic Viewpoint of Training Regression Systems

论文作者

Yi, Jirong, Zhang, Qiaosheng, Chen, Zhen, Liu, Qiao, Shao, Wei, He, Yusen, Wang, Yaohua

论文摘要

作为机器学习的中心任务之一，回归在不同领域找到了许多应用程序。解决回归问题的现有常见实践是均方根误差（MSE）最小化方法或其需要对模型的先验知识的正则变体。最近，Yi等人提出了一个基于信息的监督学习框架，他们引入了标签熵正则化，该标签不需要任何先验知识。当应用于分类任务并通过随机梯度下降（SGD）优化算法解决时，它们的方法对常用的跨熵损失及其变体进行了显着改善。但是，他们没有为拟议的配方提供SGD算法的理论收敛分析。此外，由于标签的潜在无限支持集，将框架应用于回归任务是不平凡的。在本文中，我们研究了基于相互信息的监督学习框架下的回归。我们首先认为，MSE最小化方法等于有条件的熵学习问题，然后提出了一种相互信息学习公式，以通过使用重新聚体化技术来解决回归问题。对于拟议的公式，我们提供了用于在实践中解决该算法的SGD算法的收敛分析。最后，我们考虑了一个多输出回归数据模型，其中我们根据与基础数据分布相关的相互信息得出了概括性能下限。结果表明，高维度可以是保佑而不是诅咒，这是由阈值控制的。我们希望我们的工作将成为对基于共同信息回归的进一步研究的好起点。

As one of the central tasks in machine learning, regression finds lots of applications in different fields. An existing common practice for solving regression problems is the mean square error (MSE) minimization approach or its regularized variants which require prior knowledge about the models. Recently, Yi et al., proposed a mutual information based supervised learning framework where they introduced a label entropy regularization which does not require any prior knowledge. When applied to classification tasks and solved via a stochastic gradient descent (SGD) optimization algorithm, their approach achieved significant improvement over the commonly used cross entropy loss and its variants. However, they did not provide a theoretical convergence analysis of the SGD algorithm for the proposed formulation. Besides, applying the framework to regression tasks is nontrivial due to the potentially infinite support set of the label. In this paper, we investigate the regression under the mutual information based supervised learning framework. We first argue that the MSE minimization approach is equivalent to a conditional entropy learning problem, and then propose a mutual information learning formulation for solving regression problems by using a reparameterization technique. For the proposed formulation, we give the convergence analysis of the SGD algorithm for solving it in practice. Finally, we consider a multi-output regression data model where we derive the generalization performance lower bound in terms of the mutual information associated with the underlying data distribution. The result shows that the high dimensionality can be a bless instead of a curse, which is controlled by a threshold. We hope our work will serve as a good starting point for further research on the mutual information based regression.

下载PDF全文

下载文献需遵守相关版权规定

论文标题