论文标题

Wasserstein Logistic回归具有混合特征

Wasserstein Logistic Regression with Mixed Features

论文作者

Selvi, Aras, Belbasi, Mohammad Reza, Haugh, Martin B, Wiesemann, Wolfram

论文摘要

最近的工作利用了流行的分布强大的优化范式来对抗经典逻辑回归中的过度拟合。尽管所得的分类方案在数值实验中显示出有希望的性能,但它本质地限于数值特征。在本文中,我们表明,尽管构成指数尺寸的优化问题,但具有混合(即数值和分类)特征的分布稳健的逻辑回归,但仍承认了多项式时间解决方案方案。随后,我们开发了一种实际有效的列和约束方法,该方法将问题作为一系列多项式溶解指数圆锥程序的序列解决。我们的模型保留了以前作品的许多理论特征,但是与文献相反,它不承认等效表示作为正规的逻辑回归,也就是说,它代表了真正新颖的逻辑回归变体。我们表明,我们的方法优于分类和混合基准实例的未注册和正规逻辑回归。

Recent work has leveraged the popular distributionally robust optimization paradigm to combat overfitting in classical logistic regression. While the resulting classification scheme displays a promising performance in numerical experiments, it is inherently limited to numerical features. In this paper, we show that distributionally robust logistic regression with mixed (i.e., numerical and categorical) features, despite amounting to an optimization problem of exponential size, admits a polynomial-time solution scheme. We subsequently develop a practically efficient column-and-constraint approach that solves the problem as a sequence of polynomial-time solvable exponential conic programs. Our model retains many of the desirable theoretical features of previous works, but -- in contrast to the literature -- it does not admit an equivalent representation as a regularized logistic regression, that is, it represents a genuinely novel variant of logistic regression. We show that our method outperforms both the unregularized and the regularized logistic regression on categorical as well as mixed-feature benchmark instances.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源