最佳招标策略而无需实时招标

论文标题

最佳招标策略而无需实时招标

Optimal Bidding Strategy without Exploration in Real-time Bidding

论文作者

Ghosh, Aritra, Mitra, Saayan, Sarkhel, Somdeb, Swaminathan, Viswanathan

论文摘要

通过预算限制来最大化实用程序是广告商实时投标（RTB）系统的主要目标。最大化公用事业的策略称为最佳招标策略。较早的最佳招标策略的工作应用基于模型的批处理增强学习方法，该方法无法推广到未知的预算和时间限制。此外，广告商观察到了审查的市场价格，这使得直接评估在批处理测试数据集上不可行。以前的作品忽略了失败的拍卖来减轻审查国家的困难。因此，显着修改了测试分布。我们解决了缺乏明确评估程序以及通过RTB系统中批处理学习方法传播的错误的挑战。我们在顺序招标过程中利用两个条件独立性结构，使我们能够使用最大熵原理提出一个新颖的实用框架，以模仿在实时流量中观察到的真实分布的行为。此外，该框架使我们能够训练一个可以推广到看不见的预算条件的模型，而不是仅限于历史上观察到的预算条件。我们将两个现实世界RTB数据集的方法与几个基线进行比较，并在各种预算设置下证明了性能大大提高。

Maximizing utility with a budget constraint is the primary goal for advertisers in real-time bidding (RTB) systems. The policy maximizing the utility is referred to as the optimal bidding strategy. Earlier works on optimal bidding strategy apply model-based batch reinforcement learning methods which can not generalize to unknown budget and time constraint. Further, the advertiser observes a censored market price which makes direct evaluation infeasible on batch test datasets. Previous works ignore the losing auctions to alleviate the difficulty with censored states; thus significantly modifying the test distribution. We address the challenge of lacking a clear evaluation procedure as well as the error propagated through batch reinforcement learning methods in RTB systems. We exploit two conditional independence structures in the sequential bidding process that allow us to propose a novel practical framework using the maximum entropy principle to imitate the behavior of the true distribution observed in real-time traffic. Moreover, the framework allows us to train a model that can generalize to the unseen budget conditions than limit only to those observed in history. We compare our methods on two real-world RTB datasets with several baselines and demonstrate significantly improved performance under various budget settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题