论文标题
模仿然后超越:使用双窗口DENOISE PPO多代理最佳执行
Imitate then Transcend: Multi-Agent Optimal Execution with Dual-Window Denoise PPO
论文作者
论文摘要
提出了一个新颖的框架,用于使用模仿的增强学习(RL)解决最佳执行和放置问题。从建议的框架训练的RL代理商在执行成本中始终超过了行业基准计时加权平均价格(TWAP)策略,并在样本外交易日期和股票方面表现出了巨大的概括。从三个方面实现了令人印象深刻的表现。首先,我们称为双窗口的RL网络体系结构Denoise PPO在嘈杂的市场环境中启用了有效的学习。其次,设计了模仿学习的奖励计划,并研究了一组全面的市场功能。第三,我们的灵活动作公式使RL代理可以共同解决最佳执行和放置,从而比单独解决个体问题更好。 RL代理的性能在我们的多代理现实历史限制订单模拟器中进行了评估,其中价格影响准确地评估了。此外,还进行了消融研究,证实了我们框架的优越性。
A novel framework for solving the optimal execution and placement problems using reinforcement learning (RL) with imitation was proposed. The RL agents trained from the proposed framework consistently outperformed the industry benchmark time-weighted average price (TWAP) strategy in execution cost and showed great generalization across out-of-sample trading dates and tickers. The impressive performance was achieved from three aspects. First, our RL network architecture called Dual-window Denoise PPO enabled efficient learning in a noisy market environment. Second, a reward scheme with imitation learning was designed, and a comprehensive set of market features was studied. Third, our flexible action formulation allowed the RL agent to tackle optimal execution and placement collectively resulting in better performance than solving individual problems separately. The RL agent's performance was evaluated in our multi-agent realistic historical limit order book simulator in which price impact was accurately assessed. In addition, ablation studies were also performed, confirming the superiority of our framework.