通过基于模型的增强学习和抵抗支持相对强度的股票交易优化

论文标题

通过基于模型的增强学习和抵抗支持相对强度的股票交易优化

Stock Trading Optimization through Model-based Reinforcement Learning with Resistance Support Relative Strength

论文作者

Huang, Huifang, Gao, Ting, Gui, Yi, Guo, Jin, Zhang, Peng

论文摘要

随着代理环境互动框架与许多业务问题的决策过程保持一致，加强学习（RL）正在引起越来越多的定量融资研究人员的关注。当前使用RL算法的大多数财务应用都基于无模型方法，该方法仍然面临稳定性和适应性挑战。随着许多基于模型的强化学习（MBRL）算法在诸如视频游戏或机器人技术等应用中成熟的算法，我们设计了一种新方法，该方法将阻力和支持（RS）水平作为MBRL中的行动正则化术语，以提高算法的效率和稳定性。从实验结果中，我们可以将RS水平视为一种市场时机技术，可以根据各种测量结果提高纯MBRL模型的性能，并以较小的风险获得更好的利润增益。此外，我们提出的方法甚至在19009年大流行时期抗拒了大量下降（最大降低），当时金融市场发生了不可预测的危机。关于为什么控制电阻和支撑水平的控制也可以通过数值实验（例如，参与者批评网络的丢失以及过渡动力学模型的预测误差）来提高MBRL的解释。它表明，RS指标确实有助于MBRL算法在早期阶段更快地收敛，并随着训练发作的增加而获得较小的批评。

Reinforcement learning (RL) is gaining attention by more and more researchers in quantitative finance as the agent-environment interaction framework is aligned with decision making process in many business problems. Most of the current financial applications using RL algorithms are based on model-free method, which still faces stability and adaptivity challenges. As lots of cutting-edge model-based reinforcement learning (MBRL) algorithms mature in applications such as video games or robotics, we design a new approach that leverages resistance and support (RS) level as regularization terms for action in MBRL, to improve the algorithm's efficiency and stability. From the experiment results, we can see RS level, as a market timing technique, enhances the performance of pure MBRL models in terms of various measurements and obtains better profit gain with less riskiness. Besides, our proposed method even resists big drop (less maximum drawdown) during COVID-19 pandemic period when the financial market got unpredictable crisis. Explanations on why control of resistance and support level can boost MBRL is also investigated through numerical experiments, such as loss of actor-critic network and prediction error of the transition dynamical model. It shows that RS indicators indeed help the MBRL algorithms to converge faster at early stage and obtain smaller critic loss as training episodes increase.

下载PDF全文

下载文献需遵守相关版权规定

论文标题