论文标题
ASTOCK:基于特定于股票的新闻分析模型的新数据集和自动股票交易
Astock: A New Dataset and Automated Stock Trading based on Stock-specific News Analyzing Model
论文作者
论文摘要
自然语言处理(NLP)通过分析社交媒体或新闻媒体的文本来证明支持财务决策的巨大潜力。在这项工作中,我们建立了一个平台,可以系统地研究NLP股票自动交易算法。与以前的工作相反,我们的平台具有三个功能:(1)我们为每个特定股票提供财务新闻。 (2)我们为每种股票提供各种股票因素。 (3)我们评估了更多与财务相关的指标的绩效。这样的设计使我们能够在更现实的环境中开发和评估NLP库存自动交易算法。除了设计评估平台和数据集集合外,我们还通过提出一个系统来自动从各种输入信息中学习良好的功能表示形式来做出技术贡献。我们算法的关键是一种称为语义角色标签池(SRLP)的方法,该方法利用语义角色标签(SRL)来创建每个新闻段的紧凑表示。基于SRLP,我们进一步纳入了其他股票因素以进行最终预测。此外,我们提出了一种基于SRLP的自我监督的学习策略,以提高系统的分布概括性能。通过我们的实验研究,我们表明,所提出的方法可实现更好的性能,并胜过所有基本线的年度回报率,以及CSI300指数和XIN9指数的最大降低。我们的Astock数据集和代码可在https://github.com/jinanzou/astock上找到。
Natural Language Processing(NLP) demonstrates a great potential to support financial decision-making by analyzing the text from social media or news outlets. In this work, we build a platform to study the NLP-aided stock auto-trading algorithms systematically. In contrast to the previous work, our platform is characterized by three features: (1) We provide financial news for each specific stock. (2) We provide various stock factors for each stock. (3) We evaluate performance from more financial-relevant metrics. Such a design allows us to develop and evaluate NLP-aided stock auto-trading algorithms in a more realistic setting. In addition to designing an evaluation platform and dataset collection, we also made a technical contribution by proposing a system to automatically learn a good feature representation from various input information. The key to our algorithm is a method called semantic role labeling Pooling (SRLP), which leverages Semantic Role Labeling (SRL) to create a compact representation of each news paragraph. Based on SRLP, we further incorporate other stock factors to make the final prediction. In addition, we propose a self-supervised learning strategy based on SRLP to enhance the out-of-distribution generalization performance of our system. Through our experimental study, we show that the proposed method achieves better performance and outperforms all the baselines' annualized rate of return as well as the maximum drawdown of the CSI300 index and XIN9 index on real trading. Our Astock dataset and code are available at https://github.com/JinanZou/Astock.