论文标题
用半马尔科夫有条件的随机场进行键形拔出的持续时间建模
Duration modeling with semi-Markov Conditional Random Fields for keyphrase extraction
论文作者
论文摘要
键形提取的现有方法需要预处理以生成候选短语或后处理,以将关键字转换为键形。在本文中,我们提出了一种新的方法,称为持续时间建模,使用半马尔科夫条件随机场(DM-SMCRF)进行钥匙形提取。首先,基于半马尔可夫链的属性,DM-SMCRF可以编码段级特征,并将句子中的短语顺序分类为键形或非键短语。其次,通过假设状态过渡和状态持续时间之间的独立性,DM-SMCRFS建模了键形键的持续时间(长度)的分布,以进一步探索状态持续时间信息,这可以帮助识别键形酶的大小。基于从持续时间分布得出的参数持续时间特征的凸度,得出了约束的Viterbi算法以提高DM-SMCRF中解码的性能。我们彻底评估了来自各个域的数据集上DM-SMCRF的性能。实验结果证明了提出的模型的有效性。
Existing methods for keyphrase extraction need preprocessing to generate candidate phrase or post-processing to transform keyword into keyphrase. In this paper, we propose a novel approach called duration modeling with semi-Markov Conditional Random Fields (DM-SMCRFs) for keyphrase extraction. First of all, based on the property of semi-Markov chain, DM-SMCRFs can encode segment-level features and sequentially classify the phrase in the sentence as keyphrase or non-keyphrase. Second, by assuming the independence between state transition and state duration, DM-SMCRFs model the distribution of duration (length) of keyphrases to further explore state duration information, which can help identify the size of keyphrase. Based on the convexity of parametric duration feature derived from duration distribution, a constrained Viterbi algorithm is derived to improve the performance of decoding in DM-SMCRFs. We thoroughly evaluate the performance of DM-SMCRFs on the datasets from various domains. The experimental results demonstrate the effectiveness of proposed model.