论文标题
Txtract:成千上万的产品类别的分类学知识提取
TXtract: Taxonomy-Aware Knowledge Extraction for Thousands of Product Categories
论文作者
论文摘要
从产品概况中提取结构化知识对于电子商务中的各种应用至关重要。最先进的知识提取方法都是为单一类别的产品设计的,因此不适用于现实生活中的电子商务场景,这些场景通常包含数千种不同类别。本文提出了TXTRACT,这是一种分类学知识提取模型,适用于在层次分类法中组织的数千种产品类别。通过类别有条件的自我注意力和多任务学习,我们的方法既可以扩展,因为它可以训练成千上万类且有效的单个模型,因为它提取了特定于类别的属性值。从4,000个类别的分类法进行的产品实验表明,Txtract在F1中的最先进方法最高可达10%,而在所有类别中的覆盖范围中,覆盖范围为15%。
Extracting structured knowledge from product profiles is crucial for various applications in e-Commerce. State-of-the-art approaches for knowledge extraction were each designed for a single category of product, and thus do not apply to real-life e-Commerce scenarios, which often contain thousands of diverse categories. This paper proposes TXtract, a taxonomy-aware knowledge extraction model that applies to thousands of product categories organized in a hierarchical taxonomy. Through category conditional self-attention and multi-task learning, our approach is both scalable, as it trains a single model for thousands of categories, and effective, as it extracts category-specific attribute values. Experiments on products from a taxonomy with 4,000 categories show that TXtract outperforms state-of-the-art approaches by up to 10% in F1 and 15% in coverage across all categories.