Txtract：成千上万的产品类别的分类学知识提取

论文标题

Txtract：成千上万的产品类别的分类学知识提取

TXtract: Taxonomy-Aware Knowledge Extraction for Thousands of Product Categories

论文作者

Karamanolakis, Giannis, Ma, Jun, Dong, Xin Luna

论文摘要

从产品概况中提取结构化知识对于电子商务中的各种应用至关重要。最先进的知识提取方法都是为单一类别的产品设计的，因此不适用于现实生活中的电子商务场景，这些场景通常包含数千种不同类别。本文提出了TXTRACT，这是一种分类学知识提取模型，适用于在层次分类法中组织的数千种产品类别。通过类别有条件的自我注意力和多任务学习，我们的方法既可以扩展，因为它可以训练成千上万类且有效的单个模型，因为它提取了特定于类别的属性值。从4,000个类别的分类法进行的产品实验表明，Txtract在F1中的最先进方法最高可达10％，而在所有类别中的覆盖范围中，覆盖范围为15％。

Extracting structured knowledge from product profiles is crucial for various applications in e-Commerce. State-of-the-art approaches for knowledge extraction were each designed for a single category of product, and thus do not apply to real-life e-Commerce scenarios, which often contain thousands of diverse categories. This paper proposes TXtract, a taxonomy-aware knowledge extraction model that applies to thousands of product categories organized in a hierarchical taxonomy. Through category conditional self-attention and multi-task learning, our approach is both scalable, as it trains a single model for thousands of categories, and effective, as it extracts category-specific attribute values. Experiments on products from a taxonomy with 4,000 categories show that TXtract outperforms state-of-the-art approaches by up to 10% in F1 and 15% in coverage across all categories.

下载PDF全文

下载文献需遵守相关版权规定

论文标题