基于封面的深层多模式网络用于书类型分类

论文标题

基于封面的深层多模式网络用于书类型分类

Deep multi-modal networks for book genre classification based on its cover

论文作者

Kundu, Chandra, Zheng, Lukun

论文摘要

书籍封面通常是其读者的第一印象，他们经常传达有关本书内容的重要信息。考虑到书籍的完整数字化是一项非常昂贵的任务，基于其封面的书类型分类将对许多现代检索系统完全有益。同时，由于以下原因，这也是一项极具挑战性的任务：首先，存在各种各样的书类型，其中许多流派并未具体定义。其次，书籍封面作为图形设计，以许多不同的方式有所不同，例如颜色，样式，文本信息等，甚至对于同一类型的书籍也有所不同。第三，由于许多外部因素，例如国家，文化，目标读者人群等，图书覆盖的设计可能会有所不同。随着图书行业的竞争力的不断增长，封面设计师和版式的覆盖范围将封面设计提高到极限，以期吸引销售。近年来，基于封面的书籍分类系统成为一个特别令人兴奋的研究主题。在本文中，我们提出了一个多模式深度学习框架来解决这个问题。本文的贡献是四倍。首先，我们的方法通过从书籍封面自动提取文本来增加额外的方式。其次，基于图像和基于文本的最新模型将彻底评估，以完成书籍封面分类的任务。第三，我们根据封面上显示的图像和文本开发了一个高效且可销售的多模式框架。第四，对实验结果进行了详尽的分析，并提出了提高绩效的未来工作。结果表明，多模式框架的表现明显优于当前基于图像的最新模型。但是，为了达到令人满意的水平，此分类任务需要更多的努力和资源。

Book covers are usually the very first impression to its readers and they often convey important information about the content of the book. Book genre classification based on its cover would be utterly beneficial to many modern retrieval systems, considering that the complete digitization of books is an extremely expensive task. At the same time, it is also an extremely challenging task due to the following reasons: First, there exists a wide variety of book genres, many of which are not concretely defined. Second, book covers, as graphic designs, vary in many different ways such as colors, styles, textual information, etc, even for books of the same genre. Third, book cover designs may vary due to many external factors such as country, culture, target reader populations, etc. With the growing competitiveness in the book industry, the book cover designers and typographers push the cover designs to its limit in the hope of attracting sales. The cover-based book classification systems become a particularly exciting research topic in recent years. In this paper, we propose a multi-modal deep learning framework to solve this problem. The contribution of this paper is four-fold. First, our method adds an extra modality by extracting texts automatically from the book covers. Second, image-based and text-based, state-of-the-art models are evaluated thoroughly for the task of book cover classification. Third, we develop an efficient and salable multi-modal framework based on the images and texts shown on the covers only. Fourth, a thorough analysis of the experimental results is given and future works to improve the performance is suggested. The results show that the multi-modal framework significantly outperforms the current state-of-the-art image-based models. However, more efforts and resources are needed for this classification task in order to reach a satisfactory level.

下载PDF全文

下载文献需遵守相关版权规定

论文标题