论文标题
对合成对真实适应的手写单词识别的自我训练
Self-Training of Handwritten Word Recognition for Synthetic-to-Real Adaptation
论文作者
论文摘要
手写文本识别(HTR)模型的性能在很大程度上取决于标签和代表性培训样本的可用性。但是,在许多应用程序方案中,标有样品的稀缺或昂贵。在这项工作中,我们提出了一种自我训练的方法,以训练HTR模型仅在合成样品和未标记的数据上进行培训。拟议的培训方案使用对合成数据训练的初始模型,对未标记的目标数据集进行预测。从性能相当差的初始模型开始,我们表明,通过针对预测的伪标记,可以进行相当大的适应性。此外,调查的自我训练策略不需要任何手动注释的培训样本。我们在四个广泛使用的基准数据集上评估了所提出的方法,并显示了以完全监督方式训练的模型来缩小差距的有效性。
Performances of Handwritten Text Recognition (HTR) models are largely determined by the availability of labeled and representative training samples. However, in many application scenarios labeled samples are scarce or costly to obtain. In this work, we propose a self-training approach to train a HTR model solely on synthetic samples and unlabeled data. The proposed training scheme uses an initial model trained on synthetic data to make predictions for the unlabeled target dataset. Starting from this initial model with rather poor performance, we show that a considerable adaptation is possible by training against the predicted pseudo-labels. Moreover, the investigated self-training strategy does not require any manually annotated training samples. We evaluate the proposed method on four widely used benchmark datasets and show its effectiveness on closing the gap to a model trained in a fully-supervised manner.