HKR用于哈萨克和俄罗斯数据库

论文标题

HKR用于哈萨克和俄罗斯数据库

HKR For Handwritten Kazakh & Russian Database

论文作者

Nurseitov, Daniyar, Bostanbekov, Kairat, Kurmankhojayev, Daniyar, Alimova, Anel, Abdallah, Abdelrahman

论文摘要

在本文中，我们介绍了一个新的俄罗斯和哈萨克数据库（分别为俄罗斯人的95％，分别占哈萨克语单词/句子的5％），以进行离线手写识别。与数据库一起开发了一些预处理和细分程序。该数据库以西里尔语编写，并共享相同的33个字符。除这些字符外，哈萨克字母还包含9个其他特定字符。该数据集是表单的集合。数据集中所有表格的来源都是由\ latex生成的，\ latex随后被人的笔迹填写。该数据库由1400多个填充表格组成。大约有63000个句子，大约200个不同的作家产生的符号超过715699。它可以通过使用深入和机器学习来为手写识别任务领域的研究人员提供服务。

In this paper, we present a new Russian and Kazakh database (with about 95% of Russian and 5% of Kazakh words/sentences respectively) for offline handwriting recognition. A few pre-processing and segmentation procedures have been developed together with the database. The database is written in Cyrillic and shares the same 33 characters. Besides these characters, the Kazakh alphabet also contains 9 additional specific characters. This dataset is a collection of forms. The sources of all the forms in the datasets were generated by \LaTeX which subsequently was filled out by persons with their handwriting. The database consists of more than 1400 filled forms. There are approximately 63000 sentences, more than 715699 symbols produced by approximately 200 different writers. It can serve researchers in the field of handwriting recognition tasks by using deep and machine learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题