论文标题

使用距离转换的无约束手写的孟加拉文档图像的单词分割

Word Segmentation from Unconstrained Handwritten Bangla Document Images using Distance Transform

论文作者

Singh, Pawan Kumar, Sinha, Shubham, Chowdhury, Sagnik Pal, Sarkar, Ram, Nasipuri, Mita

论文摘要

将手写文档图像分割成文本线条和单词是开发完整的光学特征识别(OCR)系统中最重要,最具挑战性的任务之一。本文直接从无约束的孟加拉文档图像中直接介绍了文本单词的自动分割。流行的距离变换(DT)算法用于定位图像单词的外界。该技术没有生成过度分段的单词。采用简单的后处理过程来隔离分段的单词图像(如果有)。该技术对从CMATERDB1.1.1数据库拍摄的50张随机图像进行了测试。以91.88%的分割精度确认了所提出的方法的鲁棒性,可以实现令人满意的结果。

Segmentation of handwritten document images into text lines and words is one of the most significant and challenging tasks in the development of a complete Optical Character Recognition (OCR) system. This paper addresses the automatic segmentation of text words directly from unconstrained Bangla handwritten document images. The popular Distance transform (DT) algorithm is applied for locating the outer boundary of the word images. This technique is free from generating the over-segmented words. A simple post-processing procedure is applied to isolate the under-segmented word images, if any. The proposed technique is tested on 50 random images taken from CMATERdb1.1.1 database. Satisfactory result is achieved with a segmentation accuracy of 91.88% which confirms the robustness of the proposed methodology.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源