DEVNAGARI DOCUMENT SEGMENTATION USING
HISTOGRAM APPROACH
V. Dongre, and V. Mankar. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), 1 (3):
46-53(August 2011)
DOI: 10.5121/ijcseit.2011.1305
Abstract
Document segmentation is one of the critical phases in machine recognition of any language. Correct
segmentation of individual symbols decides the accuracy of character recognition technique. It is used to
decompose image of a sequence of characters into sub images of individual symbols by segmenting lines and
words. Devnagari is the most popular script in India. It is used for writing Hindi, Marathi, Sanskrit and
Nepali languages. Moreover, Hindi is the third most popular language in the world. Devnagari documents
consist of vowels, consonants and various modifiers. Hence proper segmentation of Devnagari word is
challenging. A simple histogram based approach to segment Devnagari documents is proposed in this paper.
Various challenges in segmentation of Devnagari script are also discussed.
%0 Journal Article
%1 noauthororeditor
%A Dongre, Vikas J
%A Mankar, Vijay H
%D 2011
%J International Journal of Computer Science, Engineering and Information Technology (IJCSEIT)
%K Character Devnagari Line Machine Recognition Word learning paragraph segmentation
%N 3
%P 46-53
%R 10.5121/ijcseit.2011.1305
%T DEVNAGARI DOCUMENT SEGMENTATION USING
HISTOGRAM APPROACH
%U http://airccse.org/journal/ijcseit/papers/0811ijcseit05.pdf
%V 1
%X Document segmentation is one of the critical phases in machine recognition of any language. Correct
segmentation of individual symbols decides the accuracy of character recognition technique. It is used to
decompose image of a sequence of characters into sub images of individual symbols by segmenting lines and
words. Devnagari is the most popular script in India. It is used for writing Hindi, Marathi, Sanskrit and
Nepali languages. Moreover, Hindi is the third most popular language in the world. Devnagari documents
consist of vowels, consonants and various modifiers. Hence proper segmentation of Devnagari word is
challenging. A simple histogram based approach to segment Devnagari documents is proposed in this paper.
Various challenges in segmentation of Devnagari script are also discussed.
@article{noauthororeditor,
abstract = {Document segmentation is one of the critical phases in machine recognition of any language. Correct
segmentation of individual symbols decides the accuracy of character recognition technique. It is used to
decompose image of a sequence of characters into sub images of individual symbols by segmenting lines and
words. Devnagari is the most popular script in India. It is used for writing Hindi, Marathi, Sanskrit and
Nepali languages. Moreover, Hindi is the third most popular language in the world. Devnagari documents
consist of vowels, consonants and various modifiers. Hence proper segmentation of Devnagari word is
challenging. A simple histogram based approach to segment Devnagari documents is proposed in this paper.
Various challenges in segmentation of Devnagari script are also discussed. },
added-at = {2018-11-23T08:26:43.000+0100},
author = {Dongre, Vikas J and Mankar, Vijay H},
biburl = {https://www.bibsonomy.org/bibtex/2108d9c71ebf4a83e13f48d2fde77cc76/ijcseit},
doi = {10.5121/ijcseit.2011.1305},
interhash = {cd93706baa6f7c6a4f48a1732c9785b5},
intrahash = {108d9c71ebf4a83e13f48d2fde77cc76},
issn = {2231-3117 [Online] ; 2231-3605 [Print]},
journal = {International Journal of Computer Science, Engineering and Information Technology (IJCSEIT)},
keywords = {Character Devnagari Line Machine Recognition Word learning paragraph segmentation},
language = {English},
month = {August},
number = 3,
pages = {46-53},
timestamp = {2018-11-23T08:26:43.000+0100},
title = {DEVNAGARI DOCUMENT SEGMENTATION USING
HISTOGRAM APPROACH},
url = {http://airccse.org/journal/ijcseit/papers/0811ijcseit05.pdf},
volume = 1,
year = 2011
}