Text Detection
Text Detection
ANANTH SHETTY K R
USN: 4JC13LIE02
Under the Guidance of,
Mr. Shreekanth. T
Assistant Professor
1
ABSTRACT
A Text is very much interesting prospect which provides clues depending on the context for the
Detection of Text under complex background has become a challenging task due to variations in font
The proposed method can handle documents having widely varying text sizes unlike other existing
Existing methods for scene text detection tend to be slow as the image has to be processed in multiple
scales.
The algorithm such as MSER algorithm detects a large number of non-characters and rule based
method generally require hand tuned parameters, which is time consuming and error prone.
The clustering based method shows good performance but it is complicated by incorporating a
This motivated us to propose a robust and accurate scene text detection method. The proposed
INTRODUCTION
Digital cameras are compact, easy to use, portable and offer a high-speed non-contact mechanism for image
acquisition.
Its ability to capture non-paper document images like scene text has several potential applications like
licence plate recognition, road sign recognition, digital note taking, document archiving and wearable
computing.
Camera images suffer from uneven lighting, low resolution, blur, and perspective distortion.
Overcoming these challenges will help us effortlessly acquire and manage information in documents.
In document processing systems, a binarization process precedes the analysis and recognition procedures.
The use of two-level information greatly reduces the computational load and the complexity of the analysis
algorithms. It is critical to achieve robust binarization since any error introduced in this stage will affect the
subsequent processing steps.
Here, this project focus on a novel method to binarize camera-captured color document images,
whereby the foreground text is output as black and the background as white irrespective of the
original polarity of foreground-background shades.
LITERATURE REVIEW
One of the binarization method is the global thresholding technique that uses a single threshold
to classify image pixels into foreground or background classes. Global thresholding techniques
are generally based on histogram analysis [4, 6]. It works well for images with well separated
foreground and background intensities.
On the other hand, local methods use a dynamic threshold across the image according to the
local information. These approaches are generally window-based and the local threshold for a
pixel is computed from the gray values of the pixels within a window centered at that particular
pixel.
10
MOTIVATION
The above mentioned methods in the literature have some flaws
and that can be overcome in this project by employing edgeenhanced Maximally Stable Extremal Regions as basic letter
candidates.
A disadvantage of the MSER is that it detects a lot of false positives
aspect ratio.
11
OBJECTIVES
To validate the performance of our proposed system, we use the metrics
defined in [15] and run our algorithm on the ICDAR competition dataset.
The text detecting performance is evaluated by calculating the precision and
12
PROBLEM STATEMENT
The text detection stage seeks to detect the presence of text in a given image. CC-based
methods are used to identify all regions in the image.
A geometrical analysis is needed to merge the text components using the spatial
arrangement of the components so as to filter out non-text components and mark the
boundaries of the text regions.
The geometric as well as stroke width information are then applied to perform filtering
and pairing of CCs.
Finally, letters are clustered into lines and additional checks are performed to eliminate
false positives.
Hence it may be considered as a robust approach for text detection using edge-enhanced
MSER letters as basic letter candidates.
13
METHODOLOGY
14
SOFTWARE REQUIREMENTS
The framework is designed in Matlab in 64 bit system 1.8 GHz with Multi
Core processor where different types of images are considered for the
experiment.
The implementation also considers images with single text, multiple text,
and text with different sizes of fonts, text with complex and simple
background, text with different languages, images taken from camera or
mobile.
15
REFERENCES
[1] S. S. Tsai, D. Chen, V. Chandrasekhar, G. Takacs, N. M. Cheung, R. Vedantham, R.
Outdoors augmented reality on mobile phone using loxel-based visual feature organization, in Proc. ACM Multimedia
Information Retrieval, 2008, pp. 427434.
[4] D. G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, vol. 60,
16
[9] J. Liang, D. Doermann, and H. P. Li, Camera-based analysis of text and documents: a
17