0% found this document useful (0 votes)
70 views17 pages

Text Detection

The document summarizes a proposed text detection algorithm. It begins with an abstract describing the challenges of text detection under complex backgrounds and variations in font, style, language and orientation. It then discusses existing methods and their limitations in being slow, detecting non-text regions, and requiring hand-tuned parameters. The proposed method aims to address these issues. It uses edge-enhanced Maximally Stable Extremal Regions (MSER) as letter candidates and improves accuracy by filtering regions based on aspect ratio. The objectives are to evaluate performance on standard datasets and the problem is robust text detection from camera images.

Uploaded by

Krishna Shetty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views17 pages

Text Detection

The document summarizes a proposed text detection algorithm. It begins with an abstract describing the challenges of text detection under complex backgrounds and variations in font, style, language and orientation. It then discusses existing methods and their limitations in being slow, detecting non-text regions, and requiring hand-tuned parameters. The proposed method aims to address these issues. It uses edge-enhanced Maximally Stable Extremal Regions (MSER) as letter candidates and improves accuracy by filtering regions based on aspect ratio. The objectives are to evaluate performance on standard datasets and the problem is robust text detection from camera images.

Uploaded by

Krishna Shetty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 17

PRESENTATION BY,

ANANTH SHETTY K R
USN: 4JC13LIE02
Under the Guidance of,
Mr. Shreekanth. T
Assistant Professor
1

ABSTRACT
A Text is very much interesting prospect which provides clues depending on the context for the

object that appears inside an image.

Detection of Text under complex background has become a challenging task due to variations in font

and style, languages and orientation.

The proposed method can handle documents having widely varying text sizes unlike other existing

local binarization methods.

Existing methods for scene text detection tend to be slow as the image has to be processed in multiple

scales.

The algorithm such as MSER algorithm detects a large number of non-characters and rule based

method generally require hand tuned parameters, which is time consuming and error prone.

The clustering based method shows good performance but it is complicated by incorporating a

second stage processing after minimum spanning tree clustering.

This motivated us to propose a robust and accurate scene text detection method. The proposed

method aims to eliminate the above complications in a better way.

INTRODUCTION
Digital cameras are compact, easy to use, portable and offer a high-speed non-contact mechanism for image

acquisition.

Its ability to capture non-paper document images like scene text has several potential applications like

licence plate recognition, road sign recognition, digital note taking, document archiving and wearable
computing.

Camera images suffer from uneven lighting, low resolution, blur, and perspective distortion.

Overcoming these challenges will help us effortlessly acquire and manage information in documents.

In document processing systems, a binarization process precedes the analysis and recognition procedures.

The use of two-level information greatly reduces the computational load and the complexity of the analysis

algorithms. It is critical to achieve robust binarization since any error introduced in this stage will affect the
subsequent processing steps.

Text is the most important information in a document.

Here, this project focus on a novel method to binarize camera-captured color document images,

whereby the foreground text is output as black and the background as white irrespective of the
original polarity of foreground-background shades.

A block diagram of a proposed text detection algorithm is as shown in Figure.1.

Figure.1: Block Diagram of a proposed text Detection


Algorithm.
5

LITERATURE REVIEW
One of the binarization method is the global thresholding technique that uses a single threshold

to classify image pixels into foreground or background classes. Global thresholding techniques
are generally based on histogram analysis [4, 6]. It works well for images with well separated
foreground and background intensities.

On the other hand, local methods use a dynamic threshold across the image according to the

local information. These approaches are generally window-based and the local threshold for a
pixel is computed from the gray values of the pixels within a window centered at that particular
pixel.

10

MOTIVATION
The above mentioned methods in the literature have some flaws

and that can be overcome in this project by employing edgeenhanced Maximally Stable Extremal Regions as basic letter
candidates.
A disadvantage of the MSER is that it detects a lot of false positives

-- regions that do not contain characters.


To solve this problem, the algorithm proposed improves accuracy

of finding character regions.


The main idea is to eliminate regions with very small or very big

aspect ratio.

11

OBJECTIVES
To validate the performance of our proposed system, we use the metrics

defined in [15] and run our algorithm on the ICDAR competition dataset.
The text detecting performance is evaluated by calculating the precision and

recall rates and comparing with other methods.


The performance of our text detection algorithm is evaluated by checking

the correctly detected bounding boxes around the title text.


We use a stringent criterion and declare a title to be correctly detected only

when all letters within the title are detected.

12

PROBLEM STATEMENT
The text detection stage seeks to detect the presence of text in a given image. CC-based
methods are used to identify all regions in the image.
A geometrical analysis is needed to merge the text components using the spatial
arrangement of the components so as to filter out non-text components and mark the
boundaries of the text regions.
The geometric as well as stroke width information are then applied to perform filtering
and pairing of CCs.
Finally, letters are clustered into lines and additional checks are performed to eliminate
false positives.
Hence it may be considered as a robust approach for text detection using edge-enhanced
MSER letters as basic letter candidates.

13

METHODOLOGY

14

SOFTWARE REQUIREMENTS
The framework is designed in Matlab in 64 bit system 1.8 GHz with Multi

Core processor where different types of images are considered for the
experiment.

The implementation also considers images with single text, multiple text,

and text with different sizes of fonts, text with complex and simple
background, text with different languages, images taken from camera or
mobile.

15

REFERENCES
[1] S. S. Tsai, D. Chen, V. Chandrasekhar, G. Takacs, N. M. Cheung, R. Vedantham, R.

Grzeszczuk, and B. Girod, Mobile

product recognition, in Proc. ACM Multimedia 2010, 2010.


[2] D. Chen, S. S. Tsai, C. H. Hsu, K. Kim, J. P. Singh, and B. Girod, Building book inventories using smartphones, in Proc.

ACM Multimedia, 2010.


[3] G. Takacs, Y. Xiong, R. Grzeszczuk, V. Chandrasekhar, W. Chen, L. Pulli, N. Gelfand, T. Bismpigiannis, and B. Girod,

Outdoors augmented reality on mobile phone using loxel-based visual feature organization, in Proc. ACM Multimedia
Information Retrieval, 2008, pp. 427434.
[4] D. G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, vol. 60,

pp. 91110, 2004.


[5] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, Speeded-up robust features (surf), Computer Vision and Image

Understanding, vol. 110, no. 3, pp. 346 359, 2008.


[6] V. Chandrasekhar, G. Takacs, D. Chen, S. Tsai, R. Grzeszczuk, and B. Girod, CHoG: Compressed histogram of gradients.

a low bit-rate feature descriptor, in CVPR, 2009, pp. 2504 2511.


[7] D. Nister and H. Stewenius, Scalable recognition with a vocabulary tree, in CVPR, 2006, pp. 21612168.
[8] D. M. Chen, S. S. Tsai, V. Chandrasekhar, G. Takacs, R. Vedantham, R. Grzeszczuk, and B.

Girod, Inverted Index


Compression for Scalable Image Matching, in Proc. of IEEE Data Compression Conference (DCC), Snowbird, Utah, March
2010.

16

[9] J. Liang, D. Doermann, and H. P. Li, Camera-based analysis of text and documents: a

survey, IJDAR, vol. 7, no. 2-3, pp. 84104, 2005.


[10] K. Jung, K. I. Kim, and A. K. Jain, Text information extraction in images and video: a
survey, Pattern Recognition, vol. 37, no. 5, pp. 977 997, 2004.
[11] Y. Zhong, H. Zhang, and A. K. Jain, Automatic caption localization in compressed
video, IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 4, pp. 385 392, 2000.
[12] Q. Ye, Q. Huang, W. Gao, and D. Zhao, Fast and robust text detection in images and
video frames, Image Vision Comput., vol. 23, pp. 565576, 2005.
[13] X. Chen and A. L. Yuille, Detecting and reading text in natural scenes, in CVPR, 2004,
vol. 2, pp. II366 II373 Vol.2.
[14] X. Chen and A. L. Yuille, A time-efficient cascade for real-time object detection: With
applications for the visually impaired, in CVPR - Workshops, 2005, p. 28.
[15] S. M. Lucas, ICDAR 2005 text locating competition results, in ICDAR, 2005, pp. 80
84 Vol. 1.
[16] B. Epshtein, E. Ofek, and Y. Wexler, Detecting text in natural scenes with stroke width
transform, in CVPR, 2010, pp. 2963 2970.
[17] P. Shivakumara, T. Q. Phan, and C. L. Tan, A laplacian approach to multi-oriented text
detection in video, IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 2, pp. 412 419, feb.
2011.

17

You might also like