Tamil Text Extraction

(1)

Tamil Text Extraction

S.T.Deepa

Computer Science dept. Shri Shankarlal Sundarbai Shasun Jain College for Women Chennai, India

deepatheodore@gmail.com

Dr.S.P.Victor

Computer Science dept. St.Xavier’s College Palayamkottai, India

victorsp@rediffmail.com

Abstract— Extraction of text and caption from images and videos is important and in great demand for video retrieval, annotation, indexing and content analysis. In this paper, we propose a text extraction algorithm using Dual Tree Complex transform. The images with Tamil text are considered for extraction. It is demonstrated that the proposed method achieved expected accuracy of the text extraction for all the examples. The text extracted can be binarized and sent to text to audio software which can be used for visually challenged persons.

Keywords-component: Tamil text, Scene text extraction, morphological dilation, binarization I. INTRODUCTION

Digital cameras have now become very popular and it is often attached with various hand held devices like mobile phones, PDAs, watches etc. Manufacturers of these devices are now a days looking for embedding various useful technologies into such devices. Prospective technologies include recognition of texts in scene images, text to speech conversion etc. Extraction and recognition of text in images of natural scenes are useful for blind and foreigners with language barrier.

A novel and effective scheme based on analysis of connected components for extraction of Devanagari and Bangla texts from camera captured scene images is proposed [1]. A common unique feature of these two scripts is the presence of headline and the proposed scheme uses mathematical morphology operations for their extraction. Moreover, studied the problem of binarization of such scene images and observed that there are situations when repeated binarization by a well-known global thresholding approach is effective.

An efficient algorithm which can automatically detect, localize and extract Kannada text from images with complex backgrounds is present in [2]. The approach is based on the application of the color reduction technique, a standard deviation base method for edge detection and localization of text using new connected component properties is done. The system contains textual information extraction, optical character recognition and speech synthesis.

The increasing availability of high performance, low priced, portable digital imaging devices has created a tremendous opportunity for supplementing traditional scanning for document image acquisition. A survey of application domains, technical challenges and solution for the analysis of documents by digital cameras [3]. Document analysis from a single camera captured image as well as multiple frames is discussed.

A near real-time text tracking system capable of detecting and tracking text on outdoor shop signs or indoor notices is present in [4]. The method is based on two main stages: Candidate text region detection and text region tracking. Trackers are dynamically created when new text entity is detected; hey follow the text frame to frame, and they get removed when the text cannot be detected anymore.

A new algorithm for detecting text in images and video frames is present in [5]. The coarse to fins detection framework makes the calculation faster compared with previous text detection method in uncompressed with previous text method in uncompressed domain. The high detection speed will benefit the real applications like web content analysis, real time video content analysis.

There is a substantial interesting retrieving images from a large database using textual information contained in the images. A method for extracting text [6] in a color image is described. The approach is based on certain heuristics and the algorithm can therefore not be expected to find text in all possible situations. The algorithm is relatively robust to variations in font, color and size in text,

A text detection based on sparse representation is present in [7]. The edges are extracted using Canny edge detector and then group these edge points into connected components. Connected components correspond to long lines are removed. Pixel level and connected component level labeling is done. At connected component level labeling, we label each connected component as text if more than 25 percent of its points have been labeled as text at pixel labeling stage. Low threshold is chosen to avoid missing true text components.

S.T.Deepa et al. / International Journal of Engineering Science and Technology (IJEST)

(2)

A new method to extract multilingual text in images through discriminating characters from non characters based on the Gaussian mixture modeling of neighbor characters [8] is presented. The image is binarised and the morphological closing operations performed on the binary image in order that the character in it can be treated as connected components. The neighborhood of connected components is computed based on the Voronoi partition of the image, and each component is labeled as character or non character according to its neighbors.

A robust system is presented [9] to automatically detect and extract text in images from different sources, including video, newspapers, advertisements, photographs and cheques. Text is first detected using multiscale texture segmentation and spatial cohesion constraints then cleaned up and extracted using a histogram based binarization algorithm.

A multiple CAMShift algorithm on a text probability image produced by a multi layer perception is presented [10]. The text detection process is performed after generating a mosaic image in a fast and robust image registration method. It can be distinguished from other approaches by the following features texture classifier for generating TPIs can be automatically constructed using MLP and by adopting multiCAMShift. We do not need to analyze the texture properties of an entire image.

A hybrid wavelet/neural network based method is presented for detecting and tracking text in digital video automatically [11]. The tracking module uses SSD based image matching to find an initial position followed by contour based stabilization to refine the matched position. The system can detect graphical text and scene text with different font sizes and can track text that undergoes complex motions.

A robust text localization approach which can automatically detect the horizontally aligned text with different sizes, fonts, color and languages [12]. Is presented. First a wavelet transform is applied to the image and distribution of high frequency wavelet coefficients is considered to statistically characterize text and non text areas. Then the K-means algorithm is used to classify text areas in the image.

The system reads the text encountered in natural scenes with the aim to provide assistance to the visually impaired persons [13]. The system tries to find in the image, areas with small characters. Then it zooms into the found area to retake higher resolution images necessary for character recognition. The character extraction is based on connected components.

A novel method for text detection in color scene images. Text detection is done using unsupervised clustering of multichannel features [14]. Combined clustering can reduce incorrect classification significantly. Color images are decomposed into RGB channel images. 2D wavelet transform is applied to each decomposed image. Feature vector is calculated. Unsupervised pixel block classification with the k-means algorithm in combined feature vector space is done. The three channels are integrated using logical OR.

Although the Discrete Wavelet Transform (DWT) in its maximally decimated form (Mallats dyadic filter tree) has established an impressive reputation as a tool for image compression its use for other signal analysis and reconstruction tasks such as image restoration and enhancement has been hampered by two main disadvantages:

 Lack of shift invariance.

 Poor directional selectivity for diagonal features

We introduce the Dual-Tree Complex Wavelet Transform (DT CWT) with the following properties:

 Approximate shift invariance.

 Good selectivity and directionality in 2-dimensions with Gabor-like filters.

 Perfect reconstruction (PR) using short linear phase filters.

 Limited redundancy, independent of the number of scales

 Efficient order-N computation- only 2m times the simple DWT for m-D.

We propose the DT CWT as a useful front-end for many multi-dimensional signal analysis and reconstruction tasks and demonstrate this with simple examples of edge enhancement and denoising. Our work with complex wavelets for motion estimation showed that complex wavelets could provide approximate shift invariance and good directionality.

EXTENSION TO TWO DIMENSIONS

Extension to 2-D is achieved by separable filtering along columns and then rows. However, if column and row filters both suppress negative frequencies then only the first quadrant of the 2-D signal spectrum is retained. Two adjacent quadrants of the spectrum are required to represent fully a real 2-D signal so we also filter with complex conjugates of the row filters. This gives 4:1 redundancy in the transformed 2-D signal. A normal 2-D DWT produces three bandpass sub images at each level, corresponding to low-high, high-high and high-low filtering. Our 2-D CWT produces three sub images in each of spectral quadrants 1 and 2 giving six bandpass sub images of complex coefficients at each level which are strongly oriented at angles of ±15º, ±45º, ±75º.

(3)

Figure 1. Flow Chart of the approach proposed

II. METHODOLOGY

The methodology proposed describes the text extraction using dual tree complex wavelet transform. Applying the Dual tree complex wavelet in the level of 5 into the input text image, we get the higher and lower subbands, in all the subbands detecting the edges. Edges are detected using Canny edge detection method as there are many curvature in the Tamil characters. Farras filters are used for orthogonal2D channel perfect reconstruction.

A. Text Region Localization: Morphological Dilation:

Here Detected edges of all subbands are dilated .The basic effect of the dilation operator on a binary image is to gradually enlarge the boundaries of regions of foreground pixels (i.e. white pixels, typically) by adding pixels to the boundaries of the objects in an image. Thus areas of foreground pixels grow in size while holes within those regions become smaller. The dilation operator takes two pieces of data as inputs. The first is the image which is to be dilated. The second is a (usually small) set of coordinate points known as a structuring element (also known as a kernel). It is this structuring element that determines the precise effect of the dilation on the input image. Various structuring elements have been experimented to find the suitable one & dilated images are produced using a disk shaped structuring element of 6 pixels radius. Here Dilation is performed to enlarge or group the identified text regions.

B. Text Region Extraction:

All subband edges after dilation are combined with addition followed by AND operation which forms the text region as in eqn.1 below. The output of this logical operation is 1 only when all the inputs are 1. Then it is mapped to the original image to get text regions.



i

O

(

W

_B



S

_p

)



IP

(1)

Text extraction includes many applications. One of the applications is sports video mining. Text can be extracted for content analysis and database retrieval. We can view the frames of our interest, say viewing the cricket from our desired over or wicket. In future we will do video mining using text extraction.

Input Document Image

Dual tree CWT level 5

Edge detection using canny method

Morphological dilation

Logical AND Operation

Text Extracted Image

(4)

III. EXPERIMENTALRESULTS

The performance of the proposed system is evaluated in this section. We first extract the lower and higher coefficients from the input text image. Then detect the edges for the all coefficients, in that edge detect image applying the morphological dilation image, this is for text localization. Then extract the text for applying the Logical AND operation and superimpose with the original image. Finally we get the text extracted image. The images are shown

Figure 2 ; Original image Figure 3: Image after text localization Figure 4: Image with text only

IV. CONCLUSION

This paper presents a novel approach on text extraction. The method uses Dual Tree complex wavelet transform, morphological dilation and logical operation. It is robust against various conditions such as shadows, degradations, non-uniform illuminations, highlights, specular reflections, different font style and size and low contract images. The experiment result showed that the proposed method reasonably extract text regions with eliminating most non-text regions well. This is can be further binarized and used by visually challenged persons through text to audio conversion software.

REFERENCES

[1] U. Bhattacharya, S. K. Parui and S. Mondal, “Devanagari and Bangla Text Extraction from Natural Scene Images”, IEEE 10th International Conference on Document Analysis and Recognition, 2009.

[2] Keshava Prasanna, P.Ramakhanth Kumar, M.Thungamani, Manohar Koli, “Kannada text extraction from images and videos for vision impaired persons”, International Journal of Advances in Engineering and technology, 2011.

[3] .Liang, D.Doermann, H.Li, “Camera based analysis of text and documents: a survey”, Int.Journ.on Doc.Anal and Recog. (IJDAR) vol.7, pp 84 – 104, 2005

[4] C.Merino, M.Mirmehdi, “A framework towards real time detection and tracking of text”, 2nd _{Int.Workshop on camera-Based}

Doc.Anal and Recog., pp 10 – 17, 2007.

[5] Q.Ye, Q.Huang, W.Gao, D.Zhao, “Fast and robust text detection in images and video frames”, Image and Vision computing, 23, pp.565 – 576, 2005.

[6] Y.Zhong, K.Karu, A.K.Jain, “Locating text in complex color images”, 3rd_{International Conference on Document Analysis and}

Recognition, vol 1, 1995, pp 146 – 149.

[7] W.Pan, T.D.Bui, C.Y.Suen, “Text Detection from scene images Using Sparse Representation”, Proc. Of the 19th_{International}

Conference on Pattern Recognition, 2008.

[8] X.Liu, H.Fu, Y.Jia, “Gaussian mixture modeling and learning of neighbouring characters for multilingual text extraction in images”, Pattern Recognition, vol 41, pp 484 – 493, 2008.

[9] V.Wu, R.Mammatha, E.M.Risemann, “Text Finder: an automatic system to detect and recognize text in images, IEEE Transactions on PAMI, vol.21, pp.1224 – 1228, 1999.

[10] K.Jung, K.I.Kim, T.Kurata, M.Koutogi, J.H.Han, “Text scanner with text detection technology on Image Sequences”, Proceedings of 16th_{International Conference on Pattern Recognition(ICPR), vol.3, 2002, pp 473 – 476.}

[11] H.Li, D.Doermann, O.Kia, “Automatic text detection and tracking in digital video”, IEEE Trans. Image Processing, vol 9, no.1, pp.147 – 167, 2000.

[12] J.Gllavata, R.Ewerth, B.Freisleben, “Text Detection in images Based on Unsupervised Classification of High Frequency Wavelet Coefficients”, Proc. Of 17th_{Int.Conf. on Pattern Recognition (ICPR), vol. 1, 1004, pp. 425 - 428.}

[13] N. Ezaki, M.Bulacu, L.Schomaker, “Text detection from natural scene images” towards a system for visually impaired persons”, Proc. Of 17th_{Int. Conf. on Pattern Recognition, vol. II, pp. 683 – 686, 2004.}

[14] T. Saoi, H.Goto, H.Kobayashi, “Text Detection in Color Scene Images based on Unsupervised Clustering of Multichanel Wavelet Features”, Proc. Of 8th_{Int.Conf. on Doc.Anal. and Recog (ICDAR), pp 690 – 694, 2005.}