An approach to empirical Optical Character recognition paradigm using Multi-Layer Perceptorn Neural Network

June 14, 2017 | Autor: A. Mamun | Categoría: Artificial Intelligence, Machine Learning, Pattern Recognition, Artificial Neural Networks, Multilayer Perceptron, Optical Character Recognition, Backpropagation, Artificial Neural Network, Optical Character Recognition, Backpropagation, Artificial Neural Network

Share Embed

Laporkan tautan ini

Descripción

18th International Conference on Computer and Information Technology (ICCIT), 21-23 December, 2015

An approach to empirical Optical Character recognition paradigm using Multi-Layer Perceptorn Neural Network Md. Abdullah-al-mamun

Tanjina Alam

Dept. of Computer Science and Engineering Rajshahi University of Engineering and Technology Dhaka, Bangladesh [email protected]

Dept. of Computer Science and Engineering Daffodil International University Dhaka, Bangladesh [email protected]

Abstract— In this paper we are represent the architecture of Optical Character Recognition that converting from visual character to the machine readable format. To present this architecture, several stages are associate like take the character input image, preprocessing the image, feature extraction of the image and at last take a decision by the artificial computational model same as biological neuron network. Decision making system by the Artificial Neural Network associated with two steps; first is adapted the artificial neural network throughout the Multi-Layer Perceptron learning algorithm and second is recognition or classification process for the character image to comprehensible for the machine in a way that what character is it. Our proposal architecture achieved 91.53% accuracy to recognize the isolated character image and 80.65% accuracy for the sentential case character image.

fundamental, challenging problem in pattern recognition, and one with numerous useful applications ranging from electronic archival of scanned text to human-machine interfaces [2].

Keywords—Optical Character Recognition; OCR; Multi-Layer Perceptron;MLP; Artificial Neural Network; ANN; Pattern Recognition

Optical character recognizer has been developed for wide range of object-oriented application in the real world. To data entry for business document from image based text, e.g. check passport, bank invoice, statement and receipt; Automatic number plate recognition for house address, vehicle and something like this; Read the business card information; Convert to textual format from a scanning or printed document or book; Converting the handwriting character that is take from real time interaction with computer (like optical pen device that is used to write within the computer) to digitalize text character and this types of wide range of application.

II. OPTICAL CHARACTER RECOGNITION OCR is the contraction for Optical Character Recognition is a technology that allows converting mechanical or electronic image base text into the machine encodes able text through an optical mechanism. It is the research branch of Artificial Intelligence, Pattern Recognition, Machine learning and Computer Vision. The ultimate objective of any OCR system is to simulate the human reading capabilities so the computer can read, understand, edit and do similar activities it does with the text [3].

I. INTRODUCTION Human are capable to detect an object through the eye capture called "optical mechanism". As human brain "see" the object as an input and brain has an ability to understand these signals by the previous learning knowledge that is achieved from the childhood. But to implement the decision ability for a system by machine like human brain, we can use the artificial neural network. Artificial Neural Network (ANN) is a powerful data processes modeling paradigm that is capable to capture and process complex input-output relationship which is inspired by the biological neurons architecture. The motivation for the development of neural network technology stemmed from the desire to develop an artificial system that could perform "intelligent" tasks similar to those performed by the human brain [1].

III. TECHNICAL OVERVIEW The block diagram of the OCR system one stage is associated with other stage in a way that if a efficiency problem is occurred in a stage then it will be affected for the next stage. And that may be caused for the wrong recognition system. First stage of the Optical Character Recognizer system is take an input image that can be any optical capture able hardware device and it can be take an input by the scanner to digitalized for the further processing. Generally, input image is as RGB color image. So it need to the preprocessing to reduce the color space. So to preprocessing the image is associated with the RGB to Gray scale conversion and next Gray scale to Binary image conversion. The resultant binary image contains only two color black (0) and white (1). Next it needs to segment the image to detect the character line and boundary the character.

By the reviewing these function, we can take a challenge to develop a Visual Character Recognition system by the Artificial Neural Network system that can be act like the human brain. It is a common method of digitizing printed texts so that it can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data and text mining [11]. So, Optical Character recognition is a

978-1-4673-9930-2/15/$31.00 ©2015 IEEE

132

a. RGB to Gray scale Conversion The RGB color space contains red, green, blue that are added together in a variety of ways to reproduce a array of color. An image is taken from the input can be represent a 3-D RGB color space. So to reduce the feature color space of an image it need to reduced the color space of an image. We can reduce the color space of an image by converting Gray level of an image. A grayscale or greyscale digital image is an image in which the value of each pixel is a single sample, that is, it carries only intensity information. Images of this sort, also known as black-and-white, are composed exclusively of shades of gray, varying from black at the weakest intensity to white at the strongest [7].

After detect the boundary of a character, it need to matrix mapping of a character that will represent bit value of each image pixel within mxn number of grid. The final stage is learning and classification. In the training phase, network is adapted from the several training set of data with correct class for a given training set [10]. A network will be more adaptive to increase the number of training. To classify an optical character input feature vector that is get from the feature extraction (i.e, image to matrix mapping) is feed to the network and network produce an output feature vector that is corresponding to the character binary bit representation. We are proposed the Optical character recognition model that architecture is as following,

The grayscale image is formed by the RGB color space by combining 29.9% of RED, 58.7% of GREEN and 11.4% of BLUE that can be express is as gray=0.299*red + 0.587*green + 0.114*blue. So the resultant gray scale image has two color spaces range black which pixel value is 0 and white which pixel value is 1.

Figure-3: After converting the image RGB to Gray Scale b. Gray scale to Binary Image Conversion In the digital image processing, a binary image has only two possible color value for each pixel is that black and white. This color depth is 1-bit monochrome. Figure-1: Optical Character Recognizer Model. A. Acquiring Image The preliminary stage of optical character recognition system is performing acquiring the image. In the field of Digital Image Processing System, image is acquisition from any possible source that can be hardware device like camera. After capturing the image through camera it is scanned as a digital image by the scanner. In OCR optical scanners are used, which generally consist of a transport mechanism plus a sensing device that converts light intensity into gray-levels [3]. Here we have capture an image obtained by scanning that is light blue background document photograph at 32 dpi would yield an image of 1241x82 pixel with “Arial” font.

Figure-4: After converting the Gray Scale to Binary image. Consider, the gray scale of an image is define as g(x,y) then each pixel of an gray scale is compare with the threshold value(T) to convert the binary image. If the pixel value of gray scale image is greater than the threshold value (T) then it considers as 0 otherwise it is 1. This can be express is as following,

Figure-2: Take an Input Image. Min=42; Max=205; Mean=179.9; Median=205 From figure-2: Input Image

B. Input Preprocessing The preliminary stage of optical character recognition system is performing acquiring the image. In imaging science, image processing is processing of images using mathematical operations by using any form of signal processing for which the input is an image, such as a photograph or video frame; the output of image processing may be either an image or a set of characteristics or parameters related to the image [5]. There are several process is associated for the input preprocessing depending on the work to achieve like as correction of distortion, noise reduction, normalization, filtering the image and so on.

Min=32; Max=199; Mean=171.9; Median=197 From figure-3: RGB to Gray Scale Image

Min=0; Max=255; Mean=39.3; Median=0 From figure-4: Gray to Binary Image

Figure-5: Histogram Analysis for different stages C. Image Segmentation According to the Computer vision, Image segmentation is the process of partitioning a digital image into multiple segments. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze [8]. Image segmentation

133

bottom is (Line_End, Line_Bottom) and Right top is (Line_End, Line_Top).

is used for object recognition of an image; detect the boundary estimation, image editing or image database look-up. To determine the character within the image it need to segmentation of an image to detecting character line of an individual symbol.

a. Detecting Individual Character Detection of individual symbols involves scanning character lines for orthogonally separable images composed of black pixels [9].

a. Determining Character Line Enumeration of character lines in a character image is essential in delimiting the bounds within which the detection can precede [9]. This process is important to detect the character for the further process.

Figure-7: Boundary detection of a character. Calculating from first character line top left and set the pixel value first X-axis.

Figure-6: Boundary detection of character line. At first introduced earlier to detect the character line boundary of the image, it should be concern that the image character pixel is white and background is black. Consider, the image button-left corner pixel is (0,0). Set, Line_Buttom=Line_Top=Line_Start=Line_End=0. i.

ii.

iii.

iv.

Now scan the length of the image according to the Xaxis to determine the Line_Start pixel value. If pixel line is white for the current line pixel then set Line_Start=Current_pixel_value or else go for the next process. Increment the value of current pixel(i.e, Current_pixel_value) line of X-axis that toward the Yaxis and continue while found a complete pixel line that all contains white pixel. To determine the Line_End, scan the length of the image according to the X-axis. If pixel line is white then set Line_End=Current_pixel_value and stop or else go to the next process. Increment the current pixel line value(i.e, Current_pixel_value) of the image opposite direction of the Y-axis that is respect to the X-axis and continue while end of the vertical length of the image. To determine the Line_Bottom, scan the width of the image with respect to the Y-axis. If pixel line is white for the current pixel line then set Line_Bottom = Current_pixel_value. Increment the current pixel line value(i.e, Current_pixel_value) with respect to the Yaxis that is move toward the X-axis and continue while found pixel line is white. To determine the Line_Top, scan the width of the image with respect to the Y-axis. If pixel line is white then set Line_Top=Current_pixel_value. Increment the current pixel value with respect to the Y-axis that is move opposite direction of the X-axis and continue while end of the image horizontal line.

i.

Start scanning from top of the image with respect to the Y-axis. If any black pixel is found in the pixel line then set the pixel value. Or else continue for the next line.

ii.

Start top of the character and set first pixel (0, character_top) according to the X-axis.

iii.

Scan up to bottom of the image according to the Xaxis. If black pixel found in the pixel line with respect to the X-axis as the left side of the character then set the pixel value. Or else continue for the next pixel. If there is no black pixel is found then increment the pixel value of the pixel line with respect to the X-axis and reset the pixel value of Y-axis to scan the next vertical line.

iv.

Next start scanning at the left symbol at the top of the current line in the pixel value (character_left, line_top).

v.

Scan up to the width of the image with respect to the X-axis. If a black pixel is found then increment the pixel value of X-axis and reset the pixel value of Yaxis to scan the next vertical line.

vi.

Start scanning at the bottom of the current line and left side of the character that pixel value is (character_left, line_bottom).

vii.

Scan up to the right of the character according to the Y-axis. If no black pixels are found decrement the pixel value of Y-axis and reset the pixel value of xaxis to scan the next vertical line.

D. Feature Extraction In the field of pattern recognition it is important to done feature extraction technique after preprocessing system to classify the pattern. The main goal of feature extraction is to extract set of feature to produce the relevant information from

So, the pixel co-ordinate value of left bottom is (Line_Start, Line_Bottom), Left top is (Line_Start, Line_Top), Right

134

the original input set data that can be represent in a lower dimensionality space [10].

Where, the learning factor , error output tolerance activation function

a. Image to Matrix Mapping To implement the feature extraction process we have used Image to matrix mapping process. By the matrix mapping process the character image is converted corresponding two dimensional binary matrixes. Here we have map the character image at 20x30 binary matrixes that contains 600 elements. If all the pixels of the symbol are mapped into the matrix, one would definitely be able to acquire all the distinguishing pixel features of the symbol and minimize overlap with other symbols [9]. It need to more process time if the number of matrix element is increased. So here we need to reasonable tradeoff is needed to minimize the processing time in a way that cannot be significantly affected the recognition of the pattern.

So the first derivation of the activation function is

and the

Before performing the sweeps, the network weights are initialized as a random value and bias of each neuron is assigned +1. Now consider, the following architecture is shown below,

Figure-9: 600-602-6 3 Layer Neural network architecture.

Figure-8: Image to Matrix Mapping and its binary [0, 1] representation.

a. Forward pass Algorithm Input for hidden layer (jth level) neuron where b is the bios for each neuron.

E. Training Learning is the process to teach a system in a way that it can be take a decision to do the task efficiently for the unknown environment. To implement the learning process for a system it need to prior experiences knowledge and create a internal representation of the knowledge based that can be act according to the previous knowledge. Here we have used the Multi-Layer Perceptron Learning Algorithm to implement the learning and classification task for the visual character. Appling the learning process algorithm within the multilayer network architecture, the synaptic weights and threshold are update in a way that the classification/recognition task can be performing efficiently [10].

Output for hidden layer (jth level) neuron Input for Output layer (kth level) neuron

Output for Output layer (kth level) neuron

In the Multi-Layer Perceptron Neural network has an input layer, hidden layer and output layer. Input layer feed the input data set that is came from feature extraction and output layer produced the set of output vector. To implement the learning and classification process we have implement the 3 Layer network.

Let desire output Odk then, Output Error (Eok) = Desire Output (Odk) - Actual Output (Oak) Consider the error tolerance a=0.0001. If Eok is greater than then we can use the backward-pass algorithm to reduce this error.

According to the Image to Matrix mapping stage, we have seen the image character representation the 20x30 matrix. So the matrix contains 600 grids of data for an image character. So the input layer should have 600 neural. For the 600 input neural, we have created architecture for the Multi-Layer Perceptron Neural Network that has 602 neural in the hidden layer. In this proposed Optical Character Recognition Model we are only recognized the character capital English letter [A-Z], small English letter [a-z] and English digits [0-9]; total 62 characters. It has been required 6 bit binary to represent the total 62 characters. As a result, 6 neural are needed for the output layer.

b. Backward pass Algorithm Local gradient for the output layer (kth level) neurons, Local gradient for the hidden layer (jth level) neurons,

135

c. Update Weight Now consider learning factor n=0.25. So, update the weights of the network for output layer using the learning rule,

IV. EXPERIMENTAL RESULT To determine the accuracy of this model, we have experimented by two types of characters. One is isolated image character that is independently single character image and other is sentential case character image that is a character is collect from a sentence line image. These two recognition result is discussed below,

Update bios for the output layer using the learning rule is following,

A. Isolated Character Recognition Our proposal system is experimented for the 62 English character (i.e, English Capital Alphabets A to Z, English Small Alphabets a to z, English Numerical Digits 0 to 9) image recognition. Here we have performed the experiment four types of font like Arial, Calibri (body), Segoe UI and Times New Roman that recognition result is as following,

Update the weights of the network for hidden layer using the learning rule,

Update bios for the hidden layer using the learning rule is following,

F. Recognition After performing the training operation, the network synaptic weights are updated with the number of iterations. So to recognition an object its feature data is feed to the network input layer and produced an output vector. Then the difference between the target output and produced output vector can be calculated by the following function:

Character A B C D

9

Wrong Recognition (Cw)

Arial Calibri (Body) Segoe UI Times New Roman

60 58 56 53

2 4 6 9

Success (%)= (Cr/Ca)x100%

Error E(%)= (Cw/Ca)x100%

According to the Table-2, the overall comparison for the four types of font character image recognition is as following,

150 Wrong Recognition

100 50

Correct Recognition

0 Arial

Times Calibri Segoe UI New (Body) Roman

Success(%)

Chart-1: Isolated Character Recognition experiment result comparison. So the average Success rate for the Isolated Character Recognition is =

B. Sentential Case Character Recognition For the sentential case character we have used the sentence is “A Quick Brown Fox Jumps over the Lazy Dog.” This experimental sentence is written four different type of font like Arial, Calibri (body), Segoe UI and Times New Roman. Using to the sentence by four different font style the experimental result for the sentential case character is as following,

Binary Representation 000001 000010 000011 000100 . . .

62.

Correct Recognition (Cr)

Table-2: Isolated Character recognition experiment result.

By the calculating the output error, it can be take a decision for the object is that is it recognition or not. For example, for the image input character A feature vector is as following, Input Feature Vector for Character A= [00000000111100000 00000000001111100000000000000011111000000000000000 11111100000000000001111111000000000000011111110000 00000000011101111000000000000111011110000000000011 11001110000000000011110011100000000000111000111100 00000000111000111100000000011110000111000000000111 10000111100000000111000001111000000001110000011110 00000011110000001110000000111100000011110000001111 11111111110000011111111111111100000111111111111111 10000111100000000111100001110000000001111000111100 00000000111000111100000000001111001111000000000011 11011110000000000011110111100000000000011111111000 000000000111111100000000000001111] This feature vector is feed into the input layer and after calculating within the network and six bit binary output is produced from the output layer. The bit representation of the character is as following, No. 1. 2. 3. 4.

Font Type

111110

Table-1: Output Character Binary Representation.

136

Font Type Arial Calibri (Body) Segoe UI Times New Roman

Correct Recognition (Cr) 55 52

Wrong Recognition (Cw) 7 10

48 45

14 17

Success (%) =(Cr/Ca)x100

Error E(%) ==(Cw/Ca)x100

Table-3: Sentential Case Character recognition experiment result. According to the Table-3, the comparison picture of the sentential case character experiment by the four different font type is as following, 100 80 60 40 20 0

Correct Recognition

accuracy. In our proposed system are achieved 91.53% accuracy for the isolated character and 80.65% accuracy for the sentential case character. If we see the sentential case character accuracy that is achieved about 10% less accuracy from the isolated character recognition accuracy. Due to lack of proper image character segmentation from the sentence (sentential case image input) is affected the image to matrix mapping is as caused for the wrong character recognition. Artificial neural networks have been massively used for document image analysis and recognition. Most connectionist approaches rely on the use of simple MLPs, and the relationships between different uses of ANNs in different tasks have been only partially considered [11]. But Multi-Layer Perceptron Neural Network can only provide 100% accuracy for the linearly separable problem. So it is really difficult to achieve 100% accuracy for the real time Optical Character Recognition system. But Accuracy rate can be increased by the design optimal neural network architecture with number of neurons. So, in future we try to be improved the accuracy of the presentation Optical Character Recognition system by better image preprocessing and feature extracting method and also we can proposed the optimal Artificial Neural Network architecture by the compare different number of neuron for the different neural network architecture.

Wrong Recognition Success(%) Error(%)

References [1]

Dong Xiao Ni, "Application of Neural Networks to Character Recognition", Proceedings of Students/Faculty Research Day, CSIS, Pace University, May 4th, 2007. [2] N. Mezghani, A. Mitiche, M. Cheriet, "A new representation of character shape and its use in on-line character recognition by a self organizing map" International Conference on Image Processing 2004. [3] Sameeksha Barve, "Optical Character Recognition Using Artificial Neural Network", International Journal of Advanced Research in Computer Engineering & Technology, Volume 1, Issue 4, June 2012,ISSN: 2278 – 1323, pp. 131-133. [4] Rafael C. Gonzalez; Richard E. Woods (2008). “Digital Image Processing”, Prentice Hall. pp. 1–3. ISBN 978-0-13-168728-8. [5] Wikipedia, “RGB_color_model”, 22 July 2015. [6] Stephen Johnson (2006). Stephen Johnson on Digital Photography. O'Reilly. ISBN 0-596-52370-X. [7] Linda G. Shapiro and George C. Stockman (2001): “Computer Vision”, pp 279-325, New Jersey, Prentice-Hall, ISBN 0-13-030796-3. [8] Daniel Admassu, "Unicode Optical Character Recognition", Codeproject.com, 23 Aug 2006. [9] Md. Abdullah-al-mamun, "Emblematical image based pattern recognition paradigm using Multi-Layer Perceptron Neural Network", unpublished. [10] En-Wikipedia, “Optical character recognition”, 24 July 2015. [11] Simone Marinai, Marco Gori, Giovanni Soda, "Artificial Neural Networks for Document Analysis and Recognition"

Chart-2: Sentential Case Character Recognition experiment result comparison. So the average Success rate for the sentential case Character Recognition is =

Here, for the sentence character recognition success rate is fall from the isolated character recognition success rate, because of, in the time of image segmentation the some character edge is cut from the real shape of the character. These image segmentation cut result is effect on the feature extraction (i.e, image to matrix mapping) by the mean cause of wrong recognition. V. CONCLUSION Human are capable to detect an object through the eye capture called "optical mechanism" as human brain "sees". The goal of this presentation paper was to recognize the image character by the machine (i.e, computer) that’s reading capabilities will be like as human reading capability. But due to some real time difficulty, it is really tricky to achieved 100%

137

Lihat lebih banyak...

An approach to empirical Optical Character recognition paradigm using Multi-Layer Perceptorn Neural Network

Descripción

Comentarios