0% found this document useful (0 votes)

29 views6 pages

Handwritten OCR For Word in Indic Language Using Deep Networks

Uploaded by

sahilkumarjamwal464

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views6 pages

Handwritten OCR For Word in Indic Language Using Deep Networks

Uploaded by

sahilkumarjamwal464

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Handwritten OCR for word in Indic Language

using Deep Networks

Manish Kumar Gupta, Surya Vikram, Siddharth Dhawan, Ajai Kumar

Centre for Development of Advanced Computing (C-DAC), Pune, India
{mgupta, suryav, siddharthd, ajai}@cdac.in

Abstract—A large number of Indian documents are The encoder part is used to get visual features from the input
handwritten and India is a diverse nation with many languages. image and this image embedding is added with 2D Positional
These handwritten documents contain important historical and encoding. Then this image embedding is given to the decoder
cultural information which needs to be preserved by converting
to digital format. The major problem is everyone has unique part which is the same as Transformer Decoder from [3] with
handwriting with different styles of writing. To address this non-causal attention for encoder output and causal attention for
problem, we have trained Handwritten Optical Character decoder input [2]. Teacher forcing is used to make the model
Recognition (HOCR) in eight Indian languages i.e. Bangla, converge faster and to reduce training time. This model uses
Gujarati, Gurumukhi, Hindi, Kannada, Odia, Telugu, and Urdu. character-level token embedding.
The datasets IIIT-HW-Dev and IIIT-HW-Telugu refer to a
Devanagari dataset and a Telugu dataset respectively. The IIIT- Computer Vision (CV) can be used to analyze, interpret
INDIC-HW-WORDS consists of 872K handwritten words written and understand meaningful information from two-dimensional
in 8 Indic scripts by 135 writers. Devanagari and Telugu datasets images. The human eye capability can be matched in machines
are comprised of 95K and 120K handwritten words respectively. with the help of CV. A massive image dataset is required to
Tamil and Malayalam languages are excluded due to issues in learn the orientation and pattern of scenes as compared to
the IIIT-INDIC-HW-WORDS dataset. The paper describes how
the CNN-Transformer architecture leverages visual and textual humans with the help of machine learning and deep learning.
features to perform OCR tasks in different languages. The model Similar challenges are noticed in the detection and recognition
takes word images as input then CNN generates visual features, of UIDAI as well as PAN number from images [4].
and feeds them to the transformer decoder for text generation. The rest of the paper is organized as follows. Section
An encoder ResNet18 and a decoder from a transformer have
II describes the OCR-related work done in past by other
been used for all eight languages to evaluate the performance of
this architecture. This architecture performed best in Kannada authors including me. Section III explains the process of
with just a 1.5% character error rate. data preparation in detail for eight Indian languages. Section
Index Terms—Ofﬂine Handwritten Character Recognition,
IV explains the overall methodology of experimentation in
OHCR, Convolution Neural Networks, CNN, Transformers, detail. Section V describes the training configuration required
encoder-decoder for the experiment setup. Section VI describes the obtained
experimental results. Section VII concludes the paper and
shows future work.
I. I NTRODUCTION
HOCR is used for converting handwritten text to a digital
II. R ELATED W ORK
editable and searchable format which can be further used for
analysis and easy information retrieval through large data. OCR research has been going on for more than three
In this paper, the HOCR is trained for eight Indian regional decades and many conventional algorithms were used like
scripts Bangla, Gujarati, Gurumukhi, Hindi, Kannada, Odia, SVMs and HMMs which use character-level segmentation
Telugu, and Urdu. India is diverse, it has many languages and which are very difficult to annotate [5]. The visual features
HOCR can be used to preserve the knowledge, history, and of the image can be extracted using Convolution Neural
heritage of these handwritten manuscripts. Network (CNN). The problems like object classification
There is too much inconsistency in handwriting styles and and object detection, etc can be solved using these visual
various State of the Art models fail to give a satisfactory features. CNN networks cannon generate text output as text
performance on various types of handwriting samples [1]. generation is a recurrent problem that is common in most
Handwritten data has a large amount of variation within Natural Language Processing (NLP) tasks. The sequence-to-
the same sequence, this can range from the more easy to sequence learning task is solved using the encoder-decoder
handle issues such as the image based noise, blur and scale; architecture using CNN as the backbone and RNNs as
to the more complex issues such as random skewing to a decoders. A combination of CNN-BILSTM-CTC [6] is used
direction depending on writing style, shifts in the structures which performs exceptionally on trained data but it performs
of characters due to human error, character skew, etc. These poorly on unseen and noisy data. A more complex architecture
issues make handwritten OCR a massive challenge. MDLSTM-CTC [7] with dropout [8] was next evolution which
We trained an Image to sequence architecture inherited from slightly improved accuracy over BiLSTM. Compute-intensive
[2]. This model has two main parts Encoder and Decoder. CTC loss is replaced by Cross-entropy loss and greedy search

978-1-6654-9099-3/23/$31.00 ©2023 IEEE 389

Authorized licensed use limited to: M S RAMAIAH INSTITUTE OF TECHNOLOGY. Downloaded on November 15,2024 at 08:26:19 UTC from IEEE Xplore. Restrictions apply.
decoding for text generation [9]. Transformers are State of
the art models for language models currently [3]. The CNN-
Transformer architecture resolves this problem [2], [10]. The P E(y, 2i) = sin(y/100002i/dmodel )
HOCR is a word-level OCR that is based on an encoder- P E(y, 2i + 1) = cos(y/100002i/dmodel )
decoder architecture consisting of CNN with a transformer as
P E(x, dmodel /2 + 2i) = sin(x/100002i/dmodel )
decoder architecture. This type of architecture is more accurate
than that of only a CNN-RNN-based model on unseen data P E(x, dmodel /2 + 2i + 1) = sin(x/100002i/dmodel )
and can handle noisy data because using transformers as the i ∈ [0, dmodel /4) (1)
decoder part gives the character level context the same as
RNNs but with long-term contexts which helps in increasing The decoder is a transformer decoder stack, with non causal
accuracy significantly [4], [11], [12], [13]. attention to the encoder and causal self-attention [17]. The
input sequence is sent with 1D positional encoding. Training
In this paper, we trained CNN and transformer-based models
is done using teacher forcing i.e the decoder can only use
on different regional languages from IIIT Dataset [14]. The
the previous part of the sequence to predict the next. This
main purpose of this paper is to see how this new architecture
is nowadays commonly used, it helps the transformer to
performs for different Indian languages.
learn faster by providing a shifted input and the loss is
calculated based on the ground truth. The method of decoding
is greedy which can have some issues with repetition and
III. DATA P REPARATION junk generation but this has not been observed on single word
sequences. The decoder generates a probability distribution of
IIIT-INDIC-HW Dataset: It is a large dataset of annotated tokens for each step which is dependent on the probabilities
words for 10 scripts i.e Hindi, Telugu, Bengali, Gurumukhi, generated thus far as shown in equation 2.
Gujarati, Odia, Urdu, Kannada, Tamil, Malayalam [14]. The Probability Function:
images are augmented with skew, shift, scaling, and gaussian
blur. Eight language are selected for training, validation and
testing as shown in table I. pt := 1, ..., V → [0, 1]; Yt ∼ pt

Dataset Language Train Validation Test

iiit-indic-hw-words Bangla 82554 12947 17574
pt (yt ) := P(Yt = yt |y<t , I)
iiit-indic-hw-words Gujarati 82563 17643 16490
iiit-indic-hw-words Gurumukhi 81042 13627 17947
τ
iiit-indic-hw-words Hindi 69853 12869 12708 P (y|I) = pt (yt ) (2)
iiit-indic-hw-words Kannada 73517 13752 15730
t=1
iiit-indic-hw-words Odia 73400 11217 16850
iiit-indic-hw-words Telugu 80693 17910 20048 The loss function is cross entropy, the ground truth is
iiit-indic-hw-words Urdu 71207 13906 15517
y GT and there is no regularisation, instead dropout and data
Table I: Datasets For Experimentation augmentations are preferred as shown in equation 3.
Cross entropy:

1
IV. M ETHODOLOGY Lseq = − ln (pt (ytGT )) ; τ ≡ sequence length
τ t
We use an image to sequence architecture [2] consisting 1
Lbatch =− ln (pt (ytGT )) ; n ≡ #of token in batch
of a CNN encoder and a transformer decoder. This type of n
batch t
architecture belongs to Sequence-to-Sequence and Tensor-to- (3)
Tensor architecture [15]. The labels are obtained from the
ground truth as a set of characters, in addition, the three tokens This loss function is modified for mini batches.
are added for padding, end of the statement, and start of the The final layer is a 1x1 convolution which produces the
statement. logits which can be normalised with softmax to give the
The encoder is a ResNet [16] for 2D feature extraction. prediction.
Instead of pooling and classifying, we only take the feature
map generated by the last block. We use 2D positional V. T RAINING CONFIGURATION
encoding which is added to the feature map before it is The base configuration was as follows:
flattened into a sequence. 2D positional encoding [3] is a fixed Encoder: ResNet18
sinusoidal encoding, here we use half the dmodel channels for Transformer Stack:
encoding the Y coordinate, while the other half is used for the Number of layers = 6
X coordinate as shown in equation 1. After the 2D positional dmodel = 260
encoding is added we send the resulting output to the decoder h (number of heads) = 4
layer stack. dff = 1024
Positional Encoding: Dropout = 0.1

390 10th International Conference on Signal Processing and Integrated Networks (SPIN 2023)

Authorized licensed use limited to: M S RAMAIAH INSTITUTE OF TECHNOLOGY. Downloaded on November 15,2024 at 08:26:19 UTC from IEEE Xplore. Restrictions apply.
Figure 1: Overall encoder-decoder architecture

The model is implemented in PyTorch. The batch sizes step in the encoder output and causal attention on the sequence
chosen could go up to 256 but depending on augmentations, input from the decoder. The input is right shifted and then
we decided to go with a lower number especially when random added to the 1D positional encoding of the input sequence,
rotation is involved. The optimizer used is ADAM and is here the input sequence is formed by performing token
employed with a fixed learning rate (α) of 2e-4, β1 = 0.9 embedding on the ground truth. This is followed by masking
and β2 = 0.999. for teacher forcing. The decoder layers apply attention to the
System speciﬁcation encoder output and self-attention to the previous layer output,
The training is done on PARAM SHAVAK with 2.6 GHz the result is then passed through a position-wise feed-forward
Intel Xeon Gold 6145, RAM 96 GB, and NVIDIA Quadro® network. The output from the decoder stack goes through a
GV100 GPU card. linear (1x1 Convolution) layer for the final logits; which may
then be normalised using a softmax layer.
Duration
On average, the training takes around 2-3 days including
validation. The training time depends heavily on the available
system resources and the model size as is standard. The only VI. R ESULTS
other thing noticed is, global padding influences system time
as it pads all images to the size of the largest image. Hence,
a batch-wise padding method is used. The input images of all 8 languages with the respective
model predictions are shown in figure 2..
The results are compiled after testing the trained models
A. OCR Engine
on the test set in the data. The metrics are character error
1) Encoder architecture : As shown in the figure 1, CNN rate and word error rate as shown in equation 4. The model
is used in the encoder to extract features from the input performs well on many of the languages and is shown to give
images. The encoder will extract a feature map from the given a reasonably good output in most cases.
image, which when combined with 2D positional encoding
will contain both, the features and the information about the
positions of the extracted features. This feature map when #levenshteins_distance
flattened, forms our encoding. In this model, the encoder is CER = ( ∗ 100)
#total_characters
a ResNet18 architecture without the final global pooling and
the fully connected classification layers.
2) Decoder architecture: The Decoder is a six block
transformer decoder stack. It applies non-causal attention on (#wrong_words)
the entire encoder output i.e it is independent of the previous W ER = ∗ 100 (4)
(#total_words)

10th International Conference on Signal Processing and Integrated Networks (SPIN 2023) 391

Authorized licensed use limited to: M S RAMAIAH INSTITUTE OF TECHNOLOGY. Downloaded on November 15,2024 at 08:26:19 UTC from IEEE Xplore. Restrictions apply.
Language Val_loss CER WER Model
Bangla 0.3419 2.484 6.08 dmodel
Bangla 0.3944 2.8 6.94 ResNet18
Bangla 6.394 134.29 99.43 ResNet50
Bangla 0.2701 2.27 6.08 ResNet34
Bangla 0.346453 0.0249 0.072688 EfficientNet
Bangla 0.49669 0.04244 0.10001 MobileNet

Table III: Experiment of Bangla for best hyperparameter

Figure 2: Examples of inputs and outputs

Figure 3: Bangla Val Loss, CER and WER

Language Val_loss CER WER Model

Bangla 0.3944 2.8 6.94 ResNet18
Gujarati 0.4762 3.63 9.94 ResNet18
Gurumukhi 0.5922 4.34 9.86 ResNet18
Hindi 0.7203 7.58 18.57 ResNet18 The overall downtrend is observed in figure 4 which shows
Kannada 0.2356 1.34 4.95 ResNet18 the models are learning. Some models are stopped earlier due
Odia 0.8626 6.67 16.07 ResNet18 to unpredictable behavior which may be because of fewer
Telugu 0.6769 7.12 23.05 ResNet18
Urdu 0.5339 6.55 16.1 ResNet18 data. The validation loss, WER, and CER for all languages are
shown in figure 4, 5, 6. The overall metrics for all languages
Table II: Result of handwritten word OCR for 8 Indian are shown as bar graph and as numerical values in the figure
languages 7 and table II respectively.

Different CNN models are tested i.e. Resnet18 , Resnet34 ,

Resnet50 , EfficientNet, and MobileNet. Resnet architecture
outperforms other CNN architecture as shown in table III and
figure 3, their results have also been compiled for the Bangla
language as an example of improvements or otherwise. While
the model’s performance improved when we used a ResNet34
based encoder, its performance degraded a lot when an even
more complex encoder is chosen. This is believed to be due
to the data itself being relatively smaller or simpler in terms
of image size as all the images were single words. Choosing
a smaller dmodel value which reduces the hidden size of the
decoder also supports this argument as a smaller model seemed
to have performed better on the current data. All the models
are trained for 999 epochs unless, of course, they ended up
overfitting i.e their validation loss suddenly spiked and soon
reached a not a number status. Figure 4: Loss curves

392 10th International Conference on Signal Processing and Integrated Networks (SPIN 2023)

Authorized licensed use limited to: M S RAMAIAH INSTITUTE OF TECHNOLOGY. Downloaded on November 15,2024 at 08:26:19 UTC from IEEE Xplore. Restrictions apply.
Handwritten OCR. The advantage of visual features from
CNN and language features from transformers is being used
in this unified framework. The proposed architecture achieves
the least character error rate in Kannada with 1.5% on the
validation set followed by Bangla with 2.8% after 999 epochs.
Additional experiments on Bangla by replacing ResNet18
with ResNet34 gave a performance boost over the default
architecture; the char error rate went down to 2.2%. This
can help us improve CER of other languages, it is left for
future research. In the future, we will apply normalization rules
in all other languages as we have used in Hindi which can
significantly increase model accuracy. All handwritten OCRs
are trained separately; each model took around two to three
days to train. We can try to apply transfer learning to other
languages because the parameters in the model architecture are
not changing, we can make use of low-level features learned
Figure 5: Gives information about WER for all 8 languages. by CNN which can help us to reduce the training time and cost.
WER for Kannada and Telugu is the lowest whereas Odia and Transformers require a large amount of data to learn and to
Urdu are the highest.(Lower the better) address that we can try generating synthetic data using GANs
to give our model different handwriting exposures which can
help us in generalizing the model. Further experiments can
be done to make a single OCR for all the languages which
will be beneficial in addressing the requirements. We can try
replacing our architecture with Visual Transformer.

ACKNOWLEDGEMENT
The authors would like to thank the Technology
Development for Indian Languages (TDIL) Programme of the
Ministry of Electronics and Information Technology (MeitY),
Government of India for funding the consortium project for
Marathi OCR.

R EFERENCES
Figure 6: Gives information about CER for all 8 languages. [1] A. Obaid, H. El-Bakry, M. Eldosuky, and A. Shehab, “Handwritten text
It shows similar trends as WER. The value of CER is lower recognition system based on neural network,” International Journal of
Advanced Research in Computer Science and Technology, vol. 4, pp.
than WER for all languages 72–77, 01 2016.
[2] S. S. Singh and S. Karayev, “Full page handwriting recognition via
image to sequence extraction,” CoRR, vol. abs/2103.06450, 2021.
[Online]. Available: https://arxiv.org/abs/2103.06450
[3] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones,
A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all
you need,” CoRR, vol. abs/1706.03762, 2017. [Online]. Available:
http://arxiv.org/abs/1706.03762
[4] M. K. Gupta, R. Shah, J. Rathod, and A. Kumar, “Smartidocr: Automatic
detection and recognition of identity card number using deep networks,”
in 2021 Sixth International Conference on Image Information Processing
(ICIIP), vol. 6, 2021, pp. 267–272.
[5] V. Bansal and R. M. K. Sinha, “A complete ocr for printed hindi text
in devanagari script,” in Proceedings of Sixth International Conference
on Document Analysis and Recognition, Sep. 2001, pp. 800–804.
[6] C. Biswas, P. S. Mukherjee, K. Ghosh, U. Bhattacharya, and S. K.
Parui, “A hybrid deep architecture for robust recognition of text lines
of degraded printed documents,” in 2018 24th International Conference
on Pattern Recognition (ICPR), 2018, pp. 3174–3179.
[7] A. Graves and J. Schmidhuber, “Offline handwriting recognition
with multidimensional recurrent neural networks,” in Advances in
Neural Information Processing Systems, D. Koller, D. Schuurmans,
Figure 7: All language Val Loss, CER and WER Y. Bengio, and L. Bottou, Eds., vol. 21. Curran Associates, Inc.,
2008. [Online]. Available: https://proceedings.neurips.cc/paper/2008/
VII. C ONCLUSION AND FUTURE WORK file/66368270ffd51418ec58bd793f2d9b1b-Paper.pdf
[8] V. Pham, C. Kermorvant, and J. Louradour, “Dropout improves
The CNN-Transformer architecture is applied to Indian recurrent neural networks for handwriting recognition,” CoRR, vol.
regional languages for the first time for word-level abs/1312.4569, 2013. [Online]. Available: http://arxiv.org/abs/1312.4569

10th International Conference on Signal Processing and Integrated Networks (SPIN 2023) 393

Authorized licensed use limited to: M S RAMAIAH INSTITUTE OF TECHNOLOGY. Downloaded on November 15,2024 at 08:26:19 UTC from IEEE Xplore. Restrictions apply.
[9] N. Ly, C. Nguyen, and M. Nakagawa, “An attention-based end-to-
end model for multiple text lines recognition in japanese historical
documents,” 09 2019, pp. 629–634.
[10] L. Kang, P. Riba, M. Rusiñol, A. Fornés, and M. Villegas, “Pay
attention to what you read: Non-recurrent handwritten text-line
recognition,” CoRR, vol. abs/2005.13044, 2020. [Online]. Available:
https://arxiv.org/abs/2005.13044
[11] B. Su and S. Lu, “Accurate scene text recognition based on recurrent
neural network,” in Computer Vision – ACCV 2014, D. Cremers,
I. Reid, H. Saito, and M.-H. Yang, Eds. Cham: Springer International
Publishing, 2015, pp. 35–48.
[12] B. Shi, X. Bai, and C. Yao, “An end-to-end trainable neural network
for image-based sequence recognition and its application to scene text
recognition,” vol. 39, no. 11, Nov 2017, pp. 2298–2304.
[13] K. Mehrotra, M. K. Gupta, and K. Khajuria, “Collaborative deep neural
network for printed text recognition of indian languages,” in 2019 Fifth
International Conference on Image Information Processing (ICIIP),
2019, pp. 252–256.
[14] S. Gongidi and C. V. Jawahar, “Iiit-indic-hw-words: A dataset
for indic handwritten text recognition,” in Document Analysis and
Recognition, ICDAR 2021, 16th International Conference, Lausanne,
Switzerland, September 5 to 10, 2021, Proceedings, Part IV. Berlin,
Heidelberg: Springer Verlag, 2021, pp. 444–459. [Online]. Available:
https://doi.org/10.1007/978-3-030-86337-1_30
[15] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov,
R. Zemel, and Y. Bengio, “Show, attend and tell: Neural image
caption generation with visual attention,” 2015. [Online]. Available:
https://arxiv.org/abs/1502.03044
[16] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” 2015. [Online]. Available: https://arxiv.org/abs/1512.03385
[17] N. Moritz, T. Hori, and J. L. Roux, “Dual causal/non-causal self-
attention for streaming end-to-end speech recognition,” 2021. [Online].
Available: https://arxiv.org/abs/2107.01269

394 10th International Conference on Signal Processing and Integrated Networks (SPIN 2023)

Authorized licensed use limited to: M S RAMAIAH INSTITUTE OF TECHNOLOGY. Downloaded on November 15,2024 at 08:26:19 UTC from IEEE Xplore. Restrictions apply.

Handwritten Text Recognition
No ratings yet
Handwritten Text Recognition
3 pages
AI Summary
No ratings yet
AI Summary
3 pages
Trocr: Transformer-Based Optical Character Recognition With Pre-Trained Models
No ratings yet
Trocr: Transformer-Based Optical Character Recognition With Pre-Trained Models
10 pages
English For Everyone English Vocabulary Builder
95% (102)
English For Everyone English Vocabulary Builder
362 pages
Icicct 2018 8473291
No ratings yet
Icicct 2018 8473291
4 pages
Final
No ratings yet
Final
28 pages
Expanded EMNIST Project Report
No ratings yet
Expanded EMNIST Project Report
4 pages
Icicct 2017 7975203
No ratings yet
Icicct 2017 7975203
4 pages
Paper Summary Advancements and Challenges in Handwritten Text Recognition A Comprehensive Survey
No ratings yet
Paper Summary Advancements and Challenges in Handwritten Text Recognition A Comprehensive Survey
7 pages
Bilingual OCR Report
No ratings yet
Bilingual OCR Report
10 pages
Handwritten Character Recognition From Images Using CNN-ECOC Handwritten Character Recognition From Images Using CNN-ECOC
No ratings yet
Handwritten Character Recognition From Images Using CNN-ECOC Handwritten Character Recognition From Images Using CNN-ECOC
7 pages
Published Journal Paper1
No ratings yet
Published Journal Paper1
7 pages
Word Transcription of MODI Script To Devanagari Using Deep Neural Network
No ratings yet
Word Transcription of MODI Script To Devanagari Using Deep Neural Network
5 pages
The Men Old
No ratings yet
The Men Old
8 pages
Paper 1
No ratings yet
Paper 1
3 pages
Handwritten Text Recognition Using Deep Learning
No ratings yet
Handwritten Text Recognition Using Deep Learning
13 pages
Optical Character Recognition Using Convolutional Neural Network
No ratings yet
Optical Character Recognition Using Convolutional Neural Network
5 pages
Kannada Jounal Paper
No ratings yet
Kannada Jounal Paper
8 pages
Prashanth2022 Article HandwrittenDevanagariCharacter
No ratings yet
Prashanth2022 Article HandwrittenDevanagariCharacter
30 pages
Paper 2
No ratings yet
Paper 2
5 pages
231867-06 B Pascal Arrays Function Prosidures and Paradime
100% (1)
231867-06 B Pascal Arrays Function Prosidures and Paradime
9 pages
OCR Sanskrit CNN
No ratings yet
OCR Sanskrit CNN
6 pages
Hand Written Letter Recognition
No ratings yet
Hand Written Letter Recognition
14 pages
Titlelabel 1
No ratings yet
Titlelabel 1
13 pages
Arad Ill As Jaramillo 2018
No ratings yet
Arad Ill As Jaramillo 2018
6 pages
Extraction of Information From Handwriting Using Optical Character Recognition and Neural Networks
No ratings yet
Extraction of Information From Handwriting Using Optical Character Recognition and Neural Networks
6 pages
Paper 17573
No ratings yet
Paper 17573
11 pages
Report For OCR Project
No ratings yet
Report For OCR Project
18 pages
Ai Powered Ocr For Efficient Government Documentation
No ratings yet
Ai Powered Ocr For Efficient Government Documentation
49 pages
Grade 5 Language Arts Week 1 Lesson 1
No ratings yet
Grade 5 Language Arts Week 1 Lesson 1
4 pages
Project Stage I Modi
No ratings yet
Project Stage I Modi
24 pages
Revision Games
No ratings yet
Revision Games
2 pages
Sat - 23.Pdf - Handwritten Hindi Character Recognition Using CNN
No ratings yet
Sat - 23.Pdf - Handwritten Hindi Character Recognition Using CNN
11 pages
On The Benefits of Convolutional Neural Network Combinations in of Ine Handwriting Recognition
No ratings yet
On The Benefits of Convolutional Neural Network Combinations in of Ine Handwriting Recognition
6 pages
Full Page Handwriting Recognition Via Image To Sequence Extraction
No ratings yet
Full Page Handwriting Recognition Via Image To Sequence Extraction
16 pages
Conf Paper
No ratings yet
Conf Paper
7 pages
Final Seminar Presentation2
No ratings yet
Final Seminar Presentation2
14 pages
Mini Project PPT Chaimedha Final
No ratings yet
Mini Project PPT Chaimedha Final
19 pages
Sample Project Report
No ratings yet
Sample Project Report
26 pages
Handwritten Hindi Character Recognition Using MultipleClassifiers in Machine Learning
No ratings yet
Handwritten Hindi Character Recognition Using MultipleClassifiers in Machine Learning
6 pages
Đề Cuong Ôn Tập Học Kỳ II Lớp 8
No ratings yet
Đề Cuong Ôn Tập Học Kỳ II Lớp 8
10 pages
Chapter 5 Literature and Entertainment
No ratings yet
Chapter 5 Literature and Entertainment
3 pages
Plagiarism Checker X Originality Report: Similarity Found: 26%
No ratings yet
Plagiarism Checker X Originality Report: Similarity Found: 26%
29 pages
A Convolutional Recurrent Neural Network For The Handwritten Text Recognition of Historical Greek Manuscripts
No ratings yet
A Convolutional Recurrent Neural Network For The Handwritten Text Recognition of Historical Greek Manuscripts
14 pages
Handwritten English Word Recognition Using A Deep Learning Based
No ratings yet
Handwritten English Word Recognition Using A Deep Learning Based
29 pages
Hexel
No ratings yet
Hexel
75 pages
Handwriting Recognition Using Deep Learning: Image Processing
No ratings yet
Handwriting Recognition Using Deep Learning: Image Processing
14 pages
Handwritten Character Recognition Using Deep Learning (Convolutional Neural Network)
No ratings yet
Handwritten Character Recognition Using Deep Learning (Convolutional Neural Network)
22 pages
Peerj Cs 1964
No ratings yet
Peerj Cs 1964
24 pages
Private Schools SPLD Vacation - Greater Accra
No ratings yet
Private Schools SPLD Vacation - Greater Accra
2 pages
MANVA
No ratings yet
MANVA
51 pages
05 Chapter 1 PDF
No ratings yet
05 Chapter 1 PDF
15 pages
Amrutvahini College of Engineering, Sangamner: Department of Computer Engineering 2021-2022 A Project Presentation On
No ratings yet
Amrutvahini College of Engineering, Sangamner: Department of Computer Engineering 2021-2022 A Project Presentation On
24 pages
Aagama and Nigama
No ratings yet
Aagama and Nigama
2 pages
Term 3 B2 Creative Arts
No ratings yet
Term 3 B2 Creative Arts
3 pages
ocr-3
No ratings yet
ocr-3
22 pages
3rd Term Week 1-10 Note.
No ratings yet
3rd Term Week 1-10 Note.
13 pages
Information
No ratings yet
Information
19 pages
Sc Mini Project
No ratings yet
Sc Mini Project
18 pages
Real-Time Detection of Spelling Mistakes in Handwritten Notes
No ratings yet
Real-Time Detection of Spelling Mistakes in Handwritten Notes
70 pages
Final Doc1
No ratings yet
Final Doc1
62 pages
CNN-RNN Based Handwritten Text Recognition: G.R. Hemanth, M. Jayasree, S. Keerthi Venii, P. Akshaya, and R. Saranya
No ratings yet
CNN-RNN Based Handwritten Text Recognition: G.R. Hemanth, M. Jayasree, S. Keerthi Venii, P. Akshaya, and R. Saranya
7 pages
Enhancing-Accuracy-Indic-Handwritten-Text-Recognition-1
No ratings yet
Enhancing-Accuracy-Indic-Handwritten-Text-Recognition-1
15 pages
ML_REPORT_2025 (1)
No ratings yet
ML_REPORT_2025 (1)
18 pages
Project Word Report
No ratings yet
Project Word Report
17 pages
Deep Learning Paper
No ratings yet
Deep Learning Paper
22 pages
The Definition of (لعفلا) the verb
No ratings yet
The Definition of (لعفلا) the verb
7 pages
On The Go 1 Student's Book
No ratings yet
On The Go 1 Student's Book
72 pages
Offline Handwritten Hindi Character Recognition Using Data Mining152
No ratings yet
Offline Handwritten Hindi Character Recognition Using Data Mining152
50 pages
Alphabets Old & New - Lewis Day
100% (3)
Alphabets Old & New - Lewis Day
230 pages
Test
0% (2)
Test
64 pages
Reading Comprehension - Effective Presentation Tips: A Second Language A Native Speaker
0% (1)
Reading Comprehension - Effective Presentation Tips: A Second Language A Native Speaker
4 pages
Arghavan Ghajar The Easy Way To Ielts Writing Academic Modul - Sachphotos
100% (3)
Arghavan Ghajar The Easy Way To Ielts Writing Academic Modul - Sachphotos
136 pages
Wa0019.
No ratings yet
Wa0019.
16 pages
PL/SQL Stands For Procedural
No ratings yet
PL/SQL Stands For Procedural
81 pages
Properties of A Well Written Texts PDF
No ratings yet
Properties of A Well Written Texts PDF
88 pages
Juknis Porseni Pgri 2023
No ratings yet
Juknis Porseni Pgri 2023
21 pages
Chapter 13-Communicating Across Cultures
No ratings yet
Chapter 13-Communicating Across Cultures
15 pages
Indolence of The Filipino Chapter Review
No ratings yet
Indolence of The Filipino Chapter Review
6 pages
21-Mahmood Gaznavi Road Lahore
No ratings yet
21-Mahmood Gaznavi Road Lahore
18 pages
WK T1 Week 4 gr3
No ratings yet
WK T1 Week 4 gr3
4 pages
Corpus
No ratings yet
Corpus
21 pages
LINKERS Writing Unit 3
No ratings yet
LINKERS Writing Unit 3
5 pages
BPTC Drafting Briefing Sheet March 2021
No ratings yet
BPTC Drafting Briefing Sheet March 2021
2 pages
Morphology (Structure of Words) : Home Homes, Homey
No ratings yet
Morphology (Structure of Words) : Home Homes, Homey
2 pages
Carnap Physical Language PSA PDF
No ratings yet
Carnap Physical Language PSA PDF
15 pages
End-To-End Text Recognition With Convolutional Neural Networks
No ratings yet
End-To-End Text Recognition With Convolutional Neural Networks
60 pages
Nanda Resume
No ratings yet
Nanda Resume
2 pages

Handwritten OCR For Word in Indic Language Using Deep Networks

Uploaded by

Handwritten OCR For Word in Indic Language Using Deep Networks

Uploaded by

Handwritten OCR for word in Indic Language

using Deep Networks

Manish Kumar Gupta, Surya Vikram, Siddharth Dhawan, Ajai Kumar

978-1-6654-9099-3/23/$31.00 ©2023 IEEE 389

Dataset Language Train Validation Test

Table III: Experiment of Bangla for best hyperparameter

Figure 2: Examples of inputs and outputs

Figure 3: Bangla Val Loss, CER and WER

Language Val_loss CER WER Model

Different CNN models are tested i.e. Resnet18 , Resnet34 ,

You might also like