Mp Final Report
Mp Final Report
On
OBJECT CHARACTER RECOGNITION
IV SEMESTER
ARTIFICIAL INTELLIGENCE
Submitted by
CERTIFICATE
GUIDE
ACKNOWLEDGEMENT
Nishant Tiwari(23010061)
Aditya Menon(23010062)
Mayank Charde(23010042)
ABSTRACT
1. INTRODUCTION 1-2
4.2.7 Documentation
7. CONCLUSIONS AND 21
FUTURE SCOPE
7.1 Conclusion 21
9. ANNEXURE II 23
CHAPTER 1
INTRODUCTION
CHAPTER 1 INTRODUCTION
In today’s rapidly advancing technological landscape, Object Character Recognition (OCR) has emerged as a critical
and transformative field that bridges the physical and digital worlds. OCR technology enables computers to identify
and digitize text from scanned documents, photographs, or any image-based source, thereby automating the
tedious process of manual data entry. By leveraging computer vision, pattern recognition, and deep learning
algorithms, OCR systems have become increasingly sophisticated, accurate, and capable of handling a wide range of
input types and conditions.
The fundamental goal of OCR is to transform information stored in physical formats — such as handwritten notes,
printed books, invoices, and forms — into editable, searchable, and analyzable digital data. This not only improves
operational efficiency but also ensures long-term storage, easy retrieval, and enhanced accessibility of information.
With the rise of digitization efforts across industries, OCR has become an indispensable tool for businesses,
educational institutions, healthcare providers, and governments alike.
Modern OCR systems increasingly rely on deep learning methods to overcome traditional limitations. Techniques
such as CNNs, Recurrent Neural Networks (RNNs), and Long Short-Term Memory (LSTM) networks have drastically
improved OCR performance, especially in cases involving complex backgrounds, variable handwriting, distorted
fonts, or multilingual documents.
This project aims to develop a robust OCR system that can handle various challenges associated with real-world text
recognition. Our system will utilize deep learning algorithms to accurately extract characters from images, even
under noisy or sub-optimal conditions. The focus will be on achieving high accuracy, fast processing times, and
flexibility across different input sources.
Addressing these challenges requires careful dataset preparation, model training, and testing under diverse
scenarios, all of which are integral parts of this project.
By the end of the project, the aim is to deliver an OCR solution that contributes meaningfully to the automation
landscape, reduces human error, saves time, and enhances data accessibility across various sectors.
CHAPTER 2
LITERATURE REVIEW
CHAPTER 2 LITERATURE REVIEW
Object Character Recognition (OCR) has been a significant field of research and development for several decades,
witnessing a remarkable evolution from simple pattern-matching techniques to advanced deep learning-based
solutions. This chapter provides an in-depth review of the foundational concepts, existing methodologies, and recent
advancements in OCR systems. It also highlights the strengths, limitations, and research gaps that motivate the
present work.
However, early systems struggled to recognize handwritten text and often failed in the presence of noise or non-
uniform character shapes. This led to the need for more adaptive and intelligent approaches.
Feature extraction techniques like Histogram of Oriented Gradients (HOG) and Zernike Moments were
introduced to better represent characters mathematically. Although these methods improved recognition rates, they
still required manual feature engineering and were not scalable for large, complex datasets.
3
• Tesseract OCR Engine (by Google): Initially based on traditional methods, it evolved to integrate LSTM
networks for sequential text recognition.
• CRNN (Convolutional Recurrent Neural Network): Combines CNNs for feature extraction and RNNs for
sequential character decoding, making it highly effective for recognizing sequences of text in natural scenes.
• Attention-based Models: Inspired by machine translation, attention mechanisms allow the OCR systems to
focus on important regions of an image while predicting text, improving accuracy for irregular text layouts.
Studies like Shi et al.'s "An End-to-End Trainable Neural Network for Image-based Sequence Recognition and
Its Application to Scene Text Recognition" highlighted the effectiveness of end-to-end learning frameworks in
OCR applications.
• Transformer-based Models: Use of Vision Transformers (ViTs) and Sequence-to-Sequence architectures for
text recognition tasks, allowing global context awareness.
• Multi-task Learning: Simultaneously training models on recognition, detection, and segmentation tasks to
boost performance.
• Synthetic Data Generation: To overcome the shortage of annotated datasets, synthetic text images are
generated to train more robust OCR systems.
• Lightweight Models for Edge Devices: Research is focused on compressing large OCR models for mobile
and embedded applications without significantly sacrificing accuracy (e.g., MobileNetV3, EfficientNet).
• Document Digitization: Converting printed books, historical archives, and handwritten manuscripts into
searchable digital formats.
• Banking Sector: Automated cheque clearing and form processing.
• Healthcare: Digitizing patient records and prescriptions.
• Transportation: Automatic Number Plate Recognition (ANPR) systems for traffic control and surveillance.
• Assistive Technologies: Enabling visually impaired individuals to access printed material through text-to-
speech conversions.
4
2.6 Challenges in Current OCR Systems
Despite significant progress, OCR systems still face several challenges:
• Handwritten Text Recognition: High variability in handwriting styles, cursive scripts, and character
connectivity makes recognition difficult.
• Multi-language Support: Recognition across languages with different scripts (e.g., Latin, Devanagari,
Arabic, Chinese) remains a complex task.
• Low-Quality Images: OCR accuracy drops significantly with blurred, low-resolution, or noisy inputs.
• Complex Layouts: Documents with tables, multiple columns, and mixed text-image content require advanced
layout analysis algorithms.
• Highly generalized models capable of handling extreme distortions and low-quality images.
• Real-time OCR systems optimized for mobile and edge computing.
• Robust handwritten text recognition systems across multiple languages.
The present project is motivated by the need to address these challenges by developing a deep learning-based OCR
system that is accurate, efficient, and capable of handling various real-world input conditions.
5
CHAPTER 3
PROJECT PLANNING AND SCHEDULING
CHAPTER 3
PROJECT PLANNING AND SCHEDULING
• Define the scope, goals, and expected outcomes of the OCR project.
• Identify key stakeholders, project guides, team members, and their roles.
• Prepare a detailed project plan outlining major tasks, deliverables, and timelines.
• Conduct a risk analysis to identify potential obstacles and mitigation strategies.
3.1.5 Testing
3.1.6 Documentation
7
3.3 GANTT CHART
Timeline Activities
Here is the Gantt chart representing the project phases from requirement analysis to maintenance. The chart provides
a visual timeline for each phase, indicating the start and end dates.
8
CHAPTER 4
REQUIREMENT ANALYSIS
4 REQUIREMENT ANALYSIS
A successful project implementation requires a clear understanding of the system’s hardware, software,
and functional requirements. Proper requirement analysis ensures that the final system fulfills user needs,
performs efficiently, and remains scalable for future developments.
This chapter outlines the various functional, non-functional, hardware, and software requirements for
the Object Character Recognition (OCR) system.
• Image Acquisition
The system must allow users to upload or capture images containing text (scanned documents,
photos, handwritten notes, etc.).
• Preprocessing
The system must perform image enhancement tasks such as noise removal, binarization, skew
correction, and resizing.
• Character Segmentation
The system should accurately segment text into individual characters or words for further
recognition.
• Character Recognition
The system must identify and classify characters using deep learning algorithms.
• Text Output
The recognized text must be displayed in a digital editable format, allowing users to copy, edit, or
save the output.
• Error Handling
The system must handle cases of poor image quality or unrecognizable text gracefully, providing
appropriate feedback.
• The system must securely handle and store uploaded images without unauthorized access.
• Sensitive user data must be encrypted during storage and transmission.
4.2.2Usability
4.2.3Reliability
4.2.4Scalability
• The system architecture must support scaling to handle increased loads, including larger image
files and higher traffic.
4.2.5Performance
• The OCR process (from image upload to text output) should occur within a reasonable time frame
(ideally within a few seconds).
• The system should achieve a high character recognition accuracy (>90% for clean images).
4.2.6Compatibility
• The system should be compatible across various platforms (Windows, Linux) and devices
(desktop, mobile).
4.2.7Documentation
4.4Software Requirements
4.4.1Development Phase Software
10
CHAPTER 5
SYSTEM DESIGN AND IMPLEMENTATIONS
CHAPTER 5
SYSTEM DESIGN AND IMPLEMENTATIONS
5.1 ARCHITECTURE
5.1.1 Use Case Diagram :
A use case diagram is a visual representation of how users interact with a system. It's like a blueprint that
focuses on functionality from the user's perspective.
The Use Case Diagram for the Object Character Recognition (OCR) system illustrates the interaction
between the user and the system’s core functionalities. The user initiates the process by uploading an
image or a scanned document containing printed or handwritten text. Once the image is uploaded, the
system performs preprocessing tasks such as noise removal, resizing, and grayscale conversion to
enhance the quality of text recognition. After preprocessing, the OCR engine (pytesseract) extracts the
text from the image. The recognized text is then cleaned, formatted, and presented back to the user for
viewing or downloading. This workflow ensures that users can quickly and accurately digitize textual
information from images, improving accessibility, searchability, and editability of important documents.
11
The Data Flow Diagram (DFD) of the Object Character Recognition (OCR) system
illustrates how data moves through different modules during the OCR process. Initially,
the user uploads an image or scanned document through the system's interface. The
uploaded image is first handled by the Image Upload Module, which then sends it to the
Preprocessing Module. In preprocessing, the image undergoes operations like noise
reduction, resizing, and conversion to grayscale using OpenCV and Pillow libraries. The
cleaned image is then passed to the OCR Engine (pytesseract), which detects and extracts
text from the image. The extracted raw text is further sent to the Post-Processing
Module, where it is cleaned and formatted using Numpy for better readability and
structure. Finally, the recognized text is displayed to the user or made available for
download. This data flow ensures a smooth transition of information from image input to
meaningful text output, providing users with an efficient text extraction solution..
12
5.1.4 Sequence Diagram :
A sequence diagram is a type of UML (Unified Modeling Language) diagram
that depicts the interactions between objects in a system arranged in time sequence. It focuses on how
objects collaborate to achieve a specific functionality.
The Sequence Diagram for the Object Character Recognition (OCR) system represents the step-by-step interaction
between different components over time. The process begins when the user uploads an image through the web
application. The web application then sends the uploaded image to the Preprocessing Module, where various image
enhancement techniques are applied to improve OCR accuracy. After preprocessing, the image is forwarded to the
OCR Engine (using pytesseract), which extracts the text from the image. The raw extracted text is then passed to the
Post-processing Module, where it is cleaned and formatted for better readability. Finally, the processed text is sent
back to the web application, where it is displayed to the user or offered as a downloadable file. This sequence ensures
a smooth and logical flow of data from input to output, enabling efficient and accurate text extraction from images.
13
Class Name Responsibilities
User Handles user interaction (uploads image, receives text output).
ImageProcessor Preprocesses the image using OpenCV and Pillow (resize, denoise, grayscale).
OCREngine Performs text extraction from the image using Pytesseract.
PostProcessor Cleans, formats, and structures the extracted text using Numpy.
14
5.3 IMPLEMENTATION
The Object Character Recognition (OCR) system relies on several important libraries to function
efficiently. Flask is used as the primary backend framework to create the web server and API endpoints
through which users can upload images and receive extracted text. To allow seamless communication
between different domains, Flask-CORS is integrated, enabling cross-origin requests from the frontend to
the backend. The core OCR functionality is powered by Pytesseract, a Python wrapper for Google's
Tesseract-OCR engine, which is responsible for detecting and extracting text from images. Image handling
tasks, such as loading, resizing, and cropping, are managed using Pillow, an image processing library. For
more advanced image preprocessing, like grayscale conversion, noise reduction, and thresholding, the
system employs OpenCV-Python-Headless, a lightweight, server-friendly version of OpenCV.
Additionally, Numpy is used extensively for efficient handling and manipulation of image arrays and
matrix operations during preprocessing and post-processing stages. Together, these libraries create a
powerful, modular, and scalable OCR solution.
16
Figure 5.3.2 UI (FRONTEND)
The UI of the OCR system is designed to be user-friendly and intuitive. It features a simple interface where
users can upload an image through a frontend application. The system then processes the image using
OpenCV for preprocessing and Pytesseract for text extraction. The extracted text is displayed in the terminal
or a designated output area, allowing users to easily view and edit the results. The interface includes options
like "Choose an image" and "Extract Text" to guide users through the process seamlessly. The design
prioritizes functionality and ease of use, ensuring that even non-technical users can efficiently convert
images into editable text.
17
Figure 5.3.3 Extracting the text
The text extraction process involves several key steps: First, the input image is preprocessed using OpenCV
to enhance clarity, such as converting it to grayscale or adjusting contrast. Next, the Pytesseract OCR engine
analyzes the image to detect and recognize characters. The extracted text is then processed to correct errors
or format inconsistencies. Finally, the result is displayed in a readable format, allowing users to copy, edit, or
save the digitized text. This automated method ensures efficient and accurate conversion of both printed and
handwritten text from images into editable digital content.
18
CHAPTER 6
TESTING
CHAPTER 6 TESTING
TEST SCENARIOS
7.1 CONCLUSION
The developed OCR system successfully demonstrates the ability to extract text from images and
scanned documents with reasonable accuracy. By leveraging tools like OpenCV for image
preprocessing and Pytesseract for text recognition, the system automates the conversion of
printed and handwritten text into editable digital formats. Testing confirmed its effectiveness on
clear, high-quality documents, though challenges remain with blurry images, complex
handwriting, and non-standard fonts. The system’s modular design, built using Flask for the
backend and lightweight libraries, ensures scalability and ease of integration into larger
applications. Overall, this project provides a functional foundation for digitizing textual content,
reducing manual effort, and improving accessibility.
• Improved Handwriting Recognition – Enhance accuracy for cursive and varied handwriting using deep learning
(CNNs/Transformers).
• Multi-Language Support – Extend to regional languages and complex scripts (e.g., Arabic, Devanagari).
• Real-Time Mobile OCR – Optimize for live camera scanning and offline mobile use.
• AI-Powered Post-Processing – Integrate NLP for error correction and context-aware text refinement.
• Cloud & Scalability – Enable bulk processing and cloud storage with search functionality.
• Security Features – Add auto-redaction for sensitive data and encryption.
• User-Friendly Upgrades – Include batch processing, export options (PDF/Word), and voice commands.
21
REFERENCES
1. Mori, S., Suen, C. Y., and Yamamoto, K., "Historical review of OCR research and
development," Proceedings of the IEEE, vol. 80, no. 7, pp. 1029-1058, Jul. 1992. (This is a
foundational historical paper.)
2. Breuel, T. M., Ul-Hasan, A., Al-Azawi, M. A., and Shafait, F., "High-performance OCR for
printed English and Fraktur using LSTM networks," International Journal of Computer
Applications, vol. 83, no. 7, 2013. (Example of LSTM use in OCR.)
3. Shi, B., Bai, X., and Yao, C., "An end-to-end trainable neural network for image-based
sequence recognition and its application to scene text recognition," IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 39, no. 11, pp. 2298-2312, Nov. 2017. (Important for
end-to-end deep learning OCR.) Conference Papers: Breuel, T. M., "The OCRopus open source
OCR system," in Document Recognition and Retrieval XV, vol. 6815, 2008, p. 68150F.
(Introducing a key open-source system.) 22
ANNEXURE II
1. Stakeholders Details :
Project Title : "SmartOCR: An Advanced Optical Character Recognition System for Printed text
extraction.