Medical Prescription Recognition System
Medical Prescription Recognition System
Krishna Sharma ([email protected]) Indian Institute of Information Technology, Guwahati March 15, 2024 1 / 17
Outline
3 Methodology
Krishna Sharma ([email protected]) Indian Institute of Information Technology, Guwahati March 15, 2024 2 / 17
Motivation and Problem Statement
Krishna Sharma ([email protected]) Indian Institute of Information Technology, Guwahati March 15, 2024 3 / 17
Motivation and Problem Statement
Motivation:
Patient Safety: Medication errors due to misinterpreted prescriptions can
cause serious harm or even be fatal. An automated system can significantly
reduce these errors, ensuring that patients receive the correct medication and
dosage.
Efficiency: Pharmacists often spend a considerable amount of time
deciphering doctors’ handwriting. An automated recognition system can save
time, allowing pharmacists to serve more patients and focus on other critical
tasks.
Aim: To develop an AI-powered Medical Prescription Recognition System that
accurately interprets handwritten and printed prescriptions, thereby reducing
medication errors, improving patient safety, and streamlining the pharmacy
workflow for efficient healthcare delivery.
Krishna Sharma ([email protected]) Indian Institute of Information Technology, Guwahati March 15, 2024 4 / 17
Literature Review/Related Work
Krishna Sharma ([email protected]) Indian Institute of Information Technology, Guwahati March 15, 2024 5 / 17
Literature Review/Related Work
Krishna Sharma ([email protected]) Indian Institute of Information Technology, Guwahati March 15, 2024 6 / 17
Methodology
Methodology
Krishna Sharma ([email protected]) Indian Institute of Information Technology, Guwahati March 15, 2024 7 / 17
Methodology
Methodology(1/3)
Detection:
For detection of words, we use a neural network which is trained on IAM Forms dataset
The model tries to predict an axis aligned boundary box(AABB) for each word in the scanned
image.
The AABBs are predicted by the following output maps of the model:
3 segmentation maps with one-hot encoding:
Word (inner part), Word (surrounding), Background
4 geometry maps encode distances between the current pixel and the AABB edges:
Top, Bottom, Left, Right
A surrounding class is added to avoid mapping both the background and the surrounding of a
word to the background class
Krishna Sharma ([email protected]) Indian Institute of Information Technology, Guwahati March 15, 2024 8 / 17
Methodology
Methodology (2/3)
The backbone of the neural network is based on the ResNet18 model, used for feature extraction.
Then the further layers merge and upscale the feature maps given by the backbone.
The total loss is the sum of:
Segmentation loss: segmentation is regarded as a pixelwise classification problem, therefore cross
entropy loss is used
Geometry loss: using sum-of-squared errors on the geometry would put more weight on larger
bounding boxes for which larger errors can be tolerated, therefore intersection over union (IOU) is
used instead
Usually, many AABBs are predicted for the same word. Therefore, DBSCAN clustering is used to
predict the correct AABB
Krishna Sharma ([email protected]) Indian Institute of Information Technology, Guwahati March 15, 2024 9 / 17
Methodology
Methodology(1/3)
Recognition:
For recognition of words, we use a neural network which is trained on IAM HandWriting Database
The model tries to extract the word from the image
It consists of 5 convolutional NN (CNN) layers, 2 recurrent NN (RNN) layers and a final
Connectionist Temporal Classification (CTC) layer
CNN: The input image is fed into the CNN layers. These layers are trained to extract relevant
features from the image.These layers uses kernel of size 3x3 and ReLU activation function. Then
a pooling layer summarizes image regions and outputs a downsized version of the input
RNN: the RNN propagates relevant information through the feature sequence.The Long
Short-Term Memory (LSTM) implementation of RNNs is used, as it is able to propagate
information through longer distances and provides more robust training-characteristics than
vanilla RNN.
CTC: While training the NN, the CTC is given the RNN output matrix and the ground truth text
and it computes the loss value. While inferring, the CTC is only given the matrix and it decodes
it into the final text.
Krishna Sharma ([email protected]) Indian Institute of Information Technology, Guwahati March 15, 2024 10 / 17
Results and Discussions
Krishna Sharma ([email protected]) Indian Institute of Information Technology, Guwahati March 15, 2024 11 / 17
Results and Discussions
Krishna Sharma ([email protected]) Indian Institute of Information Technology, Guwahati March 15, 2024 12 / 17
Results and Discussions
Krishna Sharma ([email protected]) Indian Institute of Information Technology, Guwahati March 15, 2024 13 / 17
Results and Discussions
Krishna Sharma ([email protected]) Indian Institute of Information Technology, Guwahati March 15, 2024 14 / 17
Conclusion and Future Work
Krishna Sharma ([email protected]) Indian Institute of Information Technology, Guwahati March 15, 2024 15 / 17
Conclusion and Future Work
Krishna Sharma ([email protected]) Indian Institute of Information Technology, Guwahati March 15, 2024 16 / 17
Conclusion and Future Work
Reference
Krishna Sharma ([email protected]) Indian Institute of Information Technology, Guwahati March 15, 2024 17 / 17