Fulltext01 7
Fulltext01 7
devices
A hands-on practice of TensorFlow model conversion to TensorFlow
Lite model and its deployment on Smartphone to compare model’s
performance.
Mitra Rashidi
I/we allow publishing in full text (free available online, open access):
☒ Yes, I/we agree to the terms of publication.
Sundsvall 2202-06-23
................................................................................................................................
Location and date
Mitra Rashidi
................................................................................................................................
Name (all authors names)
1996
................................................................................................................................
Year of birth (all authors year of birth)
ii
Abstract
The thesis describes development of Machine learning (ML) integration procedure in
smartphone and provides a comparison with the traditional computer models like
TensorFlow. Machine learning is a field that promotes a lot of observation in the current
era due to its notable desire in various Intelligent applications such as computer vision,
natural language processing, recommendation systems, and time series problems. The
limitation of resources on a smartphone makes it challenging to apprehend varied
completely different activities with high precision. A user-friendly procedure is
proposed to perform the designing, development, evaluation and deployment of
Machine learning models for embedded devices with limited resources. TensorFlow
(TF) and TensorFlow lite (TF Lite) were selected to perform the task. The thesis
provides the procedure of a base Machine learning model designed for a computer,
laptop or a machine to a compressed and optimized version of the same model for
integration on a device with limited resources. The models were compared, and results
were obtained. It was found that the TensorFlow lite model is extremely favorable for
Machine learning integration in embedded devices. The storage of the developed model
file and the time taken for the prediction of the value was compared. The results showed
that the TensorFlow lite model was as accurate as the basic model, the size of the
TensorFlow lite model was 60% less than the size of the base model and the response
time of the TensorFlow lite model was 70% less than the base model. This showed that
the Machine learning integration to the embedded devices is promising with the
procedure proposed in the thesis. Finally, the model was deployed in the android smart
phone and its practicality and feasibility of use was showed. The framework adopts a
unique and reliable approach that provides flexibility while passing the challenge of
Machine learning integrated in the android device.
iii
Sammanfattning
Rapporten beskriver processutvecklingen av integration inom smart-enheter och gör
jämförelse med de traditionella datormodeller som exempelvis TensorFlow.
Maskininlärning är ett område som för närvarande observeras av många människor på
grund av dess anmärkningsvärda önskan inom olika intelligenta områden som
datorseende, naturlig språkbehandling, föreslagna datorsystem, etcetera. Problem med
utdata och tidsserier. Resursbegränsningen av smart-enheter gör det svårt att uppfatta
olika aktiviteter som helt annorlunda än en stor process. En rekommenderad
användarvänlig process för att utföra design, utveckling, utvärdering och leverans av
maskininlärning-modeller för resurs begränsade inbäddade enheter. TensorFlow och
TensorFlow Lite valdes ut för att utföra examensarbetet som tillhandahåller arbetsflödet
för en maskininlärning-modell designad för en dator, bärbar dator eller en komprimerad
och optimerad version av samma modell integrerad på en enda enhet har begränsade
resurser. Modellerna jämförs och resultaten erhålls. Resultatet det visar sig att
TensorFlow Lite-modellen är extremt starkt integrerad med maskininlärning i inbyggda
enheter Lagringen av den utvecklade modell filen, tid det tog för att förutsäga värdet
jämfördes. Resultaten visade att TensorFlow Lite-modellen var korrekt i jämförelse med
basmodellen, då storleken på TensorFlow Lite var 60 % mindre än basstorleken och
responstid för TensorFlow Lite-modellen var 70 % mindre än basmodellen. Detta visar
att det finns en möjlighet att integrera maskininlärning i enheter med den process som
föreslås i avhandlingen. Slutligen är modellen gjord på en Android-smartphone, dess
praktiska funktionalitet och genomförbarhet har visat. Ramverket har ett unikt och
pålitligt tillvägagångssätt, vilket ger flexibilitet samtidigt som det klarar utmaningen att
integrera i Android-enheter.
iv
Acknowledgements
I am extremely grateful to my supervisors, Karl Pettersson for his invaluable advice,
continuous support, and patience during my thesis. His immense knowledge and
creative vision have encouraged me in all the time of my research, implementation and
conclusion.
I would like to express my gratitude to my family, without their tremendous
understanding and encouragement in the past few weeks, it would be impossible for me
to complete my thesis.
v
Table of Contents
Abstract iii
Sammanfattning iv
Acknowledgements v
Terminology viii
1 Introduction ............................................................................................. 1
1.1 Background and problem motivation ....................................................... 1
1.2 Overall aim ............................................................................................... 2
1.3 Problem statement/Research questions ..................................................... 2
1.4 Knowledge goals....................................................................................... 3
1.5 Scope ......................................................................................................... 3
1.6 Outline ...................................................................................................... 3
2 Theory....................................................................................................... 4
2.1 Machine Learning ..................................................................................... 4
2.2 Transfer Learning ..................................................................................... 5
2.2.1 TensorFlow ................................................................................................ 5
2.2.2 Tensor ........................................................................................................ 6
2.2.3 Tensor Calculation .................................................................................... 7
2.2.4 TensorFlow Lite ........................................................................................ 7
2.3 Model optimization with TensorFlow ...................................................... 8
2.3.1 Quantization .............................................................................................. 8
2.3.2 Post-training quantization .......................................................................... 9
2.3.3 Pruning ...................................................................................................... 9
2.4 Embedded devices .................................................................................. 11
2.4.1 Micro controller ....................................................................................... 11
2.4.2 Smartphones ............................................................................................ 12
2.4.3 Raspberry pi ............................................................................................ 12
2.5 Environment............................................................................................ 12
2.5.1 Google Colab ........................................................................................... 12
2.6 Evaluation dependent Parameters ........................................................... 12
2.6.1 Accuracy of model .................................................................................. 12
2.6.2 Time of prediction ................................................................................... 12
2.6.3 Need of resources .................................................................................... 13
2.7 Related work ........................................................................................... 13
2.7.1 Design and optimization of a TF Lite on a smartphone .......................... 13
vi
2.7.2 Deploying Image Deblurring Across Mobile Devices ............................ 13
3 Methodology........................................................................................... 14
3.1 Scientific method description ................................................................. 14
3.2 Project method description ..................................................................... 15
3.2.1 Problem definition ................................................................................... 15
3.2.2 Literature study ........................................................................................ 15
3.2.3 Implementation ........................................................................................ 15
3.2.4 Measurement setup .................................................................................. 16
3.2.5 Evaluation ................................................................................................ 16
3.3 Evaluation method .................................................................................. 16
4 Implementation ...................................................................................... 17
4.1 Machine Learning model building using Keras ...................................... 17
4.2 Machine Learning model training and evaluation .................................. 19
4.3 TensorFlow lite Conversion and optimization........................................ 19
4.3.1 TensorFlow Lite Conversion ................................................................... 20
4.3.2 TensorFlow Lite Converter optimization ................................................ 20
4.4 TensorFlow lite Evaluation on Colab ..................................................... 20
4.5 TensorFlow Lite Model integration on Android smartphone ................. 22
5 Results..................................................................................................... 23
5.1 Resulting system ..................................................................................... 23
5.2 Measurement results ............................................................................... 23
6 Discussion ............................................................................................... 27
6.1 Analysis and discussion of results .......................................................... 27
6.2 Project method discussion ...................................................................... 27
6.2.1 Definition of the issue ............................................................................. 27
6.2.2 Study of literature .................................................................................... 28
6.2.3 Implementation ........................................................................................ 28
6.3 Scientific discussion ............................................................................... 28
6.4 Ethical and societal discussion ............................................................... 29
7 Conclusions ............................................................................................ 30
7.1 Answers of problem statement ............................................................... 30
7.2 Future Work ............................................................................................ 31
7.2.1 Text detection and recognition ................................................................ 31
7.2.2 Object detection and tracking .................................................................. 31
vii
Terminology
AAAL American Association for Applied Linguistics
AL Artificial Intelligence
ANN Artificial Neural Network
APL Application Programming Interface
ARM Acorn RISC Machine
CPU Central Processing Unit
DC Personal Computer
DL Deep Learning
DSP Digital Signal Processing
FPU Floating-point unit
GPU Graphics Processing Unit
IEEE Institute of Electrical and Electronics Engineers
IDL Integrated Development Environment
IOT Internet Of Things
MCU Microcontroller Unit
ML Machine Learning
NLP Natural Language Processing
NN Neural Network
SIMD Single Instruction Multiple Data
TF TensorFlow
TF Lite TensorFlow Lite
TPU Tensor Processing Unit
viii
1 Introduction
New technologies have historically had a huge influence on human evolution -
television, computers, the Internet, and, most recently, smartphones have invaded our
daily lives and living without them has become unfathomable. Utilizing machine
learning on the small devices and taking complex tasks from them is raising the AI
capabilities of the devices. This highlights the basic question of how Systems are used
and how much they are capable of [1].
The Internet of Things has greatly expanded the number of technical gadgets, but their
prominence in ordinary living has also offered a variety of applications in a wide range
of scenarios. Autonomous IoT devices solve this by incorporating AI into the devices.
Bringing logic, learning ability, context inference, and anticipation to IoT can
eventually reduce the strain on their operator. The implementation of AI on the gadgets
also addresses the basic privacy problem that has hampered the mainstream deployment
of cloud computing into each house. This helps us to largely address the question of
how AI systems should be used: they should enable well-being by permitting smart
gadgets that can fade into the background [2].
Machine Learning (ML) with TensorFlow Lite (TF lite) is a burgeoning field at the
intersection of embedded systems and Artificial Intelligence (AI). As such, a new range
of embedded applications are emerging for neural networks. Because these models are
extremely small (few hundred KBs), running on microcontrollers or digital signal
processor based embedded subsystems, they can operate continuously with minimal
impact on device battery life. Amazon, Apple, Google, and others use tiny neural
networks on billions of devices to run always-on inferences for keyword detection,
visual object detection, human-activity recognition, and anomaly detection [1]. The
work on the embedded devices is proposed by the authors to check the effectives on
them.
1
ML is still in the progress stage despite its exponential growth. There are experiments
going on carried out by the researchers involving new operations and network
architectures to obtain better results from their models. With an enhancement in results,
there is an increase in the demand of these models by the designers.
After reviewing the literature [2] following are the challenges that are being drawn:
• There are limitations in the platforms and sites which allow the ML integration
on embedded devices [4].
• The training tools are scarce in the ML market.
• Model deployment tools have limited access and lack of productivity makes
ML a challenging process for beginners.
• There is no proper framework for the compression and quantization of the ML
algorithm. Further, there is a lack of platform for the model invocation and
execution.
2
1.4 Knowledge goals
Knowledge goals of this thesis are as following
1.5 Scope
In this thesis I am focusing on services provided by TF to make ML models applicable
on smartphones. I will explore the procedure of the machine learning model
compression and optimization using TF and monitor its effect on the model's
performance by comparing its Accuracy, need of resources, memory consumption and
response time.
1.6 Outline
In this thesis, I will introduce the tools provided by TensorFlow to integrate in machine
learning or deep learning algorithms in embedded devices. I will explore the options
given by Google’s leading open-source library to compress our model and evaluate the
performance of the compressed models.
The remainder of this thesis is organized as follows: Chapter 2 presents the prerequisite
theory including Machine learning, TensorFlow, TensorFlow Lite, Tensor, embedded
devices, transfer learning and some related research done on integrating machine
learning models on embedded devices. Chapter 3 presents the methods that will be
followed to make our model work smoothly on a n android smartphone. How I planned
for different modules of my thesis is discussed in this chapter. Chapter 4 presents hands-
on practice of ML model building, evaluating, compressing, optimization and its
integration on android smartphone using TensorFlow library. Chapter 5 presents the
findings of the measurements gathered during the implementation phase in a table and
bar charts for the sake of upcoming discussion and conclusion in next chapters. Chapter
6 highlights discussions about the results that are based on the analysis of the
implementation done in chapter 4. The discussion will address other aspects, e.g.,
discussion of project method, scientific discussion, and ethical and societal aspects of
ML integration on a smartphone. Chapter 6 presents the conclusion of my thesis: do I
achieve the required goal, answer the questions raised in knowledge goals and some
future works are recommended related to my thesis.
3
2 Theory
In this chapter, descriptions of various topics that are necessary to understand the
remainder of the paper are presented.
“Machine Learning is the field of study that gives computers the ability to learn without
being explicitly programmed.” —Arthur Samuel, 1959
“A computer program is said to learn from experience E with respect to some task T
and some performance measure P, if its performance on T, as measured by P, improves
with experience E.” —Tom Mitchell, 1997
4
It is used to solve a wide range of problems not only in computer vision [5], but also in
other fields, such as data classification, knowledge extraction, or even speech
recognition [6].
2.2.1 TensorFlow
TensorFlow is the second-generation framework of Google Brain. On Feb 11, 2017,
edition 1.0.0 was published. TensorFlow, unlike the standard version, can run on many
Multi - Core CPUs. TensorFlow is compatible with 64-bit Linux, macOS, Vista, and
smart phones devices such as iOS and android [7].
Its adaptable design enables simple computing deployments over a wide diverse array
of substrates from PCs to hundreds of computers to smartphones and other devices.
Google revealed the Tensor processing unit in May 2016, an application-specific
microchip designed exclusively for deep learning and customized for TensorFlow [8].
A TPU is a configurable AI accelerator that is intended to use or execute models but
instead train them. Google said that they had been using TPUs in their data centers for
over a year and discovered that they provide an order of magnitude effective efficiency
per watt for deep learning. Google unveiled the second-generation TPUs in May 201,
as well as their availability in Google Compute Engine. Second-generation TPUs give
up to 180 teraflops of throughput and up to 11.5 petaflops when arranged into groups of
64 TPUs [9].
TF allows developers to create a dataflow structure that describes how data moves and
the which mathematically operations should be done on data during its transfer from
one point to another in the defined structure. In this structure data is presented as tensors
and the mathematical operations as nodes. The figure 2 given below shows both data
and its flow are presented [9].
5
Figure 2: TensorFlow overview
2.2.2 Tensor
TensorFlow allows you to create dataflow topologies with architectures to specify how
information goes through a tree by receiving data in the form of a series of layers called
Tensor. It enables you to create a flowchart of operations that may be done on all this
data, which passes with one extreme and returns as out at [10].
6
In ML and DL data is represented numerically, in the form of tensor. A Tensor is a
container that can hold both vectors, matrix, and scalar data type, in other words, it can
be understood as a multidimensional array. TF deals with only tensors. The data is first
converted into tensors and then further processed. Even the Images are converted into a
table of pixel values, which are then treated as a tensor and then further processed.
7
access. As a result, new environments are not limited to certain places with connection.
TF Lite types are made in a bridge format file as Flat Buffers. It is a serialization toolkit
that saves data in such a flattened raw buffer, allowing for full access before packaging.
The "TensorFlow" name is also visible for the TF Lite model. This format approach
enables calculation optimization and minimizes space needs. As a result, it outperforms
TensorFlow models [14].
2.3.1 Quantization
Quantization is used to reduce the number of bits needed to represent a variable,
maintaining the accuracy of the network. It is used both to avoid overfitting and to
reduce the size of the network. a model can be quantized when it’s converted to TF Lite
format using the TF Lite Converter [14].
According to research in the field of neuroscience, each synapse in the brain has a
memory space of 26 distinct synaptic qualities, this amounts to a respective value of 4.7
bits. Yet, because this is the unit capacity of modern computers, current NNs often rely
on decimals or double resolution floating figure values10. Because these number
symbols were never deliberately created for NNs, the challenge is whether we can
reduce the significance or build a system as a means optimized for NNs. Reduced
information value might have several advantages [15].
SIMD allows for more cost-effective and readily parallelized implementation of
functions with fewer and static point numbers. As a result, energy consumption may be
reduced while computing speed is increased [15].
Gaussian, Laplacian, and Gamma distributions are optimal symmetric uniform
quantizers. They were professionally researched. In contrast, the spacing between levels
varies with non-uniform quantizers. The functions given above are known as
quantifiable quantizers. Han et al., on the other hand, offered weight sharing
quantification [16]. K-means is employed in their concept to clump identical weights
for separate layers. However, this solution necessitates the use of a look-up table, which
increases memory usage and access. It is a form of quantization that cannot be
computed.
Using quantization in the context of NNs is useful for the reasons stated above. A next
paragraph will go through the NN's possible beneficial and detrimental consequences.
The figure 5 is given below shows the different type av quantization, Uniform
quantization, and non-Uniform quantization [17].
8
Figure 5: Quantization types
2.3.3 Pruning
Pruning removes those neurons that are lighter or less representative, improving the
generalization of the model and decreasing its size and computational costs. This
method was proposed back in 1990 [17], where the authors pruned a large ANN by
estimating the sensitivity of the error function for each connection and eliminating the
lowest ones.
Model pruning consists of removing (setting to 0 permanently) certain parameters. The
parameters that are already close to 0 are pruned. This stops a model from over fitting
since the parameters that were deemed useless during training cannot be reactivated
again. There are different ways to prune a model. Some random number of weights can
prune during the training, or a pre-trained model can be pruned as well to simplify a
model and make it lighter. The figure 7 is given below shows the different sparse
structure in a 4-dimensional weight tensor [18].
9
Figure 7: Types of sparsity from irregular to regular.
According to [21] Pruning can be done in the following was. The figure 8 is given below
shows illustration of the three stages complicated in the traditional pruning process [19].
Pruning is a strategy for introducing sparsity into NN. The trimming of irrelevant
weights inside NNs, first advocated by LeCunet al., can lead to several advancements
[19]:
• Improved generalization (reduce overfitting).
• Reduced memory footprint.
• Shorter inference delay.
Sparsity can and should be taken advantage of to reduce ram usage and reasoning delay.
As previously stated, pruning exploited to the memory footprint of a sparse matrix or,
more specifically, the pace of computation with a two-dimensional array. While filters
10
and route sparsity are simple to exploit since they depict a less packed structure, finer-
grained compactness requires unique processors and customized implementations-
whether in hardware or software [20].
Sparsity can be a characteristic of a NN, but it can also be observed in later stages of
ML on the edge:
11
can be included into an MCU architecture. The ARM Cortex - M4 and M7 architectures,
for example, have a DSP [23].
2.4.2 Smartphones
Smartphones are the most used device nowadays; there numerous ongoing development
and improvement projects are in progress for this category of embedded devices. TF
supports both android and iOS platforms.AI in the form of ML or DL has a large scope
in both platforms. ML is used in NLP, computer vision, recommendation systems and
many other applications of smart phones [24].
2.4.3 Raspberry pi
Raspberry pi is a single-board computer which is mostly used by developers to build
hardware projects, do home automation, and Edge computing, in industrial applications
and in IOT. TF officially started supporting Raspberry Pi back in 2018 [24] with a
collaboration with Raspberry pi foundation. TF allows its users to integrate their ML or
DL models on Raspberry Pi with the help of TF lite. Due to the optimization and
compression done during the conversion of a ML or DL model into TF lite model by
TF lite converter it performs well on such a small device.
2.5 Environment
2.5.1 Google Colab
There are several IDE’s available to work with TensorFlow using the python
programming language. Google’s Colab is used for the thesis. Google Colab provides
both notebook and script interface with free access to GPU, CPU for ML developers.
TF official resources also provide a lot of helping material running on Colab. Colab also
provides several utilities, and pre-written functions to save developers time [25].
12
2.6.3 Need of resources
The amount of space the model needs in the limited resource device; model
consumption power will also be under consideration for the work.
13
3 Methodology
This chapter describes the methods adopted during the planning of my thesis, the
workflow followed to achieve Implementation, evaluation, and conclusion of my
thesis.
The rigorous approach based on the scientific study was used to investigate the ML
Integration process. For a better grasp of my thesis's purpose, the literature was
examined to gain a fundamental understanding of Machine Learning, TensorFlow,
14
TensorFlow Lite, and other required topics. Following that, the suggested solutions
would be implemented through hands-on practice during the literature review.
A quantitative way to compare the performance of Machine Learning models in the
evaluation was considered. The objective is to collect data and compare measures to
arrive at conclusions. Measures will be taken at the start of this thesis' implementation
and compared to produce a conclusion. In the discussion and conclusion section, the
findings of my investigation will be shared.
3.2.3 Implementation
After the literature study had been done, it was the time for implementation part of my
thesis. The implementation would be done using the TensorFlow library on Google's
Colab and the evaluation would be done by comparing the model’s performance-
dependent parameters in the Colab on a computer and an android smartphone. The
Machine Learning model would be first designed and trained using Keras, a neural
network building library provided within the TensorFlow library. Then the model would
be trained and evaluated. Once I had a trained model, I would perform the process of its
conversion, optimization, deployment on a smartphone, and evaluation.
Since TensorFlow lite is a tool provided by TensorFlow, Machine Learning model
conversion and optimization to work on a smartphone will be done using the
15
TensorFlow library and its provided tools. For example, the TensorFlow Lite converter
function, optimization function, and TensorFlow Lite interpreter. After that our
Machine Learning model would be in TensorFlow Lite format. The effect on model
evaluation parameters would be marked at this stage by running the TensorFlow Lite
model in the Colab on the computer for evaluation with the help of the TensorFlow Lite
interpreter provided by TF. The last step would be the TensorFlow Lite model's
deployment and evaluation on Android smartphones. The figure below provides the
steps performed in implementation.
3.2.5 Evaluation
Evaluation will be performed at three stages in my implementation.
• ML model Built in Keras Evaluation on Colab.
• TFLite model evaluation on Colab.
• TFLite model evaluation on smartphone.
Measurement setup will let me mark the evaluated quantities in the table. Which then
further helps me compare and conclude.
16
4 Implementation
In this chapter I will do hands-on ML model building, evaluating, compressing,
optimization and its integration on an android smartphone using TensorFlow library.
The following workflow shows in figure 10, how to perform my implementation.
17
Figure 11: Base model designing
After designing the layers structure of my model, I defined the optimizer, loss, and
accuracy metrics for the model. Optimizer defines the rules for model improvement
based on loss and accuracy where the loss is defined as the difference between original
and prediction. If predictions deviate too much from actual results, loss function would
cough up a very large number. The accuracy matrix defines the relevance of prediction
with original value; if the prediction is very close to original value, the accuracy matrix
will give a value near a hundred percent. It was done by using the following python
commands.
The model summary was fetched to investigate its structure and parameters by using the
python command “see fig 12”. The model had 111,146 trainable parameters which
included neurons, weights, biases, optimization parameters, and more computational
parameters.
18
4.2 Machine Learning model training and evaluation
Training data was fitted with validation data, and several epochs to the model so it can
start training. In each epoch, it would train on training data, validate on validation data,
and calculate the accuracy and loss to perform optimization, this process was done
epoch times. Each epoch was reviewed by logging in the output during the training. It
showed details like the loss and accuracy of both training and validation. It helped to
monitor the performance of model training.
Evaluation of my model was done to compare the results later. The Python commands
that were used to evaluate the model are shown below in figure 13.
First, an image was selected from testing images provided in the MNIST dataset, then
the image was converted to a tensor of the required shape and datatype.
Displayed the image by using Matplotlib, after preprocessing it is fed to the model for
prediction.
19
4.3.1 TensorFlow Lite Conversion
TF Lite converter was used for the conversion of the Machine Learning model in TF
Lite format in just two steps.
20
Figure 15: TF Lite model analysis.
The image form of given input is shown in figure 16 and TF Lite model predicted the
right number.
21
4.5 TensorFlow Lite Model integration on Android smartphone
To integrate my model to an android application TF's official GitHub repository was
used in which a demo application is given to place your model and perform the
evaluation. The model was placed in the application and evaluated by running the
application on my smartphone. The results are shown in the snap below in figure 17
22
5 Results
The findings of the measurements gathered during the implementation phase are
presented in this chapter. The outcome will be presented as a table, with a snapshot of
the evaluation cell's output in the Colab added; the Colab can be accessed from the link
in the appendix. Time measurements, Accuracy matrices results, and response time will
also be provided in the form of a grouped bar chart for comparison purposes.
In the second case, TF Lite models were evaluated, TF Lite models require the following
three steps to test run.
The Helper function shown in figure 16 initializes the TF Lite interpreter, fetches model
details and then starts a loop to predict the images in test data one by one. Once the
prediction is done it is compared with actual results to define the accuracy of the model
as a parameter of performance dependence.
The third case is a test run of the model on a smartphone, I evaluate the model by its
performance in real life.
23
Base model took 1.3 seconds to predict, predicted 97% correct results and size of the
model is 1341.44 kbs. Then TF Lite model variants were evaluated and produced during
the implementation and the output is shown in figure 20.
It shows the measurements of each model separately in figure 18. Differences in the
value of performance dependent parameters will be further discussed in the next chapter.
Then I evaluated the model on an android device by test running the model in real life
and it showed great performance. Figure 16 demonstrates its working on an android
smartphone.
Table 1 presents a bar chart of model’s response Time, which shows how much time
the model took to predict the value or result.
24
Figure 21 presents a bar chart of models Inference Time, which shows how much time
the model took to predict the value or result.
Figure 22 presents a bar chart of model accuracy, which shows how many accurate
predictions were made by models in percentage.
25
Figure 23 presents a bar chart of model size, which shows how much storage space,
computational power and memory a model took to predict the value or result.
Figure 23: Bar chart of storage space required by each model in terms of size
26
6 Discussion
This chapter highlights discussions about the results that are based on the analysis of
the implementation of my thesis. The discussion will address other aspects, e.g.,
discussion of project method, scientific discussion, and ethical and societal aspects of
ML integration on a smartphone.
27
6.2.2 Study of literature
The key task that this thesis would address was defined using this manner. The
characterization of the problem at the start of the project made it easier to grasp what
was needed and where to look for it.
6.2.3 Implementation
The method used to build the implementation is divided into 5 parts mentioned in
chapter 5. The first part was model structure designing, defining the layers, loss
function, activation function, optimizer, epochs, and other requirements to start the
model training. It was done by using Keras neural network building library, provided
within TF library. It was hard to design a model without having a strong background in
ML and Keras but TF blogs and support helped a lot during the process.
Data used for training is also available within the TF library. Which made it easier for
me to collect, label, clean, arrange and pre-process data for my thesis implementation
and study. In the next part I train the model on provided training data and evaluate on
testing data. Then I performed the standard TF Lite conversion and evaluated it. which
shows almost the same performance with a big drop in model size and response time.
In the next part I performed the TF Lite conversation with dynamic and only int
approaches discussed in theory for conversion and optimization of Keras model to TF
Lite model using TF Lite converter provided by TF. Only int is mase model converted
to TF Lite with only in quantization. Then I test-run the converted models by using TF
Lite interpreter, which showed impressive performance by maintaining the accuracy of
the model and doping the model size and response time. The model with 7x smaller size
is performing like the original model is a solid proof of TF user friendly and reliable
atmosphere.
I deployed the trained model to an Android application and tested it on an Android
smartphone, it predicted the handwritten word in real time. It can be used in application
with handwritten to text modules and can also work as Google lens does with text data.
All the difficulty was faced during the selection of approaches as there were many
available and the one with better background knowledge can choose better.
28
applications. One can use the approaches mentioned and practiced in my thesis to
integrate ML on embedded devices, including an android smartphone.
29
7 Conclusions
After the fruitful outcome of my thesis, I arrive at the following conclusion.
I conclude that the scope of ML is growing very rapidly in the devices with limited
resources specially smartphones as they are the most used devices nowadays.
TensorFlow and TensorFlow lite are good tools to use while integrating ML on
Smartphones. As the measurements taken during the implementation of my thesis were
in favor of TF and TF lite.
TensorFlow lite optimizes and compresses the model in a way that its size and response
time are dropped without affecting the accuracy of the model. Which is the main
challenge to integrate ML on devices with limited resources and unlimited scope of ML
in their applications.
TF provided the facility of data loading, data labeling, data preprocessing, model
designing, model building, model evaluation, model compression and support for
almost every platform in industry under one roof. I don't need any other tool regarding
ML while using TensorFlow.
TensorFlow Lite user support is also extraordinary then its comparative tool provider
in the market. Researchers and developers are preferring TensorFlow Lite for ML
application in devices with limited resources, due to its extremely facilitating
environment.
Q2 - What are the challenges and limitations faced while deploying the Machine
Learning models on smartphones?
The main challenge was the limitation of resources on a smartphone as compared to the
Colab. The challenge was to compress and optimize the base model in a way that doesn't
affect its accuracy. It was hard to provide the accuracy like the base model after this
much compression and optimization.
30
Q4 - What are the effects on a Machine Learning model’s performance when it's
compressed and optimized for an application in a smartphone as compared to the
original model working on a standard computer or laptop?
The size of the base model was 1342 KB which is reduced to 112 KB when it is
converted to the TensorFlow Lite model which highly affects the need for resources of
the model. The accuracy remains the same despite the change in the size of the model.
The different image material may change the accuracy.
31
References
1. David, R., et al., TensorFlow lite micro: Embedded machine learning for tinyml
systems. Proceedings of Machine Learning and Systems, 2021. 3: p. 800-811.
2. Demosthenous, G. and V. Vassiliades, Continual learning on the edge with
tensorflow lite. arXiv preprint arXiv:2105.01946, 2021.
3. David, R., et al., TensorFlow lite micro: Embedded machine learning on tinyml
systems. arXiv 2020. arXiv preprint arXiv:2010.08678.
4. Warden, P. and D. Situnayake, Tinyml: Machine learning with tensorflow lite on
arduino and ultra-low-power microcontrollers. 2019: O'Reilly Media.
5. Louis, M.S., et al. Towards deep learning using tensorflow lite on risc-v. in Third
Workshop on Computer Architecture Research with RISC-V (CARRV). 2019.
6. Tom Mitchell, 1997
7. Alpaydin, E., Introduction to Machine Learning. Third edit. 2014, MIT Press.
8. Gorospe, J., et al., A Generalization Performance Study Using Deep Learning
Networks in Embedded Systems. Sensors, 2021. 21(4): p. 1031.
9. Alsing, O., Mobile object detection using tensorflow lite and transfer learning.
2018.
10. Ma, Y., et al., Transfer learning for cross-company software defect prediction.
Information and Software Technology, 2012. 54(3): p. 248-256.
11. Tang, J., Intelligent Mobile Projects with TensorFlow: Build 10+ Artificial
Intelligence Apps Using TensorFlow Mobile and Lite for IOS, Android, and
Raspberry Pi. 2018: Packt Publishing Ltd.
12. Anjum, S., et al. A pull-reporting approach for floor opening detection using
deep-learning on embedded devices. in ISARC. Proceedings of the International
Symposium on Automation and Robotics in Construction. 2021. IAARC
Publications.
13. Osman, A., et al. TinyML Platforms Benchmarking. in International Conference
on Applications in Electronics Pervading Industry, Environment and Society.
2022. Springer.
14. riley, o.: p. p#1,2,5,6.
15. Zhang, J., et al. dabnn: A super-fast inference framework for binary neural
networks on arm devices. in Proceedings of the 27th ACM international
conference on multimedia. 2019.
16. Frajberg, D., et al. Accelerating deep learning inference on mobile systems. in
International Conference on AI and Mobile Services. 2019. Springer.
17. HardienJ, Deep Learning Book Series · 2.1 Scalars Vectors Matrices and Tensors.
2018.
18. Gorospe Jauregui, J., A Generalization Performance Study Using Deep Learning
Networks in Embedded Systems. 2021.
19. Tensor.2021;Availablefrom:
https://www.tensorflow.org/lite/performance/model_optimization.
20. Pearce, H., et al., Designing Neural Networks for Real-Time Systems. IEEE
Embedded Systems Letters, 2020. 13(3): p. 94-97.
21. Wang, Y., et al. Pruning from scratch. in Proceedings of the AAAI Conference on
Artificial Intelligence. 2020.
22. Martinez-Alpiste, I., et al., Smartphone-based real-time object recognition
architecture for portable and constrained systems. Journal of Real-Time Image
Processing, 2022. 19(1): p. 103-115.
32
23. Yang, X., et al. A compositional approach using Keras for neural networks in
real-time systems. in 2020 Design, Automation & Test in Europe Conference &
Exhibition (DATE). 2020. IEEE.
24. Gao, Y. and K.M. Mosalam, Deep transfer learning for image‐based structural
damage recognition. Computer‐Aided Civil and Infrastructure Engineering,
2018. 33(9): p. 748-768.
25. Khekare, G. and K. Solanki, REAL TIME OBJECT DETECTION WITH SPEECH
RECOGNITION USING TENSORFLOW LITE.
26. Salah Eddin Adi Department of Electrical and Electronic Engineering Design
and optimization of a TensorFlow Lite deep learning neural network for human
activityrecognition on a smartphone 202.
https://ieeexplore.ieee.org/abstract/document/9629549
27. Deploying Image Deblurring Across Mobile Devices: A Perspective of Quality
and Latency Cheng-Ming Chiang, Yu Tseng, Yu-Syuan Xu, Hsien-Kai Kuo, Yi-
Min Tsai, Guan-Yu Chen, Koan-Sin Tan, Wei-Ting Wang, Yu-Chieh Lin, Shou-
Yao Roy Tseng, Wei-Shiang Lin, Chia-Lin Yu, BY Shen, Kloze Kao, Chia-Ming
Cheng, Hung-Jen Chen; Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR) Workshops, 2020, pp. 502-503
28. TensorFlow “ Post-training quantization”
https://www.tensorflow.org/lite/performance/post_training_quantization
33
A TF Lite models initialized, defined and
evaluated
34
B Model training
35