0% found this document useful (0 votes)
3 views13 pages

Transfer Learning

Transfer learning is a machine learning technique that leverages knowledge from one task to improve performance on a related task, allowing models to learn quickly and effectively with limited data. It involves using a pre-trained model, fine-tuning it for a new task, and can help prevent overfitting by utilizing learned features. One-shot learning, a related concept, enables models to classify objects with minimal examples, relying on similarity assessments between images, and is particularly useful in applications like face recognition and object detection.

Uploaded by

Sunil Mehta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views13 pages

Transfer Learning

Transfer learning is a machine learning technique that leverages knowledge from one task to improve performance on a related task, allowing models to learn quickly and effectively with limited data. It involves using a pre-trained model, fine-tuning it for a new task, and can help prevent overfitting by utilizing learned features. One-shot learning, a related concept, enables models to classify objects with minimal examples, relying on similarity assessments between images, and is particularly useful in applications like face recognition and object detection.

Uploaded by

Sunil Mehta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Transfer Learning

We, humans, are very perfect at applying the transfer of


knowledge between tasks. This means that whenever we
encounter a new problem or a task, we recognize it and apply
our relevant knowledge from our previous learning experiences.
This makes our work easy and fast to finish. For instance, if you
know how to ride a bicycle and if you are asked to ride a
motorbike which you have never done before. In such a case, our
experience with a bicycle will come into play and handle tasks
like balancing the bike, steering, etc. This will make things easier
compared to a complete beginner. Such learnings are very useful
in real life as they make us more perfect and allow us to earn
more experience. Following the same approach, a term was
introduced Transfer Learning in the field of machine learning.
This approach involves the use of knowledge that was learned in
some tasks and applying it to solve the problem in the related
target task. While most machine learning is designed to address
a single task, the development of algorithms that facilitate
transfer learning is a topic of ongoing interest in the machine-
learning community.

What is Transfer Learning?


Transfer learning is a technique in machine learning where a
model trained on one task is used as the starting point for a
model on a second task. This can be useful when the second task
is similar to the first task, or when there is limited data available
for the second task. By using the learned features from the first
task as a starting point, the model can learn more quickly and
effectively on the second task. This can also help to
prevent overfitting, as the model will have already learned
general features that are likely to be useful in the second task.

Why do we need Transfer Learning?


Many deep neural networks trained on images have a curious
phenomenon in common: in the early layers of the network, a
deep learning model tries to learn a low level of features, like
detecting edges, colours, variations of intensities, etc. Such kind
of features appear not to be specific to a particular dataset or a
task because no matter what type of image we are processing
either for detecting a lion or car. In both cases, we have to
detect these low-level features. All these features occur
regardless of the exact cost function or image dataset. Thus,
learning these features in one task of detecting lions can be used
in other tasks like detecting humans.

How does Transfer Learning work?


This is a general summary of how transfer learning works:
 Pre-trained Model: Start with a model that has
previously been trained for a certain task using a large
set of data. Frequently trained on extensive datasets, this
model has identified general features and patterns
relevant to numerous related jobs.
 Base Model: The model that has been pre-trained is
known as the base model. It is made up of layers that
have utilized the incoming data to learn hierarchical
feature representations.
 Transfer Layers: In the pre-trained model, find a set of
layers that capture generic information relevant to the
new task as well as the previous one. Because they are
prone to learning low-level information, these layers are
frequently found near the top of the network.
 Fine-tuning: Using the dataset from the new challenge
to retrain the chosen layers. We define this procedure as
fine-tuning. The goal is to preserve the knowledge from
the pre-training while enabling the model to modify its
parameters to better suit the demands of the current
assignment.
The Block diagram is shown below as follows:
Transfer Learning

Low-level features learned for task A should be beneficial for


learning of model for task B.
This is what transfer learning is. Nowadays, it is very hard to see
people training whole convolutional neural networks from
scratch, and it is common to use a pre-trained model trained on
a variety of images in a similar task, e.g models trained on
ImageNet (1.2 million images with 1000 categories) and use
features from them to solve a new task. When dealing with
transfer learning, we come across a phenomenon called the
freezing of layers.
A layer, it can be a CNN layer, hidden layer, a block of layers, or
any subset of a set of all layers, is said to be fixed when it is no
longer available to train. Hence, the weights of freeze layers will
not be updated during training. While layers that are not frozen
follows regular training procedure.
When we use transfer learning in solving a problem, we
select a pre-trained model as our base model. Now, there
are two possible approaches to using knowledge from the
pre-trained model. The first way is to freeze a few layers
of the pre-trained model and train other layers on our
new dataset for the new task.
The second way is to make a new model, but also take
out some features from the layers in the pre-trained
model and use them in a newly created model. In both
cases, we take out some of the learned features and try to train
the rest of the model. This makes sure that the only feature that
may be the same in both of the tasks is taken out from the pre-
trained model, and the rest of the model is changed to fit the
new dataset by training.

Freezed and Trainable Layers:

Now, one may ask how to determine which layers we need


to freeze, and which layers need to train. The answer is
simple, the more you want to inherit features from a pre-
trained model, the more you have to freeze layers. For
instance, if the pre-trained model detects some flower species
and we need to detect some new species. In such a case, a new
dataset with new species contains a lot of features similar to the
pre-trained model. Thus, we freeze less number of layers so that
we can use most of its knowledge in a new model. Now, consider
another case, if there is a pre-trained model which detects
humans in images, and we want to use that knowledge to detect
cars, in such a case where the dataset is entirely different, it is
not good to freeze lots of layers because freezing a large number
of layers will not only give low level features but also give high-
level features like nose, eyes, etc which are useless for new
dataset (car detection). Thus, we only copy low-level features
from the base network and train the entire network on a new
dataset.
Let’s consider all situations where the size and dataset of the
target task vary from the base network.
 The target dataset is small and similar to the base
network dataset: Since the target dataset is small, that
means we can fine-tune the pre-trained network with the
target dataset. But this may lead to a problem of
overfitting. Also, there may be some changes in the
number of classes in the target task. So, in such a case
we remove the fully connected layers from the end,
maybe one or two, and add a new fully connected layer
satisfying the number of new classes. Now, we freeze the
rest of the model and only train newly added layers.
 The target dataset is large and similar to the base
training dataset: In such cases when the dataset is
large, and it can hold a pre-trained model there will be no
chance of overfitting. Here, also the last full-connected
layer is removed, and a new fully-connected layer is
added with the proper number of classes. Now, the entire
model is trained on a new dataset. This makes sure to
tune the model on a new large dataset keeping the
model architecture the same.
 The target dataset is small and different from the
base network dataset: Since the target dataset is
different, using high-level features of the pre-trained
model will not be useful. In such a case, remove most of
the layers from the end in a pre-trained model, and add
new layers a satisfying number of classes in a new
dataset. This way we can use low-level features from the
pre-trained model and train the rest of the layers to fit a
new dataset. Sometimes, it is beneficial to train the
entire network after adding a new layer at the end.
 The target dataset is large and different from the
base network dataset: Since the target network is
large and different, the best way is to remove the last
layers from the pre-trained network and add layers with a
satisfying number of classes, then train the entire
network without freezing any layer.
Transfer learning is a very effective and fast way, to begin with,
a problem. It gives the direction to move, and most of the time
best results are also obtained by transfer learning.
Below is the sample code using Keras for Transfer learning &
fine-tuning with a custom training loop.
Advantages of transfer learning:
 Speed up the training process: By using a pre-trained
model, the model can learn more quickly and effectively
on the second task, as it already has a good
understanding of the features and patterns in the data.
 Better performance: Transfer learning can lead to better
performance on the second task, as the model can
leverage the knowledge it has gained from the first task.
 Handling small datasets: When there is limited data
available for the second task, transfer learning can help
to prevent overfitting, as the model will have already
learned general features that are likely to be useful in the
second task.
Disadvantages of transfer learning:
 Domain mismatch: The pre-trained model may not be
well-suited to the second task if the two tasks are vastly
different or the data distribution between the two tasks is
very different.
 Overfitting: Transfer learning can lead to overfitting if
the model is fine-tuned too much on the second task, as
it may learn task-specific features that do not generalize
well to new data.
 Complexity: The pre-trained model and the fine-tuning
process can be computationally expensive and may
require specialized hardware.

What is one-shot learning?


One-shot learning is an ML-based object classification algorithm
that assesses the similarity and difference between two images.
It’s mainly used in computer vision.

The goal of one-shot learning is to teach the model to set its own
assumptions about their similarities based on the minimal number
of visuals. There can be only one image (or a very limited number
of them, in which case it is often called few-shot learning) for
each class. These examples are used to build a model that can
then make predictions about further unknown visuals.

For instance, to distinguish between apples and pears, a


traditional AI model would need thousands of images taken at
various angles, with different lighting, background, etc. In
contrast, one-shot learning doesn’t require many examples of
each category. It generalizes the information it has learned
through experience with the same-type tasks by inferring similar
objects and classifying unseen objects into their respective
groups.

How does one-shot learning work?

If we need to add new classes for data classification to a


traditional neural network, this presents a challenge. In this case,
the neural network needs to be updated and retrained, which can
be either expensive, or impossible due to lack of sufficient data
and/or time.

But for tasks such as face recognition, we don’t always need to


assign faces into predefined classes (person A, person B, person
C, etc.). What we need is just to tell whether the person in front of
the border gate counter is the same as in the presented ID. This
means that the problem we have to solve is one of evaluating
differences rather than classifying.

Sticking to the example of border control, we have two images:


camera input and the person’s passport photo. The neural
network evaluates the degree of similarity between them.

Let’s take a look at how it is actually done and what types of


neural networks are needed.
Matching networks for one-shot learning

One-shot learning for computer vision tasks is based on a special


type of convolutional neural networks (CNNs) called Siamese
neural networks (SNNs). Classic CNNs adjust their parameters
throughout the training process to correctly classify each image.
Siamese neural networks are trained to evaluate the distance
between features in two input images.

Siamese neural networks run the inputs through two identical


instances of the same network. Both are trained on the same data
set and then combined to produce an output as a function of their
inputs.

Each of the two branches of this convolutional network is


responsible for learning the features of one image, while a part
with the differentiating layer evaluates how those features relate
to each other across frames. The differentiating layer checks
whether similar features were learned from both images.
Training an SNN for one-shot learning involves two stages:
verification and generalization.

In the verification stage, the triplet loss function is used. The


model receives three images – an anchor, a positive image, and a
negative image. The encoded features of the first and second
images are very similar, whereas the features of the third image
differ. To achieve better results for the model training, the triplets
of positive, negative, and anchor images must look relatively
similar, to help the model learn on the “hard-to-recognize”
examples.

In the generalization stage, the model is trained to evaluate the


probability that the input pairs belong to the same class. At this
step, it’s essential to provide the model with images where the
difference is very difficult to recognize. By increasing the
complexity of the estimations, we speed up the educating process
of the model.
Upon the completion of these two steps, the model is ready to
use: it’s now able to compare new images against each other.

Benefits and limitations of Siamese neural networks

When working with these models, keep the following in mind.

Advantages of SNNs

 When it comes to recognizing images, faces, and other


objects with strong similarities, Siamese neural networks
have been shown to outperform other types of neural
networks in terms of speed and accuracy.
 The Siamese networks have the advantage that, like other
NNs, they can be initially trained on large datasets but,
unlike other NNs, they do not need to be seriously retrained
to detect new classes.
 In addition, as both outputs share the same parameters, the
model can achieve better generalization performance
especially when dealing with similar but not identical
objects.
Challenges of SNNs

 The main disadvantage of Siamese networks is that they


require much more computation power than other types of
CNNs since there are twice as many operations needed to
teach two models during training.
 There is also a large increase in memory requirements.

The main idea of SNNs is to reconstruct the original objects into a


latent space where you can force them to meet some predefined
requirements. CNN in images is the main application area for one-
shot learning. However, the networks do not necessarily have to
be convolutional. Besides, there are no limitations on the type of
problem, as long as the constraints can be specified in the latent
space.

Note that other neural networks are also successfully used in one-
short learning for image and video recognition. These
include memory augmented NNs, spiking neural
networks, Bayesian NNs, etc.

What is the difference between zero-shot, one-shot


learning, and few-shot learning models?

Apart from one-shot learning, there exist other models that


require just several examples (few-shot learning) or no examples
at all (zero-shot learning).

Few-shot learning is simply a variation of one-shot learning model


with several training images available.

The goal of zero-shot learning is to categorize unknown classes


without training data at all. The learning process here is based on
the metadata of the images, i.e. the features relevant for the
image. The process is similar to the human cognitive process.
Say, you read a detailed description of a giraffe in a book. There’s
a high chance that you will be able to recognize it in a photo or
when you see it in the real world.
Applications

One-shot learning algorithms have been used for tasks like image
classification, object detection and localization, speech
recognition, and more.

The most common applications are face recognition and


signature verification. Apart from airport checks, the former
can be used, for example, by law enforcement agencies to detect
terrorists in crowded places and at mass events such as sports
games, concerts, and festivals. Based on surveillance cameras’
input, AI can identify people from police databases in the crowd.
This technology is also applicable in banks and other institutions
where they need to recognize the person from their ID or a photo
in their records. The same process works for signature
verification.

One-shot learning is essential for computer vision, notably for


drones and self-driving cars to recognize objects in the
environment.
Another area is cross-lingual word recognition, where one-
shot learning is applied to identify unknown words in the
translation language.

It can also be effectively used for detecting brain activity in brain


scans.

Conclusion
The big advantage of the one-shot learning algorithm is that the
classification of images is performed based on their similarity, not
on the analysis of a large number of features. This significantly
reduces computational costs and time spent on training the model.

In practice, one-shot learning has especially big potential for face


recognition anywhere, from the exhibition entrances to the
recognition of old manuscripts.

The technology keeps developing. The ‘less than one’-shot learning


model and one-shot learning with memory-augmented neural
networks are the next step in the development of deep learning and
its integration in real life.

You might also like