11 Deep Transfer Learning and Multi Task Learning

The document discusses Deep Transfer Learning and Multi-task Learning, explaining the concepts, applications, and methodologies involved in transferring knowledge from one task or domain to another. It outlines various strategies such as model fine-tuning, multi-task learning, and the use of pre-trained models for enhancing performance in tasks like image segmentation and natural language processing. Additionally, it highlights the benefits of multi-task learning, including implicit data augmentation and enhanced feature learning.

Uploaded by

khuusshii2517

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views24 pages

11 Deep Transfer Learning and Multi Task Learning

Uploaded by

khuusshii2517

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

Deep Transfer Learning and

Multi-task Learning

Concepts are ensembled from various online sources with a great acknowledgement to all those made them available online.
Transfer Learning
• Transfer a model trained on source data A to
target data B
• Task transfer: in this case, the source and target data
can be the same
• Image classification -> image segmentation
• Machine translation -> sentiment analysis
• Time series prediction -> time series classification
• …
• Data transfer:
• Images of everyday objects -> medical images
• Chinese -> English
• Physiological signals of one patient -> another patient
• …
• Rationale: similar feature can be useful in different
tasks, or shared by different yet related data.
Taxonomy of Transfer Learning

Source Data
Labeled Unlabeled
Model fine-tuning Self-taught learning
Labeled Multi-task learning
Target
Data Domain-adversarial training Self-taught clustering
Unlabeled Zero-shot learning

• Hongyi Li, Transfer Learning. https://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/transfer%20(v3).pdf.

Taxonomy of Transfer Learning

Target Data
Labeled Unlabeled
Model fine-tuning Self-taught learning
Labeled Multi-task learning
Source
Data Domain-adversarial training Self-taught clustering
Unlabeled Zero-shot learning

• Hongyi Li, Transfer Learning. https://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/transfer%20(v3).pdf.

Taxonomy of Transfer Learning

Target Data
Labeled Unlabeled
Model fine-tuning Self-taught learning
Labeled Multi-task learning
Source
Data Domain-adversarial training Self-taught clustering
Unlabeled Zero-shot learning

• Hongyi Li, Transfer Learning. https://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/transfer%20(v3).pdf.

Model Fine Tuning
• For a model trained on a large amount of labeled source data,
transfer it to target data with very little labeled target data. E.g.
Application Source Data Target Data
Medical image segmentation Segmentations of many images Segmentations of several
of daily scenes medical images
Speech recognition Audio data and transcriptions of Limited audio data and
many historical speaker transcriptions of a new speaker
Arrhythmia detection Very-long ECG signals of a large ECG snippet from a new patient
number of historical patients

• Idea: Pre-train a model using labeled source data, then fine-tune

the model with labeled target data.
• Caution: Do NOT overfit the limited amount of labeled target data!
• Hongyi Li, Transfer Learning. https://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/transfer%20(v3).pdf.
Conservative Training

Output Layer Output Layer

Hidden Layers • Use parameters of pre- Hidden Layers

trained model to initialize
the parameters of the new
Input Layer model; Input Layer
• Further train the new
model on target data. Limit
the number of epochs to
Source Data avoid over-fitting! Target
Data
• Hongyi Li, Transfer Learning. https://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/transfer%20(v3).pdf.
Layer Transfer
Output Layer Output Layer

Hidden Layer 3 Hidden Layer 3

• Use parameters of pre-trained
Hidden Layer 2 model to initialize the Hidden Layer 2 (Freeze!)
parameters of the new model;
Hidden Layer 1 • Freeze the parameters of some Hidden Layer 1 (Freeze!)
hidden layers; only fine-tune
parameters of other layers on
Input Layer Input Layer
target data. Limit the number of
epochs to avoid over-fitting!
• Usually, freeze the first or last
Source Data few layers. Target
Data

• Hongyi Li, Transfer Learning. https://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/transfer%20(v3).pdf.

Open-source Pre-trained Models
• Using open-source pre-trained models for transfer
learning is an effective and efficient way to acquire
high-quality deep learning results for your
applications!
• Pre-trained Models for Natural Language
Processing (NLP)
• BERT
• GPT-3
• …
• Pre-trained Models for Computer Vision (CV)
• VGG-16
• ResNet50
• ViT
• …
BERT: Bidirectional Encoder Representations
from Transformers
• A natural language processing model proposed
by Google in 2018;
• pre-trained on 2,500 million words of Wikipedia
and 800 million words of Book Corpus;
• allows for training customized question
answering models in a few hours in using a
single GPU;
• available at
https://github.com/google-research/bert.

• Variants: CodeBERT, RoBERTa, ALBERT, XLNet, …

GPT-3: Generative Pre-trained Transformer 3
• A natural language processing model
proposed by OpenAI in 2020;
• trained on 175 billion parameters, which is 10
times more than any previous non-sparse
language model available;
• strong at tasks such as translation, answering
questions, as well as on-the-fly reasoning-
based tasks like unscrambling words
• has been applied to writing news, generating
codes…
• available at https://openai.com/api/.
VGG-16
• A computer vision model proposed by the
Visual Geometry Group from Oxford;
• pre-trained on the ImageNet corpus; first
runner-up of ILSVRC (ImageNet Large Scale
Visual Recognition Competition) 2014 in the
classification task
• a CNN model with 16 layers and about 138
million parameters;
• has been built into popular deep learning
frameworks such as PyTorch and Keras.

• Variant: VGG-19
ResNet50
• A variant of the ResNet model, a computer
vision model proposed by Microsoft in 2015;
• pre-trained on the ImageNet corpus;
• a CNN model with 50 layers and about 380
million parameters;
• has been built into popular deep learning
frameworks such as PyTorch and Keras.
ViT: Vision Transformer
• A computer vision (CV) model proposed by
Google in 2020;
• introduces the Transformer architecture, which
has achieved huge success in natural language
processing, into CV; the idea is treating patches in
images as words in text;
• can achieve better accuracy and efficiency than
CNNs such as ResNet50;
• available at
https://github.com/google-research/vision_transf
ormer
.

• Variants: Swin Transformer, PVTv2…

Taxonomy of Transfer Learning
Target Data
Labeled Unlabeled
Model fine-tuning Self-taught learning
Labeled Multi-task learning
Source
Data Domain-adversarial training Self-taught clustering
Unlabeled Zero-shot learning

• Hongyi Li, Transfer Learning. https://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/transfer%20(v3).pdf.

Multi-task Learning (MTL)
• Simultaneously undertaking multiple tasks
using a single network. E.g.
• Simultaneous ECG heartbeat segmentation and
classification
• …
• We do not necessarily need multiple main
tasks. Rather, we can have one main task and
several auxiliary tasks to support the main task.
• Domain adaptation
• Self-supervision
• ….
• Basic forms of MTL: hard or soft parameter
sharing.
Hard Parameter Sharing
• Different tasks share some
layers (i.e. the parameters of Output Layer (Task A) Output Layer (Task B)
these layers), usually used for
feature extraction for input data.
Task-specific layers Task-specific layers
• The output of the shared layers (Task A) (Task B)
(usually learned features) is fed
to different task-specific layers
to obtain the final results.
Shared Layers
(Feature Extractor)

Input Layer (Tasks A & B)

• Sebastian Ruder, An Overview of Multi-Task Learning in Deep Neural Networks. https://ruder.io/multi-task/.

Soft Parameter Sharing
• Replace shared layers (with
identical parameters) with Output Layer (Task A) Output Layer (Task B)
constrained layers, which have
similar or related parameters.
Unconstrained layers Unconstrained layers
• The similarity or relatedness of (Task A) (Task B)
parameters than can be
controlled by a regularization
term in the loss function, or
Constrained Layers Constrained Layers
through connections between (Task A) (Task B)
constrained layers of different
tasks.
Input Layer (Task A) Input Layer (Task B)

• Sebastian Ruder, An Overview of Multi-Task Learning in Deep Neural Networks. https://ruder.io/multi-task/.

Why does MTL work?
• Implicit data augmentation:
• If different tasks have different input data, then each task
can benefit from the extra knowledge encoded in the
input of other tasks.
• Even if all tasks share the same data, simultaneously
learning for multiple tasks can reduce the risk of
overfitting for each one of these tasks.
• Enhanced feature learning
• It may be the case that a specific task is so noisy that we
cannot learn the most relevant features if we only deal
with that particular task.
• Including other tasks makes it easier to uncover truly
relevant features.
• Besides, some features are easier to learn for some tasks
than others. Handling all tasks together can help enhance
the latter’s performance.
• Sebastian Ruder, An Overview of Multi-Task Learning in Deep Neural Networks. https://ruder.io/multi-task/.
MTL-Example: Image Segmentation and
Depth Regression

Fusing semantic segmentation, instance segmentation and per-pixel depth regression tasks using
hard parameter sharing.

• Kendall, Alex, Yarin Gal, and Roberto Cipolla. "Multi-task learning using uncertainty to weigh losses for scene geometry and
semantics." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
MTL Example: Cross Language Knowledge Transfer

Fusing language-specific tasks using multi-lingual feature transformation layers by hard parameter sharing.

• Huang, Jui-Ting, et al. "Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers." 2013 IEEE International
Conference on Acoustics, Speech and Signal Processing. IEEE, 2013.
MTL Example: Correlated Time Series Forecasting

Fusing task-specific layers using shared layers by hard parameter sharing.

• Cirstea, Razvan-Gabriel, et al. "Correlated time series forecasting using multi-task deep neural networks." Proceedings of the 27th acm international
conference on information and knowledge management. 2018.
References
1. Hongyi Li, Transfer Learning.
https://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/transfer%20(v3).pdf.
2. Sejuti Das. Top 8 Pre-Trained NLP Models Developers Must Know.
https://analyticsindiamag.com/top-8-pre-trained-nlp-models-developers-must-know/.
3. Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding."
arXiv preprint arXiv:1810.04805 (2018).
4. Brown, Tom, etFeng, Zhangyin, et al. "Codebert: A pre-trained model for programming and natural
languages." arXiv preprint arXiv:2002.08155 (2020). al. "Language models are few-shot learners."
Advances in neural information processing systems 33 (2020): 1877-1901.
5. Liu, Yinhan, et al. "Roberta: A robustly optimized bert pretraining approach." arXiv preprint
arXiv:1907.11692 (2019).
6. Lan, Zhenzhong, et al. "Albert: A lite bert for self-supervised learning of language representations."
arXiv preprint arXiv:1909.11942 (2019).
7. Yang, Zhilin, et al. "Xlnet: Generalized autoregressive pretraining for language understanding."
Advances in neural information processing systems 32 (2019).
8. Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image
recognition." arXiv preprint arXiv:1409.1556 (2014).
9. He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference
on computer vision and pattern recognition. 2016.
10. Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at
scale." arXiv preprint arXiv:2010.11929 (2020).
11. Liu, Ze, et al. "Swin transformer: Hierarchical vision transformer using shifted windows." Proceedings of
the IEEE/CVF International Conference on Computer Vision. 2021.
12. Wang, Wenhai, et al. "Pvt v2: Improved baselines with pyramid vision transformer." Computational
Visual Media 8.3 (2022): 415-424.
References
13. Sebastian Ruder, An Overview of Multi-Task Learning in Deep Neural Networks.
https://ruder.io/multi-task/.
14. Kendall, Alex, Yarin Gal, and Roberto Cipolla. "Multi-task learning using uncertainty
to weigh losses for scene geometry and semantics." Proceedings of the IEEE
conference on computer vision and pattern recognition. 2018.
15. Rebut, Julien, et al. "Raw High-Definition Radar for Multi-Task
Learning." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. 2022.
16. Huang, Jui-Ting, et al. "Cross-language knowledge transfer using multilingual deep
neural network with shared hidden layers." 2013 IEEE International Conference on
Acoustics, Speech and Signal Processing. IEEE, 2013.
17. Liu, Pengfei, Xipeng Qiu, and Xuanjing Huang. "Recurrent neural network for text
classification with multi-task learning." arXiv preprint arXiv:1605.05101 (2016).
18. Cirstea, Razvan-Gabriel, et al. "Correlated time series forecasting using multi-task
deep neural networks." Proceedings of the 27th acm international conference on
information and knowledge management. 2018.

Solutions Manual For Optimal Control Theory: An Introduction
70% (20)
Solutions Manual For Optimal Control Theory: An Introduction
185 pages
2022 MSC in Nursing Pre Entry Portfolio V4
No ratings yet
2022 MSC in Nursing Pre Entry Portfolio V4
19 pages
Management Research Mark Easterby
92% (13)
Management Research Mark Easterby
664 pages
A Survey On Vision Transformer
No ratings yet
A Survey On Vision Transformer
23 pages
One Model To Learn Them All: Work Performed While at Google Brain
No ratings yet
One Model To Learn Them All: Work Performed While at Google Brain
10 pages
(Slide) Multi Task Learning
No ratings yet
(Slide) Multi Task Learning
40 pages
Lecture3 Transfer Learning
No ratings yet
Lecture3 Transfer Learning
28 pages
NB4-10 PT V Transfer Learning
No ratings yet
NB4-10 PT V Transfer Learning
16 pages
Unit - V
No ratings yet
Unit - V
44 pages
CH 5
No ratings yet
CH 5
16 pages
Multitask Transfer (1)
No ratings yet
Multitask Transfer (1)
36 pages
2022_MTFormer - Multi-task Learning via Transformer and Cross-Task Reasoning_Xu et al_Springer Nature Switzerland
No ratings yet
2022_MTFormer - Multi-task Learning via Transformer and Cross-Task Reasoning_Xu et al_Springer Nature Switzerland
18 pages
Machine Translation Wise 2016/2017
No ratings yet
Machine Translation Wise 2016/2017
58 pages
Deep Learning Book PDF
No ratings yet
Deep Learning Book PDF
272 pages
Lec25 Architectures
No ratings yet
Lec25 Architectures
52 pages
Transformer
No ratings yet
Transformer
5 pages
multi-task
No ratings yet
multi-task
11 pages
Lecture 15 - Foundation Models - CLIP and GPT
No ratings yet
Lecture 15 - Foundation Models - CLIP and GPT
45 pages
2022_Multi-Task Learning for Dense Prediction Tasks - A Survey_Vandenhende et al_IEEE Transactions on Pattern Analysis and Machine Intelligence
No ratings yet
2022_Multi-Task Learning for Dense Prediction Tasks - A Survey_Vandenhende et al_IEEE Transactions on Pattern Analysis and Machine Intelligence
20 pages
Transfer (v3)
No ratings yet
Transfer (v3)
38 pages
ML-II 5
No ratings yet
ML-II 5
5 pages
Neural Network Seminar Anirban
No ratings yet
Neural Network Seminar Anirban
13 pages
DL Unit-5
No ratings yet
DL Unit-5
7 pages
Transformers_in_computational_visual_media_A_surve
No ratings yet
Transformers_in_computational_visual_media_A_surve
30 pages
UNIT_ICHP 4
No ratings yet
UNIT_ICHP 4
19 pages
Bay Learn 2015 Deep Mind
No ratings yet
Bay Learn 2015 Deep Mind
69 pages
2021 NeurIPS VAAT Akbari, Yuan, Qian, Chuang, Chang, Cui, Gong
No ratings yet
2021 NeurIPS VAAT Akbari, Yuan, Qian, Chuang, Chang, Cui, Gong
16 pages
2012.12556
No ratings yet
2012.12556
23 pages
FDP AI,ML,DL Q5
No ratings yet
FDP AI,ML,DL Q5
2 pages
2AMM30+AY23 24+Text+Mining+Lecture+3
No ratings yet
2AMM30+AY23 24+Text+Mining+Lecture+3
88 pages
Lecture Notes
No ratings yet
Lecture Notes
86 pages
Multi Task Learning (MTL)
No ratings yet
Multi Task Learning (MTL)
15 pages
Iconips Paper On Transfer Learning
No ratings yet
Iconips Paper On Transfer Learning
11 pages
ReviewPaper TransferLearning
No ratings yet
ReviewPaper TransferLearning
6 pages
Transfer Learning Seminar
No ratings yet
Transfer Learning Seminar
12 pages
2102.10772v3
No ratings yet
2102.10772v3
16 pages
Training the application of LLM
No ratings yet
Training the application of LLM
68 pages
Transfer Learning
No ratings yet
Transfer Learning
40 pages
Unit-V Tranfer Learning Notes
No ratings yet
Unit-V Tranfer Learning Notes
27 pages
Program 5n6 Dl
No ratings yet
Program 5n6 Dl
9 pages
Unit-5 (DL For Different Domains, Role of GPUs and DL Frameworks)
No ratings yet
Unit-5 (DL For Different Domains, Role of GPUs and DL Frameworks)
15 pages
2108.05542
No ratings yet
2108.05542
42 pages
11.RNN and Transformers
No ratings yet
11.RNN and Transformers
100 pages
Essay 6
No ratings yet
Essay 6
15 pages
Lecture-28-TransformerIntroductionFinal-1
No ratings yet
Lecture-28-TransformerIntroductionFinal-1
69 pages
2024_Multi-Task Learning in Natural Language Processing - An Overview_Chen et al_ACM Computing Surveys
No ratings yet
2024_Multi-Task Learning in Natural Language Processing - An Overview_Chen et al_ACM Computing Surveys
31 pages
Advanced Deep Learning and Transformers - Cirrincione
No ratings yet
Advanced Deep Learning and Transformers - Cirrincione
3 pages
unit-iv-v-deep-learning-material
No ratings yet
unit-iv-v-deep-learning-material
32 pages
Lecture 11 Transfer and Few-shot Learning
No ratings yet
Lecture 11 Transfer and Few-shot Learning
47 pages
Deep Learning Concepts Summary
No ratings yet
Deep Learning Concepts Summary
6 pages
4 CS826 - Meta Learning
No ratings yet
4 CS826 - Meta Learning
40 pages
Deep Learning Notes
100% (1)
Deep Learning Notes
16 pages
2411.08171v1
No ratings yet
2411.08171v1
7 pages
A Survey On Transfer Learning
No ratings yet
A Survey On Transfer Learning
42 pages
a survey of deep learning - from activations to transformers
No ratings yet
a survey of deep learning - from activations to transformers
12 pages
Computer Vision 11 Transformers
No ratings yet
Computer Vision 11 Transformers
63 pages
1-s2.0-S0957417423031688-main
No ratings yet
1-s2.0-S0957417423031688-main
48 pages
Lesson 14 - Transformer
No ratings yet
Lesson 14 - Transformer
124 pages
Jntuk r20 Unit v Deep Learning Techniqueswwwjntumaterials
No ratings yet
Jntuk r20 Unit v Deep Learning Techniqueswwwjntumaterials
32 pages
AdvAI_Unit4
No ratings yet
AdvAI_Unit4
79 pages
Lecture 1a - Introduction
No ratings yet
Lecture 1a - Introduction
38 pages
A Modular End-to-End Multimodal Learning Method For Structured and Unstructured Data
No ratings yet
A Modular End-to-End Multimodal Learning Method For Structured and Unstructured Data
8 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Writing Unit 9
No ratings yet
Writing Unit 9
5 pages
Weekly Lesson Plans Template 9.14-9.18
No ratings yet
Weekly Lesson Plans Template 9.14-9.18
6 pages
Lesson Plan in Mathematics
No ratings yet
Lesson Plan in Mathematics
3 pages
ML Sahil
No ratings yet
ML Sahil
6 pages
Irfan Ali
No ratings yet
Irfan Ali
37 pages
INTERNSHIP ON TALENT ACQUISITION PROJECT Ijariie15389
100% (1)
INTERNSHIP ON TALENT ACQUISITION PROJECT Ijariie15389
3 pages
1 - Introduction For Pschology
No ratings yet
1 - Introduction For Pschology
6 pages
English: Quarter 4 - Module 2
100% (2)
English: Quarter 4 - Module 2
13 pages
English Language Teaching: Course Code: LAF183 Dit Universit Y, Dehradun
No ratings yet
English Language Teaching: Course Code: LAF183 Dit Universit Y, Dehradun
4 pages
Manyshort
No ratings yet
Manyshort
29 pages
Course Outline SPH4U
No ratings yet
Course Outline SPH4U
6 pages
Professional Practices Course Outline
No ratings yet
Professional Practices Course Outline
2 pages
Smiles 3 2.1.
No ratings yet
Smiles 3 2.1.
2 pages
EdieGaythwaite RolePlay Conflict
No ratings yet
EdieGaythwaite RolePlay Conflict
2 pages
Course Outline
No ratings yet
Course Outline
13 pages
SLAC On Science Investigatory Project
No ratings yet
SLAC On Science Investigatory Project
6 pages
Narrative Report
No ratings yet
Narrative Report
2 pages
Rhyming Words For Pre K
No ratings yet
Rhyming Words For Pre K
2 pages
Information Systems, Organizations, and Strategy
No ratings yet
Information Systems, Organizations, and Strategy
16 pages
Reap
No ratings yet
Reap
4 pages
Curriculum Models PDF
67% (3)
Curriculum Models PDF
83 pages
An Lila Nga Saya Ni Sita
No ratings yet
An Lila Nga Saya Ni Sita
36 pages
Pre Listening Ielts
No ratings yet
Pre Listening Ielts
22 pages
JH Educ 103 The Role of Discipline in
No ratings yet
JH Educ 103 The Role of Discipline in
14 pages
SHS MIL Q4 Weeks1to4 Binded Ver1.0
No ratings yet
SHS MIL Q4 Weeks1to4 Binded Ver1.0
41 pages
(Kepa Korta and John Perry) Critical Pragmatics A
100% (1)
(Kepa Korta and John Perry) Critical Pragmatics A
193 pages
Resick2006 Article ACross-CulturalExaminationOfTh
No ratings yet
Resick2006 Article ACross-CulturalExaminationOfTh
15 pages