11 Deep Transfer Learning and Multi Task Learning
11 Deep Transfer Learning and Multi Task Learning
Multi-task Learning
Concepts are ensembled from various online sources with a great acknowledgement to all those made them available online.
Transfer Learning
• Transfer a model trained on source data A to
target data B
• Task transfer: in this case, the source and target data
can be the same
• Image classification -> image segmentation
• Machine translation -> sentiment analysis
• Time series prediction -> time series classification
• …
• Data transfer:
• Images of everyday objects -> medical images
• Chinese -> English
• Physiological signals of one patient -> another patient
• …
• Rationale: similar feature can be useful in different
tasks, or shared by different yet related data.
Taxonomy of Transfer Learning
Source Data
Labeled Unlabeled
Model fine-tuning Self-taught learning
Labeled Multi-task learning
Target
Data Domain-adversarial training Self-taught clustering
Unlabeled Zero-shot learning
Target Data
Labeled Unlabeled
Model fine-tuning Self-taught learning
Labeled Multi-task learning
Source
Data Domain-adversarial training Self-taught clustering
Unlabeled Zero-shot learning
Target Data
Labeled Unlabeled
Model fine-tuning Self-taught learning
Labeled Multi-task learning
Source
Data Domain-adversarial training Self-taught clustering
Unlabeled Zero-shot learning
• Variant: VGG-19
ResNet50
• A variant of the ResNet model, a computer
vision model proposed by Microsoft in 2015;
• pre-trained on the ImageNet corpus;
• a CNN model with 50 layers and about 380
million parameters;
• has been built into popular deep learning
frameworks such as PyTorch and Keras.
ViT: Vision Transformer
• A computer vision (CV) model proposed by
Google in 2020;
• introduces the Transformer architecture, which
has achieved huge success in natural language
processing, into CV; the idea is treating patches in
images as words in text;
• can achieve better accuracy and efficiency than
CNNs such as ResNet50;
• available at
https://github.com/google-research/vision_transf
ormer
.
Fusing semantic segmentation, instance segmentation and per-pixel depth regression tasks using
hard parameter sharing.
• Kendall, Alex, Yarin Gal, and Roberto Cipolla. "Multi-task learning using uncertainty to weigh losses for scene geometry and
semantics." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
MTL Example: Cross Language Knowledge Transfer
Fusing language-specific tasks using multi-lingual feature transformation layers by hard parameter sharing.
• Huang, Jui-Ting, et al. "Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers." 2013 IEEE International
Conference on Acoustics, Speech and Signal Processing. IEEE, 2013.
MTL Example: Correlated Time Series Forecasting
• Cirstea, Razvan-Gabriel, et al. "Correlated time series forecasting using multi-task deep neural networks." Proceedings of the 27th acm international
conference on information and knowledge management. 2018.
References
1. Hongyi Li, Transfer Learning.
https://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/transfer%20(v3).pdf.
2. Sejuti Das. Top 8 Pre-Trained NLP Models Developers Must Know.
https://analyticsindiamag.com/top-8-pre-trained-nlp-models-developers-must-know/.
3. Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding."
arXiv preprint arXiv:1810.04805 (2018).
4. Brown, Tom, etFeng, Zhangyin, et al. "Codebert: A pre-trained model for programming and natural
languages." arXiv preprint arXiv:2002.08155 (2020). al. "Language models are few-shot learners."
Advances in neural information processing systems 33 (2020): 1877-1901.
5. Liu, Yinhan, et al. "Roberta: A robustly optimized bert pretraining approach." arXiv preprint
arXiv:1907.11692 (2019).
6. Lan, Zhenzhong, et al. "Albert: A lite bert for self-supervised learning of language representations."
arXiv preprint arXiv:1909.11942 (2019).
7. Yang, Zhilin, et al. "Xlnet: Generalized autoregressive pretraining for language understanding."
Advances in neural information processing systems 32 (2019).
8. Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image
recognition." arXiv preprint arXiv:1409.1556 (2014).
9. He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference
on computer vision and pattern recognition. 2016.
10. Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at
scale." arXiv preprint arXiv:2010.11929 (2020).
11. Liu, Ze, et al. "Swin transformer: Hierarchical vision transformer using shifted windows." Proceedings of
the IEEE/CVF International Conference on Computer Vision. 2021.
12. Wang, Wenhai, et al. "Pvt v2: Improved baselines with pyramid vision transformer." Computational
Visual Media 8.3 (2022): 415-424.
References
13. Sebastian Ruder, An Overview of Multi-Task Learning in Deep Neural Networks.
https://ruder.io/multi-task/.
14. Kendall, Alex, Yarin Gal, and Roberto Cipolla. "Multi-task learning using uncertainty
to weigh losses for scene geometry and semantics." Proceedings of the IEEE
conference on computer vision and pattern recognition. 2018.
15. Rebut, Julien, et al. "Raw High-Definition Radar for Multi-Task
Learning." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. 2022.
16. Huang, Jui-Ting, et al. "Cross-language knowledge transfer using multilingual deep
neural network with shared hidden layers." 2013 IEEE International Conference on
Acoustics, Speech and Signal Processing. IEEE, 2013.
17. Liu, Pengfei, Xipeng Qiu, and Xuanjing Huang. "Recurrent neural network for text
classification with multi-task learning." arXiv preprint arXiv:1605.05101 (2016).
18. Cirstea, Razvan-Gabriel, et al. "Correlated time series forecasting using multi-task
deep neural networks." Proceedings of the 27th acm international conference on
information and knowledge management. 2018.