SlideShare a Scribd company logo
Learning to Remember Rare Events
Paper is appeared in ICLR 2017, https://arxiv.org/abs/1703.03129
Authors:
Łukasz Kaiser, Ofir Nachum, Aurko Roy, and Samy Bengio
(Google Brain)
 Reviewed by Taegyun Jeon 
1
What we can learn from this paper?
1. Memory‑augmented deep neural network
2. Two tasks:
One‑shot learning (Omniglot dataset)
Life‑long one‑shot learning (large‑scale machine
translation)
3. TensorFlow implementation for the one‑shot learning
Official code from Google Brain using TensorFlow
2
Problem Definition
rare events v.s. on average
Image from "One Shot Learning" (Jisung Kim @ TensorFlow‑KR 2nd Meetup Lighting Talk) 3
8 Tactics To Combat Imbalanced Training Data
1. Collect More Data
2. Try Changing Your Performance Metric
3. Try Resampling Your Dataset
4. Try Generate Synthetic Samples
5. Try Different Algorithms
6. Try Penalized Models
7. Try a Different Perspective
8. Try Getting Creative
"8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset" @ Machine Learning Mastery 4
Problem Definition (for rare events)
Deep Neural Networks
Extend the training data
Re‑train them to handle such rare or new events
Very SLOW!! (gradient‑based optimization)
Humans (life‑long fashion)
Learn from single example
5
Key Concepts
Deep Neural Networks (+ Memory Module)
6
Previous Works
Meta‑Learning with Memory Augmented Neural Networks
Idea: Write the pair of "image and label" into the memory
Matching Networks for One Shot Learning
Idea: Train fully end‑to‑end nearest neighbor classifier
Note from A. Karpathy
7
Memory module
Define a memory of size  memory-size as a triple:
M = (K , V , A )
m:  memory-size , k:  key-size .
Key: activations of a chosen layer of a neural network.
Value: ground‑truth targets for the given example.
Age: track the ages of the items stored in memory.
m×k m m
8
Memory module (query)
Memory query q is a vector of size  key-size :
q =R , ∣∣q∣∣ = 1
The nearest neighbor(*) of q in M :
NN(q, M) = arg q ⋅ K[i].
Given a query q, Memory M will compute k‑NN:
(n , ..., n ) = NN (q, M)
Return the main result. the value V [n ]
k
i
max
1 k k
1
(*) Since the keys are normalized, the nearest neighbor w.r.t. cosine similarity. 9
Memory module (query)
Cosine similarity: d = q ⋅ K[n ]
Return  softmax (d ⋅ τ, ..., d ⋅ τ)
Inverse of softmax temperature: τ = 40
i i
1 k
10
[Note] Softmax temperature, τ
The idea is to control randomness of predictions
: Softmax outputs are more close to each other
: Softmax outputs are more and more "hardmax"
For a low temperature (τ → 0 ), the probability of the output
with the highest expected reward tends to 1.
+
11
Memory module (episode)
Slide from "Meta‑learning with memory‑augmented neural networks" (Slideshare, H. Kim) 12
Memory module (train)
Memory loss
Query q and the correct desired (supervised) value v.
Classification: v would be the class label.
Seq2Seq: v would be the output token of the current time
step.
13
Memory module (train)
loss(q, v, M) = [q ⋅ K[n ] − q ⋅ K[n ] + α]
K[n ]: positive neightbor, V [n ] = v
K[n ]: negative neightbor, V [n ] ≠ v
α: Margin to make loss as zero
b p +
p p
b b
14
Memory module (Update)
Case V [n ] = v:
K[n ] ←
A[n ] ← 0
Case V [n ] ≠ v:
if memory has empty space at n ,
assign n with n
if not, n = max(A[n ])
K[n ] ← q, V [n ] ← v, and A[n ] ← 0.
p
1 ∣∣q+k[n ]∣∣1
q+k[n ]1
1
b
empty
′
empty
′
k
′ ′ ′
15
Memory module (train & update)
16
Experiments (Evaluation)
1. Evaluation on Omniglot dataset
2. Evaluation on synthetic task
3. Evaluation on English‑German translation model
Qualitative side: rarely‑occurring words
Quantitative side: BLEU score
17
Experiments (Omniglot Dataset)
18
Experiments (Omniglot Dataset)
Omniglot dataset
This dataset contains  1623 different handwritten characters
from  50 different alphabets.
Each of the 1623 characters was drawn online via Amazon's
Mechanical Turk by 20 different people.
Each image is paired with stroke data, a sequences of  [x,y,t] 
coordinates with time  (t) in milliseconds.
Stroke data is available in MATLAB files only.
Omniglot dataset for one‑shot learning (github): https://github.com/brendenlake/omniglot 19
Experiments (Omniglot Dataset)
CNN Architecture
(Conv, ReLU), (Conv, ReLU), pool,
(Conv, ReLU), (Conv, ReLU), pool, FC, FC
Memory module
Output layer (Prediction)
20
Experiments (Omniglot Dataset)
 way : different alphabets
 shot : different characters
21
Experiments (GNMT)
Decoder path
Key: result of attention a
Combine value and LSTM output (at decoder time‑step)
t
22
Experiments (GNMT)
23
Experiments (GNMT)
Convolutional Gated Recurrent Unit (CGRU)
For more information: Read the Lunit tech blog
24
Conclusions
Long‑term memory module
Embedding input with a simple CNN (LeNet)
Returning k‑nn could be used for other layers.
25
Code Review (Github)
1.  data_utils.py : Data loading and other utilities.
2.  train.py : Script for training model.
3.  memory.py : Memory module for storing "nearest neighbors".
4.  model.py : Model using memory component.
26
Quick Start
1) First download and set‑up Omniglot data by running
python data_utils.py
2) Then run the training script:
python train.py --memory_size=8192 
--batch_size=16 --validation_length=50 
--episode_width=5 --episode_length=30
27
3) The first validation batch may look like this (although it is
noisy):
0-shot: 0.040, 1-shot: 0.404, 2-shot: 0.516,
3-shot: 0.604, 4-shot: 0.656, 5-shot: 0.684
4) At step 500 you may see something like this:
0-shot: 0.036, 1-shot: 0.836, 2-shot: 0.900,
3-shot: 0.940, 4-shot: 0.944, 5-shot: 0.916
5) At step 4000 you may see something like this:
0-shot: 0.044, 1-shot: 0.960, 2-shot: 1.000,
3-shot: 0.988, 4-shot: 0.972, 5-shot: 0.992
28
0) Basic parameters
rep_dim: 128, dimension of keys to use in memory
episode_length: 100, length of episode
episode_width: 5, number of distinct labels in a single episode
memory_size: None, number of slots in memory.
batch_size: 16, batch size
num_episodes: 100000, number of training episodes
validation_frequency: 20, every so many training episodes
assess validation accuracy
validation_length: 10, number of episodes to use to compute
validation accuracy
seed: 888, random seed for training sampling
save_dir: '', directory to save model to
use_lsh: False, use locality‑sensitive hashing (NOTE: not fully
tested)
29
1) data_utils.py
def preprocess_omniglot():
# Download and prepare raw Omniglot data.
def maybe_download_data():
# Download Omniglot repo if it does not exist.
def write_datafiles():
# Load and preprocess images from a directory and
# write them to a file.
def crawl_directory():
# Crawls data directory and returns stuff.
def resize_images():
# Resize images to new dimensions.
30
1) Tips from data_utils.py
 logging 으로메세지를관리한다.  level 조절가능.
 pickle 로dump해서사용한다. (TFrecord, queue는..?)
간단한외부명령은 subprocess 로실행한다.
train dataset만 augment (rotation) 수행(0, 90, 180, 270도)
 resizing 수행(기존: 105, 변환: 28)
OUTPUT: train_omni.pkl (733M), test_omni.pkl (126M)
31
2) train.py
def data_utils.get_data():
# Get data in form suitable for episodic training.
# Returns: Train and test data as dictionaries mapping
# label to list of examples.
class Trainer():
def run():
self.sample_episode_batch()
outputs = self.model.episode_step()
32
2) Tips from train.py
기본적인파라미터는 tf.flags 로전달
학습과 관련된내용들은 logging 으로메세지전달
 assert 활용: episode 길이오류확인
train / validation 동시수행(20 : 1 비율)
33
3) model.py
class LeNet(object):
# Standard CNN architecture
class Model(object):
# Model for coordinating between CNN embedder and
# Memory module.
34
3) model.py
Line 152‑158,  core_builder() :
embeddings = self.embedder.core_builder(x)
if keep_prob < 1.0:
embeddings = tf.nn.dropout(embeddings, keep_prob)
memory_val, _, teacher_loss = self.memory.query(
embeddings, y, use_recent_idx=use_recent_idx)
loss, y_pred = self.classifier.core_builder(
memory_val, x, y)
return loss + teacher_loss, y_pred
35
3) Tips from model.py
 core_builder() : 기존네트워크에memory 추가
입력영상 x 에대해 LeNet 을이용해embedding vector 생성
 weight ,  bias 는 tf.get_variable 로미리생성
model의각 기능을최대한세분화
36
4) memory.py
class Memory(object):
def get_hint_pool_idxs(...):
# Get small set of idxs to compute nearest neighbor
# queries on.
def query(...):
# Queries memory for nearest neighbor.
class LSHMemory(Memory):
# Memory employing locality sensitive hashing.
# Note: Not fully tested.
37
4) Tips from memory.py
 Memory 와 LSHMemory 중선택가능,  memory 사용권고.
논문의memory 동작을직관적으로구현
 memory_size 와 key_size 만변경하면거의대부분의네트워크에
접목가능
38
Appendix (Reviews)
1. Lunit Tech Blog (by Hyo‑Eun Kim) (Link)
2. OpenReview (ICLR2017) (Link)
3. BAIR Blog: "Learning to Learn" (by Chelsea Finn) (Link)
4. Learning to remember rare events (by Hongbae Kim)
(Slideshare)
5. One Shot Learning (by Jisung Kim) (Slideshare)
39
Appendix (Implementations)
1. TensorFlow/models (GoogleBrain) (Github)
40

More Related Content

PDF
TensorFlow Dev Summit 2018 Extended: TensorFlow Eager Execution
Taegyun Jeon
 
PPTX
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...
Altoros
 
PPTX
Deep Learning for AI (2)
Dongheon Lee
 
PPTX
TensorFlow Tutorial Part1
Sungjoon Choi
 
PDF
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Kendall
 
PDF
The Perceptron (D1L2 Deep Learning for Speech and Language)
Universitat Politècnica de Catalunya
 
PDF
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
PDF
Introduction to Neural Networks in Tensorflow
Nicholas McClure
 
TensorFlow Dev Summit 2018 Extended: TensorFlow Eager Execution
Taegyun Jeon
 
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...
Altoros
 
Deep Learning for AI (2)
Dongheon Lee
 
TensorFlow Tutorial Part1
Sungjoon Choi
 
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Kendall
 
The Perceptron (D1L2 Deep Learning for Speech and Language)
Universitat Politècnica de Catalunya
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Introduction to Neural Networks in Tensorflow
Nicholas McClure
 

What's hot (20)

PDF
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
PPTX
Electricity price forecasting with Recurrent Neural Networks
Taegyun Jeon
 
PPTX
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Universitat Politècnica de Catalunya
 
PDF
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
PDF
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
PDF
Distributed implementation of a lstm on spark and tensorflow
Emanuel Di Nardo
 
PPTX
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Eun Ji Lee
 
PDF
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
PDF
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Universitat Politècnica de Catalunya
 
PPTX
Anomaly detection using deep one class classifier
홍배 김
 
PDF
Deep learning for molecules, introduction to chainer chemistry
Kenta Oono
 
PPTX
Convolutional neural networks 이론과 응용
홍배 김
 
PDF
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Universitat Politècnica de Catalunya
 
PDF
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
PDF
Lecture 6: Convolutional Neural Networks
Sang Jun Lee
 
PDF
Introduction to Chainer Chemistry
Preferred Networks
 
PDF
Deep Learning for Computer Vision: Visualization (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
PDF
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Universitat Politècnica de Catalunya
 
PDF
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Electricity price forecasting with Recurrent Neural Networks
Taegyun Jeon
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Distributed implementation of a lstm on spark and tensorflow
Emanuel Di Nardo
 
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Eun Ji Lee
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Universitat Politècnica de Catalunya
 
Anomaly detection using deep one class classifier
홍배 김
 
Deep learning for molecules, introduction to chainer chemistry
Kenta Oono
 
Convolutional neural networks 이론과 응용
홍배 김
 
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Universitat Politècnica de Catalunya
 
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
Lecture 6: Convolutional Neural Networks
Sang Jun Lee
 
Introduction to Chainer Chemistry
Preferred Networks
 
Deep Learning for Computer Vision: Visualization (UPC 2016)
Universitat Politècnica de Catalunya
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Universitat Politècnica de Catalunya
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Ad

Similar to [PR12] PR-036 Learning to Remember Rare Events (20)

PDF
Power ai tensorflowworkloadtutorial-20171117
Ganesan Narayanasamy
 
PPTX
Learning Predictive Modeling with TSA and Kaggle
Yvonne K. Matos
 
PDF
AIML4 CNN lab256 1hr (111-1).pdf
ssuserb4d806
 
PPTX
Deep Learning with Apache Spark: an Introduction
Emanuele Bezzi
 
PDF
Introduction to Machine Learning
Big_Data_Ukraine
 
PDF
01A - Greatest Hits of CS111 Data structure and algorithm
ubaidullahkhan5546
 
PPTX
Android and Deep Learning
Oswald Campesato
 
PDF
Xgboost
Vivian S. Zhang
 
PDF
3D Brain Image Segmentation Model using Deep Learning and Hidden Markov Rando...
EL-Hachemi Guerrout
 
PPTX
Introduction to Deep Learning and Tensorflow
Oswald Campesato
 
PPTX
Learning to Rank with Neural Networks
Bhaskar Mitra
 
PDF
nlp dl 1.pdf
nyomans1
 
PDF
Writing Faster Python 3
Sebastian Witowski
 
PPTX
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Oswald Campesato
 
PDF
ML with python.pdf
n58648017
 
PDF
Camp IT: Making the World More Efficient Using AI & Machine Learning
Krzysztof Kowalczyk
 
PDF
Log Analytics in Datacenter with Apache Spark and Machine Learning
Piotr Tylenda
 
PDF
Log Analytics in Datacenter with Apache Spark and Machine Learning
Agnieszka Potulska
 
PPT
Multi-Layer Perceptrons
ESCOM
 
PDF
Introduction to Big Data Science
Albert Bifet
 
Power ai tensorflowworkloadtutorial-20171117
Ganesan Narayanasamy
 
Learning Predictive Modeling with TSA and Kaggle
Yvonne K. Matos
 
AIML4 CNN lab256 1hr (111-1).pdf
ssuserb4d806
 
Deep Learning with Apache Spark: an Introduction
Emanuele Bezzi
 
Introduction to Machine Learning
Big_Data_Ukraine
 
01A - Greatest Hits of CS111 Data structure and algorithm
ubaidullahkhan5546
 
Android and Deep Learning
Oswald Campesato
 
3D Brain Image Segmentation Model using Deep Learning and Hidden Markov Rando...
EL-Hachemi Guerrout
 
Introduction to Deep Learning and Tensorflow
Oswald Campesato
 
Learning to Rank with Neural Networks
Bhaskar Mitra
 
nlp dl 1.pdf
nyomans1
 
Writing Faster Python 3
Sebastian Witowski
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Oswald Campesato
 
ML with python.pdf
n58648017
 
Camp IT: Making the World More Efficient Using AI & Machine Learning
Krzysztof Kowalczyk
 
Log Analytics in Datacenter with Apache Spark and Machine Learning
Piotr Tylenda
 
Log Analytics in Datacenter with Apache Spark and Machine Learning
Agnieszka Potulska
 
Multi-Layer Perceptrons
ESCOM
 
Introduction to Big Data Science
Albert Bifet
 
Ad

More from Taegyun Jeon (13)

PDF
TensorFlow-KR 3rd meetup - Lightning Talk for SI Analytics
Taegyun Jeon
 
PDF
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
Taegyun Jeon
 
PDF
[PR12] PR-063: Peephole predicting network performance before training
Taegyun Jeon
 
PDF
GDG DevFest Xiamen 2017
Taegyun Jeon
 
PPTX
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
Taegyun Jeon
 
PDF
GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using Tenso...
Taegyun Jeon
 
PDF
[대전AI포럼] 위성영상 분석 기술 개발 현황 소개
Taegyun Jeon
 
PDF
[PR12] PR-026: Notes for CVPR Machine Learning Sessions
Taegyun Jeon
 
PDF
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
Taegyun Jeon
 
PDF
[PR12] image super resolution using deep convolutional networks
Taegyun Jeon
 
PDF
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
Taegyun Jeon
 
PDF
TensorFlow KR 2nd Meetup - Lightening talk (Satrec Initiative)
Taegyun Jeon
 
PPTX
인공지능: 변화와 능력개발
Taegyun Jeon
 
TensorFlow-KR 3rd meetup - Lightning Talk for SI Analytics
Taegyun Jeon
 
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
Taegyun Jeon
 
[PR12] PR-063: Peephole predicting network performance before training
Taegyun Jeon
 
GDG DevFest Xiamen 2017
Taegyun Jeon
 
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
Taegyun Jeon
 
GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using Tenso...
Taegyun Jeon
 
[대전AI포럼] 위성영상 분석 기술 개발 현황 소개
Taegyun Jeon
 
[PR12] PR-026: Notes for CVPR Machine Learning Sessions
Taegyun Jeon
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
Taegyun Jeon
 
[PR12] image super resolution using deep convolutional networks
Taegyun Jeon
 
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
Taegyun Jeon
 
TensorFlow KR 2nd Meetup - Lightening talk (Satrec Initiative)
Taegyun Jeon
 
인공지능: 변화와 능력개발
Taegyun Jeon
 

Recently uploaded (20)

PPTX
easa module 3 funtamental electronics.pptx
tryanothert7
 
PDF
Unit I Part II.pdf : Security Fundamentals
Dr. Madhuri Jawale
 
PDF
Natural_Language_processing_Unit_I_notes.pdf
sanguleumeshit
 
PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PPT
Ppt for engineering students application on field effect
lakshmi.ec
 
PDF
The Effect of Artifact Removal from EEG Signals on the Detection of Epileptic...
Partho Prosad
 
PPTX
Tunnel Ventilation System in Kanpur Metro
220105053
 
PPTX
Civil Engineering Practices_BY Sh.JP Mishra 23.09.pptx
bineetmishra1990
 
PDF
FLEX-LNG-Company-Presentation-Nov-2017.pdf
jbloggzs
 
PDF
Software Testing Tools - names and explanation
shruti533256
 
PPTX
database slide on modern techniques for optimizing database queries.pptx
aky52024
 
PDF
Chad Ayach - A Versatile Aerospace Professional
Chad Ayach
 
PPTX
Information Retrieval and Extraction - Module 7
premSankar19
 
PDF
Traditional Exams vs Continuous Assessment in Boarding Schools.pdf
The Asian School
 
PDF
Top 10 read articles In Managing Information Technology.pdf
IJMIT JOURNAL
 
PDF
LEAP-1B presedntation xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
hatem173148
 
PPTX
business incubation centre aaaaaaaaaaaaaa
hodeeesite4
 
PPTX
Victory Precisions_Supplier Profile.pptx
victoryprecisions199
 
PDF
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
PPT
SCOPE_~1- technology of green house and poyhouse
bala464780
 
easa module 3 funtamental electronics.pptx
tryanothert7
 
Unit I Part II.pdf : Security Fundamentals
Dr. Madhuri Jawale
 
Natural_Language_processing_Unit_I_notes.pdf
sanguleumeshit
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
Ppt for engineering students application on field effect
lakshmi.ec
 
The Effect of Artifact Removal from EEG Signals on the Detection of Epileptic...
Partho Prosad
 
Tunnel Ventilation System in Kanpur Metro
220105053
 
Civil Engineering Practices_BY Sh.JP Mishra 23.09.pptx
bineetmishra1990
 
FLEX-LNG-Company-Presentation-Nov-2017.pdf
jbloggzs
 
Software Testing Tools - names and explanation
shruti533256
 
database slide on modern techniques for optimizing database queries.pptx
aky52024
 
Chad Ayach - A Versatile Aerospace Professional
Chad Ayach
 
Information Retrieval and Extraction - Module 7
premSankar19
 
Traditional Exams vs Continuous Assessment in Boarding Schools.pdf
The Asian School
 
Top 10 read articles In Managing Information Technology.pdf
IJMIT JOURNAL
 
LEAP-1B presedntation xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
hatem173148
 
business incubation centre aaaaaaaaaaaaaa
hodeeesite4
 
Victory Precisions_Supplier Profile.pptx
victoryprecisions199
 
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
SCOPE_~1- technology of green house and poyhouse
bala464780
 

[PR12] PR-036 Learning to Remember Rare Events

  • 1. Learning to Remember Rare Events Paper is appeared in ICLR 2017, https://arxiv.org/abs/1703.03129 Authors: Łukasz Kaiser, Ofir Nachum, Aurko Roy, and Samy Bengio (Google Brain)  Reviewed by Taegyun Jeon  1
  • 2. What we can learn from this paper? 1. Memory‑augmented deep neural network 2. Two tasks: One‑shot learning (Omniglot dataset) Life‑long one‑shot learning (large‑scale machine translation) 3. TensorFlow implementation for the one‑shot learning Official code from Google Brain using TensorFlow 2
  • 3. Problem Definition rare events v.s. on average Image from "One Shot Learning" (Jisung Kim @ TensorFlow‑KR 2nd Meetup Lighting Talk) 3
  • 4. 8 Tactics To Combat Imbalanced Training Data 1. Collect More Data 2. Try Changing Your Performance Metric 3. Try Resampling Your Dataset 4. Try Generate Synthetic Samples 5. Try Different Algorithms 6. Try Penalized Models 7. Try a Different Perspective 8. Try Getting Creative "8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset" @ Machine Learning Mastery 4
  • 5. Problem Definition (for rare events) Deep Neural Networks Extend the training data Re‑train them to handle such rare or new events Very SLOW!! (gradient‑based optimization) Humans (life‑long fashion) Learn from single example 5
  • 6. Key Concepts Deep Neural Networks (+ Memory Module) 6
  • 7. Previous Works Meta‑Learning with Memory Augmented Neural Networks Idea: Write the pair of "image and label" into the memory Matching Networks for One Shot Learning Idea: Train fully end‑to‑end nearest neighbor classifier Note from A. Karpathy 7
  • 8. Memory module Define a memory of size  memory-size as a triple: M = (K , V , A ) m:  memory-size , k:  key-size . Key: activations of a chosen layer of a neural network. Value: ground‑truth targets for the given example. Age: track the ages of the items stored in memory. m×k m m 8
  • 9. Memory module (query) Memory query q is a vector of size  key-size : q =R , ∣∣q∣∣ = 1 The nearest neighbor(*) of q in M : NN(q, M) = arg q ⋅ K[i]. Given a query q, Memory M will compute k‑NN: (n , ..., n ) = NN (q, M) Return the main result. the value V [n ] k i max 1 k k 1 (*) Since the keys are normalized, the nearest neighbor w.r.t. cosine similarity. 9
  • 10. Memory module (query) Cosine similarity: d = q ⋅ K[n ] Return  softmax (d ⋅ τ, ..., d ⋅ τ) Inverse of softmax temperature: τ = 40 i i 1 k 10
  • 11. [Note] Softmax temperature, τ The idea is to control randomness of predictions : Softmax outputs are more close to each other : Softmax outputs are more and more "hardmax" For a low temperature (τ → 0 ), the probability of the output with the highest expected reward tends to 1. + 11
  • 12. Memory module (episode) Slide from "Meta‑learning with memory‑augmented neural networks" (Slideshare, H. Kim) 12
  • 13. Memory module (train) Memory loss Query q and the correct desired (supervised) value v. Classification: v would be the class label. Seq2Seq: v would be the output token of the current time step. 13
  • 14. Memory module (train) loss(q, v, M) = [q ⋅ K[n ] − q ⋅ K[n ] + α] K[n ]: positive neightbor, V [n ] = v K[n ]: negative neightbor, V [n ] ≠ v α: Margin to make loss as zero b p + p p b b 14
  • 15. Memory module (Update) Case V [n ] = v: K[n ] ← A[n ] ← 0 Case V [n ] ≠ v: if memory has empty space at n , assign n with n if not, n = max(A[n ]) K[n ] ← q, V [n ] ← v, and A[n ] ← 0. p 1 ∣∣q+k[n ]∣∣1 q+k[n ]1 1 b empty ′ empty ′ k ′ ′ ′ 15
  • 16. Memory module (train & update) 16
  • 17. Experiments (Evaluation) 1. Evaluation on Omniglot dataset 2. Evaluation on synthetic task 3. Evaluation on English‑German translation model Qualitative side: rarely‑occurring words Quantitative side: BLEU score 17
  • 19. Experiments (Omniglot Dataset) Omniglot dataset This dataset contains  1623 different handwritten characters from  50 different alphabets. Each of the 1623 characters was drawn online via Amazon's Mechanical Turk by 20 different people. Each image is paired with stroke data, a sequences of  [x,y,t]  coordinates with time  (t) in milliseconds. Stroke data is available in MATLAB files only. Omniglot dataset for one‑shot learning (github): https://github.com/brendenlake/omniglot 19
  • 20. Experiments (Omniglot Dataset) CNN Architecture (Conv, ReLU), (Conv, ReLU), pool, (Conv, ReLU), (Conv, ReLU), pool, FC, FC Memory module Output layer (Prediction) 20
  • 21. Experiments (Omniglot Dataset)  way : different alphabets  shot : different characters 21
  • 22. Experiments (GNMT) Decoder path Key: result of attention a Combine value and LSTM output (at decoder time‑step) t 22
  • 24. Experiments (GNMT) Convolutional Gated Recurrent Unit (CGRU) For more information: Read the Lunit tech blog 24
  • 25. Conclusions Long‑term memory module Embedding input with a simple CNN (LeNet) Returning k‑nn could be used for other layers. 25
  • 26. Code Review (Github) 1.  data_utils.py : Data loading and other utilities. 2.  train.py : Script for training model. 3.  memory.py : Memory module for storing "nearest neighbors". 4.  model.py : Model using memory component. 26
  • 27. Quick Start 1) First download and set‑up Omniglot data by running python data_utils.py 2) Then run the training script: python train.py --memory_size=8192 --batch_size=16 --validation_length=50 --episode_width=5 --episode_length=30 27
  • 28. 3) The first validation batch may look like this (although it is noisy): 0-shot: 0.040, 1-shot: 0.404, 2-shot: 0.516, 3-shot: 0.604, 4-shot: 0.656, 5-shot: 0.684 4) At step 500 you may see something like this: 0-shot: 0.036, 1-shot: 0.836, 2-shot: 0.900, 3-shot: 0.940, 4-shot: 0.944, 5-shot: 0.916 5) At step 4000 you may see something like this: 0-shot: 0.044, 1-shot: 0.960, 2-shot: 1.000, 3-shot: 0.988, 4-shot: 0.972, 5-shot: 0.992 28
  • 29. 0) Basic parameters rep_dim: 128, dimension of keys to use in memory episode_length: 100, length of episode episode_width: 5, number of distinct labels in a single episode memory_size: None, number of slots in memory. batch_size: 16, batch size num_episodes: 100000, number of training episodes validation_frequency: 20, every so many training episodes assess validation accuracy validation_length: 10, number of episodes to use to compute validation accuracy seed: 888, random seed for training sampling save_dir: '', directory to save model to use_lsh: False, use locality‑sensitive hashing (NOTE: not fully tested) 29
  • 30. 1) data_utils.py def preprocess_omniglot(): # Download and prepare raw Omniglot data. def maybe_download_data(): # Download Omniglot repo if it does not exist. def write_datafiles(): # Load and preprocess images from a directory and # write them to a file. def crawl_directory(): # Crawls data directory and returns stuff. def resize_images(): # Resize images to new dimensions. 30
  • 31. 1) Tips from data_utils.py  logging 으로메세지를관리한다.  level 조절가능.  pickle 로dump해서사용한다. (TFrecord, queue는..?) 간단한외부명령은 subprocess 로실행한다. train dataset만 augment (rotation) 수행(0, 90, 180, 270도)  resizing 수행(기존: 105, 변환: 28) OUTPUT: train_omni.pkl (733M), test_omni.pkl (126M) 31
  • 32. 2) train.py def data_utils.get_data(): # Get data in form suitable for episodic training. # Returns: Train and test data as dictionaries mapping # label to list of examples. class Trainer(): def run(): self.sample_episode_batch() outputs = self.model.episode_step() 32
  • 33. 2) Tips from train.py 기본적인파라미터는 tf.flags 로전달 학습과 관련된내용들은 logging 으로메세지전달  assert 활용: episode 길이오류확인 train / validation 동시수행(20 : 1 비율) 33
  • 34. 3) model.py class LeNet(object): # Standard CNN architecture class Model(object): # Model for coordinating between CNN embedder and # Memory module. 34
  • 35. 3) model.py Line 152‑158,  core_builder() : embeddings = self.embedder.core_builder(x) if keep_prob < 1.0: embeddings = tf.nn.dropout(embeddings, keep_prob) memory_val, _, teacher_loss = self.memory.query( embeddings, y, use_recent_idx=use_recent_idx) loss, y_pred = self.classifier.core_builder( memory_val, x, y) return loss + teacher_loss, y_pred 35
  • 36. 3) Tips from model.py  core_builder() : 기존네트워크에memory 추가 입력영상 x 에대해 LeNet 을이용해embedding vector 생성  weight ,  bias 는 tf.get_variable 로미리생성 model의각 기능을최대한세분화 36
  • 37. 4) memory.py class Memory(object): def get_hint_pool_idxs(...): # Get small set of idxs to compute nearest neighbor # queries on. def query(...): # Queries memory for nearest neighbor. class LSHMemory(Memory): # Memory employing locality sensitive hashing. # Note: Not fully tested. 37
  • 38. 4) Tips from memory.py  Memory 와 LSHMemory 중선택가능,  memory 사용권고. 논문의memory 동작을직관적으로구현  memory_size 와 key_size 만변경하면거의대부분의네트워크에 접목가능 38
  • 39. Appendix (Reviews) 1. Lunit Tech Blog (by Hyo‑Eun Kim) (Link) 2. OpenReview (ICLR2017) (Link) 3. BAIR Blog: "Learning to Learn" (by Chelsea Finn) (Link) 4. Learning to remember rare events (by Hongbae Kim) (Slideshare) 5. One Shot Learning (by Jisung Kim) (Slideshare) 39