Advanced Machine Learning 1
Advanced Machine Learning 1
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 2
Final Exam
To be accepted for the final exam you have to deliver all but one python exercises in time
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 3
Learning objectives
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 4
Introduction to Recurrent Neural
Networks
Feed-Forward Artificial Neural Networks
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 6
Task: Prediction of next letter
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 7
Solution: Prediction of next letter
Flatten the input for
import numpy as np model = Sequential() the FFN
import tensorflow as tf
from [Link] import Sequential [Link](Input((seq_length,len(chars))))
from [Link] import Dense, Input, Flatten [Link](Flatten())
[Link](Dense(128, activation='relu'))
text = "The Research Master Data Science at HSBI rocks!" [Link](Dense(len(chars), activation='softmax'))
chars = sorted(list(set(text)))
char_to_index = {char: i for i, char in enumerate(chars)} [Link](optimizer='adam', loss='categorical_crossentropy',
index_to_char = {i: char for i, char in enumerate(chars)} metrics=['accuracy'])
[Link](X_one_hot, y_one_hot, epochs=100)
seq_length = 3
sequences = [] start_seq = "The Re"
labels = [] generated_text = start_seq
Mapping from letters
to numbers
for i in range(len(text) - seq_length): for i in range(60):
seq = text[i:i+seq_length] x = [Link]([[char_to_index[char] for char in generated_text[-seq_length:]]])
label = text[i+seq_length] x_one_hot = tf.one_hot(x, len(chars))
[Link]([char_to_index[char] for char in seq]) prediction = [Link](x_one_hot)
[Link](char_to_index[label]) next_index = [Link](prediction)
next_char = index_to_char[next_index]
X = [Link](sequences) generated_text += next_char
y = [Link](labels) Generate the output
print("Generated Text:")
X_one_hot = tf.one_hot(X, len(chars)) Transforming the print(generated_text)
y_one_hot = tf.one_hot(y, len(chars)) text to numbers
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 8
Task: Prediction of last letter of a word
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 9
Two Problems of feed forward neural networks
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 10
Recurrent Neural Networks (RNN)
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 11
A closer look…
H O
(0)
Input 𝑿𝒕 ∈ ℝ𝑛×𝑑 = 𝑯𝑡
(𝒍)
The output of the 𝑙 𝑡ℎ hidden layer 𝑯𝒕 ∈ ℝ𝒏×𝒉𝒍 with 𝑙 = 1, … , 𝐿 is:
(𝒍) (𝑙−1) (𝑙) (𝑙) (𝑙) (𝑙)
𝑯𝒕 = 𝜙𝑙 𝑯𝑡 𝑾ℎ𝑙−1 ℎ𝑙 + 𝑯𝑡−1 𝑾ℎ𝑙 ℎ𝑙 + 𝒃ℎ𝑙
(𝑙) (𝑙)
with 𝑾ℎ𝑙 ℎ𝑙 ∈ ℝ𝒉𝒍 ×𝒉𝒍 and 𝑾ℎ𝑙−1 ℎ𝑙 ∈ ℝ𝒉𝒍−𝟏 ×𝒉𝒍
The output layer is:
(𝐿)
𝑶𝑡 = 𝑯 𝑡 𝑾 ℎ𝐿 𝑞 + 𝒃 𝑞
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede [Link] Seite 13
Types of RNNs
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 14
Task: Applications of RNN
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 15
Applications
One-to-Many
Image captioning
Video Frame Prediction / Video Generation
Music Generation
Sentence Generation from Keywords
Time Series Forecasting
Poetry Generation from a Single Theme
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 16
Applications
Many–to-One
Sentiment analysis
Text Classification
Time Series Forecasting (Point Prediction)
Machine Translation (Encoder)
Video Classification
Anomaly Detection in Time Series
Named Entity Recognition (NER)
Activity Recognition (Wearable Sensors)
DNA/Protein Sequence Classification
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 17
Applications
Many-to-Many
Machine Translation
Speech Recognition (Sequence-to-Sequence)
Video Captioning
Time Series Forecasting (Multi-step)
Named Entity Recognition (NER)
Video Classification (Frame-Level Labels)
Music Generation (Note-by-Note)
Part-of-Speech Tagging
Image Generation from Text (Text-to-Image)
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 18
Task: Prediction of last letter of a word
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 19
Solution 1/2: Prediction of last letter of a word
Group words by
import re
length for training
import numpy as np
import tensorflow as tf and testing in
from sklearn.model_selection import train_test_split
ef group_words_by_length(words, labels): batches
# Create a dictionary to store words by their length
from [Link] import Sequential
length_groups_words = {}
from [Link] import SimpleRNN, Dense, Input
length_groups_labels = {}
with open('data/long_text.txt', 'r') as file:
text = [Link]()
Get rid of signs and # Iterate over the list of words
transform to lower for i, word in enumerate(words):
text_without_signs = [Link](r'[^a-zA-Z\s]', '', text).lower() case word_length = len(word)
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 20
Solution 2/2: Prediction of last letter of a word
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 21
Padding to deal with different input sizes
Example:
[[ 711 6 71 0 0 0]
[ 73 8 2 55 7 0]
A better way to deal with different input sizes is padding
[ 83 91 45 64 3 7]]
The maximum input size for training must be set before
All inputs small that maximum input size are padded with a special sign (e.g. zero)
Padding can be done before or after the real input
This function transforms a list (of length num_samples) of sequences (lists of integers) into a
2D NumPy array of shape (num_samples, num_timesteps).
num_timesteps is either the maxlen argument if provided, or the length of the longest sequence
in the list
Sequences that are shorter than num_timesteps are padded with value until they are
num_timesteps long.
Sequences longer than num_timesteps are truncated so that they fit the desired length.
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 22
Masking to help the RNN deal with padded inputs
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede [Link] Seite 23
Task: Prediction of last letter of a word using padding and masking
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 24
Solution: Prediction of last letter of a word using padding and masking
import re padded_sequences = pad_sequences(X_train, maxlen=seq_length_max, padding='pre')
import numpy as np
import tensorflow as tf X = [Link](padded_sequences)
from [Link] import pad_sequences y = [Link](y_train)
from sklearn.model_selection import train_test_split
from [Link] import Sequential X_one_hot = tf.one_hot(X, len(chars)+1)
from [Link] import SimpleRNN, Dense, Input, Masking
def set_zero_to_zeros(X_one_hot):
with open('data/long_text.txt', 'r') as file: X_one_hot_np = X_one_hot.numpy()
text = [Link]()
# Loop through and set the one-hot encoding of padding (0) to be all zeros
text_without_signs = [Link](r'[^a-zA-Z\s]', '', text).lower() for batch in range([Link][0]): # Loop through each sequence in the batch
for idx, value in enumerate(X[batch]):
chars = sorted(list(set(text_without_signs))) if value == 0: # Check if the value is padding (0)
char_to_index = {char: i+1 for i, char in enumerate(chars)} X_one_hot_np[batch, idx] = [Link](len(chars) + 1)
index_to_char = {i+1: char for i, char in enumerate(chars)}
return tf.convert_to_tensor(X_one_hot_np)
words = text_without_signs.split()
X_one_hot = set_zero_to_zeros(X_one_hot)
sequences = [] y_one_hot = tf.one_hot(y, len(chars)+1)
labels = []
seq_length_max = 0 model = Sequential()
[Link](Input((None, len(chars)+1)))
for word in words: # mask the padding in training
[Link]([char_to_index[letter] for letter in word][0:len(word)-1]) #[Link](Masking(mask_value=0))
[Link](char_to_index[word[-1]]) [Link](SimpleRNN(256, activation='tanh'))
if len(word) > seq_length_max: [Link](Dense( len(chars)+1, activation='softmax'))
seq_length_max = len(word)
[Link](optimizer='adam', loss='CategoricalCrossentropy', metrics=['accuracy'])
X_train, X_test, y_train, y_test = train_test_split(sequences, labels, test_size=0.2, [Link](X_one_hot, y_one_hot, epochs=200)
random_state=42)
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 26
Bag of Word (Bow)
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 27
Example: Bag of Words
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 28
Word Embeddings
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 29
import numpy as np
Example: Word embeddings import [Link] as plt
from [Link] import TSNE
from [Link] import KeyedVectors
# Sample words to visualize (choose words related to animals, fruits, and countries for clarity)
words = ['dog', 'cat', 'apple', 'banana', 'france', 'germany', 'lion', 'tiger', 'paris', 'berlin']
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 30
Word2Vec
Word2Vec is a approach based on artificial neural networks for generating word embeddings
Developed by a team at Google
Word2Vec aims to capture the semantic relationships between words by mapping them to high-dimensional
vectors
There are two neural embedding methods for Word2Vec:
Continuous Bag of Words (CBOW) and Skip-gram
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede [Link] Seite 31
Continuous Bag of Words (CBOW)
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 32
Skip-Gram Model
Skip-Gram Model also learns distributed representations of words in a continuous vector space
main objective of Skip-Gram is to predict context words (words surrounding a target word) given a target
word
This is the opposite of the Continuous Bag of Words (CBOW) model
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 33
Using the Embedding Layer
Embedding Layer can be integrated into the RNN after the input layer
The layer is trained with backpropagation together with the rest of the model and learns a word
embedding (representation) that fits to the task at hand
Pretrained-weights can be used to initialize the layer
Masking can be activated
[Link](
input_dim,
output_dim,
mask_zero=False,
weights=None,
)
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 34
Task: Prediction of last letter of a word using embeddings
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 35
Task: Prediction of last letter of a word using
# one-to-many rnn embeddings
# Add zero padding to sequences at the beginning
import re padded_sequences = pad_sequences(X_train, maxlen=seq_length_max, padding='pre')
import numpy as np
import tensorflow as tf X = [Link](padded_sequences)
from [Link] import pad_sequences y = [Link](y_train)
from sklearn.model_selection import train_test_split
from [Link] import Sequential vocab_size = len(chars) + 1
from [Link] import SimpleRNN, Dense, Input, Embedding
y_one_hot = tf.one_hot(y, vocab_size)
with open('data/long_text.txt', 'r') as file:
text = [Link]() model = Sequential()
[Link](Input((None, )))
text_without_signs = [Link](r'[^a-zA-Z\s]', '', text).lower() [Link](Embedding(input_dim=vocab_size, output_dim=12, mask_zero=True))
[Link](SimpleRNN(256, activation='tanh'))
chars = sorted(list(set(text_without_signs))) [Link](Dense(vocab_size, activation='softmax'))
char_to_index = {char: i+1 for i, char in enumerate(chars)}
index_to_char = {i+1: char for i, char in enumerate(chars)} [Link](optimizer='adam', loss='CategoricalCrossentropy', metrics=['accuracy'])
[Link](X, y_one_hot, epochs=200)
words = text_without_signs.split()
# testing
sequences = []
labels = [] padded_sequences = pad_sequences(X_test, maxlen=seq_length_max, padding='pre')
seq_length_max = 0 X = [Link](padded_sequences)
y_pred_rnn = [Link](X)
for word in words: y_pred_rnn= [Link](y_pred_rnn, axis=1)
[Link]([char_to_index[letter] for letter in word][0:len(word)-1]) y = [Link](y_test)
[Link](char_to_index[word[-1]])
if len(word) > seq_length_max: accuracy = ([Link](y_pred_rnn == y))/len(y)
seq_length_max = len(word) print(f"Accuracy: {accuracy:.4f}")
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 36
Conclusion
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 37
Homework
Create a RNN to predict the
length of a word. Use the
long_text.txt data.
Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 38
Questions?
Prof. Dr.-Ing. Christian Schwede
Fachbereich Ingenieurwissenschaften und Mathematik
Campus Gütersloh
Studiengangsleiter Forschungsmaster Data Science
Mitglied des Vorstands Institute for Data Science Solutions (IDAS)
[Link]@[Link]