0% found this document useful (0 votes)
30 views40 pages

Advanced Machine Learning 1

The document outlines a course on Advanced Machine Learning focused on Recurrent Neural Networks (RNNs) and their applications. It covers topics such as generative methods, graph machine learning, and the structure and function of RNNs, including various types and their applications in tasks like sentiment analysis and machine translation. The course includes practical exercises and a final oral exam, emphasizing the processing of sequential data with variable lengths using RNNs.

Uploaded by

samuel tekyi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views40 pages

Advanced Machine Learning 1

The document outlines a course on Advanced Machine Learning focused on Recurrent Neural Networks (RNNs) and their applications. It covers topics such as generative methods, graph machine learning, and the structure and function of RNNs, including various types and their applications in tasks like sentiment analysis and machine translation. The course includes practical exercises and a final oral exam, emphasizing the processing of sequential data with variable lengths using RNNs.

Uploaded by

samuel tekyi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Advanced Machine Learning

Introduction to Recurrent Neural Networks


Prof. Dr. Christian Schwede
Content - Overview

Class 1: (Introduction to Recurrent Networks)


Recurrent Neural Networks
Padding and Masking
Embeddings
Recurrent Neural Networks
Long short-term memory
Gated recurrent unit
Differentiable neural computer
Generative Methods
Generative Adversarial Networks
Variational Autoencoder
Diffusion Models
Generative pre-trained transformer
Graph Machine Learning
Introduction into GML
Graph Neural Networks

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 2
Final Exam

Oral exam of 30 minutes

To be accepted for the final exam you have to deliver all but one python exercises in time

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 3
Learning objectives

How can sequential data with variable length be processed


with Artificial Neural Networks?

What are Recurrent Neural Networks and how do they work?

How can text be transformed to be used as feature vectors?

How can batch processing be applied with various input


length?

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 4
Introduction to Recurrent Neural
Networks
Feed-Forward Artificial Neural Networks

Feed-Forward Networks (FFN) are build


to learn a function 𝐲 = 𝑓 𝒙 from data pairs
𝒙, 𝒚 with 𝒙, 𝒚 ∈ ℝ𝑛 , ℝ𝑚
Inputs are fed forward through the network
from input to output, using weights and
activation functions (e.g. sigmoid, ReLu,
tanh, swish) to modify the output of every
neuron
FFN are trained with gradient descent using
back-propagation based on mini-
batches (stochastic gradient descent)
With Convolutional Neural Networks
(CNN) inputs with a matrix format such as
images can be used

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 6
Task: Prediction of next letter

Build a FFN using tensor flow that


predicts the next letter in a
sentence. Use “The Research
Master Data Science at HSBI
rocks!” as data and forget about
test and train split.

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 7
Solution: Prediction of next letter
Flatten the input for
import numpy as np model = Sequential() the FFN
import tensorflow as tf
from [Link] import Sequential [Link](Input((seq_length,len(chars))))
from [Link] import Dense, Input, Flatten [Link](Flatten())
[Link](Dense(128, activation='relu'))
text = "The Research Master Data Science at HSBI rocks!" [Link](Dense(len(chars), activation='softmax'))
chars = sorted(list(set(text)))
char_to_index = {char: i for i, char in enumerate(chars)} [Link](optimizer='adam', loss='categorical_crossentropy',
index_to_char = {i: char for i, char in enumerate(chars)} metrics=['accuracy'])
[Link](X_one_hot, y_one_hot, epochs=100)
seq_length = 3
sequences = [] start_seq = "The Re"
labels = [] generated_text = start_seq
Mapping from letters
to numbers
for i in range(len(text) - seq_length): for i in range(60):
seq = text[i:i+seq_length] x = [Link]([[char_to_index[char] for char in generated_text[-seq_length:]]])
label = text[i+seq_length] x_one_hot = tf.one_hot(x, len(chars))
[Link]([char_to_index[char] for char in seq]) prediction = [Link](x_one_hot)
[Link](char_to_index[label]) next_index = [Link](prediction)
next_char = index_to_char[next_index]
X = [Link](sequences) generated_text += next_char
y = [Link](labels) Generate the output
print("Generated Text:")
X_one_hot = tf.one_hot(X, len(chars)) Transforming the print(generated_text)
y_one_hot = tf.one_hot(y, len(chars)) text to numbers

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 8
Task: Prediction of last letter of a word

How can we build a FNN to


predict the last letter of a word?

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 9
Two Problems of feed forward neural networks

The size of input samples has to be fixed


The input data is not linked together
 But for “predicting the last letter of a word”: previous letters are required and hence there is a need to
remember them

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 10
Recurrent Neural Networks (RNN)

Recurrent Neural Networks:


Work good with sequential data like time-series
data and text data or video and audio streams
the output from the previous step is fed as input to
the current step: 𝒉𝒕 , called hidden state (Memory
State)
The fundamental processing unit in a RNN is a
Recurrent Unit or Cell
In each “time” step an additional input and an
additional output can be generated
Variable input and output length are possible

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 11
A closer look…
H O

FFN 𝑾𝑑ℎ 𝑾ℎ𝑞


Consider a simple FFN with one hidden layer 𝑿 𝑯 𝑶
We propagate a batch 𝑿 ∈ ℝ𝑛×𝑑 with batch size 𝑛 und feature
dimension 𝑑 as input
The output of a hidden layer is than 𝑯 = 𝜙 𝑿𝑾𝑑ℎ + 𝒃ℎ , with 𝜙 as
activation function
The last layer would calculate 𝑶 = 𝑯𝑾ℎ𝑞 + 𝒃𝑞 as q-dimensional
output 𝑶 ∈ ℝ𝑛×𝑞
RNN H O
Now we propagate 𝑿𝒕 ∈ ℝ𝑛×𝑑 at every time step 𝑡
The output of the hidden layer at time step 𝑡 depends on the output 𝑾𝑑ℎ 𝑾ℎ𝑞
(hidden state) at time step 𝑡 − 1: 𝑿𝒕 𝑯𝒕 𝑶𝒕
𝑯𝒕 = 𝜙 𝑿𝑡 𝑾𝑑ℎ + 𝑯𝑡−1 𝑾ℎℎ + 𝒃ℎ
The output layer stays the same
Note: The amount of weights does not grow with time, since 𝑾ℎℎ
weights are shared through all time steps

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede


𝑯𝒕−𝟏 Seite 12
Deeper Recurrent Neural Networks

(0)
Input 𝑿𝒕 ∈ ℝ𝑛×𝑑 = 𝑯𝑡
(𝒍)
The output of the 𝑙 𝑡ℎ hidden layer 𝑯𝒕 ∈ ℝ𝒏×𝒉𝒍 with 𝑙 = 1, … , 𝐿 is:
(𝒍) (𝑙−1) (𝑙) (𝑙) (𝑙) (𝑙)
𝑯𝒕 = 𝜙𝑙 𝑯𝑡 𝑾ℎ𝑙−1 ℎ𝑙 + 𝑯𝑡−1 𝑾ℎ𝑙 ℎ𝑙 + 𝒃ℎ𝑙
(𝑙) (𝑙)
with 𝑾ℎ𝑙 ℎ𝑙 ∈ ℝ𝒉𝒍 ×𝒉𝒍 and 𝑾ℎ𝑙−1 ℎ𝑙 ∈ ℝ𝒉𝒍−𝟏 ×𝒉𝒍
The output layer is:
(𝐿)
𝑶𝑡 = 𝑯 𝑡 𝑾 ℎ𝐿 𝑞 + 𝒃 𝑞

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede [Link] Seite 13
Types of RNNs

One-to-many: single input and multiple outputs; example image legend


Many-to one: several inputs and a single output; example sentiment analysis of text: identify a feeling
from a group of words
Many-to-many: several inputs and get several outputs; not necessarily with the same input and output
dimension; example translation of text

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 14
Task: Applications of RNN

Of what applications can you


think of and what type of RNN
would be needed?

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 15
Applications

One-to-Many
Image captioning
Video Frame Prediction / Video Generation
Music Generation
Sentence Generation from Keywords
Time Series Forecasting
Poetry Generation from a Single Theme

„Crazy professor hiking in the lower


Andes ;-)“

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 16
Applications

Many–to-One
Sentiment analysis
Text Classification
Time Series Forecasting (Point Prediction)
Machine Translation (Encoder)
Video Classification
Anomaly Detection in Time Series
Named Entity Recognition (NER)
Activity Recognition (Wearable Sensors)
DNA/Protein Sequence Classification

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 17
Applications

Many-to-Many
Machine Translation
Speech Recognition (Sequence-to-Sequence)
Video Captioning
Time Series Forecasting (Multi-step)
Named Entity Recognition (NER)
Video Classification (Frame-Level Labels)
Music Generation (Note-by-Note)
Part-of-Speech Tagging
Image Generation from Text (Text-to-Image)

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 18
Task: Prediction of last letter of a word

How can we build a RNN to predict the


last letter of a word using the
long_text.txt data file?
Use [Link].
How can you train the model when word
have different sizes?

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 19
Solution 1/2: Prediction of last letter of a word

Group words by
import re
length for training
import numpy as np
import tensorflow as tf and testing in
from sklearn.model_selection import train_test_split
ef group_words_by_length(words, labels): batches
# Create a dictionary to store words by their length
from [Link] import Sequential
length_groups_words = {}
from [Link] import SimpleRNN, Dense, Input
length_groups_labels = {}
with open('data/long_text.txt', 'r') as file:
text = [Link]()
Get rid of signs and # Iterate over the list of words
transform to lower for i, word in enumerate(words):
text_without_signs = [Link](r'[^a-zA-Z\s]', '', text).lower() case word_length = len(word)

# If the length is not in the dictionary, add it as a key


chars = sorted(list(set(text_without_signs)))
if word_length not in length_groups_words:
char_to_index = {char: i+1 for i, char in enumerate(chars)}
length_groups_words[word_length] = []
index_to_char = {i+1: char for i, char in enumerate(chars)}
length_groups_labels[word_length] = []
words = text_without_signs.split() Create a list of words
# Add the word to the corresponding length group
length_groups_words[word_length].append(word)
sequences = []
length_groups_labels[word_length].append(labels[i])
labels = []
length_groups_words=dict(sorted(length_groups_words.items()))
for word in words: Transform to length_groups_labels = dict(sorted(length_groups_labels.items()))
[Link]([char_to_index[letter] for letter in word][0:len(word)-1]) numbers # Convert the dictionary values into a list of lists
[Link](char_to_index[word[-1]])
return list(length_groups_words.values()), list(length_groups_labels.values())
X_train, X_test, y_train, y_test = train_test_split(sequences, labels, test_size=0.2, random_state=42)

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 20
Solution 2/2: Prediction of last letter of a word

model = Sequential() Variable length of X_list, y_list = group_words_by_length(X_test, y_test)


# get rid of empty X (word with one letter)
[Link](Input((None, len(chars)+1))) input dimension
X_list = X_list[1:]
[Link](SimpleRNN(256, activation='tanh'))
y_list = y_list[1:]
[Link](Dense( len(chars)+1, activation='softmax'))
correct_rnn = 0.0 Test separately as
total_samples = 0
[Link](optimizer='adam', loss='CategoricalCrossentropy', metrics=['accuracy']) well
for i in range(len(X_list)):
X = [Link](X_list[i])
y = [Link](y_list[i])

X_one_hot = tf.one_hot(X, len(chars)+1)


# training y_one_hot = tf.one_hot(y, len(chars)+1)
X_list, y_list = group_words_by_length(X_train, y_train)
# get rid of empty X (word with one letter) y_pred = [Link](X_one_hot)
X_list = X_list[1:] y_pred = [Link](y_pred, axis=1)
y_list = y_list[1:]
# Count correct predictions
for i in range(len(X_list)): Train each batch of correct_rnn += [Link](y_pred == y)
total_samples += len(y)
X = [Link](X_list[i]) words with same
y = [Link](y_list[i]) length separately accuracy = correct_rnn / total_samples
X_one_hot = tf.one_hot(X, len(chars)+1) print(f"Test Accuracy: {accuracy:.4f}")
y_one_hot = tf.one_hot(y, len(chars)+1)

[Link](X_one_hot, y_one_hot, epochs=200)

Test Accuracy: 0.2482

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 21
Padding to deal with different input sizes
Example:
[[ 711 6 71 0 0 0]
[ 73 8 2 55 7 0]
A better way to deal with different input sizes is padding
[ 83 91 45 64 3 7]]
The maximum input size for training must be set before
All inputs small that maximum input size are padded with a special sign (e.g. zero)
Padding can be done before or after the real input

[Link].pad_sequences(sequences=data:list, maxlen=seq_length_max, padding='pre'‚ truncating='pre' , value=0)

This function transforms a list (of length num_samples) of sequences (lists of integers) into a
2D NumPy array of shape (num_samples, num_timesteps).
num_timesteps is either the maxlen argument if provided, or the length of the longest sequence
in the list
Sequences that are shorter than num_timesteps are padded with value until they are
num_timesteps long.
Sequences longer than num_timesteps are truncated so that they fit the desired length.

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 22
Masking to help the RNN deal with padded inputs

Masking is a way to tell sequence-processing layers that


certain timesteps in an input are missing, and thus should be
skipped when processing the data
The masking layer act as a Boolean filter that will not let
masked inputs (e.g. inputs padded with zeros) pass
The effect is, that the output of the rnn as well as the hidden
state are passed unchanged to the next sequence step

There are three ways to introduce input masks in Keras models:


Add a [Link] layer
([Link](Masking(mask_value=0))
Pass a mask argument manually when calling layers that
support this argument (e.g. RNN layers)
Configure a [Link] layer with
mask_zero=True.

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede [Link] Seite 23
Task: Prediction of last letter of a word using padding and masking

Adjust the previous model adding


padding and a masking layer after the
input layer to the RNN.

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 24
Solution: Prediction of last letter of a word using padding and masking
import re padded_sequences = pad_sequences(X_train, maxlen=seq_length_max, padding='pre')
import numpy as np
import tensorflow as tf X = [Link](padded_sequences)
from [Link] import pad_sequences y = [Link](y_train)
from sklearn.model_selection import train_test_split
from [Link] import Sequential X_one_hot = tf.one_hot(X, len(chars)+1)
from [Link] import SimpleRNN, Dense, Input, Masking
def set_zero_to_zeros(X_one_hot):
with open('data/long_text.txt', 'r') as file: X_one_hot_np = X_one_hot.numpy()
text = [Link]()
# Loop through and set the one-hot encoding of padding (0) to be all zeros
text_without_signs = [Link](r'[^a-zA-Z\s]', '', text).lower() for batch in range([Link][0]): # Loop through each sequence in the batch
for idx, value in enumerate(X[batch]):
chars = sorted(list(set(text_without_signs))) if value == 0: # Check if the value is padding (0)
char_to_index = {char: i+1 for i, char in enumerate(chars)} X_one_hot_np[batch, idx] = [Link](len(chars) + 1)
index_to_char = {i+1: char for i, char in enumerate(chars)}
return tf.convert_to_tensor(X_one_hot_np)
words = text_without_signs.split()
X_one_hot = set_zero_to_zeros(X_one_hot)
sequences = [] y_one_hot = tf.one_hot(y, len(chars)+1)
labels = []
seq_length_max = 0 model = Sequential()
[Link](Input((None, len(chars)+1)))
for word in words: # mask the padding in training
[Link]([char_to_index[letter] for letter in word][0:len(word)-1]) #[Link](Masking(mask_value=0))
[Link](char_to_index[word[-1]]) [Link](SimpleRNN(256, activation='tanh'))
if len(word) > seq_length_max: [Link](Dense( len(chars)+1, activation='softmax'))
seq_length_max = len(word)
[Link](optimizer='adam', loss='CategoricalCrossentropy', metrics=['accuracy'])
X_train, X_test, y_train, y_test = train_test_split(sequences, labels, test_size=0.2, [Link](X_one_hot, y_one_hot, epochs=200)
random_state=42)

# Add zero padding to sequences at the beginning

Accuracy with masking: 0.7698 Accuracy: 0.8179


Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 25
How to pass more complex text as an input to an RNN?

We have already used One-Hot Encoding


One-hot encoding results in high-dimensional vectors, making it computationally expensive and
memory-intensive, especially with large vocabularies
It does not capture semantic relationships between words; each word is treated as an isolated
entity without considering its meaning or context
It is restricted to the vocabulary seen during training, making it unsuitable for handling out-of-
vocabulary words

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 26
Bag of Word (Bow)

Bag of Word (Bow) is a text representation technique


that represents a document as an unordered set of
words and their respective frequencies
It discards the word order and captures the frequency of
each word in the document, creating a vector
representation
BoW ignores the order of words in the document,
leading to a loss of sequential information and
context
less effective for tasks where word order is crucial, such
as in natural language understanding
BoW representations are often sparse
increased memory requirements and computational
inefficiency (especially when dealing with large
datasets)

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 27
Example: Bag of Words

from sklearn.feature_extraction.text import CountVectorizer


Vocabulary: {'the': 13, 'quick': 10, 'brown': 1, 'fox': 4,
'jumps': 7, 'over': 9, 'lazy': 8, 'dog': 3, 'never': 5, 'jump': 6,
# Sample sentences (documents)
corpus = [ 'quickly': 11, 'foxes': 2, 'are': 0, 'and': 12}
'The quick brown fox jumps over the lazy dog',
'Never jump over the lazy dog quickly',
'Brown foxes are quick and lazy',
]
Bag of Words Matrix:
# Initialize the CountVectorizer (BoW model)
[[1 1 0 1 1 0 0 1 1 1 1 0 0 2]
vectorizer = CountVectorizer() [1 0 0 1 0 1 1 0 1 1 0 1 0 2]
# Fit and transform the corpus into a bag-of-words model [0 1 1 0 0 0 0 0 1 0 1 0 1 0]]
X = vectorizer.fit_transform(corpus)

# Show the vocabulary (index mapping)


print("Vocabulary:", vectorizer.vocabulary_)
Feature names: ['and' 'are' 'brown' 'dog' 'fox' 'foxes' 'jump'
# Show the Bag of Words representation
'jumps' 'lazy' 'over' 'quick' 'quickly' 'the' 'never']
print("Bag of Words Matrix:\n", [Link]())

# Show the feature names (words in the vocabulary)


print("Feature names:", vectorizer.get_feature_names_out())

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 28
Word Embeddings

Word Embedding is an approach for representing words and documents


Word Embedding or Word Vector is a numeric vector input that represents a word
in a lower-dimensional space
It allows words with similar meanings to have a similar representation

Need for Word Embedding?


To reduce dimensionality
To use a word to predict the words with similar meaning
Inter-word semantics must be captured

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 29
import numpy as np
Example: Word embeddings import [Link] as plt
from [Link] import TSNE
from [Link] import KeyedVectors

# Load pre-trained GloVe or Word2Vec embeddings


word_vectors = KeyedVectors.load_word2vec_format('data/[Link]', binary=True)

# Sample words to visualize (choose words related to animals, fruits, and countries for clarity)
words = ['dog', 'cat', 'apple', 'banana', 'france', 'germany', 'lion', 'tiger', 'paris', 'berlin']

# Retrieve the word embeddings for the selected words


word_embeddings = [Link]([word_vectors[word] for word in words])
n_samples = len(word_embeddings)

# Set perplexity to a value less than n_samples (e.g., 5 or 10)


perplexity_value = min(30, n_samples // 3) # Adjust based on your dataset size

# Initialize t-SNE with adjusted perplexity


tsne = TSNE(n_components=2, perplexity=perplexity_value, random_state=42)

# Fit and transform the embeddings


word_embeddings_2d = tsne.fit_transform(word_embeddings)

# Plot the words in the 2D space


[Link](figsize=(8, 6))
[Link](word_embeddings_2d[:, 0], word_embeddings_2d[:, 1], color='blue')

# Annotate the points with the corresponding words


for i, word in enumerate(words):
[Link](word, xy=(word_embeddings_2d[i, 0], word_embeddings_2d[i, 1]), fontsize=12)

[Link]("2D Visualization of Word Embeddings")


[Link](True)
[Link]()

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 30
Word2Vec

Word2Vec is a approach based on artificial neural networks for generating word embeddings
Developed by a team at Google
Word2Vec aims to capture the semantic relationships between words by mapping them to high-dimensional
vectors
There are two neural embedding methods for Word2Vec:
Continuous Bag of Words (CBOW) and Skip-gram

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede [Link] Seite 31
Continuous Bag of Words (CBOW)

CBOW is a feedforward neural network with a


single hidden layer
The input layer represents the context
words
The output layer represents the target word at
the center of the window of context words
The hidden layer contains the learned
continuous vector representations (word
embeddings) of the input words
The dimensionality of the hidden layer
represents the size of the word embeddings

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 32
Skip-Gram Model

Skip-Gram Model also learns distributed representations of words in a continuous vector space
main objective of Skip-Gram is to predict context words (words surrounding a target word) given a target
word
This is the opposite of the Continuous Bag of Words (CBOW) model

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 33
Using the Embedding Layer

Embedding Layer can be integrated into the RNN after the input layer
The layer is trained with backpropagation together with the rest of the model and learns a word
embedding (representation) that fits to the task at hand
Pretrained-weights can be used to initialize the layer
Masking can be activated
[Link](
input_dim,
output_dim,
mask_zero=False,
weights=None,
)

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 34
Task: Prediction of last letter of a word using embeddings

Adjust the previous model replacing the


mask layer by an embedding. Get rid of
the one-hot-encoding of the input.

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 35
Task: Prediction of last letter of a word using
# one-to-many rnn embeddings
# Add zero padding to sequences at the beginning
import re padded_sequences = pad_sequences(X_train, maxlen=seq_length_max, padding='pre')
import numpy as np
import tensorflow as tf X = [Link](padded_sequences)
from [Link] import pad_sequences y = [Link](y_train)
from sklearn.model_selection import train_test_split
from [Link] import Sequential vocab_size = len(chars) + 1
from [Link] import SimpleRNN, Dense, Input, Embedding
y_one_hot = tf.one_hot(y, vocab_size)
with open('data/long_text.txt', 'r') as file:
text = [Link]() model = Sequential()
[Link](Input((None, )))
text_without_signs = [Link](r'[^a-zA-Z\s]', '', text).lower() [Link](Embedding(input_dim=vocab_size, output_dim=12, mask_zero=True))
[Link](SimpleRNN(256, activation='tanh'))
chars = sorted(list(set(text_without_signs))) [Link](Dense(vocab_size, activation='softmax'))
char_to_index = {char: i+1 for i, char in enumerate(chars)}
index_to_char = {i+1: char for i, char in enumerate(chars)} [Link](optimizer='adam', loss='CategoricalCrossentropy', metrics=['accuracy'])
[Link](X, y_one_hot, epochs=200)
words = text_without_signs.split()
# testing
sequences = []
labels = [] padded_sequences = pad_sequences(X_test, maxlen=seq_length_max, padding='pre')
seq_length_max = 0 X = [Link](padded_sequences)
y_pred_rnn = [Link](X)
for word in words: y_pred_rnn= [Link](y_pred_rnn, axis=1)
[Link]([char_to_index[letter] for letter in word][0:len(word)-1]) y = [Link](y_test)
[Link](char_to_index[word[-1]])
if len(word) > seq_length_max: accuracy = ([Link](y_pred_rnn == y))/len(y)
seq_length_max = len(word) print(f"Accuracy: {accuracy:.4f}")

X_train, X_test, y_train, y_test = train_test_split(sequences, labels, test_size=0.2,


random_state=42) Accuracy: 0.7629

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 36
Conclusion

RNNs are used for sequence data of variable input


They use a hidden state that is passed from time step to
time step
There are different architectures and use cases according
to the input and output dimensions
Padding and masking can be used to train with batches of
data with variable length
Embeddings are a good way to transform words to
numbers with respect to preservation of meanings

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 37
Homework
Create a RNN to predict the
length of a word. Use the
long_text.txt data.

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 38
Questions?
Prof. Dr.-Ing. Christian Schwede
Fachbereich Ingenieurwissenschaften und Mathematik
Campus Gütersloh
Studiengangsleiter Forschungsmaster Data Science
Mitglied des Vorstands Institute for Data Science Solutions (IDAS)

[Link]@[Link]

You might also like