0% found this document useful (0 votes)

30 views40 pages

Advanced Machine Learning 1

The document outlines a course on Advanced Machine Learning focused on Recurrent Neural Networks (RNNs) and their applications. It covers topics such as generative methods, graph machine learning, and the structure and function of RNNs, including various types and their applications in tasks like sentiment analysis and machine translation. The course includes practical exercises and a final oral exam, emphasizing the processing of sequential data with variable lengths using RNNs.

Uploaded by

samuel tekyi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views40 pages

Advanced Machine Learning 1

Uploaded by

samuel tekyi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Advanced Machine Learning

Introduction to Recurrent Neural Networks

Prof. Dr. Christian Schwede
Content - Overview

Class 1: (Introduction to Recurrent Networks)

Recurrent Neural Networks
Padding and Masking
Embeddings
Recurrent Neural Networks
Long short-term memory
Gated recurrent unit
Differentiable neural computer
Generative Methods
Generative Adversarial Networks
Variational Autoencoder
Diffusion Models
Generative pre-trained transformer
Graph Machine Learning
Introduction into GML
Graph Neural Networks

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 2
Final Exam

Oral exam of 30 minutes

To be accepted for the final exam you have to deliver all but one python exercises in time

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 3
Learning objectives

How can sequential data with variable length be processed

with Artificial Neural Networks?

What are Recurrent Neural Networks and how do they work?

How can text be transformed to be used as feature vectors?

How can batch processing be applied with various input

length?

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 4
Introduction to Recurrent Neural
Networks
Feed-Forward Artificial Neural Networks

Feed-Forward Networks (FFN) are build

to learn a function 𝐲 = 𝑓 𝒙 from data pairs
𝒙, 𝒚 with 𝒙, 𝒚 ∈ ℝ𝑛 , ℝ𝑚
Inputs are fed forward through the network
from input to output, using weights and
activation functions (e.g. sigmoid, ReLu,
tanh, swish) to modify the output of every
neuron
FFN are trained with gradient descent using
back-propagation based on mini-
batches (stochastic gradient descent)
With Convolutional Neural Networks
(CNN) inputs with a matrix format such as
images can be used

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 6
Task: Prediction of next letter

Build a FFN using tensor flow that

predicts the next letter in a
sentence. Use “The Research
Master Data Science at HSBI
rocks!” as data and forget about
test and train split.

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 7
Solution: Prediction of next letter
Flatten the input for
import numpy as np model = Sequential() the FFN
import tensorflow as tf
from [Link] import Sequential [Link](Input((seq_length,len(chars))))
from [Link] import Dense, Input, Flatten [Link](Flatten())
[Link](Dense(128, activation='relu'))
text = "The Research Master Data Science at HSBI rocks!" [Link](Dense(len(chars), activation='softmax'))
chars = sorted(list(set(text)))
char_to_index = {char: i for i, char in enumerate(chars)} [Link](optimizer='adam', loss='categorical_crossentropy',
index_to_char = {i: char for i, char in enumerate(chars)} metrics=['accuracy'])
[Link](X_one_hot, y_one_hot, epochs=100)
seq_length = 3
sequences = [] start_seq = "The Re"
labels = [] generated_text = start_seq
Mapping from letters
to numbers
for i in range(len(text) - seq_length): for i in range(60):
seq = text[i:i+seq_length] x = [Link]([[char_to_index[char] for char in generated_text[-seq_length:]]])
label = text[i+seq_length] x_one_hot = tf.one_hot(x, len(chars))
[Link]([char_to_index[char] for char in seq]) prediction = [Link](x_one_hot)
[Link](char_to_index[label]) next_index = [Link](prediction)
next_char = index_to_char[next_index]
X = [Link](sequences) generated_text += next_char
y = [Link](labels) Generate the output
print("Generated Text:")
X_one_hot = tf.one_hot(X, len(chars)) Transforming the print(generated_text)
y_one_hot = tf.one_hot(y, len(chars)) text to numbers

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 8
Task: Prediction of last letter of a word

How can we build a FNN to

predict the last letter of a word?

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 9
Two Problems of feed forward neural networks

The size of input samples has to be fixed

The input data is not linked together
 But for “predicting the last letter of a word”: previous letters are required and hence there is a need to
remember them

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 10
Recurrent Neural Networks (RNN)

Recurrent Neural Networks:

Work good with sequential data like time-series
data and text data or video and audio streams
the output from the previous step is fed as input to
the current step: 𝒉𝒕 , called hidden state (Memory
State)
The fundamental processing unit in a RNN is a
Recurrent Unit or Cell
In each “time” step an additional input and an
additional output can be generated
Variable input and output length are possible

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 11
A closer look…
H O

FFN 𝑾𝑑ℎ 𝑾ℎ𝑞

Consider a simple FFN with one hidden layer 𝑿 𝑯 𝑶
We propagate a batch 𝑿 ∈ ℝ𝑛×𝑑 with batch size 𝑛 und feature
dimension 𝑑 as input
The output of a hidden layer is than 𝑯 = 𝜙 𝑿𝑾𝑑ℎ + 𝒃ℎ , with 𝜙 as
activation function
The last layer would calculate 𝑶 = 𝑯𝑾ℎ𝑞 + 𝒃𝑞 as q-dimensional
output 𝑶 ∈ ℝ𝑛×𝑞
RNN H O
Now we propagate 𝑿𝒕 ∈ ℝ𝑛×𝑑 at every time step 𝑡
The output of the hidden layer at time step 𝑡 depends on the output 𝑾𝑑ℎ 𝑾ℎ𝑞
(hidden state) at time step 𝑡 − 1: 𝑿𝒕 𝑯𝒕 𝑶𝒕
𝑯𝒕 = 𝜙 𝑿𝑡 𝑾𝑑ℎ + 𝑯𝑡−1 𝑾ℎℎ + 𝒃ℎ
The output layer stays the same
Note: The amount of weights does not grow with time, since 𝑾ℎℎ
weights are shared through all time steps

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede

𝑯𝒕−𝟏 Seite 12
Deeper Recurrent Neural Networks

(0)
Input 𝑿𝒕 ∈ ℝ𝑛×𝑑 = 𝑯𝑡
(𝒍)
The output of the 𝑙 𝑡ℎ hidden layer 𝑯𝒕 ∈ ℝ𝒏×𝒉𝒍 with 𝑙 = 1, … , 𝐿 is:
(𝒍) (𝑙−1) (𝑙) (𝑙) (𝑙) (𝑙)
𝑯𝒕 = 𝜙𝑙 𝑯𝑡 𝑾ℎ𝑙−1 ℎ𝑙 + 𝑯𝑡−1 𝑾ℎ𝑙 ℎ𝑙 + 𝒃ℎ𝑙
(𝑙) (𝑙)
with 𝑾ℎ𝑙 ℎ𝑙 ∈ ℝ𝒉𝒍 ×𝒉𝒍 and 𝑾ℎ𝑙−1 ℎ𝑙 ∈ ℝ𝒉𝒍−𝟏 ×𝒉𝒍
The output layer is:
(𝐿)
𝑶𝑡 = 𝑯 𝑡 𝑾 ℎ𝐿 𝑞 + 𝒃 𝑞

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede [Link] Seite 13
Types of RNNs

One-to-many: single input and multiple outputs; example image legend

Many-to one: several inputs and a single output; example sentiment analysis of text: identify a feeling
from a group of words
Many-to-many: several inputs and get several outputs; not necessarily with the same input and output
dimension; example translation of text

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 14
Task: Applications of RNN

Of what applications can you

think of and what type of RNN
would be needed?

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 15
Applications

One-to-Many
Image captioning
Video Frame Prediction / Video Generation
Music Generation
Sentence Generation from Keywords
Time Series Forecasting
Poetry Generation from a Single Theme

„Crazy professor hiking in the lower

Andes ;-)“

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 16
Applications

Many–to-One
Sentiment analysis
Text Classification
Time Series Forecasting (Point Prediction)
Machine Translation (Encoder)
Video Classification
Anomaly Detection in Time Series
Named Entity Recognition (NER)
Activity Recognition (Wearable Sensors)
DNA/Protein Sequence Classification

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 17
Applications

Many-to-Many
Machine Translation
Speech Recognition (Sequence-to-Sequence)
Video Captioning
Time Series Forecasting (Multi-step)
Named Entity Recognition (NER)
Video Classification (Frame-Level Labels)
Music Generation (Note-by-Note)
Part-of-Speech Tagging
Image Generation from Text (Text-to-Image)

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 18
Task: Prediction of last letter of a word

How can we build a RNN to predict the

last letter of a word using the
long_text.txt data file?
Use [Link].
How can you train the model when word
have different sizes?

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 19
Solution 1/2: Prediction of last letter of a word

Group words by
import re
length for training
import numpy as np
import tensorflow as tf and testing in
from sklearn.model_selection import train_test_split
ef group_words_by_length(words, labels): batches
# Create a dictionary to store words by their length
from [Link] import Sequential
length_groups_words = {}
from [Link] import SimpleRNN, Dense, Input
length_groups_labels = {}
with open('data/long_text.txt', 'r') as file:
text = [Link]()
Get rid of signs and # Iterate over the list of words
transform to lower for i, word in enumerate(words):
text_without_signs = [Link](r'[^a-zA-Z\s]', '', text).lower() case word_length = len(word)

# If the length is not in the dictionary, add it as a key

chars = sorted(list(set(text_without_signs)))
if word_length not in length_groups_words:
char_to_index = {char: i+1 for i, char in enumerate(chars)}
length_groups_words[word_length] = []
index_to_char = {i+1: char for i, char in enumerate(chars)}
length_groups_labels[word_length] = []
words = text_without_signs.split() Create a list of words
# Add the word to the corresponding length group
length_groups_words[word_length].append(word)
sequences = []
length_groups_labels[word_length].append(labels[i])
labels = []
length_groups_words=dict(sorted(length_groups_words.items()))
for word in words: Transform to length_groups_labels = dict(sorted(length_groups_labels.items()))
[Link]([char_to_index[letter] for letter in word][0:len(word)-1]) numbers # Convert the dictionary values into a list of lists
[Link](char_to_index[word[-1]])
return list(length_groups_words.values()), list(length_groups_labels.values())
X_train, X_test, y_train, y_test = train_test_split(sequences, labels, test_size=0.2, random_state=42)

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 20
Solution 2/2: Prediction of last letter of a word

model = Sequential() Variable length of X_list, y_list = group_words_by_length(X_test, y_test)

# get rid of empty X (word with one letter)
[Link](Input((None, len(chars)+1))) input dimension
X_list = X_list[1:]
[Link](SimpleRNN(256, activation='tanh'))
y_list = y_list[1:]
[Link](Dense( len(chars)+1, activation='softmax'))
correct_rnn = 0.0 Test separately as
total_samples = 0
[Link](optimizer='adam', loss='CategoricalCrossentropy', metrics=['accuracy']) well
for i in range(len(X_list)):
X = [Link](X_list[i])
y = [Link](y_list[i])

X_one_hot = tf.one_hot(X, len(chars)+1)

# training y_one_hot = tf.one_hot(y, len(chars)+1)
X_list, y_list = group_words_by_length(X_train, y_train)
# get rid of empty X (word with one letter) y_pred = [Link](X_one_hot)
X_list = X_list[1:] y_pred = [Link](y_pred, axis=1)
y_list = y_list[1:]
# Count correct predictions
for i in range(len(X_list)): Train each batch of correct_rnn += [Link](y_pred == y)
total_samples += len(y)
X = [Link](X_list[i]) words with same
y = [Link](y_list[i]) length separately accuracy = correct_rnn / total_samples
X_one_hot = tf.one_hot(X, len(chars)+1) print(f"Test Accuracy: {accuracy:.4f}")
y_one_hot = tf.one_hot(y, len(chars)+1)

[Link](X_one_hot, y_one_hot, epochs=200)

Test Accuracy: 0.2482

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 21
Padding to deal with different input sizes
Example:
[[ 711 6 71 0 0 0]
[ 73 8 2 55 7 0]
A better way to deal with different input sizes is padding
[ 83 91 45 64 3 7]]
The maximum input size for training must be set before
All inputs small that maximum input size are padded with a special sign (e.g. zero)
Padding can be done before or after the real input

[Link].pad_sequences(sequences=data:list, maxlen=seq_length_max, padding='pre'‚ truncating='pre' , value=0)

This function transforms a list (of length num_samples) of sequences (lists of integers) into a
2D NumPy array of shape (num_samples, num_timesteps).
num_timesteps is either the maxlen argument if provided, or the length of the longest sequence
in the list
Sequences that are shorter than num_timesteps are padded with value until they are
num_timesteps long.
Sequences longer than num_timesteps are truncated so that they fit the desired length.

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 22
Masking to help the RNN deal with padded inputs

Masking is a way to tell sequence-processing layers that

certain timesteps in an input are missing, and thus should be
skipped when processing the data
The masking layer act as a Boolean filter that will not let
masked inputs (e.g. inputs padded with zeros) pass
The effect is, that the output of the rnn as well as the hidden
state are passed unchanged to the next sequence step

There are three ways to introduce input masks in Keras models:

Add a [Link] layer
([Link](Masking(mask_value=0))
Pass a mask argument manually when calling layers that
support this argument (e.g. RNN layers)
Configure a [Link] layer with
mask_zero=True.

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede [Link] Seite 23
Task: Prediction of last letter of a word using padding and masking

Adjust the previous model adding

padding and a masking layer after the
input layer to the RNN.

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 24
Solution: Prediction of last letter of a word using padding and masking
import re padded_sequences = pad_sequences(X_train, maxlen=seq_length_max, padding='pre')
import numpy as np
import tensorflow as tf X = [Link](padded_sequences)
from [Link] import pad_sequences y = [Link](y_train)
from sklearn.model_selection import train_test_split
from [Link] import Sequential X_one_hot = tf.one_hot(X, len(chars)+1)
from [Link] import SimpleRNN, Dense, Input, Masking
def set_zero_to_zeros(X_one_hot):
with open('data/long_text.txt', 'r') as file: X_one_hot_np = X_one_hot.numpy()
text = [Link]()
# Loop through and set the one-hot encoding of padding (0) to be all zeros
text_without_signs = [Link](r'[^a-zA-Z\s]', '', text).lower() for batch in range([Link][0]): # Loop through each sequence in the batch
for idx, value in enumerate(X[batch]):
chars = sorted(list(set(text_without_signs))) if value == 0: # Check if the value is padding (0)
char_to_index = {char: i+1 for i, char in enumerate(chars)} X_one_hot_np[batch, idx] = [Link](len(chars) + 1)
index_to_char = {i+1: char for i, char in enumerate(chars)}
return tf.convert_to_tensor(X_one_hot_np)
words = text_without_signs.split()
X_one_hot = set_zero_to_zeros(X_one_hot)
sequences = [] y_one_hot = tf.one_hot(y, len(chars)+1)
labels = []
seq_length_max = 0 model = Sequential()
[Link](Input((None, len(chars)+1)))
for word in words: # mask the padding in training
[Link]([char_to_index[letter] for letter in word][0:len(word)-1]) #[Link](Masking(mask_value=0))
[Link](char_to_index[word[-1]]) [Link](SimpleRNN(256, activation='tanh'))
if len(word) > seq_length_max: [Link](Dense( len(chars)+1, activation='softmax'))
seq_length_max = len(word)
[Link](optimizer='adam', loss='CategoricalCrossentropy', metrics=['accuracy'])
X_train, X_test, y_train, y_test = train_test_split(sequences, labels, test_size=0.2, [Link](X_one_hot, y_one_hot, epochs=200)
random_state=42)

# Add zero padding to sequences at the beginning

Accuracy with masking: 0.7698 Accuracy: 0.8179

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 25
How to pass more complex text as an input to an RNN?

We have already used One-Hot Encoding

One-hot encoding results in high-dimensional vectors, making it computationally expensive and
memory-intensive, especially with large vocabularies
It does not capture semantic relationships between words; each word is treated as an isolated
entity without considering its meaning or context
It is restricted to the vocabulary seen during training, making it unsuitable for handling out-of-
vocabulary words

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 26
Bag of Word (Bow)

Bag of Word (Bow) is a text representation technique

that represents a document as an unordered set of
words and their respective frequencies
It discards the word order and captures the frequency of
each word in the document, creating a vector
representation
BoW ignores the order of words in the document,
leading to a loss of sequential information and
context
less effective for tasks where word order is crucial, such
as in natural language understanding
BoW representations are often sparse
increased memory requirements and computational
inefficiency (especially when dealing with large
datasets)

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 27
Example: Bag of Words

from sklearn.feature_extraction.text import CountVectorizer

Vocabulary: {'the': 13, 'quick': 10, 'brown': 1, 'fox': 4,
'jumps': 7, 'over': 9, 'lazy': 8, 'dog': 3, 'never': 5, 'jump': 6,
# Sample sentences (documents)
corpus = [ 'quickly': 11, 'foxes': 2, 'are': 0, 'and': 12}
'The quick brown fox jumps over the lazy dog',
'Never jump over the lazy dog quickly',
'Brown foxes are quick and lazy',
]
Bag of Words Matrix:
# Initialize the CountVectorizer (BoW model)
[[1 1 0 1 1 0 0 1 1 1 1 0 0 2]
vectorizer = CountVectorizer() [1 0 0 1 0 1 1 0 1 1 0 1 0 2]
# Fit and transform the corpus into a bag-of-words model [0 1 1 0 0 0 0 0 1 0 1 0 1 0]]
X = vectorizer.fit_transform(corpus)

# Show the vocabulary (index mapping)

print("Vocabulary:", vectorizer.vocabulary_)
Feature names: ['and' 'are' 'brown' 'dog' 'fox' 'foxes' 'jump'
# Show the Bag of Words representation
'jumps' 'lazy' 'over' 'quick' 'quickly' 'the' 'never']
print("Bag of Words Matrix:\n", [Link]())

# Show the feature names (words in the vocabulary)

print("Feature names:", vectorizer.get_feature_names_out())

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 28
Word Embeddings

Word Embedding is an approach for representing words and documents

Word Embedding or Word Vector is a numeric vector input that represents a word
in a lower-dimensional space
It allows words with similar meanings to have a similar representation

Need for Word Embedding?

To reduce dimensionality
To use a word to predict the words with similar meaning
Inter-word semantics must be captured

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 29
import numpy as np
Example: Word embeddings import [Link] as plt
from [Link] import TSNE
from [Link] import KeyedVectors

# Load pre-trained GloVe or Word2Vec embeddings

word_vectors = KeyedVectors.load_word2vec_format('data/[Link]', binary=True)

# Sample words to visualize (choose words related to animals, fruits, and countries for clarity)
words = ['dog', 'cat', 'apple', 'banana', 'france', 'germany', 'lion', 'tiger', 'paris', 'berlin']

# Retrieve the word embeddings for the selected words

word_embeddings = [Link]([word_vectors[word] for word in words])
n_samples = len(word_embeddings)

# Set perplexity to a value less than n_samples (e.g., 5 or 10)

perplexity_value = min(30, n_samples // 3) # Adjust based on your dataset size

# Initialize t-SNE with adjusted perplexity

tsne = TSNE(n_components=2, perplexity=perplexity_value, random_state=42)

# Fit and transform the embeddings

word_embeddings_2d = tsne.fit_transform(word_embeddings)

# Plot the words in the 2D space

[Link](figsize=(8, 6))
[Link](word_embeddings_2d[:, 0], word_embeddings_2d[:, 1], color='blue')

# Annotate the points with the corresponding words

for i, word in enumerate(words):
[Link](word, xy=(word_embeddings_2d[i, 0], word_embeddings_2d[i, 1]), fontsize=12)

[Link]("2D Visualization of Word Embeddings")

[Link](True)
[Link]()

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 30
Word2Vec

Word2Vec is a approach based on artificial neural networks for generating word embeddings
Developed by a team at Google
Word2Vec aims to capture the semantic relationships between words by mapping them to high-dimensional
vectors
There are two neural embedding methods for Word2Vec:
Continuous Bag of Words (CBOW) and Skip-gram

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede [Link] Seite 31
Continuous Bag of Words (CBOW)

CBOW is a feedforward neural network with a

single hidden layer
The input layer represents the context
words
The output layer represents the target word at
the center of the window of context words
The hidden layer contains the learned
continuous vector representations (word
embeddings) of the input words
The dimensionality of the hidden layer
represents the size of the word embeddings

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 32
Skip-Gram Model

Skip-Gram Model also learns distributed representations of words in a continuous vector space
main objective of Skip-Gram is to predict context words (words surrounding a target word) given a target
word
This is the opposite of the Continuous Bag of Words (CBOW) model

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 33
Using the Embedding Layer

Embedding Layer can be integrated into the RNN after the input layer
The layer is trained with backpropagation together with the rest of the model and learns a word
embedding (representation) that fits to the task at hand
Pretrained-weights can be used to initialize the layer
Masking can be activated
[Link](
input_dim,
output_dim,
mask_zero=False,
weights=None,
)

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 34
Task: Prediction of last letter of a word using embeddings

Adjust the previous model replacing the

mask layer by an embedding. Get rid of
the one-hot-encoding of the input.

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 35
Task: Prediction of last letter of a word using
# one-to-many rnn embeddings
# Add zero padding to sequences at the beginning
import re padded_sequences = pad_sequences(X_train, maxlen=seq_length_max, padding='pre')
import numpy as np
import tensorflow as tf X = [Link](padded_sequences)
from [Link] import pad_sequences y = [Link](y_train)
from sklearn.model_selection import train_test_split
from [Link] import Sequential vocab_size = len(chars) + 1
from [Link] import SimpleRNN, Dense, Input, Embedding
y_one_hot = tf.one_hot(y, vocab_size)
with open('data/long_text.txt', 'r') as file:
text = [Link]() model = Sequential()
[Link](Input((None, )))
text_without_signs = [Link](r'[^a-zA-Z\s]', '', text).lower() [Link](Embedding(input_dim=vocab_size, output_dim=12, mask_zero=True))
[Link](SimpleRNN(256, activation='tanh'))
chars = sorted(list(set(text_without_signs))) [Link](Dense(vocab_size, activation='softmax'))
char_to_index = {char: i+1 for i, char in enumerate(chars)}
index_to_char = {i+1: char for i, char in enumerate(chars)} [Link](optimizer='adam', loss='CategoricalCrossentropy', metrics=['accuracy'])
[Link](X, y_one_hot, epochs=200)
words = text_without_signs.split()
# testing
sequences = []
labels = [] padded_sequences = pad_sequences(X_test, maxlen=seq_length_max, padding='pre')
seq_length_max = 0 X = [Link](padded_sequences)
y_pred_rnn = [Link](X)
for word in words: y_pred_rnn= [Link](y_pred_rnn, axis=1)
[Link]([char_to_index[letter] for letter in word][0:len(word)-1]) y = [Link](y_test)
[Link](char_to_index[word[-1]])
if len(word) > seq_length_max: accuracy = ([Link](y_pred_rnn == y))/len(y)
seq_length_max = len(word) print(f"Accuracy: {accuracy:.4f}")

X_train, X_test, y_train, y_test = train_test_split(sequences, labels, test_size=0.2,

random_state=42) Accuracy: 0.7629

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 36
Conclusion

RNNs are used for sequence data of variable input

They use a hidden state that is passed from time step to
time step
There are different architectures and use cases according
to the input and output dimensions
Padding and masking can be used to train with batches of
data with variable length
Embeddings are a good way to transform words to
numbers with respect to preservation of meanings

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 37
Homework
Create a RNN to predict the
length of a word. Use the
long_text.txt data.

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 38
Questions?
Prof. Dr.-Ing. Christian Schwede
Fachbereich Ingenieurwissenschaften und Mathematik
Campus Gütersloh
Studiengangsleiter Forschungsmaster Data Science
Mitglied des Vorstands Institute for Data Science Solutions (IDAS)

[Link]@[Link]

Advanced Machine Learning 1 - Without Solutions
No ratings yet
Advanced Machine Learning 1 - Without Solutions
32 pages
Module 4 Recurrent Neural Network
100% (1)
Module 4 Recurrent Neural Network
78 pages
IMP - Fundamentals of Deep Learning - Introduction To Recurrent Neural Networks
No ratings yet
IMP - Fundamentals of Deep Learning - Introduction To Recurrent Neural Networks
33 pages
Lab 9 RNN
No ratings yet
Lab 9 RNN
8 pages
Time Series RNN LSTM 1746197734
No ratings yet
Time Series RNN LSTM 1746197734
25 pages
REPORT
No ratings yet
REPORT
24 pages
Neural Networks for Time Series Analysis
No ratings yet
Neural Networks for Time Series Analysis
24 pages
Intro to Recurrent Neural Networks
No ratings yet
Intro to Recurrent Neural Networks
79 pages
Ad3501 DL Unit 3 Notes
No ratings yet
Ad3501 DL Unit 3 Notes
30 pages
CNN RNN LSTM Attention
No ratings yet
CNN RNN LSTM Attention
86 pages
Sequence Modeling Recurrent Neural Networks
No ratings yet
Sequence Modeling Recurrent Neural Networks
18 pages
AD3501 DL UNIT 3 Notes - Nil AD3501 DL UNIT 3 Notes - Nil
No ratings yet
AD3501 DL UNIT 3 Notes - Nil AD3501 DL UNIT 3 Notes - Nil
31 pages
DL Module 5
No ratings yet
DL Module 5
10 pages
Deep Learning Unit 4 by Syed Ateeq
No ratings yet
Deep Learning Unit 4 by Syed Ateeq
34 pages
Soft Computing 1
No ratings yet
Soft Computing 1
15 pages
Blue and White Simple Business Plan Presentation
No ratings yet
Blue and White Simple Business Plan Presentation
15 pages
DL Unit-4
No ratings yet
DL Unit-4
31 pages
RNNs and Their Types - 15 Slides (Easy Copy-Paste Format)
No ratings yet
RNNs and Their Types - 15 Slides (Easy Copy-Paste Format)
6 pages
Module 6
No ratings yet
Module 6
51 pages
5 Deep Learning RNNs
No ratings yet
5 Deep Learning RNNs
26 pages
Unit-Iv DL
No ratings yet
Unit-Iv DL
54 pages
Understanding Recurrent Neural Networks (RNN) - NLP - by Praveen Raj - Medium
No ratings yet
Understanding Recurrent Neural Networks (RNN) - NLP - by Praveen Raj - Medium
25 pages
LSTM Ucl
100% (1)
LSTM Ucl
35 pages
Deep Learning - AD3501 - Notes - Unit 3 - Recurrent Neural Networks
No ratings yet
Deep Learning - AD3501 - Notes - Unit 3 - Recurrent Neural Networks
29 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
28 pages
15.03.2024 Csa3007 A24+d23+d24
No ratings yet
15.03.2024 Csa3007 A24+d23+d24
8 pages
Survey of Prediction Using Recurrent Neural Network
No ratings yet
Survey of Prediction Using Recurrent Neural Network
3 pages
Deep & Reinforcement - Unit 4
No ratings yet
Deep & Reinforcement - Unit 4
17 pages
Introduction to Recurrent Neural Networks
No ratings yet
Introduction to Recurrent Neural Networks
16 pages
What Is A Recurrent Neural Network
No ratings yet
What Is A Recurrent Neural Network
36 pages
AE556 2024 Topic6 RNN
No ratings yet
AE556 2024 Topic6 RNN
19 pages
Lec 4 Recurrent Neural Network Long Short-Term Memory
No ratings yet
Lec 4 Recurrent Neural Network Long Short-Term Memory
32 pages
Article On Recurrent Neural Networks
No ratings yet
Article On Recurrent Neural Networks
3 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
Module - 3.1
No ratings yet
Module - 3.1
120 pages
Endsem Imp DL Unit 4
No ratings yet
Endsem Imp DL Unit 4
30 pages
RNN Introduction
No ratings yet
RNN Introduction
22 pages
Unit 4 NLP
No ratings yet
Unit 4 NLP
19 pages
The Math Behind Recurrent Neural Networks
No ratings yet
The Math Behind Recurrent Neural Networks
39 pages
Sequence Modeling with Neural Networks
No ratings yet
Sequence Modeling with Neural Networks
75 pages
1 Recurrent Neural Networks
No ratings yet
1 Recurrent Neural Networks
34 pages
Lec14 RNN3 8 Feb 18
No ratings yet
Lec14 RNN3 8 Feb 18
16 pages
RNN Overview: Types, Applications, and Code
No ratings yet
RNN Overview: Types, Applications, and Code
8 pages
Understanding Recurrent Neural Networks
100% (1)
Understanding Recurrent Neural Networks
34 pages
RNNs: A Guide for AI Enthusiasts
No ratings yet
RNNs: A Guide for AI Enthusiasts
83 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
8 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
6 pages
6b. Recurrent Neural Networks
No ratings yet
6b. Recurrent Neural Networks
38 pages
DL Co3 - PPT 1
No ratings yet
DL Co3 - PPT 1
22 pages
Unit IV
No ratings yet
Unit IV
31 pages
DL Exp-7 16010422230
No ratings yet
DL Exp-7 16010422230
12 pages
Final PDL - Unit IV
No ratings yet
Final PDL - Unit IV
51 pages
A Brief Overview of Recurrent Neural Networks (RNN)
No ratings yet
A Brief Overview of Recurrent Neural Networks (RNN)
8 pages
RNN Tutorial
No ratings yet
RNN Tutorial
41 pages
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
No ratings yet
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
16 pages
CH4 - AA1.1-Sequence Models
No ratings yet
CH4 - AA1.1-Sequence Models
26 pages
Unit 4
No ratings yet
Unit 4
34 pages
Learning From Data Lecture 8: Support Vector Machines (SVM) : Alaa Othman June 10, 2025
No ratings yet
Learning From Data Lecture 8: Support Vector Machines (SVM) : Alaa Othman June 10, 2025
70 pages
BDA 03 Architectures
No ratings yet
BDA 03 Architectures
91 pages
ESDIT 2024 - Reimagining Digital Well-Being (Report)
No ratings yet
ESDIT 2024 - Reimagining Digital Well-Being (Report)
34 pages
Kettner 2025 - Eight Notes On Power and Algorithms
No ratings yet
Kettner 2025 - Eight Notes On Power and Algorithms
24 pages
LFD Lecture1New
No ratings yet
LFD Lecture1New
93 pages
Exam Figures
No ratings yet
Exam Figures
20 pages
Multimedia Trends Across Sectors
No ratings yet
Multimedia Trends Across Sectors
39 pages
Invitation Letter - PEAC X Khan Academy Training
No ratings yet
Invitation Letter - PEAC X Khan Academy Training
2 pages
Celt P
No ratings yet
Celt P
2 pages
Ethical Culture in The Organization
100% (3)
Ethical Culture in The Organization
8 pages
Design Engineering Course Overview
No ratings yet
Design Engineering Course Overview
89 pages
CIS020-1-CIS093-1 - Assignment 2-Group - Work-Individual - Refective - Report - Form
No ratings yet
CIS020-1-CIS093-1 - Assignment 2-Group - Work-Individual - Refective - Report - Form
2 pages
Number Talk Strategies for Teachers
No ratings yet
Number Talk Strategies for Teachers
2 pages
Home Exercise Programs For Musculoskeletal and Sports Injuries
100% (12)
Home Exercise Programs For Musculoskeletal and Sports Injuries
265 pages
Lesson Plan Conditional Sentences Зачетный урок
No ratings yet
Lesson Plan Conditional Sentences Зачетный урок
7 pages
Elements of Design: Line Lesson Plan
100% (1)
Elements of Design: Line Lesson Plan
3 pages
Simmons-21st Century Learnerg
No ratings yet
Simmons-21st Century Learnerg
2 pages
ACES RBT 2nd Edition Study Guide UPDATED 2025
No ratings yet
ACES RBT 2nd Edition Study Guide UPDATED 2025
19 pages
Eiffel Scholarship CV
No ratings yet
Eiffel Scholarship CV
2 pages
Grade 6 English Day 2 Q2
No ratings yet
Grade 6 English Day 2 Q2
7 pages
EM 12c - General Porudct Support Assessment (v3.0)
100% (2)
EM 12c - General Porudct Support Assessment (v3.0)
51 pages
Historical Thinking Skills
No ratings yet
Historical Thinking Skills
9 pages
Java/J2EE Developer Profile
No ratings yet
Java/J2EE Developer Profile
6 pages
ISTP Relationship Insights and Compatibility
No ratings yet
ISTP Relationship Insights and Compatibility
4 pages
Bottom-Up Evaluation of S-Attributed Definitions
No ratings yet
Bottom-Up Evaluation of S-Attributed Definitions
53 pages
(A03) IG Physics v2
No ratings yet
(A03) IG Physics v2
13 pages
SE (CS403) Imp Questions May2025
No ratings yet
SE (CS403) Imp Questions May2025
4 pages
OLC Engineering 2012 2013
No ratings yet
OLC Engineering 2012 2013
218 pages
Chemistry 2-W3
No ratings yet
Chemistry 2-W3
6 pages
GRADE 11 MATHS Teacher Document
No ratings yet
GRADE 11 MATHS Teacher Document
195 pages
Grade 7 Mental Health Lessons
No ratings yet
Grade 7 Mental Health Lessons
12 pages
Class-III Maths, Sample Paper-2
No ratings yet
Class-III Maths, Sample Paper-2
5 pages
8020P-30 Minutes of Sales Training That Will Explode Your Business in 2022
No ratings yet
8020P-30 Minutes of Sales Training That Will Explode Your Business in 2022
2 pages
English Passive Voice Practice
100% (1)
English Passive Voice Practice
10 pages
327 Lesson Plan
No ratings yet
327 Lesson Plan
11 pages
Reflex and Voluntary Actions
No ratings yet
Reflex and Voluntary Actions
3 pages