0% found this document useful (0 votes)

16 views45 pages

CI DeepLearningFundamentals

The document provides an overview of deep learning, focusing on the tasks Deep Neural Networks can perform, such as classification, prediction, and recognition. It discusses the importance of vectorization for computational efficiency in training models, as well as various activation functions and their derivatives used in neural networks. Additionally, it highlights the distinction between parameters and hyperparameters in model development and emphasizes the iterative nature of creating deep learning solutions.

Uploaded by

tadeuszlabuz78

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views45 pages

CI DeepLearningFundamentals

Uploaded by

tadeuszlabuz78

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

DEEP LEARNING

Introduction to Deep Learning and

Deep Network Learning Issues

Adrian Horzyk AGH University of

Science and Technology
[email protected] Krakow, Poland
Tasks for Deep Neural Networks

We use Deep Neural Networks for specific group of issues:

• Classification (of images, signals etc.)
• Prediction (e.g. price, temperature, size, distance)
• Recognition (of speech, objects etc.)
• Translation (from one language to another)
• Autonomous behaviors (driving by the autonomous cars, flying of the drones…)
• Clustering of objects (grouping them according to their similarity)
• etc.
using supervised or unsupervised training of such networks.

We have to deal with structures and unstructured data:

Structured data are usually well-described by the attributes and
collected in data tables (relational databases), while unstructured data
are images, (audio, speech) signals, (sequences of) texts (corpora).
Binary Classification
In binary classification, the result is describe by two values:
• 1 – when the object of the class was recognized (e.g. is a cat),
• 0 – when the object was not recognized as belonging to the given class (e.g. is not a cat).

Example:

Is a cat (1)

Is not a cat (0)

Image Representation
Training Examples
Logistic Regression
Computing Sigmoid Function

We use numpy vectorization to compute sigmoid and sigmoid_derivative

for any input vector z:
Logistic Regression Cost Function
Loss Functions
The loss functions are used to evaluate the performance of the models. The bigger your loss
is, the more different your predictions (𝑦̂) are from the true values (𝑦). In deep learning,
we use optimization algorithms like Gradient Descent to train models and minimize the
cost.
Gradient Descent
We have to minimize the cost function J for a given training data set
to achieve as correct prediction for input data as possible:

Here, w is 1D, but its dimension is bigger in real.

Calculus of the Gradient Descent

The main idea of the Gradient Descent algorithm is to go

in the reverse direction to the gradient (the descent slope):
Derivative Rules

The Gradient Descent algorithm

uses partial derivatives calculated
after the following rules:
Gradient Descent for Logistic Regression

We use a computational graph for the presentation of forward and backward operations for
a single neuron implementing logistic regression for the weighted sum of inputs x:
Gradient Descent for Training Dataset

The final logistic regression gradient descent

algorithm will repeatedly go through
all training examples updating parameters
until the cost function is not small enough:
To speed up computation we should use
vectorization instead of for-loops:
Efficiency of Vectorization
When dealing with big data collections and big data vectors, we definitely should
use vectorization (that performs SIMD operations) to proceed computations
faster:

Compare time efficacies of these two approaches!

Conclusion:
Whenever possible, avoid explicit for-loops and use vectorization: np.dot(w.T,x), np.dot(W,x), np.multiply(x1,x2),
np.outer(x1,x2), np.log(v), np.exp(v), np.abs(v), np.zeros(v), np.sum(v), np.max(v), np.min(v) etc.
Vectorization uses parallel CPU or GPU operations (called SIMD – single instruction multiple data)
proceed on parallelly working cores.
Vectorization of the Logistic Regression
Let’s vectorize the previous algorithm:

broadcasted
Broadcasting in Python
Broadcasting in numpy
Broadcasting is very useful for performing mathematical operations between
arrays of different shapes. The example below show the normalization of the data.
Normalization for Efficiency
We use normalization (np.linalg.norm) to achieve a better performance because
gradient descent converges faster after normalization:
Lists vs. Vectors and Matrices

Be careful when creating vectors

because lists have no shape and
are declared similarly.
Column and Row Vectors

Be careful when creating vectors

because lists have no shape and
are declared similarly.
Reshaping Image Matrices

When working with images in deep learning, we typically reshape them into vector
representation using np.reshape():
Shape and Reshape Vectors and Matrices
We commonly use the numpy functions np.shape() and np.reshape() in deep learning:
• X.shape is used to get the shape (dimension) of a vector or a matrix X.
• X.reshape(...) is used to reshape a vector or a matrix X into some other dimension(s).
Simple Neuron

We defined the
fundamental
elements and
operations on a
single neuron.
Simple Neural Network

Having defined
the fundamental
elements and
operations,
we can create
a simple neural
network.
Stacking Neurons Vertically and Vectorizing

Stacking values and creating

vectors, and stacking vectors
and creating matrices is very
important from the efficiency
of computation point of view!
Stacking Examples Horizontally and Vectorizing

Stacking vectors of training

examples horizontally creating
matrices is very important
from the efficiency of
computation point of view!

After Vectorizing
Vectorization of Dot Product
In deep learning, you deal with very large datasets. Non-computationally-optimal functions become
a huge bottleneck in your algorithms and can result in models that take ages to run. To make sure that
your code is computationally efficient, you should use vectorization. Compare the following codes:
Vectorization of Outer Product
In deep learning, you deal with very large datasets. Non-computationally-optimal functions become
a huge bottleneck in your algorithms and can result in models that take ages to run. To make sure that
your code is computationally efficient, you should use vectorization. Compare the following codes:
Vectorization of Element-Wise Multiplication

In deep learning, you deal with very large datasets. Non-computationally-optimal functions become
a huge bottleneck in your algorithms and can result in models that take ages to run. To make sure that
your code is computationally efficient, you should use vectorization. Compare the following codes:
Vectorization of General Dot Product
In deep learning, you deal with very large datasets. Non-computationally-optimal functions become
a huge bottleneck in your algorithms and can result in models that take ages to run. To make sure that
your code is computationally efficient, you should use vectorization. Compare the following codes:
Activation Functions of Neurons
We use different activation functions for neurons in different layers:
COMPARISON OF ACTIVATION FUNCTIONS
• Sigmoid function is used
in the output layer:
𝟏
𝒈 𝒛 =𝝈 𝒛 =
𝟏+𝒆−𝒛
• Tangent hyperbolic function
is used in hidden layers:
𝒆𝒛 −𝒆−𝒛
𝒈 𝒛 = 𝒕𝒂𝒏𝒉 𝒛 =
𝒆𝒛 +𝒆−𝒛
• Rectified linear unit (ReLu)
is used in hidden layers (FAST!):
𝒈 𝒛 = 𝑹𝒆𝑳𝒖 𝒛 = 𝒎𝒂𝒙 𝟎, 𝒛
• Smooth ReLu (SoftPlus)
is used in hidden layers:
𝒈 𝒛 = 𝑺𝒐𝒇𝒕𝑷𝒍𝒖𝒔 𝒛 = 𝒍𝒐𝒈 𝟏 + 𝒆𝒛
• Leaky ReLu is used in hidden layers :
𝒛 𝒊𝒇 𝒛 > 𝟎
• 𝒈 𝒛 = 𝑳𝒆𝒂𝒌𝒚𝑹𝒆𝑳𝒖 𝒛 = ቊ
𝟎. 𝟎𝟏𝒛 𝒊𝒇 𝒛 ≤ 𝟎
Activation Functions
Derivatives of Activation Functions

Derivatives are necessary for

the use of gradient descent:

• Sigmoid function:
𝟏 𝒅𝒈 𝒛
𝒈 𝒛 =𝝈 𝒛 = 𝒈′ 𝒛 = =𝒈 𝒛 ∙ 𝟏−𝒈 𝒛 =𝒂∙ 𝟏−𝒂
𝟏+𝒆−𝒛 𝒅𝒛

• Tangent hyperbolic function:

𝒆𝒛 −𝒆−𝒛 𝒅𝒈 𝒛 𝟐
𝒈 𝒛 = 𝒕𝒂𝒏𝒉 𝒛 = 𝒈′ 𝒛 = =𝟏− 𝒈 𝒛 = 𝟏 − 𝒂𝟐
𝒆𝒛 +𝒆−𝒛 𝒅𝒛

• Rectified linear unit (ReLu):

𝒅𝒈 𝒛 𝟏 𝒊𝒇 𝒛 > 𝟎
𝒈 𝒛 = 𝑹𝒆𝑳𝒖 𝒛 = 𝒎𝒂𝒙 𝟎, 𝒛 𝒈′ 𝒛 = =ቊ
𝒅𝒛 𝟎 𝒊𝒇 𝒛 ≤ 𝟎
• Smooth ReLu (SoftPlus):
𝒅𝒈 𝒛 𝒆𝒛 𝟏
𝒈 𝒛 = 𝑺𝒐𝒇𝒕𝑷𝒍𝒖𝒔 𝒛 = 𝒍𝒏 𝟏 + 𝒆𝒛 𝒈′ 𝒛 = = =
𝒅𝒛 𝟏+𝒆𝒛 𝟏+𝒆−𝒛

• Leaky ReLu:
𝒛 𝒊𝒇 𝒛 > 𝟎 𝒅𝒈 𝒛 𝟏 𝒊𝒇 𝒛 > 𝟎
𝒈 𝒛 = 𝑳𝒆𝒂𝒌𝒚𝑹𝒆𝑳𝒖 𝒛 = ቊ 𝒈′ 𝒛 = =ቊ
𝟎. 𝟎𝟏𝒛 𝒊𝒇 𝒛 ≤ 𝟎 𝒅𝒛 𝟎. 𝟎𝟏 𝒊𝒇 𝒛 ≤ 𝟎
Derivatives of Activation Functions
Neural Network Gradients
Random Initialization of Weights

Parameters must be initialized by small random numbers:

• W cannot be initialized to 0:
• 𝑾[𝒍] = 𝒏𝒑. 𝒓𝒂𝒏𝒅𝒐𝒎. 𝒓𝒂𝒏𝒅𝒏 𝒏[𝒍] , 𝒏[𝒍−𝟏] ∗ 𝟎. 𝟎𝟏
• Small random initial weights values of the weights allow for faster training
because the activation functions of neurons stimulated by values a little bit
greater than 0 usually have the biggest slopes, so each update of weights results
in big changes of output values and allows the network to move towards the
solution faster.

• b can be initialized to 0:
• 𝒃[𝒍] = 𝒏𝒑. 𝒛𝒆𝒓𝒐 𝒏[𝒍] , 𝟏
Going to Deeper NN Architectures

Deep neural
network
architecture means
the use of many
hidden layers
between input and
output layers.
Dimensions of Stacked Matrices
Building Blocks of Deep Neural Networks
Stacking Building Blocks Subsequently
Parameters and Hyperparameters

We should distinguish between parameters and hyperparameters:

• Parameters of the model are established during the training process, e.g.:
• 𝑾[𝒍] , 𝒃[𝒍] .
• Hyperparameters control parameters and are established by the developer of
the model, e.g.:
• 𝜶 – learning rate,
• 𝑳 – number of hidden layers,
• 𝒏[𝒍] - number of neurons in layers,
• 𝒈[𝒍] - choice of activation functions for layers,
• number of iterations over training data,
• momentum,
• minibatch size,
• regularization parameters,
• optimization parameters,
• dropout parameters, …
Iterative Development of DL Solutions
Deep Learning solutions are usually developed in an iterative
and empirical process that composes of three main elements:
• Idea – when we suppose that a selected model, training method, and some
hyperparameters let us to solve the problem.
• Code – when we try to code and apply the idea in a real code.
• Experiment – prove our suppositions and assumptions or not, and allow to
update or change the idea until the experiments return satisfactory results.
Let’s start with powerful computations!
✓ Questions?
✓ Remarks?
✓ Suggestions?
✓ Wishes?
Bibliography and Literature
1. Nikola K. Kasabov, Time-Space, Spiking Neural Networks and Brain-Inspired Artificial
Intelligence, In Springer Series on Bio- and Neurosystems, Vol 7., Springer, 2019.
2. Ian Goodfellow, Yoshua Bengio, Aaron Courville, Deep Learning, MIT Press, 2016, ISBN
978-1-59327-741-3 or PWN 2018.
3. Holk Cruse, Neural Networks as Cybernetic Systems, 2nd and revised edition
4. R. Rojas, Neural Networks, Springer-Verlag, Berlin, 1996.
Adrian Horzyk
5. Convolutional Neural Network (Stanford) [email protected]
6. Visualizing and Understanding Convolutional Networks, Zeiler, Fergus, ECCV 2014 Google: Horzyk
7. IBM: https://www.ibm.com/developerworks/library/ba-data-becomes-knowledge-
1/index.html
8. NVIDIA: https://developer.nvidia.com/discover/convolutional-neural-network
9. JUPYTER: https://jupyter.org/

University of Science
and Technology
in Krakow, Poland

Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
DeepLearning Introduction
No ratings yet
DeepLearning Introduction
14 pages
DL Unit 1
No ratings yet
DL Unit 1
9 pages
Softmax vs Sigmoid in Neural Networks
No ratings yet
Softmax vs Sigmoid in Neural Networks
15 pages
Deep Learning
100% (4)
Deep Learning
100 pages
Understanding Deep Learning Concepts
No ratings yet
Understanding Deep Learning Concepts
78 pages
Week 1 - Artificial Neural Networks - Part I - Justin
No ratings yet
Week 1 - Artificial Neural Networks - Part I - Justin
56 pages
CBOW vs Skip-Gram in Word2Vec
No ratings yet
CBOW vs Skip-Gram in Word2Vec
170 pages
DL Notes
No ratings yet
DL Notes
652 pages
Deep Learning Andrew NG
100% (4)
Deep Learning Andrew NG
173 pages
Neural Networks Optional
No ratings yet
Neural Networks Optional
96 pages
Optimization of Deep Networks
No ratings yet
Optimization of Deep Networks
84 pages
Python Basics Nympy
No ratings yet
Python Basics Nympy
5 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Neural Network Training
No ratings yet
Neural Network Training
73 pages
Unit-1 and 2 and 3
No ratings yet
Unit-1 and 2 and 3
212 pages
Deep Learning Guide
No ratings yet
Deep Learning Guide
3 pages
009 Neural - Networks Complete
No ratings yet
009 Neural - Networks Complete
61 pages
Unit 3
No ratings yet
Unit 3
110 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
48 pages
Introduction To Deep Learning With IBM PDF
No ratings yet
Introduction To Deep Learning With IBM PDF
15 pages
Deep Learning 1754476648
No ratings yet
Deep Learning 1754476648
100 pages
Neural Networks for Beginners
No ratings yet
Neural Networks for Beginners
79 pages
AI Teacher Training - Machine Learning Curriculum
No ratings yet
AI Teacher Training - Machine Learning Curriculum
34 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Deep Learning Fundamentals and Techniques
No ratings yet
Deep Learning Fundamentals and Techniques
883 pages
Lecture 8 - Logistic Regression
No ratings yet
Lecture 8 - Logistic Regression
58 pages
III-II CSM (Ar 20) DL Unit - 1
No ratings yet
III-II CSM (Ar 20) DL Unit - 1
24 pages
MachineLearningSlides PartOne
No ratings yet
MachineLearningSlides PartOne
252 pages
Lec 2
No ratings yet
Lec 2
43 pages
Machine Learning in Engineering Applications
No ratings yet
Machine Learning in Engineering Applications
13 pages
Unit I
No ratings yet
Unit I
90 pages
Deep Learning for Beginners
No ratings yet
Deep Learning for Beginners
42 pages
PyTorch for Deep Learning Students
No ratings yet
PyTorch for Deep Learning Students
7 pages
03-Lecture Notes-Mid
No ratings yet
03-Lecture Notes-Mid
23 pages
Deep Neural Network
No ratings yet
Deep Neural Network
60 pages
cs188 Fa24 Lec24
No ratings yet
cs188 Fa24 Lec24
46 pages
Deep Learning with Keras Basics
No ratings yet
Deep Learning with Keras Basics
58 pages
LLM For Maths People
No ratings yet
LLM For Maths People
53 pages
Shallow Networks Versus Deep Networks
No ratings yet
Shallow Networks Versus Deep Networks
6 pages
Deep Learning for CS Students
No ratings yet
Deep Learning for CS Students
21 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Ch03-Deep Learning Network
No ratings yet
Ch03-Deep Learning Network
36 pages
Unit Ii DNN
No ratings yet
Unit Ii DNN
24 pages
FDL Module1
No ratings yet
FDL Module1
102 pages
WINSEM2024-25 CSE4006 ETH AP2024254000689 2025-01-03 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000689 2025-01-03 Reference-Material-I
39 pages
Deep Learning
No ratings yet
Deep Learning
43 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
Dense Neural Nets
No ratings yet
Dense Neural Nets
68 pages
The Deep Learning Revolution: Introductory Overview Lecture
No ratings yet
The Deep Learning Revolution: Introductory Overview Lecture
35 pages
Drzewa Decyzyjne
No ratings yet
Drzewa Decyzyjne
15 pages
Effectiveness of Symbolic Classification Trees vs. Noisy Variables
No ratings yet
Effectiveness of Symbolic Classification Trees vs. Noisy Variables
7 pages
Ijerph 1379431 Peer Review v1
No ratings yet
Ijerph 1379431 Peer Review v1
12 pages
CI JupyterPython
No ratings yet
CI JupyterPython
12 pages
Chapter 3 An Illustrative Example of Case 1 Best-Worst Scaling - Non-Market Valuation With R
No ratings yet
Chapter 3 An Illustrative Example of Case 1 Best-Worst Scaling - Non-Market Valuation With R
41 pages
Visualization of Linear Ordering Results For Metri
No ratings yet
Visualization of Linear Ordering Results For Metri
13 pages
Sustainability 1477699 Peer Review v1
No ratings yet
Sustainability 1477699 Peer Review v1
25 pages
Chapter 5 An Illustrative Example of Case 3 Best-Worst Scaling - Non-Market Valuation With R
No ratings yet
Chapter 5 An Illustrative Example of Case 3 Best-Worst Scaling - Non-Market Valuation With R
22 pages
Class 3 Count Models 1.0
No ratings yet
Class 3 Count Models 1.0
39 pages
HIPYR
No ratings yet
HIPYR
30 pages
1 s2.0 S0140197107000802 Main
No ratings yet
1 s2.0 S0140197107000802 Main
19 pages
Hagenaars 2002 B
No ratings yet
Hagenaars 2002 B
22 pages
Distribution and Symmetric Distribution Regression
No ratings yet
Distribution and Symmetric Distribution Regression
53 pages
Datanormalization Details
No ratings yet
Datanormalization Details
2 pages
1 s2.0 S0001879120300701 Main
No ratings yet
1 s2.0 S0001879120300701 Main
21 pages
Clustergen Details
No ratings yet
Clustergen Details
2 pages
Modelling The Adsorption of Iron and Manganese by
No ratings yet
Modelling The Adsorption of Iron and Manganese by
20 pages
Environment of HRM
No ratings yet
Environment of HRM
24 pages
Computing Assignment Guidelines
No ratings yet
Computing Assignment Guidelines
5 pages
MATLAB-Based SAR Image Processing
No ratings yet
MATLAB-Based SAR Image Processing
11 pages
Tutorial Interbase
No ratings yet
Tutorial Interbase
78 pages
0 BTS Installation Tests: Reliability Test
No ratings yet
0 BTS Installation Tests: Reliability Test
3 pages
Isu Abele 2021 Recalled Questions
No ratings yet
Isu Abele 2021 Recalled Questions
42 pages
HALF Yearly Question Paper 11
No ratings yet
HALF Yearly Question Paper 11
5 pages
Pile Foundation Test in SNI 8460-2017
No ratings yet
Pile Foundation Test in SNI 8460-2017
2 pages
Oil Recommendations
No ratings yet
Oil Recommendations
4 pages
Depth Map and 3D Imaging Applications Algorithms and Technologies 1st Edition Aamir Saeed Malik Latest PDF 2025
100% (2)
Depth Map and 3D Imaging Applications Algorithms and Technologies 1st Edition Aamir Saeed Malik Latest PDF 2025
139 pages
Gear Ratios
No ratings yet
Gear Ratios
2 pages
Unit IC Examiners' Report Jan 2015
100% (4)
Unit IC Examiners' Report Jan 2015
11 pages
Company Profile PT CRM
0% (1)
Company Profile PT CRM
13 pages
4701-0056-01A SSI-8000 BASIC Operator's Manual
No ratings yet
4701-0056-01A SSI-8000 BASIC Operator's Manual
183 pages
Python Programming Question Bank
No ratings yet
Python Programming Question Bank
6 pages
Derivatives
No ratings yet
Derivatives
2 pages
ST1 Science-5 Q3
No ratings yet
ST1 Science-5 Q3
3 pages
Paper Chromatography Setup Guide
100% (1)
Paper Chromatography Setup Guide
2 pages
Filipino - Thought Leonardo - Mercado
No ratings yet
Filipino - Thought Leonardo - Mercado
67 pages
Replacing The Heating Element: No.A1321 (900M - 900L - 907 - 908 - 913 - 914 - 951 - 952)
No ratings yet
Replacing The Heating Element: No.A1321 (900M - 900L - 907 - 908 - 913 - 914 - 951 - 952)
1 page
Publish To The World: I Turned The Corner
No ratings yet
Publish To The World: I Turned The Corner
3 pages
Technology Transfer
No ratings yet
Technology Transfer
3 pages
International M1 2019
100% (3)
International M1 2019
181 pages
S4 Coud
No ratings yet
S4 Coud
7 pages
01192015114905IMYB - 2013 - Vol III - Kaolin - Ballclay - Other Clays and Shale 2013 PDF
No ratings yet
01192015114905IMYB - 2013 - Vol III - Kaolin - Ballclay - Other Clays and Shale 2013 PDF
24 pages
Impact Test On Steel Sample
100% (2)
Impact Test On Steel Sample
4 pages
Ebara CB Horizontal Split Pumps Guide
No ratings yet
Ebara CB Horizontal Split Pumps Guide
53 pages
WarWhat Is It Good For
No ratings yet
WarWhat Is It Good For
5 pages
SAT Essay Writing Guide
No ratings yet
SAT Essay Writing Guide
23 pages

CI DeepLearningFundamentals

Uploaded by

CI DeepLearningFundamentals

Uploaded by

DEEP LEARNING

Introduction to Deep Learning and

Adrian Horzyk AGH University of

We use Deep Neural Networks for specific group of issues:

We have to deal with structures and unstructured data:

Is not a cat (0)

We use numpy vectorization to compute sigmoid and sigmoid_derivative

Here, w is 1D, but its dimension is bigger in real.

The main idea of the Gradient Descent algorithm is to go

The Gradient Descent algorithm

The final logistic regression gradient descent

Compare time efficacies of these two approaches!

Be careful when creating vectors

Be careful when creating vectors

Stacking values and creating

Stacking vectors of training

Derivatives are necessary for

• Tangent hyperbolic function:

• Rectified linear unit (ReLu):

Parameters must be initialized by small random numbers:

We should distinguish between parameters and hyperparameters:

You might also like