Day 3 take up convolutional neural network

ICEBREAKER
Phạm Nguyễn Anh Thư
@thu.phamhcmut
Let’s get to know each other!

Record shop
_____ ___ ____
Thật bất ngờ
___ ____ ____
Em dạo này
Tình đắng như ly cà phê
____ _____ ____ _ __ ____
___ ___ ___ ___ ___ ______
Để Mị nói cho mà nghe
____ ____ ____ ___ __ ____ ____
Đâu cần một bài ca tình yêu

how-to-AI Series: Unlock Potential
Day 3: Take UP Convolutional
Neural Network
Nguyễn Luật Gia Khôi
@giakhoi.nguyenluat
Nguyễn Thế Bình
@binh.nguyen288

Outline
1. Break the ice
2. Convolutional Neural Network
3. Why using CNN instead of NN?
4. Demo code
5. Kaggle Challenge

1 -1 -1 -1 1
1 -1 1 -1 1
1 -1 -1 -1 1
1 1 1 -1 1
1 1 1 -1 1
1 1 1 -1 1
1 -1 -1 1 1
1 -1 -1 -1 1
1 -1 1 -1 1
1 -1 -1 -1 1
1 1 1 -1 1
1 1 1 -1 1
1 1 1 -1 1
1 -1 -1 1 1
● -1: black
● 1: white
Matrix representation

1 -1 -1 -1 1
1 -1 1 -1 1
1 -1 -1 -1 1
1 1 1 -1 1
1 1 1 -1 1
1 1 1 -1 1
1 -1 -1 1 1
1 -1 -1 -1 1
1 -1 1 -1 1
1 -1 -1 -1 1
1 1 1 -1 1
1 1 1 -1 1
1 1 1 -1 1
1 -1 -1 1 1
-1 -1 -1 1 1
-1 1 -1 1 1
-1 -1 -1 1 1
1 1 -1 1 1
1 1 -1 1 1
1 -1 1 1 1
-1 1 1 1 1
-1 -1 -1 1 1
-1 1 -1 1 1
-1 -1 -1 1 1
1 1 -1 1 1
1 1 -1 1 1
1 -1 1 1 1
-1 1 1 1 1
1 1 -1 1 1
1 -1 1 -1 1
1 -1 -1 1 1
1 1 -1 1 1
1 1 -1 1 1
1 1 -1 1 1
1 1 -1 1 1
1 1 -1 1 1
1 -1 1 -1 1
1 -1 -1 1 1
1 1 -1 1 1
1 1 -1 1 1
1 1 -1 1 1
1 1 -1 1 1
ANN can solve this variety in digits

1 -1 -1 -1 1
1 -1 1 -1 1
1 -1 -1 -1 1
1 1 1 -1 1
1 1 1 -1 1
1 1 1 -1 1
1 -1 -1 1 1
1
-1
-1
-1
1
1
-1
...
1
-1
1
1
-1
-1
1
1
x2
x33
x32
x3
x4
x34
x1
x35
...
h2
x33
x32
h3
h4
x34
h1
x35
...
h2
h8
h3
h9
h1
h10
...
1
9
8
2
3
0
0.01
0.92
0.003
0.008
0.015
0.02

WAIT A SECOND, how about this?

Image size = 1920 x 1080 x 3
Input layer #neurons = 1920 x 1080 x 3 = 6 million
Hidden layer #neurons = Say you keep it = 4 million
Weights between input and ﬁrst hidden layer =
6 * 4 = 24 trillion parameters

Disadvantages of ANN in image classiﬁcation
- Computationally expensive.
- Treat local pixels same as pixels far apart => no locality concern
- Sensitive to location of the object in the image.

How does human recognize
images?

Eyes
Nose
Ears
Hand
Leg
Head
Body
Koala

Nine
Curves
Long curve
Circular pattern

How can we make computers
recognize these patterns?

1 -1 -1 -1 1
1 -1 1 -1 1
1 -1 -1 -1 1
1 1 1 -1 1
1 1 1 -1 1
1 1 -1 1 1
1 -1 1 1 1
1 -1 -1 -1 1
1 -1 1 -1 1
1 -1 -1 -1 1
1 1 1 -1 1
1 1 1 -1 1
1 1 -1 1 1
1 -1 1 1 1
1 -1 -1 -1 1
1 -1 1 -1 1
1 -1 -1 -1 1
1 1 1 -1 1
1 1 1 -1 1
1 1 -1 1 1
1 -1 1 1 1
1 -1 -1 -1 1
1 -1 1 -1 1
1 -1 -1 -1 1
1 1 1 -1 1
1 1 1 -1 1
1 1 -1 1 1
1 -1 1 1 1
1 -1 -1 -1 1
1 -1 1 -1 1
1 -1 -1 -1 1
1 1 1 -1 1
1 1 1 -1 1
1 1 -1 1 1
1 -1 1 1 1
Circular pattern
filter
Diagonal line
filter
Diagonal line
filter

1 -1 -1 -1 1
1 -1 1 -1 1
1 -1 -1 -1 1
1 1 1 -1 1
1 1 1 -1 1
1 1 -1 1 1
1 -1 1 1 1
-1 -1 -1
-1 1 -1
-1 -1 -1
* =
-1 9 -1
-5 1 -3
-3 3 -3
-5 -1 -5
-3 -5 -3
Circular pattern
ﬁlter
-1 x -1 + -1 x -1 + … + -1 x -1 + 1 x 1 = 9

x
-1 -1 -1
-1 1 -1
-1 -1 -1
*
Circular pattern
filter
=
x
-1 -1 -1
-1 1 -1
-1 -1 -1
*
Circular pattern
filter
=
x
x
-1 -1 -1
-1 1 -1
-1 -1 -1
*
Circular pattern
filter
=

x x
*
Eye
filter
= *
Eye
filter
=
*
Eye
filter
=
x x
x x

x x
*
Eye filter
=
x
*
Nose filter
=
x
x
*
Ear filter
=
*
Head filter
x
=
x x

x
x
X
...
X
...
h2
x8
h3
x9
h1
x10
...
0.01 (Giraﬀe)
0.80 (Koala)
0.003 (Horse)
0.008 (Buﬀalo)
0.15 (Bear)
0.02 (Squirrel)

Flatten
Feature extraction Classiﬁcation

1 -1 -1 -1 1
1 -1 1 -1 1
1 -1 -1 -1 1
1 1 1 -1 1
1 1 1 -1 1
1 1 -1 1 1
1 -1 1 1 1
-1 -1 -1
-1 1 -1
-1 -1 -1
* =
-1 9 -1
-5 1 -3
-3 3 -3
-5 -1 -5
-3 -5 -3
Circular pattern
ﬁlter
ReLU
0 9 0
0 1 0
0 3 0
0 0 0
0 0 0

Stride & Padding
stride = 2,
padding = “valid”
stride = 1,
padding = “same”
input
output
input output

0 9 0 0
0 1 0 0
0 3 0 0
0 0 0 0
0 0 0 0
Max pooling
pool_size = (2,2)
stride = 2
9 0
3 0
output_shape = math.ﬂoor((input_shape - pool_size) / strides) + 1

Beneﬁts of max pooling
- Reduce feature map size => Reduce computational cost.
- Fewer parameters needed => Reduce overﬁtting.
- Make model tolerant towards variations and distortions.

Complete pipeline
Conv + ReLU Pooling Conv + ReLU Pooling
w1 x h1x c1 w2 x h2 x c1 w2 x h2 x c2 w3 x h3 x c2
Flatten

- Can capture spatial features => Connection sparsity => Reduce overfitting
x x
*
Eye filter
=
x
*
Nose filter
=
x
x
*
Ear filter
=

0 0 9 0
0 1 0 0
0 3 0 0
0 0 0 0
0 0 0 0
- Translation invariant.
1 9
3 0
0 0 0 0
0 0 9 0
0 3 0 0
0 0 0 0
0 0 0 0
0 9
3 0
Conv
Conv
Pooling
Pooling

- Location invariant feature detection.

● Challenge: Build a model that can automatically classiﬁes images to their
correct labels.
● Registration form is now open!
● Timeline: 12/12/2021 - 30/12/2021
● Rewards:
○ First prize: 1.000.000 VND
○ Second prize: 700.000 VND
○ Third prize: 500.000 VND
○ 2 AI potential prizes: 400.000 VND
GDSC AI CHALLENGE

Feedback form
https://tinyurl.com/37kk6dkw

Day 3 take up convolutional neural network

More Related Content

Similar to Day 3 take up convolutional neural network

More from HuyPhmNht2

Recently uploaded

Day 3 take up convolutional neural network