Max pooling is a downsampling technique that slides a window (e.g., 2x2) over the input feature map and extracts the maximum value from each window.
This process achieves two key goals:
- Dimensionality Reduction: Reduces computational complexity by shrinking the feature map size.
- Translation Invariance: Makes the model robust to small spatial shifts in input features.
In TensorFlow, tf.keras.layers.MaxPooling2D implements max pooling operation. Key parameters include:
- pool_size: Size of the pooling window (e.g., (2, 2)).
- strides: Step size of the window (defaults to pool_size if not specified).
- padding: 'valid' (no padding) or 'same' (pad to retain input size).
Implementation with TensorFlow
Here’s how to implement max pooling using tf.keras.layers.MaxPooling2D:
import tensorflow as tf
import numpy as np
#4x4 grayscale image
input_image = np.array([[
[[1], [2], [3], [4]],
[[5], [6], [7], [8]],
[[9], [10], [11], [12]],
[[13], [14], [15], [16]]
]], dtype=np.float32)
# MaxPooling2D layer
model = tf.keras.Sequential([
tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='valid')
])
output = model.predict(input_image)
print("Input Shape:", input_image.shape)
print("Output Shape:", output.shape)
print("Pooled Output:\n", output[0, :, :, 0])
Output:
Input Shape: (1, 4, 4, 1)
Output Shape: (1, 2, 2, 1)
Pooled Output:
[[ 6. 8.]
[14. 16.]]
Max pooling layer is used to reduce overfitting, improve computational efficiency and enhances translation invariance.