Hyperparameter Tuning
Hyperparameter Tuning
TUNING
PRESENTED BY -SRUTHY P L
ROLL NO -11
M.TECH VLSI & ES
MODEL ENGINEERING COLLEGE
THRIKKAKARA
Hyperparameters vs. Parameters
• Parameters: Internal variables learned by the model (e.g., weights,
biases).
• Hyperparameters: External configurations that control training (e.g.,
learning rate, dropout rate).
• Example: In a Neural Network,
• - Parameters: Weights, Biases.
• - Hyperparameters: Number of layers, Activation function, Learning
rate.
HYPERPARAMETERS
• Hyperparameters are settings that control the learning process of a
machine learning model.
• Unlike parameters (e.g., weights in neural networks),
hyperparameters are not learned from data.
• They are set before training starts and influence model performance.
• Examples: Learning rate, batch size, number of hidden layers, number
of trees in a random forest.
Why is Hyperparameter Tuning
Important?
• Poor hyperparameters can lead to underfitting or overfitting.
• The right hyperparameters improve model accuracy and efficiency.
• Helps in optimizing training time and computational resources.
• Essential for deep learning models where training costs are high.
Common Hyperparameters in
Machine Learning & Deep Learning
• Learning Rate (α): Controls how much to adjust weights in each step.
• Batch Size: Number of samples per update.
• Number of Epochs: Number of complete passes through the dataset.
• Number of Layers: Defines depth of deep learning models.
• Activation Functions: ReLU, Sigmoid, Tanh, etc.
• Dropout Rate: Prevents overfitting by randomly dropping
connections.
• Regularization Parameters: L1, L2 norms to control model complexity.
Learning Rate
• The learning rate is a hyperparameter that determines the step size at
which the network updates its parameters during training.
• A large learning rate can lead to rapid convergence but may result in
unstable and oscillating training.
• A small learning rate can ensure stable and smooth training but may
result in slower convergence.
• Therefore, it is important to experiment with different learning rates
and choose the one that gives the best trade-off between training speed
and stability.
Number of Layers
• The number of layers in a CNN is a critical
hyperparameter that determines the depth of the
network.
• A deeper network can learn more complex features and
patterns from the data, but it is also more prone to
overfitting.
• Therefore, it is important to strike a balance between the
number of layers and the complexity of the problem.
• A good starting point is to use a small number of layers
and gradually increase their depth until the desired
performance is achieved.
Filter Size
• The filter size is another important hyperparameter that determines
the receptive field of each convolutional layer.
• A larger filter size can capture more information from the input
image, but it also increases the number of parameters in the network.
• A smaller filter size can reduce the number of parameters, but it may
not be able to capture all the relevant features in the image.
• Therefore, it is important to experiment with different filter sizes
and choose the one that gives the best performance.
We can start with filter size 3x3
Stride
• The stride is a hyperparameter that determines the number of pixels by
which the filter moves across the input image.
• A larger stride can reduce the size of the output feature maps, but it can
also lead to information loss.
• A smaller stride can preserve more information, but it also increases the
computation time and memory requirements.
• Therefore, it is important to choose an appropriate stride that balances the
trade-off between information loss and computational efficiency.
• Default stride in CNN is 1
Stride
Padding
• Padding is a technique used to preserve the spatial dimensions of the
input image while applying convolutional layers.
• It involves adding zeros around the border of the input image to create
a padded image that can be convolved with the filter.
• Padding can help preserve the information at the edges of the image
and prevent the loss of spatial resolution.
• However, it also increases the memory requirements and computation
time of the network.
• Therefore, it is important to experiment with different padding
techniques and choose the one that gives the best performance.
Padding
Learning Rate
Batch Size
• The batch size is a hyperparameter that determines the number of
samples that are processed by the network in each training iteration.
• A larger batch size can reduce the variance of the gradient estimates and
improve the stability of the training.
• However, it also increases the memory requirements and may lead to
slower convergence.
• A smaller batch size can reduce the memory requirements and improve
the convergence speed but may lead to noisy gradient estimates.
• Therefore, it is important to experiment with different batch sizes and
choose the one that gives the best trade-off between stability and speed.
Batch Size
Methods of Hyperparameter Tuning
• 1. Grid Search: Exhaustive search over a predefined hyperparameter
space.
• 2. Random Search: Randomly selects hyperparameters, often more
efficient.
• 3. Bayesian Optimization: Uses probability models to find best
hyperparameters.
• 4. Hyperband: Optimizes computational budget using adaptive
resource allocation.
Challenges in Hyperparameter
Tuning
• Large search space makes tuning computationally expensive.
• Overfitting can occur if tuned too aggressively.
• Requires deep knowledge of the model and data.
• Trade-off between performance improvement and computational
cost.
Conclusion
• Hyperparameter tuning is crucial for optimizing machine learning &
deep learning models.
• Choosing the right tuning method improves performance and
efficiency.
• Understanding hyperparameters helps in better model design and
training.
• Use systematic approaches and tools to automate tuning for large-
scale projects.