0% found this document useful (0 votes)
4 views19 pages

Hyperparameter Tuning

The document discusses hyperparameter tuning in machine learning, highlighting the difference between parameters and hyperparameters, and the importance of selecting appropriate hyperparameters for model performance. It outlines common hyperparameters such as learning rate, batch size, and number of layers, and describes methods for tuning them, including grid search and random search. The conclusion emphasizes the significance of hyperparameter tuning for optimizing models and suggests using systematic approaches for large-scale projects.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views19 pages

Hyperparameter Tuning

The document discusses hyperparameter tuning in machine learning, highlighting the difference between parameters and hyperparameters, and the importance of selecting appropriate hyperparameters for model performance. It outlines common hyperparameters such as learning rate, batch size, and number of layers, and describes methods for tuning them, including grid search and random search. The conclusion emphasizes the significance of hyperparameter tuning for optimizing models and suggests using systematic approaches for large-scale projects.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

HYPERPARAMETER

TUNING
PRESENTED BY -SRUTHY P L
ROLL NO -11
M.TECH VLSI & ES
MODEL ENGINEERING COLLEGE
THRIKKAKARA
Hyperparameters vs. Parameters
• Parameters: Internal variables learned by the model (e.g., weights,
biases).
• Hyperparameters: External configurations that control training (e.g.,
learning rate, dropout rate).
• Example: In a Neural Network,
• - Parameters: Weights, Biases.
• - Hyperparameters: Number of layers, Activation function, Learning
rate.
HYPERPARAMETERS
• Hyperparameters are settings that control the learning process of a
machine learning model.
• Unlike parameters (e.g., weights in neural networks),
hyperparameters are not learned from data.
• They are set before training starts and influence model performance.
• Examples: Learning rate, batch size, number of hidden layers, number
of trees in a random forest.
Why is Hyperparameter Tuning
Important?
• Poor hyperparameters can lead to underfitting or overfitting.
• The right hyperparameters improve model accuracy and efficiency.
• Helps in optimizing training time and computational resources.
• Essential for deep learning models where training costs are high.
Common Hyperparameters in
Machine Learning & Deep Learning
• Learning Rate (α): Controls how much to adjust weights in each step.
• Batch Size: Number of samples per update.
• Number of Epochs: Number of complete passes through the dataset.
• Number of Layers: Defines depth of deep learning models.
• Activation Functions: ReLU, Sigmoid, Tanh, etc.
• Dropout Rate: Prevents overfitting by randomly dropping
connections.
• Regularization Parameters: L1, L2 norms to control model complexity.
Learning Rate
• The learning rate is a hyperparameter that determines the step size at
which the network updates its parameters during training.
• A large learning rate can lead to rapid convergence but may result in
unstable and oscillating training.
• A small learning rate can ensure stable and smooth training but may
result in slower convergence.
• Therefore, it is important to experiment with different learning rates
and choose the one that gives the best trade-off between training speed
and stability.
Number of Layers
• The number of layers in a CNN is a critical
hyperparameter that determines the depth of the
network.
• A deeper network can learn more complex features and
patterns from the data, but it is also more prone to
overfitting.
• Therefore, it is important to strike a balance between the
number of layers and the complexity of the problem.
• A good starting point is to use a small number of layers
and gradually increase their depth until the desired
performance is achieved.
Filter Size
• The filter size is another important hyperparameter that determines
the receptive field of each convolutional layer.
• A larger filter size can capture more information from the input
image, but it also increases the number of parameters in the network.
• A smaller filter size can reduce the number of parameters, but it may
not be able to capture all the relevant features in the image.
• Therefore, it is important to experiment with different filter sizes
and choose the one that gives the best performance.
We can start with filter size 3x3
Stride
• The stride is a hyperparameter that determines the number of pixels by
which the filter moves across the input image.
• A larger stride can reduce the size of the output feature maps, but it can
also lead to information loss.
• A smaller stride can preserve more information, but it also increases the
computation time and memory requirements.
• Therefore, it is important to choose an appropriate stride that balances the
trade-off between information loss and computational efficiency.
• Default stride in CNN is 1
Stride
Padding
• Padding is a technique used to preserve the spatial dimensions of the
input image while applying convolutional layers.
• It involves adding zeros around the border of the input image to create
a padded image that can be convolved with the filter.
• Padding can help preserve the information at the edges of the image
and prevent the loss of spatial resolution.
• However, it also increases the memory requirements and computation
time of the network.
• Therefore, it is important to experiment with different padding
techniques and choose the one that gives the best performance.
Padding
Learning Rate
Batch Size
• The batch size is a hyperparameter that determines the number of
samples that are processed by the network in each training iteration.
• A larger batch size can reduce the variance of the gradient estimates and
improve the stability of the training.
• However, it also increases the memory requirements and may lead to
slower convergence.
• A smaller batch size can reduce the memory requirements and improve
the convergence speed but may lead to noisy gradient estimates.
• Therefore, it is important to experiment with different batch sizes and
choose the one that gives the best trade-off between stability and speed.
Batch Size
Methods of Hyperparameter Tuning
• 1. Grid Search: Exhaustive search over a predefined hyperparameter
space.
• 2. Random Search: Randomly selects hyperparameters, often more
efficient.
• 3. Bayesian Optimization: Uses probability models to find best
hyperparameters.
• 4. Hyperband: Optimizes computational budget using adaptive
resource allocation.
Challenges in Hyperparameter
Tuning
• Large search space makes tuning computationally expensive.
• Overfitting can occur if tuned too aggressively.
• Requires deep knowledge of the model and data.
• Trade-off between performance improvement and computational
cost.
Conclusion
• Hyperparameter tuning is crucial for optimizing machine learning &
deep learning models.
• Choosing the right tuning method improves performance and
efficiency.
• Understanding hyperparameters helps in better model design and
training.
• Use systematic approaches and tools to automate tuning for large-
scale projects.

You might also like