ML LAB Viva Questions with Answers
ML LAB Viva Questions with Answers
1. Question: What problem do ridge regression and lasso regression address in linear
regression models?
Answer: They address the issue of multicollinearity and overfitting by introducing
regularization terms to the linear regression equation.
2. Question: Explain the key difference between ridge regression and lasso regression.
Answer: Ridge regression adds a regularization term with the squared
magnitude of coefficients, while lasso regression adds the absolute magnitude.
Lasso can lead to sparsity by setting some coefficients to exactly zero.
3. Question: When would you choose ridge regression over lasso regression, and vice
versa?
Answer: Ridge regression is preferred when all features are expected to
contribute to the model, while lasso regression is suitable when feature selection
is crucial, and some features can be omitted.
4. Question: How does the regularization term in ridge regression contribute to
preventing overfitting?
Answer: The regularization term penalizes large coefficients, discouraging
complex models and reducing the risk of overfitting.
5. Question: What is the significance of the regularization parameter in ridge and lasso
regression?
Answer: The regularization parameter controls the strength of regularization. A
higher value increases the penalty on large coefficients, influencing the model's
complexity.
1. Question: What is the ID3 algorithm, and how does it work in the context of decision
trees?
Answer: The Iterative Dichotomiser 3 (ID3) algorithm is a decision tree algorithm that
recursively selects the best attribute to split the data based on information gain, aiming
to maximize the homogeneity of the resulting subsets.
2. Question: How does information gain influence the decision-making process in the ID3
algorithm?
Answer: Information gain measures the effectiveness of an attribute in reducing
uncertainty about the classification. The ID3 algorithm selects the attribute with the
highest information gain for node splitting.
3. Question: Explain the concept of entropy and its role in the ID3 algorithm.
Answer: Entropy quantifies the impurity or disorder of a set. ID3 minimizes entropy by
selecting attributes that lead to subsets with lower entropy, resulting in more
homogeneous classes.
4. Question: What are the limitations of the ID3 algorithm, and how can they be
addressed?
Answer: ID3 is sensitive to noise and outliers. Pruning techniques, such as post-pruning
or pre-pruning, can be applied to mitigate overfitting and improve generalization.
5. Question: How does the decision tree represent knowledge, and how can it be
visualized?
Answer: A decision tree represents knowledge through a tree-like structure where each
node corresponds to a test on an attribute, and each branch represents an outcome of
the test. Visualization tools like Graphviz can be used to create graphical representations.
Week 8: Program to Demonstrate Naive Bayes Classifier and
Logistic Regression
a. Naive Bayes Classifier
1. Question: What makes the Naive Bayes Classifier "naive"?
Answer: The "naive" assumption in Naive Bayes is that features are conditionally
independent given the class label, simplifying the calculation of probabilities.
2. Question: How is the Naive Bayes Classifier applied in text classification tasks?
Answer: In text classification, Naive Bayes is used to calculate the probability of a
document belonging to a particular class based on the probabilities of individual words
occurring in that class.
3. Question: Explain Laplace smoothing and its role in Naive Bayes.
Answer: Laplace smoothing is used to handle the issue of zero probabilities for unseen
features. It involves adding a small constant to all feature counts to avoid division by
zero.
4. Question: In what scenarios is the Naive Bayes Classifier particularly suitable?
Answer: Naive Bayes is effective in text classification, spam filtering, and situations
where the independence assumption is reasonable.
5. Question: How does Naive Bayes handle continuous features, and what is the role of
probability density functions?
Answer: For continuous features, Naive Bayes assumes a probability density function,
often Gaussian, to estimate the likelihood of a given value.
1. Question: What is a Support Vector Machine (SVM), and how does it work in
classification?
Answer: SVM is a supervised machine learning algorithm that finds the hyperplane that
maximally separates classes in feature space. It aims to maximize the margin between
classes.
2. Question: Explain the concept of a hyperplane in SVM.
Answer: A hyperplane is a decision boundary that separates classes in feature space. In
SVM, the optimal hyperplane is the one that maximizes the margin, the distance
between the hyperplane and the nearest data points of each class.
3. Question: What is the significance of support vectors in SVM?
Answer: Support vectors are the data points that lie closest to the hyperplane and
influence its position. They play a crucial role in determining the optimal hyperplane.
4. Question: How does SVM handle nonlinearly separable data?
Answer: SVM can handle nonlinearly separable data by using kernel functions, which
implicitly map the input features into a higher-dimensional space, making them
separable.
5. Question: What are the key parameters in an SVM model, and how do they impact
model performance?
Answer: Key parameters include the choice of kernel, the regularization parameter (C),
and the kernel parameters. Tuning these parameters can significantly impact the SVM's
ability to generalize to new data.
1. Question: What is the concept of ensemble learning, and why is it used in machine
learning?
Answer: Ensemble learning combines the predictions of multiple models to improve
overall performance and generalization, reducing the risk of overfitting.
2. Question: How does bagging (Bootstrap Aggregating) work in the context of ensemble
learning?
Answer: Bagging creates multiple subsets of the training data through bootstrap
sampling and trains a base model on each subset. The final prediction is obtained by
averaging or voting.
3. Question: Explain the boosting algorithm and its purpose in ensemble learning.
Answer: Boosting focuses on iteratively improving the performance of weak learners by
assigning higher weights to misclassified instances. It aims to create a strong learner
from a collection of weak learners.
4. Question: What is the significance of the learning rate in boosting algorithms?
Answer: The learning rate controls the contribution of each weak learner to the final
model. A lower learning rate requires more iterations but may lead to better
convergence.
5. Question: When would you choose bagging over boosting or vice versa?
Answer: Bagging is effective when the base models are unstable or prone to overfitting,
while boosting excels in improving the performance of weak models, making it suitable
for scenarios where accuracy is crucial.
1. Question: How does a Random Forest differ from a traditional decision tree?
Answer: A Random Forest is an ensemble of decision trees, where each tree is trained
on a random subset of the data and a random subset of features. The final prediction is
an average or voting of individual tree predictions.
2. Question: What is the purpose of feature bagging in a Random Forest?
Answer: Feature bagging involves considering only a random subset of features when
splitting nodes in individual decision trees. It enhances diversity among trees and
improves overall model generalization.
3. Question: How does a Random Forest handle overfitting compared to a single decision
tree?
Answer: The aggregation of predictions from multiple trees in a Random Forest tends to
reduce overfitting, as individual errors are likely to cancel out, leading to a more robust
model.
4. Question: Explain the concept of out-of-bag (OOB) error in Random Forest.
Answer: OOB error is the prediction error on the instances not used in the training set of
a particular tree. It serves as an estimate of the model's performance on unseen data.
5. Question: In what types of problems is a Random Forest particularly effective?
Answer: Random Forests are effective in a wide range of problems, including
classification, regression, and tasks with a large number of features.
1. Question: What is Principal Component Analysis (PCA), and what is its primary
objective?
Answer: PCA is a dimensionality reduction technique that aims to capture the most
important information in a dataset by transforming it into a new set of uncorrelated
variables called principal components.
2. Question: How does PCA identify the principal components, and what information do
they represent?
Answer: PCA identifies the principal components as linear combinations of the original
features. The first principal component represents the direction of maximum variance,
and subsequent components capture orthogonal directions of decreasing variance.
3. Question: Explain the concept of eigenvalues and eigenvectors in the context of PCA.
Answer: Eigenvalues represent the variance captured by each principal component, and
eigenvectors represent the directions of these components in the feature space.
4. Question: What is the trade-off between the number of principal components and the
explained variance?
Answer: Increasing the number of principal components explains more variance but may
lead to overfitting. A balance must be struck to retain sufficient information while
reducing dimensionality.
5. Question: How can PCA be applied to preprocess data before feeding it into a machine
learning model?
Answer: PCA can be used to reduce the dimensionality of the data, eliminating
redundant information and speeding up training without sacrificing too much predictive
power.
1. Question: What does DBSCAN stand for, and what distinguishes it from other clustering
algorithms?
Answer: DBSCAN stands for Density-Based Spatial Clustering of Applications
with Noise. Unlike K-Means, DBSCAN identifies clusters based on data density
and can discover clusters of arbitrary shapes.
2. Question: Explain the core concepts of DBSCAN, such as epsilon (ε) and minimum
points (MinPts).
Answer: ε is the radius within which MinPts data points are considered part of
the same neighborhood. MinPts is the minimum number of data points required
to form a dense region.
3. Question: How does DBSCAN classify data points as core points, border points, and
noise points?
Answer: Core points have at least MinPts data points within ε, border points are
within ε of a core point but have fewer than MinPts neighbors, and noise points
do not satisfy the density criteria.
4. Question: What are the advantages of DBSCAN in handling clusters of varying shapes
and sizes?
Answer: DBSCAN can identify clusters with irregular shapes and adapt to varying
densities within the dataset. It is not constrained by assumptions about cluster
shapes.
5. Question: How does DBSCAN handle outliers, and why is it considered robust to noise?
Answer: DBSCAN naturally identifies noise as data points that do not belong to
any cluster. Its density-based approach makes it robust to outliers, and clusters
are only formed where there is sufficient density.