Machine Learning Lab Viva Questions and Answers (For M.
Tech Students)
1. What is Machine Learning?
Answer: Machine Learning is a subset of Artificial Intelligence that provides systems the ability to
automatically learn and improve from experience without being explicitly programmed.
2. What are the types of Machine Learning?
Answer:
- Supervised Learning
- Unsupervised Learning
- Semi-supervised Learning
- Reinforcement Learning
3. What is the difference between supervised and unsupervised learning?
Answer:
- Supervised learning uses labeled data, while unsupervised learning works on unlabeled data.
- Example: Supervised - Linear Regression; Unsupervised - K-Means Clustering
4. What is overfitting and underfitting?
Answer:
- Overfitting: Model performs well on training data but poorly on test data.
- Underfitting: Model performs poorly on both training and test data.
5. What are precision, recall, and F1-score?
Answer:
- Precision: TP / (TP + FP)
- Recall: TP / (TP + FN)
- F1-score: Harmonic mean of Precision and Recall
6. What is a confusion matrix?
Answer: A confusion matrix is a table used to describe the performance of a classification model on
a set of test data with known labels.
7. What is the use of train_test_split in scikit-learn?
Answer: It splits the dataset into training and testing sets.
8. What is cross-validation?
Answer: Cross-validation is a technique for assessing how the results of a statistical analysis will
generalize to an independent data set.
9. What is the difference between classification and regression?
Answer:
- Classification: Predicts categorical output (e.g., spam or not spam)
- Regression: Predicts continuous output (e.g., house price)
10. What is gradient descent?
Answer: It is an optimization algorithm used to minimize the loss function in machine learning
models.
11. What is the role of the learning rate in gradient descent?
Answer: It controls how much we adjust the weights with respect to the loss gradient.
12. What is regularization?
Answer: A technique used to reduce overfitting by adding a penalty term to the loss function. (e.g.,
L1, L2)
13. What is the difference between L1 and L2 regularization?
Answer:
- L1 (Lasso): Adds absolute value of magnitude to penalty term (can shrink coefficients to 0)
- L2 (Ridge): Adds squared value of magnitude to penalty term
14. What is PCA?
Answer: Principal Component Analysis is a dimensionality reduction technique that transforms a
large set of variables into a smaller one that still contains most of the information.
15. What is the KNN algorithm?
Answer: K-Nearest Neighbors is a simple, instance-based learning algorithm that classifies data
points based on the majority label of its k-nearest neighbors.
16. What is a Decision Tree?
Answer: A decision tree is a flowchart-like structure used for classification and regression, where
internal nodes represent tests on features, branches represent outcomes, and leaves represent
classes or values.
17. What is entropy in a decision tree?
Answer: Entropy measures the impurity or disorder in a dataset. Lower entropy means higher purity.
18. What is a random forest?
Answer: An ensemble learning method that builds multiple decision trees and merges them to get a
more accurate and stable prediction.
19. What is SVM?
Answer: Support Vector Machine is a supervised learning model used for classification and
regression by finding a hyperplane that best separates data points.
20. What are kernels in SVM?
Answer: Kernels are functions used to transform data into higher dimensions to make it possible to
perform linear separation in non-linear datasets. Common kernels: linear, polynomial, RBF.
21. What is a neural network?
Answer: A neural network is a computational model inspired by the human brain, consisting of layers
of neurons that learn to perform tasks by considering examples.
22. What is backpropagation?
Answer: A training algorithm for neural networks that calculates the gradient of the loss function and
updates weights in the network to minimize error.
23. What are activation functions?
Answer: Functions used in neural networks to introduce non-linearity. Examples: sigmoid, tanh,
ReLU.
24. What are epochs, batch size, and iterations?
Answer:
- Epoch: One complete pass through the training dataset.
- Batch Size: Number of samples processed before the model is updated.
- Iteration: One update of the model parameters (depends on batch size).
25. Libraries used in ML labs and their purposes:
- NumPy: Numerical computations
- Pandas: Data manipulation and analysis
- Matplotlib/Seaborn: Data visualization
- scikit-learn: Machine learning algorithms and preprocessing
- Keras/TensorFlow: Deep learning and neural networks
- MLflow: Experiment tracking
26. Commonly Used Libraries in ML Labs:
- NumPy: Core library for numerical computations and array handling.
- Pandas: Data analysis and manipulation tool for structured data.
- Matplotlib: Plotting library for static, interactive, and animated visualizations.
- Seaborn: Statistical data visualization built on top of Matplotlib.
- scikit-learn: Simple and efficient tools for data mining and machine learning.
- Keras: High-level neural networks API, written in Python.
- TensorFlow: End-to-end open-source platform for machine learning.
- PyTorch: Deep learning framework for fast and flexible experimentation.
- MLflow: Tool to manage the ML lifecycle, including experimentation and deployment.
- OpenCV: Library for computer vision and image processing.
- XGBoost/LightGBM: Libraries for gradient boosting and ensemble models.
- Statsmodels: Statistical modeling, hypothesis testing, and data exploration.
- Joblib: For saving and loading large Python objects like ML models efficiently.