AI & ML Interview Preparation
AI & ML Interview Preparation
4. What are some of the techniques used for sampling? What is the
main advantage of sampling?
Overfitting: The model performs well only for the sample training
data. If any new data is given as input to the model, it fails to provide
any result. These conditions occur due to low bias and high variance
in the model. Decision trees are more prone to overfitting.
SYNAPSE - The AI & ML Club AI & ML Interview Preparation
7. What is Cross-Validation?
Although these two terms are used for establishing a relationship and
dependency between any two random variables, the following are the
differences between them:
covarianceXY = E[(X-μX),(Y-μY)]
correlationXY = E[(X-μX),(Y-μY)]/(σXσY) so that
For example, in the below image, we can see that the sample that we
selected does not entirely represent the whole population that we
have. This helps us to question whether we have selected the right
data for analysis or not.
11. Why is data cleaning crucial? How do you clean the data?
Data cleaning helps to identify and fix any structural issues in the
data. It also helps in removing any duplicates and helps to
maintain the consistency of the data.
A) Filter Methods:
B)Wrapper Methods:
Backward Selection: Here, all the features are tested and the
nonfitting ones are eliminated one by one to see while checking
which works better.
C)Embedded Methods:
13. How will you treat missing values during data analysis?
The impact of missing values can be known aer identifying what kind
of variables have the missing values.
If the data analyst finds any pattern in these missing values, then
there are chances of finding meaningful insights.
In case of patterns are not found, then these missing values can
either be ignored or can be replaced with default values such as
mean, minimum, maximum, or median values.
Univariate Analysis: This analysis deals with solving only one variable
at a time.
Bivariate Analysis: This analysis deals with the statistical study of two
variables at a given time.
15. What is the difference between the Test set and validation set?
The test set is used to test or evaluate the performance of the trained
model. It evaluates the predictive power of the model. The validation
set is part of the training set that is used to select parameters for
avoiding model overfitting.
Kernel functions are generalized dot product functions used for the
computing dot product of vectors xx and yy in high dimensional
feature space. Kernal trick method is used for solving a non-linear
problem by using a linear classifier by transforming linearly
inseparable data into separable ones in higher dimensions.
For example, consider the below graph that illustrates training data:
Before citing instances, let us understand what are false positives and
false negatives.
False Positives are those cases that were wrongly identified as an
event even if they were not. They are called Type I errors.
False Negatives are those cases that were wrongly identified as
non-events despite being an event. They are called Type II errors
20.Give one example where both false positives and false negatives
are important equally?
Banking fields: Lending loans are the main sources of income to the
banks. But if the repayment rate isn’t good, then there is a risk of
huge losses instead of any profits. So giving out loans to customers is
a gamble as banks can’t risk losing good customers but at the same
time, they can’t afford to acquire bad customers. This case is a classic
example of equal importance in false positive and false negative
scenarios.
SYNAPSE - The AI & ML Club AI & ML Interview Preparation
This reduces the storage space and time for model execution.
Removes the issue of multi-collinearity thereby improving the
parameter interpretation of the ML model.
Makes it easier for visualizing data when the dimensions are
reduced.
Avoids the curse of increased dimensionality.
22. How is the grid search parameter different from the random
search tuning strategy?
Tuning strategies are used to find the right set of hyperparameters.
Hyperparameters are those properties that are fixed and model-
specific before the model is tested or trained on the dataset. Both
the grid search and random search tuning strategies are optimization
techniques to find efficient hyperparameters.
Grid Search:
Here, every combination of a preset list of hyperparameters is
tried out and evaluated.
The search pattern is similar to searching in a grid where the
values are in a matrix and a search is performed. Each parameter
set is tried out and their accuracy is tracked. aer every
combination is tried out, the model with the highest accuracy is
chosen as the best one.
The main drawback here is that, if the number of hyperparameters
is increased, the technique suffers. The number of evaluations
can increase exponentially with each increase in the
hyperparameter. This is called the problem of dimensionality in a
grid search.
Random Search:
In this technique, random combinations of hyperparameters set
are tried and evaluated for finding the best solution. For
optimizing the search, the function is tested at random
configurations in parameter space as shown in the image below.
In this method, there are increased chances of finding optimal
parameters because the pattern followed is random. There are
chances that the model is trained on optimized parameters
without the need for aliasing.
This search works the best when there is a lower number of
dimensions as it takes less time to find the right set.