0% found this document useful (0 votes)

23 views15 pages

AI & ML Interview Preparation

The document provides a comprehensive guide on machine learning interview preparation, covering essential concepts such as bias-variance tradeoff, sampling techniques, overfitting and underfitting, and various statistical analyses. It also discusses data cleaning, feature selection methods, and the importance of handling missing values, along with examples of false positives and false negatives in different fields. Overall, it serves as a valuable resource for individuals preparing for machine learning interviews.

Uploaded by

Opinion FF

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views15 pages

AI & ML Interview Preparation

Uploaded by

Opinion FF

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

SYNAPSE - The AI & ML Club

Machine Learning Interview

Preparation Q&A
SYNAPSE - The AI & ML Club AI & ML Interview Preparation

Machine Learning Interview Q&A

1. What is Machine Learning? what are its applications?

Machine learning is a field of study, that allows machines to learn and

improve themself from data, without being explicitly programmed.

Here are some applications of machine learning -

Analyzing images of products on a production line to automatically

classify them
Automatically classifying news articles into fake and real
Automatically flagging offensive comments on discussion forums
Creating a chatbot or a personal assistance

2. What is bias and variance in the term of machine learning?

Bias: Bias is the average difference between the average prediction of

the model and true values. (In other words, Bias is the inability of a
model to learn and capture the relationship in training data.) High bias
can lead to underfitting.

Variance: Variance is the variability of model prediction on different

subsets of training data. A model with high variance pays too much
attention to capture the patterns in data. High variance can lead
overfitting.

3. What is the bias-variance tradeoff?

Bias variance trade-off is a fundamental concept of machine learning
that describes the balance between a model’s complexity and
predictive performance , this trade-off is crucial to understand that
model is generalize well or not.
SYNAPSE - The AI & ML Club AI & ML Interview Preparation

4. What are some of the techniques used for sampling? What is the
main advantage of sampling?

There are majorly two categories of sampling techniques based on

the usage of statistics, they are:
Probability Sampling techniques: Clustered sampling, Simple
random sampling, Stratified sampling.
Non-Probability Sampling techniques: Quota sampling,
Convenience sampling, snowball sampling, etc

Data analysis can not be done on a whole volume of data at a time

especially when it involves larger datasets. It becomes crucial to take
some data samples that can be used for representing the whole
population and then perform analysis on it. While doing this, it is very
much necessary to carefully take sample data out of the huge data
that truly represents the entire dataset.

5. List down the conditions for Overfitting and Underfitting.

Overfitting: The model performs well only for the sample training
data. If any new data is given as input to the model, it fails to provide
any result. These conditions occur due to low bias and high variance
in the model. Decision trees are more prone to overfitting.
SYNAPSE - The AI & ML Club AI & ML Interview Preparation

Underfitting: Here, the model is so simple that it is not able to

identify the correct relationship in the data, and hence it does not
perform well even on the test data. This can happen due to high bias
and low variance. Linear regression is more prone to Underfitting.

6. What are Eigenvectors and Eigenvalues?

Eigenvectors are column vectors or unit vectors whose

length/magnitude is equal to 1. They are also called right vectors.
Eigenvalues are coefficients that are applied on eigenvectors which
give these vectors different values for length or magnitude.

A matrix can be decomposed into Eigenvectors and Eigenvalues and

this process is called Eigen decomposition. These are then eventually
used in machine learning methods like PCA (Principal Component
Analysis) for gathering valuable insights from the given matrix
SYNAPSE - The AI & ML Club AI & ML Interview Preparation

7. What is Cross-Validation?

Cross-Validation is a Statistical technique used for improving a

model’s performance. Here, the model will be trained and tested with
rotation using different samples of the training dataset to ensure that
the model performs well for unknown data. The training data will be
split into various groups and the model is run and validated against
these groups in rotation.

8. What are the differences between correlation and covariance?

Although these two terms are used for establishing a relationship and
dependency between any two random variables, the following are the
differences between them:

Correlation: This technique is used to measure and estimate the

quantitative relationship between two variables and is measured
in terms of how strong are the variables related.
Covariance: It represents the extent to which the variables
change together in a cycle. This explains the systematic
relationship between pair of variables where changes in one
affect changes in another variable.

Mathematically, consider 2 random variables, X and Y where the

means are represented as μX and μY respectively and standard
deviations are represented by σX and σY respectively and E
represents the expected value operator, then:
SYNAPSE - The AI & ML Club AI & ML Interview Preparation

covarianceXY = E[(X-μX),(Y-μY)]
correlationXY = E[(X-μX),(Y-μY)]/(σXσY) so that

correlation(X,Y) = covariance(X,Y)/(covariance(X) covariance(Y))

Based on the above formula, we can deduce that the correlation is

dimensionless whereas covariance is represented in units that are
obtained from the multiplication of units of two variables.

The following image graphically shows the difference between

correlation and covariance:

9. How do you approach solving any data analytics based project?

Generally, we follow the below steps:

First step is to thoroughly understand the business
requirement/problem
Next, explore the given data and analyze it carefully. If you find
any data missing, get the requirements clarified from the
business.
Data cleanup and preparation step is to be performed next which
is then used for modeling. Here, the missing values are found and
the variables are transformed.
Run your model against the data, build meaningful visualization
and analyze the results to get meaningful insights.
Release the model implementation, track the results and
performance over a specified period to analyze the usefulness.
Perform cross-validation of the model.
SYNAPSE - The AI & ML Club AI & ML Interview Preparation

10. Why do we need selection bias?

Selection Bias happens in cases where there is no randomization

specifically achieved while picking a part of the dataset for analysis.
This bias tells that the sample analyzed does not represent the whole
population meant to be analyzed.

For example, in the below image, we can see that the sample that we
selected does not entirely represent the whole population that we
have. This helps us to question whether we have selected the right
data for analysis or not.

11. Why is data cleaning crucial? How do you clean the data?

While running an algorithm on any data, to gather proper insights, it is

very much necessary to have correct and clean data that contains
only relevant information. Dirty data most oen results in poor or
incorrect insights and predictions which can have damaging effects.

For example, while launching any big campaign to market a product, if

our data analysis tells us to target a product that in reality has no
demand and if the campaign is launched, it is bound to fail. This
results in a loss of the company’s revenue. This is where the
importance of having proper and clean data comes into the picture.

Data Cleaning of the data coming from different sources helps in

data transformation and results in the data where the data
scientists can work on.
Properly cleaned data increases the accuracy of the model and
provides very good predictions
SYNAPSE - The AI & ML Club AI & ML Interview Preparation

If the dataset is very large, then it becomes cumbersome to run

data on it. The data cleanup step takes a lot of time (around 80%
of the time) if the data is huge. It cannot be incorporated with
running the model. Hence, cleaning data before running the
model, results in increased speed and efficiency of the model.

Data cleaning helps to identify and fix any structural issues in the
data. It also helps in removing any duplicates and helps to
maintain the consistency of the data.

12.What are the available feature selection methods for selecting

the right variables for building efficient predictive models?

While using a dataset in data science or machine learning algorithms,

it so happens that not all the variables are necessary and useful to
build a model. Smarter feature selection methods are required to
avoid redundant models to increase the efficiency of our model.
Following are the three main methods in feature selection:

A) Filter Methods:

These methods pick up only the intrinsic properties of features

that are measured via univariate statistics and not cross-validated
performance. They are straightforward and are generally faster
and require less computational resources when compared to
wrapper methods.
There are various filter methods such as the Chi-Square test,
Fisher’s Score method, Correlation Coefficient, Variance
Threshold, Mean Absolute Difference (MAD) method, Dispersion
Ratios, etc.
SYNAPSE - The AI & ML Club AI & ML Interview Preparation

B)Wrapper Methods:

These methods need some sort of method to search greedily on

all possible feature subsets, access their quality by learning and
evaluating a classifier with the feature.
The selection technique is built upon the machine learning
algorithm on which the given dataset needs to fit.

There are three types of wrapper methods, they are:

Forward Selection: Here, one feature is tested at a time and new

features are added until a good fit is obtained.

Backward Selection: Here, all the features are tested and the
nonfitting ones are eliminated one by one to see while checking
which works better.

Recursive Feature Elimination: The features are recursively

checked and evaluated how well they perform.

These methods are generally computationally intensive and

require highend resources for analysis. But these methods usually
lead to better predictive models having higher accuracy than
filter methods.
SYNAPSE - The AI & ML Club AI & ML Interview Preparation

C)Embedded Methods:

Embedded methods constitute the advantages of both filter and

wrapper methods by including feature interactions while
maintaining reasonable computational costs
These methods are iterative as they take each model iteration and
carefully extract features contributing to most of the training in
that iteration.
Examples of embedded methods: LASSO Regularization (L1),
Random Forest Importance.
SYNAPSE - The AI & ML Club AI & ML Interview Preparation

13. How will you treat missing values during data analysis?

The impact of missing values can be known aer identifying what kind
of variables have the missing values.

If the data analyst finds any pattern in these missing values, then
there are chances of finding meaningful insights.

In case of patterns are not found, then these missing values can
either be ignored or can be replaced with default values such as
mean, minimum, maximum, or median values.

If the missing values belong to categorical variables, then they are

assigned with default values. If the data has a normal distribution,
then mean values are assigned to missing values.

If 80% values are missing, then it depends on the analyst to either

replace them with default values or drop the variables.

14.What are the differences between univariate, bivariate and

multivariate analysis?

Statistical analyses are classified based on the number of variables

processed at a given time.

Univariate Analysis: This analysis deals with solving only one variable
at a time.

Example - Sales pie charts based on territory

Bivariate Analysis: This analysis deals with the statistical study of two
variables at a given time.

Example - Scatterplot of Sales and spend volume analysis study.

Multivariate Analysis: This analysis deals with statistical analysis of

more than two variables and studies the responses.

Example: Study of the relationship between human’s social media

habits and their self esteem which depends on multiple factors like
age, number of hours spent, employment status, relationship status,
etc.
SYNAPSE - The AI & ML Club AI & ML Interview Preparation

15. What is the difference between the Test set and validation set?

The test set is used to test or evaluate the performance of the trained
model. It evaluates the predictive power of the model. The validation
set is part of the training set that is used to select parameters for
avoiding model overfitting.

16.What do you understand by a kernel trick?

Kernel functions are generalized dot product functions used for the
computing dot product of vectors xx and yy in high dimensional
feature space. Kernal trick method is used for solving a non-linear
problem by using a linear classifier by transforming linearly
inseparable data into separable ones in higher dimensions.

17.How will you balance/correct imbalanced data?

There are different techniques to correct/balance imbalanced data. It

can be done by increasing the sample numbers for minority classes.
The number of samples can be decreased for those classes with
extremely high data points. Following are some approaches followed
to balance data:

Use the right evaluation metrics: In cases of imbalanced data, it is

very important to use the right evaluation metrics that provide
valuable information.
Specificity/Precision: Indicates the number of selected instances
that are relevant.
Sensitivity: Indicates the number of relevant instances that are
selected.
F1 score: It represents the harmonic mean of precision and
sensitivity.
MCC (Matthews correlation coefficient): It represents the
correlation coefficient between observed and predicted binary
classifications.
AUC (Area Under the Curve): This represents a relation between
the true positive rates and false-positive rates.

For example, consider the below graph that illustrates training data:

Here, if we measure the accuracy of the model in terms of getting

"0"s, then the accuracy of the model would be very high -> 99.9%, but
the model does not guarantee any valuable information. In such
cases, we can apply different evaluation metrics as stated above
SYNAPSE - The AI & ML Club AI & ML Interview Preparation

Training Set Resampling: It is also possible to balance data by

working on getting different datasets and this can be achieved by
resampling. There are two approaches followed under-sampling that
is used based on the use case and the requirements:

Under-sampling: This balances the data by reducing the size of

the abundant class and is used when the data quantity is
sufficient. By performing this, a new dataset that is balanced can
be retrieved and this can be used for further modeling.

Over-sampling: This is used when data quantity is not sufficient.

This method balances the dataset by trying to increase the
samples size. Instead of getting rid of extra samples, new samples
are generated and introduced by employing the methods of
repetition, bootstrapping, etc.

Perform K-fold cross-validation correctly: Cross-Validation needs to

be applied properly while using over-sampling. The cross-validation
should be done before over-sampling because if it is done later, then
it would be like overfitting the model to get a specific result. To avoid
this, resampling of data is done repeatedly with different ratios.

18.What are some examples when false positive has proven

important than false negative ?

Before citing instances, let us understand what are false positives and
false negatives.
False Positives are those cases that were wrongly identified as an
event even if they were not. They are called Type I errors.
False Negatives are those cases that were wrongly identified as
non-events despite being an event. They are called Type II errors

Some examples where false positives were important than false

negatives are:

In the medical field: Consider that a lab report has predicted

cancer to a patient even if he did not have cancer. This is an
example of a false positive error. It is dangerous to start
chemotherapy for that patient as he doesn’t have cancer as
starting chemotherapy would lead to damage of healthy cells and
might even actually lead to cancer.
SYNAPSE - The AI & ML Club AI & ML Interview Preparation

In the e-commerce field: Suppose a company decides to start a

campaign where they give $100 gi vouchers for purchasing
$10000 worth of items without any minimum purchase
conditions. They assume it would result in at least 20% profit for
items sold above $10000. What if the vouchers are given to the
customers who haven’t purchased anything but have been
mistakenly marked as those who purchased $10000 worth of
products. This is the case of falsepositive error.

19.What are some examples when false positive has proven

important than false negative ?

Some examples where false negatives were important than false

positives are:

Criminal justice system: It's considered worse to convict an

innocent person (false positive) than to let a guilty person go free
(false negative).

Drug testing: It's more important to catch drug users (minimize

false negatives) than to have someone falsely accused of drug use
(minimize false positives).

20.Give one example where both false positives and false negatives
are important equally?

Banking fields: Lending loans are the main sources of income to the
banks. But if the repayment rate isn’t good, then there is a risk of
huge losses instead of any profits. So giving out loans to customers is
a gamble as banks can’t risk losing good customers but at the same
time, they can’t afford to acquire bad customers. This case is a classic
example of equal importance in false positive and false negative
scenarios.
SYNAPSE - The AI & ML Club AI & ML Interview Preparation

21. What is the importance of dimensionality reduction?

The process of dimensionality reduction constitutes reducing the

number of features in a dataset to avoid overfitting and reduce the
variance. There are mostly 4 advantages of this process:

This reduces the storage space and time for model execution.
Removes the issue of multi-collinearity thereby improving the
parameter interpretation of the ML model.
Makes it easier for visualizing data when the dimensions are
reduced.
Avoids the curse of increased dimensionality.

22. How is the grid search parameter different from the random
search tuning strategy?
Tuning strategies are used to find the right set of hyperparameters.
Hyperparameters are those properties that are fixed and model-
specific before the model is tested or trained on the dataset. Both
the grid search and random search tuning strategies are optimization
techniques to find efficient hyperparameters.

Grid Search:
Here, every combination of a preset list of hyperparameters is
tried out and evaluated.
The search pattern is similar to searching in a grid where the
values are in a matrix and a search is performed. Each parameter
set is tried out and their accuracy is tracked. aer every
combination is tried out, the model with the highest accuracy is
chosen as the best one.
The main drawback here is that, if the number of hyperparameters
is increased, the technique suffers. The number of evaluations
can increase exponentially with each increase in the
hyperparameter. This is called the problem of dimensionality in a
grid search.

Random Search:
In this technique, random combinations of hyperparameters set
are tried and evaluated for finding the best solution. For
optimizing the search, the function is tested at random
configurations in parameter space as shown in the image below.
In this method, there are increased chances of finding optimal
parameters because the pattern followed is random. There are
chances that the model is trained on optimized parameters
without the need for aliasing.
This search works the best when there is a lower number of
dimensions as it takes less time to find the right set.

Ace The Data Science Interview PDF
No ratings yet
Ace The Data Science Interview PDF
13 pages
Time Series Forecasting Week 2 Quiz Part 1
75% (4)
Time Series Forecasting Week 2 Quiz Part 1
3 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Top 100 Machine Learning Questions With Answers For Interview PDF
100% (3)
Top 100 Machine Learning Questions With Answers For Interview PDF
48 pages
Zorro Manual
No ratings yet
Zorro Manual
500 pages
Unit 3
No ratings yet
Unit 3
55 pages
Week 4 - Intro to ML
No ratings yet
Week 4 - Intro to ML
37 pages
Model Evaluation
No ratings yet
Model Evaluation
39 pages
Data Science Interview Questions for Freshers
No ratings yet
Data Science Interview Questions for Freshers
18 pages
ML_DS_interview_quetions
No ratings yet
ML_DS_interview_quetions
17 pages
Lecture 9 - Evaluations
No ratings yet
Lecture 9 - Evaluations
68 pages
Data Science Interview Questions -1
No ratings yet
Data Science Interview Questions -1
55 pages
5.2
No ratings yet
5.2
62 pages
Deep Learning[1]
No ratings yet
Deep Learning[1]
26 pages
Machine Learning Fundamentals
No ratings yet
Machine Learning Fundamentals
4 pages
Crack_Data_Science_Interview_�_1731300339
No ratings yet
Crack_Data_Science_Interview_�_1731300339
132 pages
Data Science Interview Questions (#Day11) PDF
100% (1)
Data Science Interview Questions (#Day11) PDF
11 pages
Unit 2
No ratings yet
Unit 2
18 pages
Mastering Machine Learning: A Comprehensive Guide to Success
From Everand
Mastering Machine Learning: A Comprehensive Guide to Success
Rick Spair
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Data Science
No ratings yet
Data Science
14 pages
Statistics for Data Science
No ratings yet
Statistics for Data Science
39 pages
Machine Learning
No ratings yet
Machine Learning
48 pages
20 Questions On Feature Engineering and Eda
No ratings yet
20 Questions On Feature Engineering and Eda
9 pages
DA_1733591326
No ratings yet
DA_1733591326
132 pages
1 - Intro to Machine Learning
No ratings yet
1 - Intro to Machine Learning
34 pages
Data Science Interview Question
No ratings yet
Data Science Interview Question
7 pages
Interview Questions On Machine Learning
100% (4)
Interview Questions On Machine Learning
22 pages
ML Unit-3 - RTU
No ratings yet
ML Unit-3 - RTU
20 pages
ML MAKAUT unit-3
No ratings yet
ML MAKAUT unit-3
6 pages
Pattern L1 L6
No ratings yet
Pattern L1 L6
19 pages
Pattern Recognition Application
No ratings yet
Pattern Recognition Application
43 pages
Pattern Summary Final
No ratings yet
Pattern Summary Final
28 pages
AAM UNIT 1 QB WITH ANSWER
No ratings yet
AAM UNIT 1 QB WITH ANSWER
12 pages
Breaking into AI!
No ratings yet
Breaking into AI!
30 pages
Chapter 2 Data Preprocessing
No ratings yet
Chapter 2 Data Preprocessing
23 pages
Data Prep and Cleaning For Machine Learning
No ratings yet
Data Prep and Cleaning For Machine Learning
22 pages
Common DS Interview Questions and Answers - 1
No ratings yet
Common DS Interview Questions and Answers - 1
4 pages
Feature Selection - Study Material
No ratings yet
Feature Selection - Study Material
6 pages
CSC413 Lecture Note
No ratings yet
CSC413 Lecture Note
32 pages
AI and ML For Business Antim Prahar WITH ANSWERS
No ratings yet
AI and ML For Business Antim Prahar WITH ANSWERS
26 pages
Unit6 Part3 General Procedure
No ratings yet
Unit6 Part3 General Procedure
19 pages
ML - Module 5
No ratings yet
ML - Module 5
80 pages
Week5 Modified
No ratings yet
Week5 Modified
25 pages
Mmds
No ratings yet
Mmds
12 pages
AI Capstone Project - Notes-Part2
No ratings yet
AI Capstone Project - Notes-Part2
8 pages
Data Science Interview Questions: Answer Here
No ratings yet
Data Science Interview Questions: Answer Here
54 pages
DataScience Interview Questions
100% (1)
DataScience Interview Questions
66 pages
ML Interview Guide 1750321365
No ratings yet
ML Interview Guide 1750321365
6 pages
DSF - UNIT III Notes
No ratings yet
DSF - UNIT III Notes
17 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
10 pages
Unit IV
No ratings yet
Unit IV
51 pages
Chapter 02 Overview - 4
No ratings yet
Chapter 02 Overview - 4
43 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
module3_DS_ppt
No ratings yet
module3_DS_ppt
68 pages
SML Updated UNIT 4
No ratings yet
SML Updated UNIT 4
44 pages
(A) What Is Machine Learning? Explain The Impact of Various Machine Learning Techniques in Today's World
No ratings yet
(A) What Is Machine Learning? Explain The Impact of Various Machine Learning Techniques in Today's World
6 pages
ML Unit 2
No ratings yet
ML Unit 2
35 pages
Lecture 5 - Feature extraction, model building & evaluation
No ratings yet
Lecture 5 - Feature extraction, model building & evaluation
35 pages
Model Selection NEW
No ratings yet
Model Selection NEW
24 pages
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
From Everand
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
Steven Taylor
No ratings yet
The Fundamentals of Machine Learning: Building Intelligent Systems from Data
From Everand
The Fundamentals of Machine Learning: Building Intelligent Systems from Data
Ethan Bennett
No ratings yet
MBAN-603DE - Decision Making Methods & Tools
No ratings yet
MBAN-603DE - Decision Making Methods & Tools
3 pages
PR 2 1.docx
No ratings yet
PR 2 1.docx
6 pages
Design and Implementation of A Computerized Tax Collection Systemn
No ratings yet
Design and Implementation of A Computerized Tax Collection Systemn
26 pages
Nptel: NOC:Introduction To Data Analytics - Video Course
No ratings yet
Nptel: NOC:Introduction To Data Analytics - Video Course
2 pages
MKTG1418 A3 SGS Puufriends-1
No ratings yet
MKTG1418 A3 SGS Puufriends-1
48 pages
STA8005: Multivariate Analysis For High-Dimensional Data Tutorial - Week 3 (MVN)
No ratings yet
STA8005: Multivariate Analysis For High-Dimensional Data Tutorial - Week 3 (MVN)
1 page
OB Article Review assignment-OR
No ratings yet
OB Article Review assignment-OR
3 pages
Do Energy Codes Really Save Energy?: Allen Lee, PH.D., Cadmus, Portland, OR Matei Perussi, Cadmus, Portland, OR
No ratings yet
Do Energy Codes Really Save Energy?: Allen Lee, PH.D., Cadmus, Portland, OR Matei Perussi, Cadmus, Portland, OR
12 pages
Math2361 - Probability & Statistics
No ratings yet
Math2361 - Probability & Statistics
3 pages
One-Way Repeated Measures Anova: Daniel Boduszek
No ratings yet
One-Way Repeated Measures Anova: Daniel Boduszek
15 pages
profile study tamanna
No ratings yet
profile study tamanna
16 pages
CHAPTER 3 Research MARIANO ESCRIN HUSBIE
No ratings yet
CHAPTER 3 Research MARIANO ESCRIN HUSBIE
24 pages
Improving Reading Comprehension Skills With Children's Books Through Metacognitive Strategy: The Turkish Context
No ratings yet
Improving Reading Comprehension Skills With Children's Books Through Metacognitive Strategy: The Turkish Context
12 pages
Modules of Instruction TESDA Template
No ratings yet
Modules of Instruction TESDA Template
50 pages
Dav Exp3 66
No ratings yet
Dav Exp3 66
4 pages
Proyek Akhir Elsie Tria Paramian
No ratings yet
Proyek Akhir Elsie Tria Paramian
64 pages
UGSRS Guidelines - 2022 (Version 3.O)
No ratings yet
UGSRS Guidelines - 2022 (Version 3.O)
28 pages
Resume - Isabella Premont
No ratings yet
Resume - Isabella Premont
2 pages
Final 3rd MAT1243 Handout 2023 Ac Year
No ratings yet
Final 3rd MAT1243 Handout 2023 Ac Year
70 pages
AIML Assignment 7
No ratings yet
AIML Assignment 7
3 pages
Data Profiling
No ratings yet
Data Profiling
3 pages
IMRAD DepEd Lecture - Method and Data Gathering Procedure
No ratings yet
IMRAD DepEd Lecture - Method and Data Gathering Procedure
17 pages
Predicting Life Insurance Risk Classes Using Machine Learning
No ratings yet
Predicting Life Insurance Risk Classes Using Machine Learning
68 pages
What Is A Critical Success Factor
100% (1)
What Is A Critical Success Factor
19 pages
Perception of Cybercrime Among Nigerian Youths
No ratings yet
Perception of Cybercrime Among Nigerian Youths
47 pages
Leul Thesis
No ratings yet
Leul Thesis
84 pages
Company Bankruptcy Detection PDF
No ratings yet
Company Bankruptcy Detection PDF
34 pages
Mantenimiento Predictivo MATLAB
No ratings yet
Mantenimiento Predictivo MATLAB
11 pages

AI & ML Interview Preparation

Uploaded by

AI & ML Interview Preparation

Uploaded by

SYNAPSE - The AI & ML Club

Machine Learning Interview

Machine Learning Interview Q&A

1. What is Machine Learning? what are its applications?

Machine learning is a field of study, that allows machines to learn and

Here are some applications of machine learning -

Analyzing images of products on a production line to automatically

2. What is bias and variance in the term of machine learning?

Bias: Bias is the average difference between the average prediction of

Variance: Variance is the variability of model prediction on different

3. What is the bias-variance tradeoff?

There are majorly two categories of sampling techniques based on

Data analysis can not be done on a whole volume of data at a time

5. List down the conditions for Overfitting and Underfitting.

Underfitting: Here, the model is so simple that it is not able to

6. What are Eigenvectors and Eigenvalues?

Eigenvectors are column vectors or unit vectors whose

A matrix can be decomposed into Eigenvectors and Eigenvalues and

Cross-Validation is a Statistical technique used for improving a

8. What are the differences between correlation and covariance?

Correlation: This technique is used to measure and estimate the

Mathematically, consider 2 random variables, X and Y where the

correlation(X,Y) = covariance(X,Y)/(covariance(X) covariance(Y))

Based on the above formula, we can deduce that the correlation is

The following image graphically shows the difference between

9. How do you approach solving any data analytics based project?

Generally, we follow the below steps:

10. Why do we need selection bias?

Selection Bias happens in cases where there is no randomization

While running an algorithm on any data, to gather proper insights, it is

For example, while launching any big campaign to market a product, if

Data Cleaning of the data coming from different sources helps in

If the dataset is very large, then it becomes cumbersome to run

12.What are the available feature selection methods for selecting

While using a dataset in data science or machine learning algorithms,

These methods pick up only the intrinsic properties of features

These methods need some sort of method to search greedily on

There are three types of wrapper methods, they are:

Forward Selection: Here, one feature is tested at a time and new

Recursive Feature Elimination: The features are recursively

These methods are generally computationally intensive and

Embedded methods constitute the advantages of both filter and

If the missing values belong to categorical variables, then they are

If 80% values are missing, then it depends on the analyst to either

14.What are the differences between univariate, bivariate and

Statistical analyses are classified based on the number of variables

Example - Sales pie charts based on territory

Example - Scatterplot of Sales and spend volume analysis study.

Multivariate Analysis: This analysis deals with statistical analysis of

Example: Study of the relationship between human’s social media

16.What do you understand by a kernel trick?

17.How will you balance/correct imbalanced data?

There are different techniques to correct/balance imbalanced data. It

Use the right evaluation metrics: In cases of imbalanced data, it is

Here, if we measure the accuracy of the model in terms of getting

Training Set Resampling: It is also possible to balance data by

Under-sampling: This balances the data by reducing the size of

Over-sampling: This is used when data quantity is not sufficient.

Perform K-fold cross-validation correctly: Cross-Validation needs to

18.What are some examples when false positive has proven

Some examples where false positives were important than false

In the medical field: Consider that a lab report has predicted

In the e-commerce field: Suppose a company decides to start a

19.What are some examples when false positive has proven

Some examples where false negatives were important than false

Criminal justice system: It's considered worse to convict an

Drug testing: It's more important to catch drug users (minimize

21. What is the importance of dimensionality reduction?

The process of dimensionality reduction constitutes reducing the

You might also like