Sample questions on the Part of Machine Learning
(a) Short answer questions
1. What is meant by “learning” in the context of machine learning?
The ability of an algorithm to automatically acquire knowledge from data, adapt their
behaviour and improve performance over time without being explicitly programmed
for every scenario.
2. List out the types of machine learning.
Supervised, unsupervised, reinforcement
3. Distinguish between classification and regression.
Classification is concerned with assigning input data to the predefined categories
producing a discrete label as an output while regression focuses on predicting
continuous numerical values based on input features
4. What are the differences between supervised and unsupervised learning?
In supervised learning the model is trained to map an input to output while in
unsupervised there is no output to map to. The model is not trained but explores the
data to see some patterns.
5. What is meant by supervised classification?
It is a form of supervised learning where an algorithm learns from the dataset
consisting of input feature and there corresponding target labels.
6. Explain supervised learning with an example.
You have a teacher who tells you the correct answer. You learn from examples that
are labelled with the correct answer.
7. What do you mean by reinforcement learning?
The program is not taught what actions to take. The program must discover which
action yields the most reward.
8. What is an association rule?
This is a technique for discovering interesting relations of valuables in large
databases.
9. Explain the concept of Association rule learning. Give the names of two algorithms for
generating association rules.
This is a technique that aims at discovering interesting relationships between items in
dataset. It involves identifying association, dependencies or co-occurrences among
different items.
Examples of algorithms are: Apriopri algorithm, Eclat and FP-Growth algorithm
10. What is a classification problem in machine learning. Illustrate with an example.
Refers to a task where the goal is to assign input instances to predefined categories
based on their features.
eg developing a system that is able to classify emails as either spam or not based on
the email content. This requires training the model.
11. Give three examples of classification problems from real life situations.
Classifying medical images. eg x-ray as normal or not
face recognition: the input is an image. the classes are people to be recognized
Optical character recognition: which is the problem of recognizing character
codes from their images,
12. What is a discriminant in a classification problem?
is a rule or a function that is used to assign labels to new observations.
13. List three machine learning algorithms for solving classification problems.
Logistic regression algorithm
Decision tree algorithm
Random forest algorithm
14. What is a binary classification problem? Explain with an example. Give also an example
for a classification problem which is not binary.
This is a classification problem with only two class labels. Eg email spam
classification. Emails are classified as either spam or not spam
Example of a non-binary classification: flower species classification based on
sepal and petals.
15. What is regression problem. What are the different types of regression?
Refer to the type of supervised learning where the goal is to predict continuous
numerical value based on the input feature.
Types of regression are: Linear, polynomial and Logistic regression
16. In the context of classification problems explain with examples the following: (i)
hypothesis (ii) hypothesis space.
(i) refers to a proposed rule that maps input instances to their corresponding class
labels. eg in spam email classification, an hypothesis can be a rule that states that if
an email contains a word “buy now” it’s a spam.
(ii) refers to all possible hypotheses that can be considered for a given classification
problem. Eg can contain different rules each capturing different characteristics of an
email spam
17. Define the version space of a binary classification problem.
Refers to a set of all hypothesis that are consistent with the available training data.
18. What is meant by overfitting of data? Explain with an example.
Overfitting is the production of an analysis which corresponds too closely or
exactly to a particular set of data.
19. What is meant by overfitting and underfitting of data with examples.
Overfitting is the production of an analysis which corresponds too closely or
exactly to a particular set of data.
Underfitting is the production of a machine learning model that is not complex
enough to accurately capture relationships between a dataset, its features and a
target variable.
20. What is dimensionality reduction? How is it implemented?
is the process of reducing the number of variables under consideration by
obtaining a smaller set of principal variables.
Feature selection. Here we are interested in finding feature k that gives the
most information.
Feature extraction: here we are interested in finding a new set of k feature that
are the combination of the original n feature.
21. Explain why dimensionality reduction is useful in machine learning.
Decrease the number of inferences during testing. time and space complexity and
training time. Saves the cost of extracting an input
22. What are the commonly used dimensionality reduction techniques in machine learning?
Principal Components Analysis (PCA) and Linear Discriminant Analysis
(LDA),
23. How is the subset selection method used for dimensionality reduction?
is the process of selecting a subset of relevant features (variables, predictors)
for use in model construction.
In forward selection, we start with no variables and add them one by one, at
each step adding the one that decreases the error the most, until any further
addition does not decrease the error (or decreases it only sightly).
In sequential backward selection, we start with the set containing all features
and at each step remove the one feature that causes the least error.
24. What is cross-validation in machine learning?
is a technique to evaluate predictive models by partitioning the original
sample into a training set to train the model, and a test set to evaluate it.
25. What is meant by 5 x 2 cross-validation?
You divide the data into 5 equal parts and then get 2 parts as a training set and the rest
as a evaluation set.
26. What is meant by the confusion matrix of a binary classification problem.
it is a 2x2 matrix used to evaluate the performance of the binary model
27. Define the following terms: precision, recall, sensitivity, specificity.
precision measures the proportion of correctly predicted positive instances
among all instances that the model predicted as positive
Recall Measures the proportion of correctly predicted positive instances
among all actual positive instances in the dataset.
Sensitivity: same as recall.
Specificity: measure the proportion of correctly predicted negative instances
among all actual negative instances.
28. What is ROC curve in machine learning?
It is a graphical representation that illustrates the performance of a binary
classification model at different classification model thresholds.
29. What are true positive rates and false positive rates in machine learning?
True positive is another name for Sensitivity or recall
False positives measure the proportion of actual negative instances incorrectly
predicted as positive by the model.
(b) Long answer questions
1. Give a definition of the term “machine learning”. Explain with an example the concept of
learning in the context of machine learning.
The field of study that gives computers the ability to learn without being explicitly
programmed.
In handwritten recognition, we first need to train the model using dataset. Now test
the model to see if it can recognize the letters. If the performance of the model
improves with experience, then the algorithm is said to be learning.
2. Describe the basic components of the machine learning process.
Data storage: facility to store information and retrieve. Computers store information
in hard disks.
Abstraction: this is the process of extracting knowledge about the stored data. This
involves creating a general concept of the data as a whole. The process of fitting a
model to a dataset is known as training.
Generalisation: turning the stored information into a form that can be utilized for
future action. These actions are to be carried out on tasks that are similar. The goal is
to identify those properties of data that will be relevant in future task.
Evaluation: this is the process of giving feedback to the user to measure the utility of
learnt knowledge.
3. Describe in detail applications of machine learning in any three different knowledge
domains.
In telecommunication: call pattern are analysed and for network optimization and
maximizing quality of service
In medicine: learning programs are used diagnosis
In manufacturing: learning programs are used for optimization, control and touble
shooting.
4. Describe with an example the concept of association rule learning. Explain how it is made
use of in real life situations.
This is a method for discovering an interesting association between variables in a
large databases using some measure of interestingness.
For example, if someone buys gets tomatoes and onion, how likely is that person
getting carbage
5. What is the classification problem in machine learning? Describe three real life situations
in different domains where such problems arise.
This is the problem of identifying to which of a set of categories does a new
observation belongs, on the basis of training a set of data containing observation or
whose membership category is known.
In speech recognition: the input is acoustic and the classes are words that can be
ultred.
In medical diagnosis: the input are relevant information we have about the patient and
the output are illnesses.
In face recognition,
6. What is meant by a discriminant of a classification problem? Illustrate the idea with
examples.
This is a rule that is used to assign labels to new observation
For example, if a team scores 5 goals in the next game, then the team qualifies. Else
its eliminated.
7. Describe in detail with examples the different types of learning like the supervised
learning, etc.
Reinforcement: the program is not taught what actions to take. The program must
discover which action yields the most reward.
Supervised: You have a teacher who tells you the correct answer. You learn from
examples that are labelled with the correct answer.
Unsupervised: there is no output to map to. The model is not trained but explores the
data to see some patterns.
8. Define version space and illustrate it with an example.
Is the set of all possible hypothesis consistent with the training data in a machine
learning problem.
Example classifying whether an animal is a mammal or birds based on can fly and
give birth to young or not.
9. What is meant by “noise” in data? What are its sources and how it is affecting results?
This is any unwanted anomaly in the data.
Causes
Imprecision in the recording of input attribute
Additional attributes which we have not taken into account.
Errors in labelling the data points. Eg labelling positive as negative and negative as
positive
How it results affect
Simple hypothesis may not explain the data
Errors distorts the data
Learning problems may not produce accurate results
10. What issues are to be considered while selecting a model for applying machine learning in
a given problem.
Nature of the Problem: Understanding the nature of the problem is crucial in selecting
the right machine learning model. Different types of problems require different types
of algorithms.
Data Complexity: The complexity of the data and the relationships between features
influence the choice of the machine learning model.
11. Explain cross-validation in machine learning. Explain the different types of cross-
validations.
is a technique to evaluate predictive models by partitioning the original
sample into a training set to train the model, and a test set to evaluate it.
12. What is meant by true positives etc.? What is meant by confusion matrix of a binary
classification problem? Explain how this can be extended to multi-class problems.
True positive is another name for Sensitivity or recall. Measures the proportion
of correctly predicted positive instances among all actual positive instances in
the dataset.
False positives measure the proportion of actual negative instances incorrectly
predicted as positive by the model.
Confusion matrix: it is a 2x2 matrix used to evaluate the performance of the binary
model
13. What are ROC space and ROC curve in machine learning? In ROC space, which points
correspond to perfect prediction, always positive prediction and always negative prediction?
Why?
It is a graphical representation that illustrates the performance of a binary
classification model at different classification model thresholds.
The left bottom corner point (0; 0): Always negative prediction
The right top corner point (1; 1): Always positive prediction
The left top corner point (0; 1): Perfect prediction
Points along the diagonal: Random performance
Sample questions on the Part of Artificial Intelligence
1. What is Artificial Intelligence?
This is the science and engineering of making intelligent machines especially
computer programs
2. What is the basic difference between Robotics and Artificial Intelligence?
A robot is a machine that may or may not require intelligent to perform specific task
while an AI is a program
3. What Contributes to AI?
Computer science, Biology, Math, engineering, neuron science, sociological
etc
4. Explain Programming without and with AI
Without AI
A robot can solve specific questions it was meant to solve
Specification in program leads to change in structure
Modification is not easy and quick
With AI
A robot can answer generic questions it was meant to solve
You can modify the program without affecting the structure
Program can be modified quickly and easy
5. Applications of AI
Used in gaming: making the non-player intelligent
Natural language processing: enables computers to understand and process human
language
Vision systems: understanding and interpret visual data
Speech recognition: convert spoken language into written text.
6. What are the types of Intelligence?
Linguistic: ability to speak, recognise voice
Logical: the ability to understand relationships in the absence of action
Musical: ability to create and understand meaning of sound
Bodily Kinesthetic: ability to use complete or part of the body to solve a problem
7. What is Intelligence composed of?
Reasoning
Learning
Problem solving
Linguistic intelligence
8. Compare and contrast the two types of Reasoning?
Inductive: to conduct specific observation to make broad general statements
Deductive: starts with a general statement and then examines the possibility to reach a
conclusion
9. Difference between Human and Machine Intelligence
Humans perceive by patterns while machines by rules and data
Human store and recall information by pattern while machines is by using searching
algorithm
10. Compare and contrast Speech and Voice Recognition.
Speech recognition aims at understanding and comprehending what was spoken while
voice recognition aim is to recognize who is speaking.
Speaker independent are difficulty to implement while voice recognition is easy to
implement
11. Explain the various Task Domains of Artificial Intelligence.
Mundene task domain:
Computer Vision: deals with the interpretation and understanding of visual data, such as
images and videos.
Natural language processing: interacting with and understanding human language
Game playing: developing AI algorithm to play game strategically and learn from
experience
Expert task domain
Medical diagnosis using AI systems
Financial analysis using AI techniques
Formal task domain
Geometric
Integration and differention
Logic Programming: Programming paradigm based on formal logic for knowledge
representation and computation.
12. What are Agent and Environment?
Agent: anything that can perceive its environment through sensors and act upon that
environment through effectors.
Environment refers to the external surroundings or context in which the agent
operates
13. What do you mean by Agent Terminology?
refers to the various terms and concepts used to describe and classify different types
of agents.
14. What is Ideal Rational Agent?
This one is capable of doing expected actions to maximize its performance measure
on the basis of:
its percept sequence and
built-in knowledge base.
15. Explain the Structure and the four different types of Agent models in AI. Explain with proper
diagrams.
Simple reflex
They choose actions based on current percept
Their environment is completely observable
Model based reflex action
They use model of the world to choose action. They maintain an internal state
Updating the state requires knowledge about how:
The world evolves
The Agents action affect the world
Goal based agent
They choose their actions in order to achieve their goals
Goal approach is more flexible than reflex agent
Utility based agent
they choose actions based on preference for each state
Have some uncertainity
16. Describe the Nature and the properties of Environments.
It represents the external context or surroundings in which an agent operates.
Properties of environment
Dynamic: the environment changes while the agent is acting
Deterministic: the next state is determined by the previous state
Observable: determining the complete state of the environment at each point.
17. What are AI Search terminologies?
Space complexity: maximum number of nodes stored in memory.
Time complexity: maximum number of nodes that are created
Problem instance: initial state plus goal state
Problem space: the environment in which the search key takes place
Depth of problem: the length of shortest path.
18. Explain in detail the various brute force search strategies with proper examples.
Breadth First Search: starts with the root node and then explores the neigbouring
nodes before moving to the next level. The complexed depends on the number of
nodes.
Depth First Search: implemented with FIFO. Might not terminate and go infinity.
Bidirectional: searches forward from initial state and from goal state until they meet
to identify a common state.
19. Explain briefly the various Informed (Heuristic) Search Strategies
Heuristic evaluation function: they calculate the optimal path between two points
Pure heuristic search: it expands nodes in the order of their heuristic values. The
shortest paths are saved while the longest ones are discarded.
Greedy Best First Search: it expands the node that is estimated to be closest to the
goal
A*: expands path that are promising and ignores those that are expensive
20. Explain briefly the various Local Search Algorithms
Travelling sales man problem: finds a low cost tour. starts from a city, visits all cities
exactly once and end up at the starting point.
Local Beam Search: explores multiple paths simultaneously in the search space. It
starts with a set of randomly generated initial states, called a beam, and evaluates their
quality. In each iteration, the algorithm selects the most promising states from the
current beam.
21. What is Computer Vision? What are its characteristics and Applications?
Computer vision: refers to the field of study and technology that enables computers to
interpret, analyse, and understand visual data from images or videos
Application of CV
Face recognition
Biometrics
Gesture analysis
Process control
Characteristics
Image processing and analysis
Perception of visual data
Object recognition and tracking
Real time processing
22. What is Fuzzy Logic? Explain Fuzzy Logic Systems Architecture. Given an example.
Fuzzy logic is the method of reasoning that resemble human reasoning.
Fuzzy architecture
It has 4 parts
Fuzzification module: transforms the fuzzy input into fuzzy sets. splits the input
signal into 5 steps: LP, MP, Small, MN and LM
Knowledge Base: stores IF-THEN RULE
Inference engine: simulates human reasoning process by making fuzzy inference on
the input data and using IF-THEN-RULE
Defuzzification: transforms fuzzy sets obtained by inference engine into a crisp value.
See diagram and example
23. Describe Membership Function.
This is the function that allows you to quantify linguistic term and represent a fuzzy
set graphically.
24. What are the Application Areas of Fuzzy Logic.
Automotive system
Environmental control
Domestic goods
25. Explain the following.
a. Components of NLP
Natural language understanding: analyses natural language. Mapping a given natural
language input into a useful representation.
Natural language generation: this is the process of producing meaningful phrases and
sentences in the form of natural language from some internal representation.
b. Difficulties in NLU
Syntax level ambiguity: when a sentence has multiple meaning.
Lexical ambiguity: a word has multiple meaning
Referential ambiguity: not sure which entity a word is referring to
c. NLP Terminology
Phonology: study of organizing sound systematically.
Sematics: concerned with meaning of words and how to combine words into
meaningful.
Morphology: study of construction of words from primitive meaningful units.
d. Steps in the development of NLP
1. Lexical analysis: identifying and analysing the structure of words.
2. Syntactic analysis: analysis of words in a sentence for gramma and arranging
words in the manner that shows relationship among the words.
3. Sematic analysis: draws the exact meaning of the text.
4. Programmatic: what was said is re-interpreted on what it was actually meant.
26. Explain the two Implementation Aspects of Syntactic Analysis
a. Context-Free Grammar
A context-free grammar (CFG) is a formal system used to describe the syntax or structure of a
language. It consists of a set of production rules that define how the elements of the language
can be combined to form valid sentences or phrases.
b. Top-Down Parser
A top-down parser is a parsing technique that starts from the root of a parse tree and works its
way down to the leaves.
It starts with a symbol and attempts to rewrite it into a sequence of terminal symbol that
matches the classes of the words in the input.
27. What are Expert Systems? What are Capabilities and Components of an Expert System?
These are computer programs developed to solve complex problems in a particular
domain at the level of extra ordinary human intelligence and expertise
Characteristics
High response
Reliable
Understandable
High performance
Capabilities
demonstrating
explaining
interpreting
justifying the conclusion
28. What is Knowledge? Explain the Components of Knowledge Base, its representation and
acquisition.
Knowledge is the combination of data, information and past experience.
Knowledge base
It contains domain specifics and highly quality knowledge
It is required to exhibit intelligence
Components of knowledge base
Factual: information widely accepted by knowledge engineers and scholars
Heuristic knowledge: it is about practice, accurate judgement. One’s ability to
evaluate and guessing
Knowledge Representation
This is the method used to organize the knowledge in the knowledge base.
It is in the form of IF-THEN-ELSE
Knowledge Acquisition
refers to the process of obtaining and incorporating information or expertise into an
AI system, typically through data collection, manual input, or learning algorithms, to
enhance its knowledge and performance.
29. What is an Inference Engine? What are the strategies used by the Inference Engine to arrive at
a solution?
An inference engine is a component of an AI system that applies logical rules and
reasoning techniques to draw conclusions or make inferences based on the available
knowledge or information
It manipulates knowledge from the knowledge base to arrive at a particular solution.
In case of rules based, it applies the rules repeatedly to the fact obtained from earlier
rule application
30. List down the Applications of Expert System.
Automotive design
Monitoring system: comparing data with observed system
Finding out faults in vehicles
31. What are the general steps in the Development of an Expert System?
Identify the problem: the problem must be suitable for an expert system to solve. And
establish cost effective of the system.
Design the system: identify the technology to be used.
Develop the prototype.
Test and refine the prototype
Develop and complete an expert system. Document it, train users and test.
32. What are Artificial Neural Networks (ANNs), its basic structure, types, and working
principle.
These are computing systems made up of a number of simple highly interconnected
computing elements which process information by their dynamic state response to
extend input.
Types
Feedforward: information flow in one direction. No feedback. Used in pattern
generation
Feedback: here feedback loops are allowed.
Working of ANNs
Each arrow represents a connection between two neurons and indicate
the pathway for the flow of information.
Each connection has a weight, an integer number which controls the
signal between two neurons.
If the neurone generates desired output, there is no need to adjust the
weight.
33. Explain the learning strategies of ANNs.
Supervised learning: it requires to be trained.
Unsupervised learning: required when there is no example data. This does not need to
be trained.
Re-inforcement: based on observation
34. Explain Bayesian Networks (BN) with an example problem and derive at the Probability
table.
These are graphical structure used to represent a probabilistic relationship among a set
of random values.
Each node represents a random variable with a specific proposition
35. List the Applications of Neural Networks.
Medical: cancer cell analysis
Speech recognition
Process control