MODULE I 1 Mark Questions and Answers
1. Define Machine Learning in terms of Performance (P), Experience (E), and Task (T).
- According to Tom Mitchell, Machine Learning is the study of algorithms that improve their performance
P at some task T with experience E.
2. How does semi-supervised learning relate to both supervised and unsupervised learning?
- Semi-supervised learning is a combination of both supervised and unsupervised learning. It uses a small
amount of labeled data (as in supervised learning) along with a large amount of unlabeled data (as in
unsupervised learning).
3. What is meant by multiclass classifier?
- A multiclass classifier is a type of classification algorithm that can classify data into more than two
categories or classes. For example, classifying types of crops, types of music, or different species of animals.
Unlike binary classifiers, which only handle two classes (e.g., yes/no, spam/not spam), multiclass classifiers
can handle multiple classes.
4. Illustrate the concept of qualitative data.
- Qualitative data provides information about the quality or characteristics of an object or entity, which
cannot be measured numerically. It is also called categorical data. Examples include:
- Nominal data: Data with named categories but no inherent order (e.g., blood group, gender).
- Ordinal data: Data with named categories that can be ordered (e.g., customer satisfaction levels: very
happy, happy, unhappy).
5. How would you solve the Email spam filtering problem using Supervised Learning?
- In supervised learning, the email spam filtering problem is solved by training a model using a labeled
dataset of emails, where each email is marked as either "spam" or "not spam." The model learns to classify
new emails based on the patterns it identifies in the training data.
OR
By applying Naïve Bayes Classification Model.
6. How would you improve model performance?
Model performance can be improved through various techniques, including:
• Try different models or algorithms.
• Improve data quality (e.g., handle outliers, add features).
• Use techniques like cross-validation to ensure better generalization
7. Distinguish between classification and regression techniques?
- Classification:
- Used for predicting categorical outcomes (e.g., spam/not spam, yes/no).
- Output is a class label (e.g., "cat" or "dog").
- Examples: Email spam detection, image recognition.
- Regression:
- Used for predicting continuous values (e.g., temperature, salary, price).
- Output is a numeric value (e.g., 25.6, 100.5).
- Examples: Predicting house prices, forecasting sales.
MODULE II 1 Mark Questions and Answers
1. Define Feature engineering in Machine learning.
Feature engineering is the process of selecting, transforming, or creating new features from raw data to
improve the performance of a machine learning model.
2.What are the two types of feature transformation? Give example for each
• Feature Construction:
Example: Suppose you have a dataset with the features "height" and "weight." You could construct a
new feature "BMI" (Body Mass Index) using these features
• Feature Extraction:
Example: In image processing, extracting edges from an image converts the original image into a
feature set that highlights only the edges.
3.What is the main idea of feature engineering?
Converts raw input data into structured and useful features for models.
Prepares data for better performance in machine learning tasks.
4.State Bayes' theorem with a mathematical formula
P(A∣B) is the probability of event A given event B.
P(B∣A) is the probability of event B given event A.
P(A) and P(B)are the probabilities of events A and B, respectively.
5.Define feature subset selection.
Feature subset selection is the process of choosing a subset of relevant features from the original set to
reduce dimensionality, improve model performance, and minimize computational costs.
6.What changes would you make to handle categorical features in machine learning
• By encoding categorical variable to numeric features.
• By Ordinal Encoding (assigning numbers to ordered categories).
7.What might you include in feature extraction technique to reduce the Number of features while
minimizing redundancies
• Principal Component Analysis (PCA): Reduces dimensions by transforming features into
principal components.
• Linear Discriminant Analysis (LDA)
MODULE III 1 Mark Questions and Answers
1.What is supervised learning? Give example
Supervised learning is a type of machine learning where a model is trained using labeled data. The
algorithm learns to map inputs to the correct outputs based on given training examples.
Example: Predicting house prices based on features like size, location, and number of bedrooms using
historical price data
2. Define the term problem identification in classification learning.
The first step in a supervised learning model is to identify a clear and well-defined problem with
specific goals and benefits
3. Mention the various steps involved in data preprocessing.
Data Cleaning (handling missing values, removing duplicates)
Data Transformation (normalization, encoding categorical data)
Feature Selection (Choosing relevant features that contribute to the prediction accuracy.)
4. How do you choose the best learning algorithm for a problem? Give example.
The best algorithm is chosen based on factors such as dataset size, complexity, and interpretability
requirements.
Example: For Spam Email Detection:
■ Naive Bayes Classifier could be chosen because it works well with text
classification problems.
5. List various classification algorithm used in supervised classification.
Decision Tree
Random Forest
Support Vector Machine (SVM)
K-Nearest Neighbors (KNN)
Naive Bayes
Logistic Regression
Neural Networks
6. Differentiate between training and test data.
Training Data: Used to train the model by providing labelled examples.
Test Data: It is used to evaluate how well the model performs.
7. What defines similarity between two data elements in KNN?
In KNN, similarity between two data points is defined using a distance metric.
Common Distance Metrics→ Euclidean Distance:
MODULE IV 1 Mark Questions and Answers
1. Define unsupervised learning. Give example.
- Definition: Unsupervised learning is a machine learning concept where unlabeled and unclassified
information is analyzed to discover hidden knowledge. The algorithms work on the data without any prior
training and identify patterns, groupings, and other interesting knowledge from the data.
- Example: Clustering is an example of unsupervised learning, where data is grouped into clusters based
on similarity without any prior labels.
2. Mention applications of unsupervised learning.
- Segmentation of target consumer populations by advertisement consulting agencies.
- Anomaly or fraud detection in the banking sector.
- Image processing and image segmentation (e.g., face recognition).
- Grouping of important characteristics in genes for genetics research.
- Dimensionality reduction in sample data by data scientists.
- Document clustering and identifying potential labeling options.
3. Define the term clustering.
- Definition: Clustering refers to a set of techniques for finding subgroups or clusters in a dataset based on
the characteristics of the objects within that dataset. Objects within a group are similar to each other but
different from objects in other groups.
4. Mention different fields where cluster analysis is used effectively.
- Text data mining (e.g., text categorization, document summarization).
- Customer segmentation (e.g., based on demographics, buying habits).
- Anomaly checking (e.g., fraudulent bank transactions, unauthorized computer intrusions).
- Data mining (e.g., simplifying large datasets by grouping features).
5. What are different types of clustering techniques?
- Partitioning methods (e.g., k-means, k-medoids).
- Hierarchical methods (e.g., agglomerative, divisive clustering).
- Density-based methods (e.g., DBSCAN).
6. Define the term support and confidence in Association rule.
- Support: The support of an item set X in a transaction database T is the percentage or number of
transactions in T that contain X. For example, if 60% of transactions contain an item, its support is 60%.
- Confidence: The confidence of a rule X =>Y is the percentage of transactions in T that contain X and
also contain Y. It measures the reliability of the inference of the rule.
7. What is dendrogram?
- Definition: A dendrogram is a tree-like structure used to represent the step-by-step creation of
hierarchical clustering. It shows how clusters are merged iteratively in agglomerative clustering or split
iteratively in divisive clustering.
MODULE V 1 Mark Questions and Answers
1.What are the two main parts of human nervous system?
The human nervous system has two main parts —
✓ →the central nervous system (CNS)consisting of the brain and spinal cord
✓ →the peripheral nervous system consisting of nerves and ganglia outside the brain and spinal cord.
2. What are the different types of activate functions in artificial neural network?
Common activation functions include:
• Identity Function
• Threshold /Step Function
• ReLU (Rectified Linear Unit)
• Sigmoid Function
3. Mathematically express the threshold function.
The threshold function outputs 1 when the input is greater than or equal to a certain threshold (usually zero)
and 0 when it is less than the threshold
4. What are the two types of sigmoid function?
1. Binary sigmoid function
2. Bipolar sigmoid function
5. What is Deep Learning?
Deep learning is a particular kind of machine learning that achieves great power and flexibility by
learning to represent the world as a nested hierarchy of concepts, with each concept defined in relation to
simpler concepts, and more abstract representations computed in terms of less abstract ones.
6. What is Deep Neural Network?
Deep Neural Network – It is a neural network with a certain level of complexity (having multiple
hidden layers in between input and output layers). They are capable of modeling and processing
non-linear relationships.
7. List the various steps involved in working of Deep learning technique.
1) Understanding the problem
2) Identify the relevant data.
3)Choose the Deep Learning Algorithm appropriately.
4) Training the algorithm with the dataset.
5)Fifth, Final testing should be done on the dataset.
***************************************************************************************