0% found this document useful (0 votes)
11 views

Aiml 4

Uploaded by

ayushisingh2186
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Aiml 4

Uploaded by

ayushisingh2186
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 107

FACULTY OF

ENGINEERING
AND TECHNOLOGY

Concepts of Artificial Intelligence and its Applications

2024-25
1

III year ECE

www.jainuniversity.ac.in www.set.jainuniversity.ac.in
FACULTY OF
SUPERVISED MACHINE LEARNING ENGINEERING
AND TECHNOLOGY

Learning Issues, Supervised Learning Foundations, Basic Models for


Supervised Learning, Overfitting, Limitations, Social Impact.

www.jainuniversity.ac.in www.set.jainuniversity.ac.in
Machine Learning

33
Objectives
1) What is Learning?

2) What is Machine Learning?

3) Steps in machine learning.

4) Types of machine Learning.

5) Applications of Machine Learning.


44
What is Learning?

“To gain knowledge or understanding of, or skill in by


study, instruction or experience''
 Learning a set of new facts.

 Learning HOW to do something .

 Improving ability of something already learned.

5
What is Machine Learning?
 Machine Learning is the study of methods for programming
computers to learn.

 Building machines that automatically learn from experience.

 Machine learning usually refers to the changes in systems that


perform tasks associated with artificial intelligence AI Such tasks
involve recognition, diagnosis, planning, robot control,
prediction, etc.

66
What is Machine Learning?

Learning Trained
algorithm machine

TRAINING
DATA Answer

Query 7
Steps in machine learning
1) Data collection.

2) Representation.

3) Modeling.

4) Estimation.

5) Validation.
88
General structure of a learning system
Learning system

Data Learning Feed-back


Process

Problem Solving

Teacher
Results
Performance
Evaluation

9
Advantages of ML
1) Solving vision problems through statistical
inference.

2) Intelligence from the common sense AI.

3) Reducing the constraints over time achieving


complete autonomy.

10
10
Disadvantages of ML

1)Application specific algorithms.

2)Real world problems have too many variables and


sensors might be too noisy.

3)Computational complexity.

11
11
Types of machine Learning
1) Unsupervised Learning .

2) Semi-Supervised (reinforcement).

3) Supervised Learning.

12
12
Unsupervised Learning

 Studies how input patterns can be represented to


reflect the statistical structure of the overall
collection of input patterns
 No outputs are used (unlike supervised learning
and reinforcement learning)
 Learner is provided only unlabeled data.
 No feedback is provided from the environment
13
13
Unsupervised Learning

 Advantage
 Most of the laws of science were developed through
unsupervised learning.

 Disadvantage
 The identification of the features itself is a complex
problem in many situations.
14
14
Semi-Supervised (reinforcement)
 it is in between Supervised and Unsupervised learning
techniques the amount of labeled and unlabelled data
required for training.
 With the goal of reducing the amount of supervision
required compared to supervised learning.
 At the same time improving the results of unsupervised
clustering to the expectations of the user.

15
15
Semi-Supervised (reinforcement)

 Semi-supervised learning is an area of increasing


importance in Machine Learning.
 Automatic methods of collecting data make it more
important than ever to develop methods to make
use of unlabeled data.

16
16
Applications
of
Machine Learning 17
Drug discovery

18
18
Medical diagnosis

Photo MRI CT
19
Iris verification

20
20
21
21
Radar Imaging

22
Speech Recognition

23
Finger print

fingerprint image
24
Signature Verification

25
Face Recognition

26
Target Recognition

27
Robotics vision

28
Traffic Monitoring

29
Linear Regression
 Regression: It predicts the continuous output variables based on
the independent input variable. like the prediction of house
prices based on different parameters like house age, distance
from the main road, location, area, etc.
 Linear regression is a type of supervised machine
learning algorithm that computes the linear relationship between
the dependent variable and one or more independent features by
fitting a linear equation to observed data.

30
Linear Regression
 When there is only one independent feature, it is known
as Simple Linear Regression, and when there are more than
one feature, it is known as Multiple Linear Regression.

 Similarly, when there is only one dependent variable, it is


considered Univariate Linear Regression, while when there
are more than one dependent variables, it is known
as Multivariate Regression.

31
Why Linear Regression important?
 Its simplicity is a virtue, as linear regression is
transparent, easy to implement, and serves as a
foundational concept for more complex
algorithms.
Linear Regression is a supervised learning algorithm in
machine learning that predicts a continuous output variable
based on one or more input features.

32
simple Linear Regression
 This is the simplest form of linear regression, and it involves
only one independent variable and one dependent variable. The
equation for simple linear regression is:
 y=β0​+β1​X

where:
 Y is the dependent variable
 X is the independent variable
 β0 is the intercept
 β1 is the slope

33
multiple Linear Regression
 This involves more than one independent variable and one dependent variable.
The equation for multiple linear regression is:
y=β0​+β1​X1+β2​X2+………βn​Xn
where:
 Y is the dependent variable
 X1, X2, …, Xn are the independent variables
 β0 is the intercept
 β1, β2, …, βn are the slopes
 The goal of the algorithm is to find the best Fit Line equation that can
predict the values based on the independent variables.

34
Linear Regression

Y is called a dependent or target variable and X is called an independent variable also known as the predictor
of Y
35
Linear Regression Algorithm
 1. Initialize model parameters (β0, β1)
 2. Calculate predicted values (y_pred = β0 + β1x)
 3. Calculate error (ε = y_true - y_pred)
 4. Calculate cost function (MSE = (1/n) * Σ(ε^2))
 5. Update model parameters using optimization
algorithm (e.g., Gradient Descent)
 6. Repeat steps 2-5 until convergence
36
Linear Regression Algorithm
 import numpy as np
 from sklearn.linear_model import
LinearRegression
 import matplotlib.pyplot as plt
 # Generate sample data
 X = np.array([1, 2, 3, 4, 5])
 y = np.array([2, 3, 5, 7, 11])
37
Linear Regression Algorithm
# Create and train model
model = LinearRegression()
model.fit(X, y)
# Make predictions
predictions = model.predict(X)
# Plot data and regression line
plt.scatter(X, y)
plt.plot(X, predictions, color='red')
plt.show()
38
Applications
1. Predicting house prices
2. Forecasting sales
3. Analyzing stock prices
4. Energy consumption prediction
5. Medical diagnosis

39
Advantage
1. Interpretability
2. Simplicity
3. Computational efficiency
4. Wide range of applications

40
Disadvantage
1. Assumptions may not hold
2. Sensitive to outliers
3. Limited to linear relationships

41
Real world examples
1. Google's self-driving cars (predicting steering angles)
2. Netflix's recommendation system (predicting user ratings)
3. Weather forecasting (predicting temperature and precipitation)

42
Example
 Problem Statement:Predict house prices based on
the number of bedrooms.
 Dataset:| Bedrooms | Price || --- | --- || 1 | 100,000
|| 2 | 150,000 || 3 | 200,000 || 4 | 250,000 || 5 |
300,000 |

43
Python code
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
# Define dataset
X = np.array([1, 2, 3, 4, 5])
y = np.array([100000, 150000, 200000, 250000, 300000])
# Create and train model
model = LinearRegression()
model.fit(X, y)
44
Python code
# Make predictions
predictions = model.predict(X)
# Plot data and regression line
plt.scatter(X, y)
plt.plot(X,predictions,color='red')
plt.xlabel('Bedrooms')
plt.ylabel('Price')
plt.show()

45
Python code
# Print coefficients
print('Intercept (β0):', model.intercept_)
print('Slope (β1):', model.coef_)
Output:
Intercept (β0): 50000.0
Slope (β1): 50000.0
The linear regression algorithm predicts house prices based on the
number of bedrooms. The equation is:Price = 50000 + 50000 *
BedroomsThis means that for each additional bedroom, the price
increases by $50,000.
46
Logistic Regression
 Logistic Regression is a supervised learning algorithm used
to predict the probability of an event occurring (binary
classification). It models the relationship between a
dependent variable (target) and one or more independent
variables (features).

47
Logistic Regression
Key Components:
 1. Logistic Function (Sigmoid): Maps input to probability
between 0 and 1.
 2. Cost Function (Log Loss): Measures difference between

predicted and actual probabilities.


 3. Optimization Algorithm: Finds optimal model
parameters.

48
Logistic Regression
 Logistic Regression Equation:
p = 1 / (1 + e^(-z))
 where:- p: Probability of positive class

- z: Linear combination of features (w*x + b)


- w: Weights
- x: Features

49
Logistic Regression
 Types of Logistic Regression:
1. Binary Logistic Regression: Two-class classification.
2.Multinomial Logistic Regression: Multi-class classification.
3.Ordinal Logistic Regression: Ordered multi-class
classification.

50
Logistic Regression Algorithm
1.Import necessary libraries and load dataset.
2. Preprocess data (handle missing values, normalize/scale
features).
3.Split data into training (~70%) and testing sets (~30%).
4. Create a Logistic Regression model.
5. Train the model using the training data.
6. Evaluate the model using the testing data

51
Logistic Regression
Common Evaluation Metrics:
 Accuracy
 Precision
 Recall
 F1-score
 ROC-AUC

52
Logistic Regression
 Advantages:
1.Interpretable: Model weights indicate feature importance.
2.Efficient: Computationally fast.
3.Simple: Easy to implement.
Disadvantages:
1.Assumes Linearity: Relationships between features and
target.
2.Sensitive to Outliers: Affects model performance.
3.Not Suitable for Complex Relationships: Non-linear
relationships. 53
Logistic Regression
Applications:
 Credit Risk Assessment

 Medical Diagnosis

 Customer Churn Prediction

 Spam Detection

 Image Classification

54
Logistic Regression
 Example: Predicting Diabetes based on Health
Indicators.
 Dataset:| Feature | Description || --- | --- |
 | Age | Patient's age |
 | BMI | Body Mass Index |
 | BP | Blood Pressure |
 | Glucose | Blood Glucose level |
 | Diabetes | Target variable (Yes/No) |
55
Logistic Regression
import pandas as pd
from sklearn.model_selection import
train_test_split
from sklearn.preprocessing import
StandardScaler
From sklearn.linear_model import LogisticRegression
from sklearn.metrics import
accuracy_score, classification_report, confusion_matrix
# Load dataset
df = pd.read_csv('diabetes.csv')

56
Logistic Regression
# Preprocess data
scaler = StandardScaler()
df[['Age', 'BMI', 'BP', 'Glucose']] = scaler.fit_transform(df[['Age', 'BMI', 'BP',
'Glucose']])
# Split data
X = df.drop('Diabetes', axis=1)
y = df['Diabetes']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)
# Create Logistic Regression model
logreg = LogisticRegression(max_iter=1000)
57
Logistic Regression
# Train model
logreg.fit(X_train,y_train)
# Evaluate model
y_pred = logreg.predict(X_test)
print("Accuracy:",accuracy_score(y_test, y_pred))
print("ClassificationReport:")
print(classification_report(y_test, y_pred))
print("Confusion Matrix:")
print(confusion_matrix(y_test,y_pred))

58
Logistic Regression
Output:
Accuracy: 0.85
Classification Report: Precision Recall f1-score
Diabetes 0.83 0.86 0.84
No Diabetes 0.86 0.83 0.85
Confusion Matrix:[[55 10][12 53]]

Interpretation:The Logistic Regression model achieved an accuracy of 85% in


predicting diabetes based on health indicators.

59
Logistic Regression
Feature Importance:
| Feature | Coefficient || --- | --- |
| Age | 0.23 |
| BMI | 0.31 |
| BP | 0.17 |
| Glucose | 0.45 |
Glucose level has the highest coefficient, indicating its significant impact on
diabetes prediction.
This example demonstrates how Logistic Regression can be applied to binary
classification problems in healthcare.

60
parameters
 Accuracy is the proportion of correctly predicted
instances out of total instances in a test dataset.
 Accuracy = (TP + TN) / (TP + TN + FP + FN)
TP = True Positives(number of instances that are correctly predicted positive instances)
TN = True Negatives(number of instances that are correctly predicted as negative)
FP = False Positives(number of instances that are incorrectly predicted as positive )
FN = False Negatives(number of instances that are incorrectly predicted as negative)

61
parameters
 Types of Accuracy:
 1. Training Accuracy: Accuracy on training data.
 2. Testing Accuracy: Accuracy on unseen test
data.
 3. Validation Accuracy: Accuracy on validation
data.

62
parameters
Importance:
 1. Evaluates model performance.

 2. Compares models.

 3. Identifies overfitting/underfitting.

63
parameters
Precision:
 Precision is the ratio of true positives (TP) to the

sum of true positives (TP) and false positives


(FP), measuring the accuracy of positive
predictions.
 Precision = TP / (TP + FP)

64
parameters
Importance:
 1. Evaluates model's ability to avoid false

positives.
 2. Critical in applications with severe
consequences (e.g., medical diagnosis).
 3. Balances with recall to achieve optimal
performance.

65
parameters
Recall:
 Recall is the ratio of true positives (TP) to the

sum of true positives (TP) and false negatives


(FN), measuring the model's ability to detect all
actual positive instances.
 Recall = TP / (TP + FN)

66
parameters
Recall:
 Recall is the ratio of true positives (TP) to the

sum of true positives (TP) and false negatives


(FN), measuring the model's ability to detect all
actual positive instances.
 Recall = TP / (TP + FN)

67
parameters
Precision vs Recall:
 1. Precision focuses on accuracy, recall focuses

on completeness.
 2. Precision is sensitive to false positives, recall is
sensitive to false negatives.
 3. Precision and recall are inversely related.

68
parameters
F1-Score:
 The F1-Score is the harmonic mean of precision

and recall, measuring a model's accuracy on


imbalanced datasets.
 F1 = 2 * (Precision * Recall) /

(Precision + Recall)

69
parameters
Confusion matrix:
 A confusion matrix is a table that evaluates the

performance of a classification model by


comparing predicted classes against actual
classes.

70
parameters
Confusion matrix:
 1. Accuracy: (TP + TN) / (TP + TN + FP + FN)

 2. Precision: TP / (TP + FP)

 3. Recall: TP / (TP + FN)

 4. F1-Score: 2 * (Precision * Recall) / (Precision

+ Recall)
 5. False Positive Rate: FP / (FP + TN)

 6. False Negative Rate: FN / (FN + TP) 71


Decision Tree Algorithm
A decision tree is a supervised learning algorithm
that uses a tree-like model to classify data or make
predictions.
How it Works:
1. Root Node: Start with the entire dataset.
2. Splitting: Choose the best feature to split the data.
3. Decision Nodes: Create child nodes based on the split.
4. Leaf Nodes: Make predictions or classifications.
72
Decision Tree Algorithm
 Example:
Predicting whether someone will buy a car based on age and income.
 Dataset:

| Age | Income | Buys Car |


| --- | --- | --- |
| 25 | 50000 | Yes |
| 30 | 60000 | Yes |
| 40 | 70000 | Yes |
| 20 | 30000 | No |
| 35 | 55000 | Yes |
73
Decision Tree Algorithm
 Decision Tree:
 +---------------+
 | Age < 30 |
 +---------------+
 |
 |
 v
 +---------------+ +---------------+
 | Income < 55000 | | Buys Car = Yes |
 +---------------+ +---------------+
 |
 |
 v
 +---------------+ +---------------+
 | Income >= 55000| | Buys Car = No |
 +---------------+ +---------------+

74
Decision Tree Algorithm
 Algorithm Steps:
1. Choose Age as the root node attribute.
2. Split data into Age < 30 and Age >= 30.
3. For Age < 30, choose Income as the next attribute.
4. Split data into Income < 55000 and Income >= 55000.

75
Decision Tree Algorithm
 Python Implementation:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
# Load dataset
df = pd.DataFrame({'Age': [25, 30, 40, 20, 35], 'Income':
[50000, 60000, 70000, 30000, 55000],
'Buys Car': [1, 1, 1, 0, 1]})
.
76
Decision Tree Algorithm
 # Define features and target
X = df[['Age', 'Income']]
y = df['Buys Car']
 # Train/Test Split

X_train, X_test, y_train, y_test = train_test_split(X, y,


test_size=0.2)

77
Decision Tree Algorithm
 # Create Decision Tree model
clf = DecisionTreeClassifier(random_state=42)
 # Train model

clf.fit(X_train, y_train)
 # Evaluate model

accuracy = clf.score(X_test, y_test)


print("Accuracy:", accuracy)

78
Decision Tree Algorithm
 Advantages:
1. Easy to interpret
2. Handles categorical features
3. Fast training and prediction
 Disadvantages:

1. Prone to overfitting
2. Not suitable for complex relationships

79
Decision Tree Algorithm
 Real-World Applications:
1. Credit risk assessment
2. Medical diagnosis
3. Customer segmentation
4. Image classification
5. Natural Language Processing

80
Random forest Algorithm
 Random Forest is an ensemble learning algorithm that
combines multiple decision trees to improve prediction
accuracy and reduce overfitting.
 Key components:
 1. Bootstrapping: Randomly select samples from the training data.
 2. Decision Tree Creation: Train a decision tree on the
bootstrapped samples.
 3. Feature Randomization: Randomly select features for each
decision tree.
 4. Voting: Combine predictions from multiple decision trees.
81
Random forest Algorithm
 How it Works:
 1. Train decision trees on bootstrapped samples.
 2. Each decision tree predicts outcome.
 3. Combine predictions using voting.

82
Random forest Algorithm
 Example:Suppose we want to predict whether someone
will buy a car based on age, income, credit score, and
location.
 | Age | Income | Credit Score | Location | Bought Car || --- | --- | --- | --- |
--- |
| 25 | 50000 | 700 | Urban | Yes |
| 30 | 60000 | 800 | Suburban | Yes |
| 35 | 70000 | 900 | Rural | Yes |
| 20 | 40000 | 600 | Urban | No |
| 40 | 80000 | 950 | Suburban | Yes |
Random forest Algorithm
 Random Forest Model:
 Create 100 decision trees with random feature subsets.
 Each decision tree predicts whether someone will buy a car.
 Combine predictions using voting.
Random forest Algorithm
from sklearn.ensemble import
RandomForestClassifier
from sklearn.model_selection import train_test_split
import pandas as pd
# Load data
df = pd.read_csv('car_data.csv')
# Split data
X = df[['Age', 'Income', 'Credit Score', 'Location']]
y = df['Bought Car']
Random forest Algorithm
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train model
rf=RandomForestClassifier(n_estimators=100,random_state=42)
rf.fit(X_train, y_train)
Random forest Algorithm
# Make predictions
y_pred = rf.predict(X_test)

# Evaluate modelaccurac
y = rf.score(X_test, y_test)
print("Accuracy:", accuracy)
Random forest Algorithm
 Advantages:
1. Improves prediction accuracy
2. Reduces overfitting
3. Handles high-dimensional data
4. Robust to missing values
5. Parallelizable
 Disadvantages:1. Computationally expensive

2. Difficult to interpret
3. Requires hyperparameter tuning
88
Random forest Algorithm
 Real-World Applications:
 Image classification
 Natural language processing
 Recommender systems
 Credit risk assessment
 Medical diagnosis

89
overfitting
 Overfitting occurs when a model is too complex and
performs well on training data but poorly on new, unseen
data.
 Causes:

1. Complex models
2. Small training datasets
3. Noise in training data
4. Feature correlation
5. Poor regularization
90
Underfitting
 Underfitting occurs when a model is too simple and fails to
capture the underlying patterns in the training data,
resulting in poor performance on both training and test
data.
 Causes:
 Simple models
 Insufficient training data
 Lack of relevant features
 Inadequate model complexity
 Poor feature engineering
91
KNN algorithm
 KNN (K-Nearest Neighbors) is a supervised
machine learning algorithm that classifies new
data points based on the similarity to nearby
data points.

92
Key Components
 1. Features: Input variables (e.g., age, income).
 2. Target Variable: Output variable (e.g., bought
car).
 3. Distance Metric: Measure of similarity (e.g.,
Euclidean).
 4. K: Number of nearest neighbors.

93
Types
 1. Classification KNN: Predict categorical labels.
 2. Regression KNN: Predict continuous values.

94
KNN algorithm
 1. Collect and preprocess data. (features and
target variable)
 2. Choose K (number of nearest neighbors).
 3. Calculate distance between new data point
and existing data points.
 4. Select K nearest neighbors.
 5. Assign label based on majority vote.
95
Distance Metrics
 1. Euclidean Distance
 2. Manhattan Distance (L1 Distance)
 3. Minkowski Distance
 4. Cosine Similarity
 5. Hamming Distance

96
KNN algorithm
Example: Suppose we want to predict whether someone will buy a car based
on their age and income.
Training Data:
| Age | Income | Bought Car || --- | --- | --- |
| 25 | 50000 | Yes |
| 30 | 60000 | Yes |
| 35 | 70000 | Yes |
| 20 | 40000 | No |
| 40 | 80000 | Yes |
| 45 | 90000 | Yes |

97
KNN algorithm
Test Data:
| Age | Income || --- | --- |
| 32 | 55000 |

KNN with K=3:


1. Calculate distances between test data and training data.
2. Select 3 nearest neighbors.
3. Assign label based on majority vote.

98
KNN algorithm
Distance Calculations:
| Age | Income | Distance |
| 25 | 50000 | 7.07 |
| 30 | 60000 | 5.00 |
| 35 | 70000 | 7.07 |
| 20 | 40000 | 12.53 |
| 40 | 80000 | 10.00 |
| 45 | 90000 | 12.73 |

99
KNN algorithm
3 Nearest Neighbors:
| Age | Income | Bought Car |
| 30 | 60000 | Yes |
| 25 | 50000 | Yes |
| 35 | 70000 | Yes |

Prediction:Based on majority vote, the prediction is:


Yes, the person will buy a car.

100
python implementation
import numpy as np
from sklearn.neighbors import
KNeighborsClassifier
 # Training data

X_train = np.array([[25, 50000], [30, 60000], [35,


70000], [20, 40000], [40, 80000], [45, 90000]])
y_train = np.array([1, 1, 1, 0, 1, 1])
101
python implementation
# Test data
X_test = np.array([[32, 55000]])
# Create KNN model
knn = KNeighborsClassifier(n_neighbors=3)
# Train model
knn.fit(X_train, y_train)

102
python implementation
# Make prediction
prediction = knn.predict(X_test)
print(prediction)

103
Advantages
 1. Simple to implement.
 2. Effective for non-linear relationships.
 3. Handles high-dimensional data.

104
Disadvantages
 1. Computationally expensive.
 2. Sensitive to choice of K.
 3. Vulnerable to outliers.

105
Real-World Applications
 1. Image classification
 2. Text classification
 3. Recommendation systems
 4. Customer segmentation
 5. Predictive maintenance

106
FACULTY OF
ENGINEERING
AND TECHNOLOGY

107

www.jainuniversity.ac.in www.set.jainuniversity.ac.in

You might also like