Introduction to AI

The document provides an overview of data science, focusing on machine learning, including definitions, types of data, and various algorithms such as supervised, unsupervised, and reinforcement learning. It discusses model selection, evaluation, and techniques like regularization to prevent overfitting. Key algorithms covered include kNN, decision trees, SVM, and random forests, along with their applications and challenges in training AI models.

Uploaded by

Yasiru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Introduction to AI

Uploaded by

Yasiru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

EEX4373 - Data Science

Dr. H.M.M.Caldera
Electrical and Computer Engineering
The Open University of Sri Lanka
Agenda
• A general overview and terminology
• Variables
• Feature representation
• Introduction to machine learning
• Model selection and evaluation
• Classification: kNN, decision trees, SVM
• Ensemble methods: random forests
• Regularization
A general overview and terminology
Variables

Independent variable An explanatory variable is a type of independent variable.

Dependent variable
More...
• Your explanatory variable is
academic motivation at the start of
the school year.
• Your response variable is GPA at
the end of the school year.
Under AI context,

Variables = Features ??
• A"variable" is a broader term that refers to any quantity that can be
measured or controlled. It can include not only features but also
other types of data, such as labels or target variables in supervised
learning.
• "Features" specifically refer to the input variables or attributes that
are used to describe the data in a machine learning model. These
features are the characteristics or properties of the data that the
model uses to make predictions or classifications.
Introduction to machine learning
Machine Learning
Machine Learning is a subfield of Artificial Intelligence
(AI) that empowers computers to learn from data and improve
their performance over time, all without explicit programming.

Image credit - https://i.stack.imgur.com/

Definition

Arthur Samuel, coined the term “Machine

Learning” in 1959 while at IBM.

His definition as below (Generally).

The field of study that gives computers the

ability to learn without being explicitly
programmed.
Overview of AI
• Once you have a large set of data, you can perform many tasks like,
• Identification/Detection
• Classification
• Prediction
• How can AI transform “data” into valuable assert?
• AI involves with mainly,
• Predictive analytics
General Applications
How machines learn?
• It is like a human learning process
• By Examples
• By Experience

• You need to tell me following

object is a mop or not
Data Types in ML
•In high-level we are able to categorize data into mainly two
parts
•Numerical data
•Categorical data
1. Numeric Data: Numeric data is the most commonly used data
type in machine learning. It includes continuous variables like
age, height, weight, etc., and discrete variables like the number
of siblings, number of cars owned, etc.
2. Categorical Data: Categorical data includes variables that have
discrete values like color, gender, and nationality. These
variables can be nominal (no order) or ordinal (ordered).
3. Text Data: Text data includes text documents like emails, social
More media posts, and customer reviews. Natural Language
Processing (NLP) is used to process this type of data.

detailed 4. Image Data: Image data includes images that are used for tasks
like image recognition, object detection, and facial recognition.
Convolutional Neural Networks (CNNs) are used to process this
type of data.
5. Audio Data: Audio data includes audio files that are used for
tasks like speech recognition and audio classification. Recurrent
Neural Networks (RNNs) are used to process this type of data.
6. Time-Series Data: Time-series data includes data that is
collected over time, like stock prices, weather data, and sensor
data. Time-series analysis is used to process this type of data.
ML process
• ML process has a step-by-step process.
A set of data
that use to train
the algorithm
Can be labelled
or not
Types of Machine Learning

Supervised Learning

Reinforcement Unsupervised
Learning Learning
Supervised Learning
• Under supervised learning, the algorithm is trained on a labeled
dataset, which means that the input data is paired with
corresponding output labels.
• i.e. Learn to predict target values from labelled data
• The goal is for the model to learn the mapping between inputs
and outputs, making predictions or classifications on new,
unseen data.
• Common tasks include regression (predicting a continuous
value) and classification (assigning a label to input data).
• Examples of supervised learning algorithms include linear
regression, support
More - What is Supervised vector
Learning? | IBM machines, decision trees, and neural
networks.
Unsupervised Learning
• Unsupervised learning involves working with unlabeled data,
where the algorithm must discover patterns, relationships, or
structures within the data without explicit guidance.
• Clustering and dimensionality reduction are common tasks in
unsupervised learning.
• Examples of unsupervised learning algorithms include k-means
clustering, hierarchical clustering, principal component analysis
(PCA), and autoencoders.
Reinforcement Learning
• Reinforcement learning focuses on training models to make
sequential decisions by interacting with an environment. The
model receives feedback in the form of rewards or penalties
based on its actions, allowing it to learn optimal strategies over
time.
• Reinforcement learning is often used in scenarios where an
agent must learn to navigate an environment and take actions
to maximize cumulative rewards.
• Examples of reinforcement learning algorithms include
Q-learning, deep Q-networks (DQN), and policy gradient
methods.
ML Algorithms

Reinforcement
Supervised Unsupervised
learning

Regression Classification

Simple Linear
regression

Multiple Linear
regression

Polynomial
Model selection and evaluation
Supervised learning algorithms
• Linear regression
• Logistic regression
• Support vector machines
• K-NN
• Naïve bias
• Decision tree
• Random forest
Unsupervised learning algorithms

K means clustering
Hierarchical clustering
Principle Component Analysis
Independent Component Analysis
Singular Value decomposition
Supervised
learning

Regression Classification
Regression problem

When input and output is focused on a sequence of

(continuous values), then regression is the best fit
Linear regression
• Mainly defined as a statistical method
• Focused on a depend and independent
variable
• AI context defined as a supervised
learning algorithm
Regression

Multiple
Single explanatory
explanatory
variable.(simple
variables. (multiple
linear regression)
linear regression)
Classification

When the output is

required to classify
into classes, then
classification problem
is the best option.
Popular classification algorithms
• Logistic Regression: Despite its name, logistic regression is a classification
algorithm commonly used for binary classification tasks.
• Decision Trees: These are tree-like structures where each internal node
represents a decision based on a feature, and each leaf node represents a
class label.
• Support Vector Machines (SVM): SVM aims to find a hyperplane that best
separates data into different classes.
• K-Nearest Neighbors (KNN): This algorithm classifies data points based on
the majority class among their k-nearest neighbors in the feature space.
• Random Forest: A collection of decision trees that work together to
improve the overall accuracy and robustness of the model.
• Naive Bayes: Based on Bayes' theorem, this algorithm is particularly
effective for text classification tasks.
Logistic regression
• Useful when the dependent variable is categorical
• Output is a binary output (i.e., 0/1 or True/False)
• An inverted tree design
Decision Tree • Used for both classification and regression tasks.
Internal Root Node: The topmost node in the tree, representing
Nodes: the first decision based on a specific feature. This decision
Nodes in splits the dataset into subsets.
the middle
of the tree
that
represent
decisions
based on
different
features.
Branches: The
Each
edges connecting Leaf Nodes: The termina
internal
nodes, indicating the nodes at the bottom of the
node
outcome of a tree, representing the fina
contributes
decision. Each class label (in classificatio
to further
branch corresponds or the predicted value (in
partitioning
to a specific value or regression). Each leaf nod
the dataset.
range of values for a is associated with a speci
feature. decision or outcome.
More…
• Splitting Criteria:
• The criteria used to decide how to split the dataset at each internal
node. Common criteria include Gini impurity (for classification) and
mean squared error (for regression).
• Pruning:
• The process of removing branches or nodes from the tree to prevent
overfitting. Pruning helps create a more generalized model that
performs well on new, unseen data.
• Decision Rules:
• The paths from the root node to the leaf nodes represent decision rules.
These rules define how the algorithm makes predictions based on the
input features.
Random Forest
• A collection of decision trees
• Focused on many features to analysis the output
Support Vector Machines
• Support Vector Machines (SVM) are a powerful and versatile class of
supervised machine learning algorithms used for both classification
and regression tasks.
• SVMs are particularly effective in high-dimensional spaces and are
well-suited for situations where the data has complex relationships.
• They work by finding a hyperplane that best separates the data into
different classes or predicts a numerical value in the case of
regression.
Hyperplane: In a
binary
classification
problem, SVM
aims to find the The margin is the distance
hyperplane that between the hyperplane and
best separates the the nearest data point from
data points of one either class. SVM seeks to
class from another. maximize this margin,
A hyperplane is a providing a robust separation
subspace with one between classes.
fewer dimension
than the input
space. In two
dimensions, the
hyperplane is a
line, and in three
dimensions, it is a
plane. Support Vectors are the data
points that lie closest to the
decision boundary
(hyperplane). These points are
crucial in determining the
Image credit - https://www.mdpi.com/1996-1073/14/12/3393
position and orientation of the
hyperplane.
KNN
• K-Nearest Neighbors (KNN) is a simple and
intuitive machine learning algorithm used
for both classification and regression tasks.
• It falls under the category of
instance-based or lazy learning algorithms
because it doesn't explicitly learn a model
during the training phase. Instead, it makes
predictions based on the similarity of new
instances to existing labeled instances in
the training dataset.
Some popular issues in training an AI
algorithm
• Overfitting
• Underfitting
• Hyperparameter Tuning
• Class Imbalance
• Ethical Concerns and Bias

Image credit -Interactive Regression Lens for Exploring Scatter Plots (researchgate.net)
Outliers
Regularization
• Regularization is a technique used in machine learning and artificial
intelligence to prevent overfitting and improve the generalization of a
model. Overfitting occurs when a model learns the training data too
well, including its noise and outliers, to the extent that it performs
poorly on new, unseen data.
• Regularization methods add a penalty term to the model's objective
function, discouraging it from fitting the training data too closely and
promoting a simpler, more generalized solution. The regularization
term is typically based on the model's parameters, and it penalizes
large parameter values.
Types of regularization
• L1 Regularization (Lasso): In L1 regularization, the penalty term is the
absolute values of the model's coefficients multiplied by a
regularization parameter. This type of regularization can lead to some
coefficients being exactly zero, effectively performing feature
selection by eliminating less important features.
• L2 Regularization (Ridge): L2 regularization adds a penalty term that
is the squared sum of the model's coefficients multiplied by a
regularization parameter. It tends to shrink the coefficients toward
zero but rarely results in exactly zero coefficients. L2 regularization is
useful for preventing large weights that may cause numerical
instability.
Learn More…
• Course | 6.036 | MIT Open Learning Library
• https://machinelearningmastery.com/start-here/#python
Questions
Thank you

PO Box 21, Nawala, Nugegoda, Sri Lanka

Phone: +94 11 288 100
www.ou.ac.lk

cp4252 Machine Learning
100% (2)
cp4252 Machine Learning
49 pages
AIYA SESSION 4
No ratings yet
AIYA SESSION 4
42 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
10 pages
ML & DL Notes
No ratings yet
ML & DL Notes
30 pages
Bike Buyer Prediction Using Classification Algorithm
No ratings yet
Bike Buyer Prediction Using Classification Algorithm
19 pages
Module 3
No ratings yet
Module 3
11 pages
Introduction of Machine Learning
No ratings yet
Introduction of Machine Learning
9 pages
Machine Learning Models
No ratings yet
Machine Learning Models
11 pages
Lec-1 Introduction
No ratings yet
Lec-1 Introduction
65 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
9 pages
Fulldoc - Dsec Mca - Crime Prediction (1) - 051521
No ratings yet
Fulldoc - Dsec Mca - Crime Prediction (1) - 051521
65 pages
UNit 1 Introduction To ML
No ratings yet
UNit 1 Introduction To ML
225 pages
ARTIFICIAL INTE-WPS Office
No ratings yet
ARTIFICIAL INTE-WPS Office
29 pages
Machine learning algorithms laiki
No ratings yet
Machine learning algorithms laiki
123 pages
CS601_Machine Learning_Unit 1_Notes_1672759748
No ratings yet
CS601_Machine Learning_Unit 1_Notes_1672759748
13 pages
Machine Learning For Beginners Overview of Algorithm TypesStart Learning Machine Learning From Here
No ratings yet
Machine Learning For Beginners Overview of Algorithm TypesStart Learning Machine Learning From Here
13 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
24 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
9 pages
Models For Machine Learning: M. Tim Jones
No ratings yet
Models For Machine Learning: M. Tim Jones
10 pages
21CSC305P ML_ Unit 1-E.pptx
No ratings yet
21CSC305P ML_ Unit 1-E.pptx
137 pages
Machine Learning
No ratings yet
Machine Learning
42 pages
Machine Learning Algorithms - A Review - ART20203995
No ratings yet
Machine Learning Algorithms - A Review - ART20203995
6 pages
Unit 3 big data
No ratings yet
Unit 3 big data
50 pages
CE880_lecture5_slides
No ratings yet
CE880_lecture5_slides
32 pages
Lect3 Machine Learning
No ratings yet
Lect3 Machine Learning
27 pages
Module2 ch2
No ratings yet
Module2 ch2
36 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
AI
No ratings yet
AI
52 pages
Machine Learning Supervised
No ratings yet
Machine Learning Supervised
42 pages
Machine Learning
No ratings yet
Machine Learning
51 pages
nn
No ratings yet
nn
24 pages
Unit 4 AI LASK
No ratings yet
Unit 4 AI LASK
7 pages
Machine Learning Ppts
No ratings yet
Machine Learning Ppts
38 pages
ML Unit 2
No ratings yet
ML Unit 2
33 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
1
No ratings yet
1
42 pages
1. Machine Learning - Introduction
No ratings yet
1. Machine Learning - Introduction
73 pages
1. Machine Learning - Introduction
No ratings yet
1. Machine Learning - Introduction
138 pages
3 Introduction To Machine Learning
No ratings yet
3 Introduction To Machine Learning
21 pages
Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
12 pages
Ai 3rd slide_250515_144356
No ratings yet
Ai 3rd slide_250515_144356
5 pages
Lecture - 2 & 3
No ratings yet
Lecture - 2 & 3
62 pages
LECTURE-2
No ratings yet
LECTURE-2
36 pages
Week 01
No ratings yet
Week 01
37 pages
Report On Machine Learning-Jyoti Poddar-EC084
No ratings yet
Report On Machine Learning-Jyoti Poddar-EC084
70 pages
ML notes
No ratings yet
ML notes
10 pages
INTRODUCTION
No ratings yet
INTRODUCTION
51 pages
ML Unit 1
No ratings yet
ML Unit 1
21 pages
MachineLearning Jan2nd
100% (2)
MachineLearning Jan2nd
171 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
Unit-5 MECH 3-2
No ratings yet
Unit-5 MECH 3-2
14 pages
Machine Learning
No ratings yet
Machine Learning
40 pages
MAchineLearningNotes
No ratings yet
MAchineLearningNotes
6 pages
Data Science Vijay1
No ratings yet
Data Science Vijay1
88 pages
Top 10 Machine Learning Algorithms With Their Use
100% (1)
Top 10 Machine Learning Algorithms With Their Use
12 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
4 pages
Machine Learning For Beginners
100% (1)
Machine Learning For Beginners
30 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Parts of Speech Worksheet - Parts of Speech Exercises - 7ESL
No ratings yet
Parts of Speech Worksheet - Parts of Speech Exercises - 7ESL
13 pages
MHZ 4256 Activity Schedule .xlsx - Google Drive
No ratings yet
MHZ 4256 Activity Schedule .xlsx - Google Drive
1 page
A Simple Guide To English Nouns - Useful Noun Examples - 7ESL
No ratings yet
A Simple Guide To English Nouns - Useful Noun Examples - 7ESL
28 pages
An Easy Guide To English Adverbs - Cool Adverb Examples - 7ESL
No ratings yet
An Easy Guide To English Adverbs - Cool Adverb Examples - 7ESL
21 pages
Mastering Adjectives in English With Examples - 7ESL
No ratings yet
Mastering Adjectives in English With Examples - 7ESL
27 pages
A Guide To Mastering English Pronouns With Helpful Pronoun Examples - 7ESL
No ratings yet
A Guide To Mastering English Pronouns With Helpful Pronoun Examples - 7ESL
28 pages
Mastering Interjections With Cool Interjection Examples - 7ESL
No ratings yet
Mastering Interjections With Cool Interjection Examples - 7ESL
31 pages
Learning To Rank: From Pairwise Approach To Listwise Approach
No ratings yet
Learning To Rank: From Pairwise Approach To Listwise Approach
9 pages
ML 1
No ratings yet
ML 1
51 pages
Malware_Detection_Using_Machine_Learning (1)
No ratings yet
Malware_Detection_Using_Machine_Learning (1)
4 pages
Sign Language Translation
No ratings yet
Sign Language Translation
23 pages
Aniket PDF
No ratings yet
Aniket PDF
4 pages
Wa0036.
No ratings yet
Wa0036.
32 pages
Enhancing Academic Resource Evaluation in Computer Science and Engineering Through Automated Assessment
No ratings yet
Enhancing Academic Resource Evaluation in Computer Science and Engineering Through Automated Assessment
4 pages
Aircraft Identification
No ratings yet
Aircraft Identification
13 pages
Final Project Report
No ratings yet
Final Project Report
18 pages
Raphael Sonabend PHD Thesis
No ratings yet
Raphael Sonabend PHD Thesis
345 pages
Hossein Abbasimehr, M. S. (2020) - An Optimized Model Using LSTM Network For Demand Forecasting. Tehran, Iran - Computer & Industrial Engineering.
No ratings yet
Hossein Abbasimehr, M. S. (2020) - An Optimized Model Using LSTM Network For Demand Forecasting. Tehran, Iran - Computer & Industrial Engineering.
13 pages
Introduction To Data Mining Global Edition Pang Ning Tan Michael Steinbach Anuj Karpatne Vipin Kumar All Chapter Instant Download
100% (2)
Introduction To Data Mining Global Edition Pang Ning Tan Michael Steinbach Anuj Karpatne Vipin Kumar All Chapter Instant Download
79 pages
AI Algorithm Auditor Certificate Handbook 1720372190
100% (1)
AI Algorithm Auditor Certificate Handbook 1720372190
31 pages
bca 5th sem minor report
No ratings yet
bca 5th sem minor report
46 pages
Class Notes: The Basics of Machine Learning
No ratings yet
Class Notes: The Basics of Machine Learning
4 pages
Machine Learning Implementations in Childhood Stunting Research A Systematic Literature Review
No ratings yet
Machine Learning Implementations in Childhood Stunting Research A Systematic Literature Review
6 pages
Machine Learning: Notes by Aniket Sahoo - Part II
No ratings yet
Machine Learning: Notes by Aniket Sahoo - Part II
140 pages
Machine Learning for Practical Decision Making: A Multidisciplinary Perspective with Applications from Healthcare, Engineering and Business Analytics Christo El Morr pdf download
100% (1)
Machine Learning for Practical Decision Making: A Multidisciplinary Perspective with Applications from Healthcare, Engineering and Business Analytics Christo El Morr pdf download
41 pages
One Class Text Classification Using An Ensemble of Classifiers
No ratings yet
One Class Text Classification Using An Ensemble of Classifiers
71 pages
Lung_Cancer_Detection_using_Machine_Learning
No ratings yet
Lung_Cancer_Detection_using_Machine_Learning
5 pages
Project Plagiarism Report
No ratings yet
Project Plagiarism Report
21 pages
2014 - Predicting The Price of Used Cars Using Machine Learning Techniques PDF
No ratings yet
2014 - Predicting The Price of Used Cars Using Machine Learning Techniques PDF
12 pages
210422
No ratings yet
210422
2 pages
'AI & Machine Vision Coursework Implementation of Deep Learning For Classification of Natural Images
No ratings yet
'AI & Machine Vision Coursework Implementation of Deep Learning For Classification of Natural Images
13 pages
Peerj Cs 1481
No ratings yet
Peerj Cs 1481
22 pages
Enhanced Indoor Localization Based BLE Using Gaussian Process Regression and Improved Weighted KNN
No ratings yet
Enhanced Indoor Localization Based BLE Using Gaussian Process Regression and Improved Weighted KNN
12 pages
Next-Gen Security in IIoT - Integrating Intrusion Detection
No ratings yet
Next-Gen Security in IIoT - Integrating Intrusion Detection
11 pages
PDF Data Driven Treatment Response Assessment and Preterm Perinatal and Paediatric Image Analysis Andrew Melbourne Download
100% (4)
PDF Data Driven Treatment Response Assessment and Preterm Perinatal and Paediatric Image Analysis Andrew Melbourne Download
51 pages
Big Data Analytics
No ratings yet
Big Data Analytics
8 pages