0% found this document useful (0 votes)
3 views64 pages

Introduction - Final

The document outlines the objectives and outcomes of a Machine Learning course, covering basic concepts, supervised and unsupervised algorithms, ensemble techniques, and dimensionality reduction. It discusses the applications of machine learning in various fields, the types of learning, and the challenges faced in the domain, such as data labeling and the shortage of experts. Additionally, it provides insights into the steps involved in developing machine learning applications, including data collection and preprocessing techniques.

Uploaded by

neha.surti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views64 pages

Introduction - Final

The document outlines the objectives and outcomes of a Machine Learning course, covering basic concepts, supervised and unsupervised algorithms, ensemble techniques, and dimensionality reduction. It discusses the applications of machine learning in various fields, the types of learning, and the challenges faced in the domain, such as data labeling and the shortage of experts. Additionally, it provides insights into the steps involved in developing machine learning applications, including data collection and preprocessing techniques.

Uploaded by

neha.surti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 64

Objectives

 To introduce the basic concepts and techniques of


Machine Learning
 To introduce various supervised and unsupervised
algorithms
 To introduce various ensemble techniques for
combining ML models.
 To introduce the concept of dimensionality reduction
and its techniques.

August 12, 2025 1


Outcomes

 Identify a Machine Learning technique for the given problem and understand the
concepts of Training Error, Generalization Error, Overfitting and Underfitting.
 Apply Regression and Decision Tree techniques on the given data and examine
the performance of the model
 Compare and Contrast Ensemble approaches for combining multiple Machine
Learning Techniques
 Determine the type of Support Vector Machines variant which can applied on the
given data
 Apply Unsupervised Learning technique on the given data for getting insights
from unlabeled data
 Use Dimensionality Reduction techniques for dealing with data with large
number of attributes

August 12, 2025 2


Syllabus

3
4
5
6
7
8
What is machine learning

 Machine learning is an area in the


computer science which involves teaching
computers to do things naturally by
learning through experience
 A computer program is said to learn from
experience (E) with respect to some class
of task (T) and performance measure (P)

August 12, 2025 9


Features

 Class of task
 Performance measure
 Source of experience
 Example:-Robot navigation in a maze

August 12, 2025 10


Common definitions

 Machine learning is used to parse data,


learn from it and then make a
determination or predi ction about
something in the world
 Machine learning lies at the intersection
of computer science, engineering and
statistics and often appears in other
disciplines

August 12, 2025 11


Components of ML study

Computer Statistics
Science

Engineering
Engineering
Examples of machine learning

 Facebook which continuously notices the


friends in the list, profiles often visited, your
interests, workplace, groups you are in and
so on. Based on the information retrieved ,
facebook gives you friend suggestions
 Consider that you purchased an item from
amazon. If you purchased a mobile phone
online, then the site from where you
purchased it immediately recommends a
cover for the phone purchased

August 12, 2025 13


How does ML algorithm work?
 Machine learning uses algorithms to find
patterns in data
 then uses a model that recognizes those
patterns to make predictions on new data
Predictions

Training
Model:
algorithm:
Data recognizes
finds the
the pattern
patterns

New Data
Machine learning can be
implemented in the

 Healthcare sector
 Pharmaceutical companies

August 12, 2025 15


Where is machine learning used

 Marketing and sales


 Search engines
 Transportation:- Based on travel history
and pattern of travelling across various
routes , machine learning can help
transportation companies predict
potential problems that could arise on
certain routes and accordingly advise
their customers to opt for a different
route
August 12, 2025 16
Types of machine learning

 Supervised learning:-Suppose you have a


fruit basket and your task is to arrange
the fruit by type
 You can group the fruits based on any
physical character

August 12, 2025 17


 Rule 1:- If the color of the fruit is Red and
size of the fruit is small then the fruit is
cherry
 If the color of the fruit is Red and size of
the fruit is Big then the fruit is apple
 If the color of the fruit is Green and size of
the fruit is small then the fruit is grape
 If the color of the fruit is green and size of
the fruit is Big then the fruit is
watermelon
August 12, 2025 18
Decision Tree Induction

August 12, 2025 20


August 12, 2025 21
Reinforcement Learning

 This learning is similar to supervised learning.


 In the supervised learning the correct target
output values are known for each input
pattern.
 But in some cases, the less information might
be available
 For example the network might be told that its
actual output is only 50% correct.
 Thus here only critic information is available
not the exact information

August 12, 2025 22


 The learning based on this critic information is
called as reinforcement learning and the
feedback sent is called as reinforcement
signal.
 The reinforcement learning is a form of
supervised learning because the network
receives some feedback from its environment.
 The reinforcement signals are processed in
the critic signal generator and the obtained
critic signals are sent to the network for
adjustments of weights.
August 12, 2025 23
 The reinforcement learning is also called
learning with a critic as opposed to
learning with a teacher , which indicates
supervised learning
 Reinforcement learning work very similar
to how you learn by yourself without any
guidance basically through hit and trial.
 When you get something right , you get
reward, you feel happy and you move
ahead
August 12, 2025 24
 When you get something wrong , you get
a penalty . You take a step back and then
you try to avoid incorrect path while
exploring another correct path.
 Example:-Robots equipped with sensors
from to learn their surrounding
environment

August 12, 2025 25


Unsupervised Learning

 K-Means clustering

August 12, 2025 26


Issues in machine learning

 Data labelling:- Today there is a large


amount of data that is unlabelled and
raw. As you know supervised machine
learning works on labelled data.
 Without adequate data labels in the
training dataset , it is not feasible to build
robust maching learning model.
 Companies are putting thousands of man
hours to label the data so that it can be
used for machine learning
August 12, 2025 27
 This is an active area of research where
the labels can be attached to the data as
it is used

August 12, 2025 28


Shortage of experts
 Machine learning is an emerging field and
there are not many experts around the
world. You require experts who can
 1)Understand the wide variety of data
 2) Model the data correctly so as to meet
the desired objectives
 3)Build and manage software and
hardware tools and techniques required
for machine learning

August 12, 2025 29


Obtaining massive training
datasets

 It is difficult to obtain massive training


dataset for various areas of machine
learning
 You may lack historical data and also the
quality of data for the training dataset
matters.
 If the dataset obtained does not represent
a fair sample size then the resultant
machine learning model could be
erroneous
August 12, 2025 30
 If you are trying to build machine learning
model to predict a particular type of
cancer from a given set of symptoms,
lifestyle and blood related parameters.
Then you may require quality data for
thousands of patients that have had that
particular type of cancer and the details
of their symptoms , lifestyle and blood
related parameters.

August 12, 2025 31


HARD TO EXPLAIN PROBLEMS
AND RESULTS

 Complex machine learning models , often


built by experts may not be self
explanatory when used by common
people in the field
 For example you tell a healthy person
that she has an 80% chance of getting a
particular disease then she may require
additional details behind that statement

August 12, 2025 32


Limited possibilities to reuse the
model

 It is difficult to reuse an existing machine


learning model for other uses cases.
Companies have to invest time and
resources to build new model for solving
new use cases

August 12, 2025 33


Steps in developing machine
learning application

 Collect data:-Some of the popular,


publicly available dataset resources are
as follows
 1) Kaggle dataset
 2) Amazon web services
 3) Machine learning repository
 4)Google tensor flow
 5)Microsoft
 6) Open ML
August 12, 2025 34
Prepare the input data

 Once you have the data, you need to


ensure that it is in the right format such
that it can be processed by the chosen
algorithm and computer programs

August 12, 2025 35


Data Preprocessing?

 Data Processing
Processing that involves transformation of raw
data into useful information.

 Why pre-processing is required?


1. Real world data are generally
 incomplete:

 noisy:

August 12, 2025 36


2. Tasks in Data Preprocessing

 Data cleaning
 Fill in missing values, smooth noisy data, identify or
remove outliers
 Data integration
 Integrating data from multiple sources
 Data transformation
 Normalization
 Data reduction
Obtains reduced representation in volume but produces
the same or similar analytical results

August 12, 2025 37


preprocessing

August 12, 2025 38


Data Cleaning

 Data cleaning tasks


 Fill in missing values
 Identify outliers and smooth out noisy
data

August 12, 2025 39


How to Handle Missing
Data?
 Ignore the tuple: usually done when class label is missing
(assuming the tasks in classification
 Fill in the missing value manually: tedious + infeasible?
 Use a global constant to fill in the missing value: e.g.,
“unknown”
 Use the attribute mean or median to fill in the missing value
 Use the most probable value to fill in the missing value:
using techniques like regression , Bayesian
classification ,decision tree, Clustering algorithm

August 12, 2025 40


Example of Weather
Outlook Temperature Humidity W indy Class
sunny hot high false N
sunny hot high true N
overcast hot high false P
rain mild high false P
rain cool normal false P
rain cool normal true N
overcast cool normal true P
sunny mild high false N
sunny cool normal false P
rain mild normal false P
sunny mild normal true P
overcast mild high true P
overcast hot normal false P
rain mild high true N

August 12, 2025 41


How to Handle Noisy Data?
 Binning method:
 first sort data and partition into (equi-depth)

bins
 then one can smooth by bin means, smooth by

bin median, smooth by bin boundaries, etc.


 Clustering
 detect and remove outliers

 Regression

August 12, 2025 42


Binning

 Consider sorted data for example price in


INR
 4,8,9,15,21,21,24,25,26,28,29,34
 N=3
 Bin 1:4,8,9,15
 Bin 2: 21,21,24,25
 Bin 3:26,28,29,34

August 12, 2025 43


Smooth by bin means

 Replace each value of bin with its mean


value
 Bin 1:- 9,9,9,9
 Bin 2:-23,23,23,23
 Bin 3:-29,29,29,29

August 12, 2025 44


Smoothing by bin median

 Bin 1:-8.5,8.5,8.5,8.5
 Bin 2:-22.5,22.5,22.5,22.5
 Bin 3:-28.5,28.5,28.5,28.5

August 12, 2025 45


Smoothing by bin boundaries

 Bin 1:- 4,4,4,15


 Bin 2:-21,21,25,25
 Bin 3:- 26,26,26,34

August 12, 2025 46


Data Integration

 Carl’s Coefficient Measure


 Covariance

August 12, 2025 47


August 12, 2025 48
August 12, 2025 49
August 12, 2025 50
Data Reduction

 Dimension reduction technique

August 12, 2025 51


Example of Decision Tree Induction

Initial attribute set:


{A1, A2, A3, A4, A5, A6}
A4 ?

A1? A6?

Class 1 Class 2 Class 1 Class 2

> Reduced attribute set: {A1, A4, A6}

August 12, 2025 52


Numerosity Reduction

Numerosity reduction 40
technique refers to 35
reducing the volume of
data by choosing smaller30
forms for data 25
representation.
20
1. Histograms
A popular data 15
reduction technique 10
Divide data into buckets
5
Range of bucket is

called as width. 0
10000 30000 50000 70000 90000
August 12, 2025 53
Histogram

August 12, 2025 54


August 12, 2025 55
histogram

 D=[1,2,3,4,2,2,3,3,3,3,1,1,1,1,1,4,4,5,5,5,6,6,6,7,
7,7,1]

August 12, 2025 56


histogram

 D=[1,2,3,4,2,2,3,3,3,3,1,1,1,1,1,4,4,5,5,5,6,6,6,7,7,7,
1]
 1:-7 times
 2:-3
 3:-5
 4:-3 times
 5:-3
 6:3
 7:-3
August 12, 2025 57
Data Transformation

 Normalization

August 12, 2025 58


Z-score v' 
v  meanA
stand _ devA

 Sample data [10,20,30]


 Mean=20
 Std dev=square root of variance
 Variance

59
Z-score v' 
v  meanA
stand _ devA

 Sample data [10,20,30]


 Mean=20
 Std dev=square root of variance
 Variance

 =66.66
 Std dev=8.16
 V1=-1.22,0,1.22
60
Analyze the input data

 You need to ensure that examples are


complete (there are no missing values)

August 12, 2025 61


Train the algorithm
Test the algorithm

August 12, 2025 Data Mining: Concepts and Techniques 63


Use the algorithm

 You spent a lot of time collecting and


cleaning the data and then building and
testing the model

August 12, 2025 64


Periodic revisit

 You should periodically review the result


that the model is producing and evaluate
if there are opportunities for improving it
in light of new data. You may carry out
minor adjustments to the model or may
retrain it with latest data to fine tune it

August 12, 2025 65

You might also like