0% found this document useful (0 votes)
31 views

Data in ML

This document provides an introduction to machine learning. It defines machine learning as a computer program that improves its performance on tasks through experience. It discusses important machine learning concepts like training sets, instances, features, and feature vectors. It also describes different types of machine learning including supervised learning for classification and regression, unsupervised learning, and reinforcement learning. Finally, it discusses some challenges in machine learning like data collection, dimensionality, and data issues like noise, outliers, and imbalance.

Uploaded by

Purnama Gaming
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Data in ML

This document provides an introduction to machine learning. It defines machine learning as a computer program that improves its performance on tasks through experience. It discusses important machine learning concepts like training sets, instances, features, and feature vectors. It also describes different types of machine learning including supervised learning for classification and regression, unsupervised learning, and reinforcement learning. Finally, it discusses some challenges in machine learning like data collection, dimensionality, and data issues like noise, outliers, and imbalance.

Uploaded by

Purnama Gaming
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

INTRODUCTION TO MACHINE LEARNING

Dr. Gede Angga Pradipta M.Eng


Machine Learning ?

“ A Computer program is said to learn from experience (E) with

some class of tasks (T) and a performance measure (P) if its

performance at task in T as measured by P improves with E “

(Tom Mitchell, 1977)


Some Important Terminologies
• Trainning/Evolution Set
- Set of data to discover potentially predictive relationships.
• Instances
- A sample is an item to process (e.g. classify). It can be a document, a picture, a sound ,
a video, a row in database or CSV file, or whatever you can describe with a fixed set of
quantitative traits.
• Features/attributes
- The number of features or distinct traits that can be used to describe each item in a
quantitative manner.
• Feature vector
- Is an n-dimensional vector of numerical features that represent some object.
• Feature extraction
- Preparation of feature vector
- Transforms the data in the high-dimensional space to a space of fewer dimensions
Attributes Features

Class/label
Target
Attributes

Name Balance Age Employed Write-off


Mike $ 2000 31 No Yes
Mary $13500 30 Yes Yes
Claudio $200 21 No No
Robert $5000 28 Yes Yes
David $10000 42 Yes Yes

This is one row (example). Instance


Feature vector is : <Claudio,200,21,No>
Class label (value of target attribute) is No
WHY MACHINE LEARNING ?
WHY MACHINE LEARNING ?
WHY MACHINE LEARNING ?
Traditional Code Vs Machine Learning
Machine Learning Pipeline • Data acquisition
(image,text,sound,video,
etc)
• ML Problem check list
Defining
Problem
• Removing noise ( e.g
• Present final Tomek-Links)
performance • Oversampling data
Present Preparing • Feature Engineering
result Data
• Feature Selection
• Data transformation

• Performance
measurement
Improve Check • Evaluate different
result Algorithms model
• Tuning parameter
• Build Ensemble
Model
Types of machine
learning
Supervised Machine Learning
Supervised learning (Regression)

• Regression is a measure of the relation between the mean value of


one variable (e.g. salary) and corresponding values of other
variable (e.g experience).
• Regression analysis is a statistical process for estimating the
relationships among variables.
• Regression means to predict the ouput value usong training data.
• Basic algorithm: linier regression
Supervised learning (Classification)

Nerual network model Geometric model

Logical Model/rule based model Probabilistic model


Differences between classification and regression

•A classification problem requires that •A regression problem requires the


examples be classified into one of two or prediction of a quantity.
more classes. •A regression can have real valued or
•A classification can have real-valued or discrete input variables.
discrete input variables. •A problem with multiple input
•A problem with two classes is often called a variables is often called a multivariate
two-class or binary classification problem. regression problem.
•A problem with more than two classes is •A regression problem where input
often called a multi-class classification variables are ordered by time is called a
problem. time series forecasting problem.
•A problem where an example is assigned
multiple classes is called a multi-label
classification problem.
Unsupervised learning

Source : sckit-learn.org
Reinforcement Learning
Some challenges task of machine learning
• Difficult to data collection/Acquisition
• Not all features useful to find good model (Curse of Dimensionality)
• Some dataset contains noise/outlier/redundant data
• Small data and Imbalance data
Curse of Dimensionality
• As number of features or dimensions grows, the amount of data we need
to generalize accurately grows exponentially (C.Bishop, 2006)

As the dimensionality increases, the classifier's


performance increases until the optimal number
of features is reached. Further increasing the
dimensionality without increasing the number of
training samples results in a decrease in
classifier performance.

https://www.datasciencecentral.com/
Curse of Dimensionality
• If we would keep adding features, the dimensionality of the feature space grows, and becomes
sparser and sparser.
• Due to this sparsity, it becomes much more easy to find a separable hyperplane because the
likelihood that a training sample lies on the wrong side of the best hyperplane becomes
infinitely small when the number of features becomes infinitely large.

https://www.datasciencecentral.com/
Curse of Dimensionality

This concept is called overfitting and is a direct result of the curse of dimensionality.

https://www.datasciencecentral.com/
Curse of Dimensionality
How to evaluate/Validate ML Model

K- Fold Cross Validation Train-Test Data Split


How to evaluate ML Model
THANK YOU

You might also like