Data in ML
Data in ML
Class/label
Target
Attributes
• Performance
measurement
Improve Check • Evaluate different
result Algorithms model
• Tuning parameter
• Build Ensemble
Model
Types of machine
learning
Supervised Machine Learning
Supervised learning (Regression)
Source : sckit-learn.org
Reinforcement Learning
Some challenges task of machine learning
• Difficult to data collection/Acquisition
• Not all features useful to find good model (Curse of Dimensionality)
• Some dataset contains noise/outlier/redundant data
• Small data and Imbalance data
Curse of Dimensionality
• As number of features or dimensions grows, the amount of data we need
to generalize accurately grows exponentially (C.Bishop, 2006)
https://www.datasciencecentral.com/
Curse of Dimensionality
• If we would keep adding features, the dimensionality of the feature space grows, and becomes
sparser and sparser.
• Due to this sparsity, it becomes much more easy to find a separable hyperplane because the
likelihood that a training sample lies on the wrong side of the best hyperplane becomes
infinitely small when the number of features becomes infinitely large.
https://www.datasciencecentral.com/
Curse of Dimensionality
This concept is called overfitting and is a direct result of the curse of dimensionality.
https://www.datasciencecentral.com/
Curse of Dimensionality
How to evaluate/Validate ML Model