0% found this document useful (0 votes)
3 views

Unit-1_ML

The document provides an introduction to Machine Learning, detailing its definition, applications, and various types including supervised, unsupervised, semi-supervised, and reinforcement learning. It outlines the design of learning systems, approaches like artificial neural networks and support vector machines, and discusses issues such as inadequate training data and overfitting. Additionally, it raises questions regarding the effectiveness of training data and algorithms in the learning process.

Uploaded by

Kartikeya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Unit-1_ML

The document provides an introduction to Machine Learning, detailing its definition, applications, and various types including supervised, unsupervised, semi-supervised, and reinforcement learning. It outlines the design of learning systems, approaches like artificial neural networks and support vector machines, and discusses issues such as inadequate training data and overfitting. Additionally, it raises questions regarding the effectiveness of training data and algorithms in the learning process.

Uploaded by

Kartikeya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Introduction to Machine

Learning

Prepared By: Deepti Singh


Machine Learning
• It is a subfield of Artificial Intelligence that uses algorithms trained on data sets to
create self learning models that are capable of predicting outcomes and
classifying information without human intervention.
intervention
• Applications:
- Predicting stock market fluctuations
- Translating text from one language to another
- Analyzing Big data
- Suggesting Products to Consumers based on their past purchases.
- etc.
Types of Machine Learning:
• There are four types of machine learning:

Types of Learning

Supervised Unsupervised Semi- Reinforcement


Supervised

Techniques : Techniques :
Techniques : Techniques : 1. Self-training 1. Q-Learning
1. Classification 1. Clustering Models 2. Deep Q-
2. Regression 2. Association 2. Graph based Networks
Models (DQN)

Fig. 1 Types of Machine Learning


Supervised Learning:
• Algorithms are trained on labelled datasets that include tags
describing each piece of data.
• It Uses Patterns to predict label Values on additional unlabelled.
Unsupervised Learning:
• It is a technique in which an algorithm discovers patterns and
relationships using unlabeled data..
• The primary goal is to discover hidden patterns, similarities, or
clusters within the data, which can then be used for various purposes,
such as data exploration, visualization, dimensionality reduction, and
more.
Reinforcement Learning
• It is a learning method that interacts with the environment by
producing actions and discovering errors.
• It train an agent to act in an environment by maximizing rewards
through trial and error.
• Agents learns by :
- Exploring actions: try different actions
- Receiving feedback: 1. Reward for correct action
2. Punishment for incorrect action
- Improving Performance: Refining strategies over time.
Semi-Supervised
Supervised Learning
• It represents the intermediate ground between Supervised (With
Labelled training data) and Unsupervised learning (with no labelled
training data) algorithms and uses the combination of labelled and
unlabeled datasets during the training period.
• The main aim of semi-supervised learning is to effectively use all the
available data.
Well Posed Learning Problems
• A Computer program is said to learn from Experience E with respect
to some class of task T and performance measure P, if its performance
at task in T as measured by P, improve with experience E.
• By…… Tom M. Mitchell
• A handwriting recognition learning problem:
- Task T:: recognizing & classifying handwritten words within images.
- Performance Measure P:: percent of words correctly classified.
- Training Experience E: a database of handwritten words with given
classification.
Designing a Learning System:
1. Choosing the training Experience
2. Choosing the target function
3. Choosing the representation for the target function
4. Choosing a function approximation algorithm
5. Final design.
Designing a Learning System
• Let us consider a designing a program to learn play checkers, with the goal of entering it
in the world checker tournament.
1. Choosing the Training Experience.
- one key attribute is whether the training experience provides direct or indirect
feedback regarding the choices made by the performance system.
-The degree to which the learner controls the sequence of training example.
- how well it represents the distribution of example over which the final System
performance P must be measured.

2. Choosing the Target Function: Used to determine exactly what type of Knowledge will
be learned and how this will be used by performance program.
3. Choosing a Representation for the Target function: Machine will already know all the
possible moves(Legal), so next step is to choose the optimised move using linear
equations, hierarchical graph.
4. Choosing a function approximation algorithm.
algorithm
5. Final Design
1. Choosing the training Experience:
- From which system will learn
- Failure or success will depend on choosing the training experience.
- Three attributes on which impact on failure or success
 Training Experience provide direct and indirect feedback-
feedback credit
assignment (learner face a problem)
 Degree to which the learner controls the sequence of training
examples
 How well it represents the distribution of examples over which the
final system performance must be measured.
3. Choosing a representation for the target
functions.
4.Choosing a function Approximation Algorithm
5. Final Design

Fig. 2 Final Design in Design Learning System


Design Choices (Checker Program)

Fig. 3 Summary of Choices in designing the checker learning problem


Introduction to Machine Learning Approaches:
• Artificial Neural Networks
• Clustering
• Decision Tree
• Bayesian Belief Networks(BBN)
• Support Vector Machine
• Genetic Algorithm
1. Artificial Neural Network

Fig. 4 Biological Neuron Fig. 5 Artificial Neural Networks


• An ANN is a computational model composed of interconnected nodes
or “neurons”, which process information in a manner similar to the
human brain.
• Warren McCulloch & Walker Pitts modelled the first artificial neural
circuit in 1943, but it gain popularity around 1980’s in mainstream
data operations.
Function: Neurons process information using weights, biases and
activation functions (like Sigmoid etc.)
etc
Training: It uses back propagation to adjust weights and minimize
errors.
Application: Image Processing, NLP, Financial Forecasting, Medical
Diagnosis etc.
2. Clustering
• It is used to group similar data points into clusters based on their
similarities.
- Unlike Supervised learning, clustering does not require labelled
data.
• It is particularly useful for exploring and understanding complex
datasets without predefined categories.
categories
• It helps in discovering patterns & insights from data.
• Applications: Customers Segmentation, Image Compression, anomaly
detection etc.
Types of Clustering:
Hard Clustering: each data point belong to exactly Soft Clustering: each data point belong to multiple
one cluster. clusters with varying degree of membership.

Fig. 6 Hard Clustering and Soft Clustering


3. Decision Tree
• It is a popular supervised learning algorithm used for classification
and regression tasks.
• It is used to model and predict outcomes based on input data.
• It is a tree- like structure where
- each internal node tests on attributes,
- each branch corresponds to attribute value
- And each leaf node represent final decision or prediction.
• Working:
1. Splitting: The dataset splits into subsets based on feature values.
2. Decision Rule: At each node, a decision rule is applied to split the
data.
3. Predictions: The Process continues until leaf nodes are reached and
provide final predictions.
* Decision tree are easy to interpret & visualize, making them a valuable tools for
various applications.

Fig. 7 Tree Structure to represent whether a person is fit or not


4. Bayesian Belief Networks (BBN)
• It is a graphical model that represents the probabilistic relationships
among variables.
• It is used to handle uncertainty and make predictions or decisions
based on probabilities.
• Graphical Representation: Variables are represented as nodes in a
directed acyclic graph (DAG), and their dependencies are shown as
edges.
• Conditional Probabilities:: Each node’s probability depends on its
parent nodes, expressed as P(Variable | Parent).
• Probabilistic Model:: Built from probability distributions, BBNs apply
probability theory for tasks like prediction and anomaly detection.
5. Reinforcement Learning
• Reinforcement Learning is a feedback-based
feedback Machine learning
technique in which an agent learns to behave in an environment by
performing the actions and seeing the results of actions.
• For each good action, the agent gets positive feedback, and for each
bad action, the agent gets negative feedback or penalty.
• the agent learns automatically using feedbacks without any labeled
data, unlike supervised learning.
• Since there is no labeled data, so the agent is bound to learn by its
experience only.
• Agents learns by :
- Exploring actions: try different actions
- Receiving feedback: 1. Reward for correct action
2. Punishment for incorrect action
- Improving Performance: Refining strategies over time.
6. Genetic Algorithm
• It is an adaptive heuristic search algorithm inspired by Darwin’s
Theory of evolution in Nature.

- Used to solve optimization problem in ML.


- Helps in solving complex problems that would take a long time (due
to large search space) to solve.
- It is inspired by the process of natural selection & evolution.
Operators of GA:
• Initialization: start with randomly generated population of potential
solutions (Called Chromosomes.)
• Selection:
- Evaluate the fitness of each individual in the population.
- The individuals with good fitness scores are allowed to pass their
genes to successive generations.
• Crossover: Two individuals are selected using selection operator and
crossover sites are chosen randomly.
randomly Then the genes at these
crossover sites are exchanged thus creating a completely new
individual (offspring).
Cont…
• Mutation: The key idea is to insert random genes in offspring to
maintain the diversity in the population to avoid premature
convergence.
• Replacement: Replace the old population with the new generation.
• Iteration: Repeat the selection, crossover, mutation & replacement
steps until a satisfactory solution is found.
Applications:
• Designing electronic circuits
• Image Processing
• Etc.
Support Vector Machine:
• Support Vector Machine (SVM) is a supervised machine learning
algorithm used for classification and regression tasks.
• SVM is particularly well-suited for classification tasks.
• It aims to find the optimal hyperplane in an N-dimensional space to
separate data points into different classes.
• It works for both linear and non-linear
linear data.
Fig.8 Multiple Hyperplane separates the data
Issues in Machine Learning
• Inadequate Training Data: ML models require large amount of high
quality data to learn effectively. However in many domains, obtaining
such data is difficult due to factors like privacy concerns, costs of data
collection, and data sparsity.
• Poor quality of Data: The quality of data directly impacts the
performance of machine learning models.
models
poor quality include – incomplete, noisy or inconsistent and this
leads to inaccurate predictions and flawed outcomes.
Cont…
• Non-Representative Training Data: Data It occurs when the training
dataset does not accurately reflect the real world distribution of data.
Consequences:
- Poor Generalization: Models perform well in controlled environments
but poorly in real-world applications.
applications
- Bias in prediction: If the training data is not representative, the
model’s predictions will be biased towards certain outcomes,
potentially leading to unfair or inaccurate results.
• Overfitting and Underfitting: Overfitting occurs when model
becomes too complex and fits the noise in the training data rather
than underlying patterns which results in poor generalization to new
data.
Cont.
Underfitting: When model is too simple to capture the underlying
patterns in the data.
• Monitoring and Maintenance: Once a ML model is deployed,
continuous monitoring is essential to ensure that it remains accurate
and relevant. As the data landscape changes, models may begin to
drift from their original performance level.
• Data Bias: It occurs when the training data used to build a model is
not representative of the broader population, leading to biased
predictions. This results in discriminating against certain groups or
fail to generalize to all users.
• Etc.
Our checkers example raises a number of generic
questions about machine learning.
learning
• On the basis of training data
1. How much training data is sufficient?
2. What general bounds can be found to relate the confidence in learned
hypotheses to the amount of training experience and the character of the
learner‘s hypothesis space?
• Based on Algorithms:
1. What algorithms exist for learning general target functions from specific
training examples?
2. In what settings will particular algorithms converge to the desired
function, given sufficient training data?
3. Which algorithms perform best for which types of problems and
representations?
Cont.
• On the basis of prior knowledge
1. When and how can prior knowledge held by the learner guide the process
of generalizing from examples?
2. Can prior knowledge be helpful even when it is only approximately
correct?
• On the basis of training experience
1. Choosing the best strategy
2. The choice of strategy alter the complexity of learning problem.
• What is the best way to reduce the learning task to one or more function
approximation problems.
• How can the learner automatically alter its representation to improve its
ability to represent and learn the target function?
Data Science versus Machine Learning

You might also like