0% found this document useful (0 votes)
11 views35 pages

machine-1

The document provides a comprehensive overview of machine learning, covering its introduction, types, and various approaches such as regression, decision trees, and neural networks. It discusses the history of machine learning, its applications across different fields, advantages, and disadvantages, emphasizing the importance of data in training models. Key concepts include supervised and unsupervised learning, performance measures, and the evolution of algorithms over time.

Uploaded by

dipgarai.sandip
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views35 pages

machine-1

The document provides a comprehensive overview of machine learning, covering its introduction, types, and various approaches such as regression, decision trees, and neural networks. It discusses the history of machine learning, its applications across different fields, advantages, and disadvantages, emphasizing the importance of data in training models. Key concepts include supervised and unsupervised learning, performance measures, and the evolution of algorithms over time.

Uploaded by

dipgarai.sandip
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Machine Learning

• (UNIT-1 : INTRODUCTION) Learning, Types of Learning, Well defined


learning problems, Designing a Learning System, History of ML,
Introduc on of Machine Learning Approaches - (Ar ficial Neural Network,
Clustering, Reinforcement Learning, Decision Tree Learning, Bayesian
networks, Support Vector Machine, Gene c Algorithm), Issues in Machine
Learning and Data Science Vs Machine Learning.
• (UNIT-2: REGRESSION & BAYESIAN LEARNING) REGRESSION: Linear
Regression and Logis c Regression. BAYESIAN LEARNING - Bayes theorem,
Concept learning, Bayes Op mal Classifier, Naïve Bayes classifier, Bayesian
belief networks, EM algorithm. SUPPORT VECTOR MACHINE: Introduc on,
Types of support vector kernel - (Linear kernel, polynomial kernel, and
Gaussiankernel), Hyperplane - (Decision surface), Proper es of SVM, and
Issues in SVM.
• (UNIT-3: DECISION TREE LEARNING) DECISION TREE LEARNING - Decision
tree learning algorithm, Induc ve bias, Induc ve inference with decision
trees, Entropy and informa on theory, Informa on gain, ID-3 Algorithm,
Issues in Decision tree learning. INSTANCE-BASED LEARNING - k-Nearest
Neighbour Learning, Locally Weighted Regression, Radial basis func on
networks, Case-based learning.
• (UNIT-4: ARTIFICIAL NEURAL NETWORKS) ARTIFICIAL NEURAL NETWORKS
- Perceptron's, Mul layer perceptron,
Gradient descent & the Delta rule, Mul layer networks, Deriva on of
Backpropaga on Algorithm, Generaliza on, Unsupervised Learning - SOM
Algorithm and its variant; DEEP LEARNING - Introduc on, concept of
convolu onal neural network, Types of layers - (Convolu onal Layers, Ac va on
func on, pooling, fully connected), Concept of Convolu on (1D and 2D) layers,
Training of network, Case study of CNN for eg on Diabe c Re nopathy, Building
a smart speaker, Self-deriving car etc.
• (UNIT-5: REINFORCEMENT LEARNING) REINFORCEMENT LEARNING-
Introduc on to Reinforcement Learning, Learning Task,Example of
Reinforcement Learning in Prac ce, Learning Models for Reinforcement -
(Markov Decision process, Q Learning - Q Learning func on, @
Learning Algorithm ), Applica on of Reinforcement Learning,Introduc on
to Deep Q
• Learning. GENETIC ALGORITHMS: Introduc on, Components, GA cycle of
reproduc on, Crossover, Muta on, Gene c
• Programming, Models of hEvtoltup o:n/a/nwd
Lewarnwing.,kAnppolicaw olnes.dgegate.in/gate

(UNIT-1 : INTRODUCTION)

• Learning, Types of Learning


• Well defined learning problems, Designing a Learning System.
• History of ML
• Introduc on of Machine Learning Approaches –
• Ar ficial Neural Network
• Clustering
• Reinforcement Learning
• Decision Tree Learning
• Bayesian networks
• Support Vector Machine
• Gene c Algorithm),
• Issues in Machine Learning and
• Data Science vs Machine Learning
Defini on of Learning
• Learning involves changes in behaviour due to experiences, focusing on
adap ng rather than relying on ins nct or temporary states.
• Components of a Learning System:
• Performance Element: Determines ac ons based on exis ng
strategies.
• Learning Element: Improves the performance element by analyzing
past outcomes. Influences include:
• Components of Performance: Understanding exis ng
capabili es.
• Feedback Mechanism: Using feedback to enhance
performance.
• Knowledge Representa on: How informa on is organized and
accessed.

• Acquisi on of New Knowledge:


• Essen al to learning; involves understanding new informa on, similar
to how students learn new mathema cal techniques.
• Problem Solving:
• Integrates new knowledge and deduces solu ons when not all data is
available, akin to a doctor diagnosing illnesses with limited
informa on.
Performance measures for learning
• Generality
• Generality refers to a machine learning model's ability to perform
well across various datasets and environments, not just the one it
was trained on. For
instance, a facial recogni on system that can accurately iden fy faces in
diverse ligh ng condi ons and angles demonstrates good generality.
• Efficiency
• Efficiency in machine learning measures how quickly a model can
learn from data. A spam detec on algorithm that quickly adapts to
new types of spam emails with minimal training data exhibits high
efficiency.
• Robustness
• Robustness is the ability of a model to handle errors, noise, and
unexpected data without failing. A voice recogni on system that can
understand commands in a noisy room shows robustness.
• Efficacy
• Efficacy is the overall effec veness of a machine learning model in
performing its intended tasks. An autonomous driving system that
safely navigates city traffic and avoids accidents under various
condi ons demonstrates high efficacy.
• Ease of Implementa on
• This measures how straigh orward it is to develop and deploy a
machine learning model. A recommenda on system that can be
integrated into an exis ng e-commerce pla orm using standard
algorithms and so ware libraries highlights ease of implementa on.
Supervised Learning
• Supervised learning involves training a machine learning model using
labeled data, which means the data is already associated with the correct
answer.
• Example: Consider teaching a child to iden fy fruits. You show them
pictures of
various fruits, like apples and bananas, while telling them, "This is an apple,"
and "This is a banana." Over me, the child learns to iden fy fruits correctly
based on the examples given.
• Key Steps in Supervised Learning:
• Input and Output Pairing: Each input (e.g., a fruit picture) is paired
with its correct label (e.g., "apple").
• Training: The model learns by comparing its predic on with the
actual label and adjus ng itself to improve accuracy.
• Error Correc on: If the model predicts incorrectly (e.g., calls an apple
a banana), it adjusts its internal parameters to reduce the error.
• Outcome: The model eventually learns to map inputs (fruit images)
to the correct outputs (fruit names).
Unsupervised learning
• Unsupervised learning involves training a model without any labels, which
means the model tries to iden fy pa erns and data groupings on its own.
• Example: Imagine placing a mix of different coins on a table and asking a
child to sort them. Without explaining any criteria, the child might start
grouping the coins by size, colors , or denomina on on their own.
• Key Steps in Unsupervised Learning:
• Input Without Labels: The model receives data without any explicit
instruc ons on what to do with it.
• Pa ern Recogni on: The model analyzes the data and tries to find
any natural groupings or pa erns (e.g., clustering coins based on size
or color).
• Self-Organiza on: The model organizes data into different categories
based
on the pa erns it perceives.
• Outcome: The model creates its own system of categoriza on
without external guidance.
Well-defined learning problems
• A well-defined learning problem allows a computer program to improve at
a specific task through experience. This is characterized by three key
elements:
• Task (T): The specific ac vity or challenge the program is expected to
perform.
• Performance Measure (P): The criteria used to gauge the program's
effec veness at the task.
• Experience (E): The data or interac ons from which the program
learns.
• Checkers Game:
• Task (T): Playing the game of checkers.
• Performance Measure (P): The percentage of games won against
various opponents.
• Experience (E): Engaging in numerous prac ce games, possibly
including self- play.
• Handwri ng Recogni on:
• Task (T): Iden fying and categorizing handwri en words in images.
• Performance Measure (P): The accuracy rate, measured as the
percentage of words correctly recognized.
• Experience (E): Analysis of a large dataset of labeled handwri en
word images.
• Autonomous Driving Robot:
• Task (T): Naviga ng public four-lane highways using vision-based
sensors.
• Performance Measure (P): The average distance the robot travels
without making a mistake, as determined by a human supervisor.
• Experience (E): Processing sequences of images and corresponding
steering commands previously collected from human drivers.
Overview of the history of Machine Learning Early Developments:
• 1943: Neurophysiologist Warren McCulloch and mathema cian Walter
Pi s introduced the concept of a neural network by modeling neurons
with electrical circuits.

Overview of the history of Machine Learning Early Developments:


• 1952: Arthur Samuel developed the first computer program capable of
learning from its ac vi es.
• 1958: Frank Rosenbla created the Perceptron, the first ar ficial neural
network, which was designed for pa ern and shape recogni on.
• 1959: Bernard Widrow and Marcian Hoff developed two neural network
models: ADELINE, which could detect binary pa erns, and MADELINE,
which was used to reduce echo on phone lines.
Advancements in the 1980s and 1990s:
• 1982: John Hopfield proposed a network with bidirec onal lines that
mimicked actual neuronal structures.

Advancements in the 1980s and 1990s:


• 1986: The backpropaga on algorithm was popularized, allowing the
use of mul ple layers in neural networks, enhancing their learning capabili es.
Advancements in the 1980s and 1990s:
• 1997: IBM’s Deep Blue, a chess-playing computer, famously beat the
reigning world chess champion.
Advancements in the 1980s and 1990s:
• 1998: AT&T Bell Laboratories achieved significant progress in digit
recogni on, notably enhancing the ability to recognize handwri en postcodes
for the US Postal Service.
21st Century Innova ons:
• The 21st century has seen a significant surge in machine learning, driven
by both industry and academia, to boost computa onal capabili es and
innova on.
• Notable projects include:
• GoogleBrain (2012): A deep learning project.
• AlexNet (2012): A deep convolu onal neural network.
• DeepFace (2014) and DeepMind (2014): Projects that advanced facial
recogni on and AI decision-making.
• OpenAI (2015), ResNet (2015), and U-net (2015): Each contributed
to advancements in AI capabili es, from gameplay to medical
imaging.

Machine learning
• Machine learning is a subset of ar ficial intelligence (AI) that enables
computers to
learn from and make decisions based on data, without being explicitly
programmed.
• Defini on: Machine learning involves developing algorithms that allow
computers to process and learn from data automa cally.
• Purpose: The aim is to enable computers to learn from their experiences
and improve their performance over me without human interven on.
• Func onality: Machine learning algorithms analyze vast amounts of data,
enabling them to perform tasks more efficiently and accurately. This could
be anything from predic ng consumer behavior to detec ng fraudulent
transac ons.
• Integra on: Combining machine learning with AI and cogni ve
technologies enhances its ability to process and interpret large volumes of
complex data.
• Example: Consider a streaming service like Ne lix. Machine learning is
used to analyze your viewing habits and the habits of others with similar
tastes. Based on this data, the system recommends movies and shows that
you might like. Here, the algorithm learns from the accumulated data to
make increasingly accurate predic ons over me, thereby enhancing user
experience without manual interven on. This demonstrates machine
learning’s capability to adapt and improve autonomously, making it a
powerful tool in many tech-driven applica ons.

Machine learning has a wide range of applica ons across different fields, Here
are some key applica ons along with examples:
• Image Recogni on:
• Applica on: Image recogni on involves iden fying objects, features,
or pa erns within digital images or videos.
• Example: Used in facial recogni on systems for security purposes or
to detect defec ve products on assembly lines in manufacturing.
• Speech Recogni on:
• Applica on: Speech recogni on technology converts spoken words
into text, facilita ng user interac on with devices and applica ons.
• Example: Virtual assistants like Siri and Alexa use speech recogni on
to understand user commands and provide appropriate responses.
• Medical Diagnosis:
• Applica on: Machine learning assists in diagnosing diseases by
analyzing clinical parameters and their combina ons.
• Example: Predic ng diseases such as diabetes or cancer by examining
pa ent data and previous case histories to iden fy pa erns that precede
diagnoses.
• Sta s cal Arbitrage:
• Applica on: In finance, sta s cal arbitrage involves automated
trading strategies that capitalize on pa erns iden fied in trading
data.
• Example: Algorithmic trading pla orms that analyze historical stock
data to
make buy or sell decisions in milliseconds to capitalize on market inefficiencies.
• Learning Associa ons:
• Applica on: This process uncovers rela onships between variables in
large databases, o en revealing hidden pa erns.
• Example: Market basket analysis in retail, which analyzes purchasing
pa erns to understand product associa ons and op mize store
layouts.
• Informa on Extrac on:
• Applica on: Informa on extrac on involves pulling structured
informa on from unstructured data, like text.
• Example: Extrac ng key pieces of informa on from legal documents
or news ar cles to summarize content or populate databases
automa cally.
Advantages of Machine Learning:
• Iden fies Trends and Pa erns:
• Example: Streaming services like Ne lix analyze viewer data to
iden fy viewing pa erns and recommend shows and movies that
individual users are likely to enjoy.
Advantages of Machine Learning:
• Automa on:
• Example: Autonomous vehicles use machine learning to interpret
sensory data and make driving decisions without human input,
improving transporta on efficiency and safety.
Advantages of Machine Learning:
• Con nuous Improvement:
• Example: Credit scoring systems evolve by learning from new
customer data, becoming more accurate in predic ng
creditworthiness over me.
Advantages of Machine Learning:
• Handling Complex Data:
• Example: Financial ins tu ons use machine learning algorithms to
detect fraudulent transac ons by analyzing complex pa erns of
customer behavior that would be difficult for humans to process.
Disadvantages of Machine Learning:
• Data Acquisi on:
• Example: In healthcare, acquiring large datasets of pa ent medical
records that are comprehensive and privacy-compliant is challenging
and expensive.
Disadvantages of Machine Learning:
• Time and Resources:
• Example: Developing a machine learning model for predic ng stock
market trends requires extensive computa onal resources and me
to analyze years of market data before it can be deployed.
Disadvantages of Machine Learning:
• Interpreta on of Results:
• Example: In genomics research, interpre ng the vast amounts of data
produced by machine learning algorithms requires highly specialized
knowledge to ensure findings are accurate and meaningful.
Disadvantages of Machine Learning:
• High Error-Suscep bility:
• Example: Early stages of facial recogni on technology showed high
error rates, par cularly in accurately iden fying individuals from
minority groups, leading to poten al biases and inaccuracies.
Machine Learning Approaches Ar ficial Neural Network
• Overview of ANNs:
• Inspira on: ANNs mimic the structure and func on of the nervous
systems in animals, par cularly how neurons transmit signals.
• Func onality: These networks are used for machine learning and
pa ern recogni on, handling complex data inputs effec vely.

47
Ar ficial Neural Network
• Components of ANNs:
• Neurons: Modeled as nodes within a network.
• Connec ons: Nodes are linked by arcs that represent synapses, with
weights that signify the strength of each connec on.
• Processing: The network processes signals in a way analogous to
neural ac vity in biological brains.

• Opera on:
• Signal Transmission: Connec ons in the network facilitate the
propaga on of data, similar to synap c transmission in biology.
• Informa on Processing: ANNs adjust the weights of connec ons to
learn from data and make informed decisions.
Clustering
• Defini on: Clustering is the process of sor ng items into groups based on
their similari es, forming dis nct clusters where items within each cluster
are more alike to each other than to those in other clusters.
• Visual Representa on: Imagine organizing fruits into groups by type, such
as grouping apples together, oranges in another group, and bananas in
a separate one, visually represen ng how clusters segregate similar
items.

• Characteris cs: Clusters act like exclusive clubs, where members share
common traits but differ significantly from members of other clusters,
illustra ng the dis nc veness of each group.
• Mul dimensional Space: Clusters are akin to islands in an expansive
ocean, with dense popula on points represen ng similar items within
each cluster, and low- density water symbolizing dissimilar items
separa ng clusters.
• Machine Learning Perspec ve: Clustering entails discovering pa erns
without explicit guidance, akin to exploring a forest without a map, where
similari es guide the grouping process. It's a form of unsupervised
learning, akin to solving a puzzle without knowledge of the final solu on.
• Unsupervised Learning: Clustering is learning through observa on, not
instruc on. It's like solving a puzzle without knowing what the final picture
looks like.
• Data Reduc on:
• Example: Imagine sor ng a massive collec on of books into genres
(fic on, non-fic on, sci-fi, etc.). Clustering reduces the data into
manageable chunks for easier processing.
• Hypothesis Genera on:
• Example: Grouping customer purchase data to generate hypotheses
about shopping preferences, which can then be tested with
addi onal research.
• Hypothesis Tes ng:
• Example: Using clustering to verify if certain customer segments
show different purchasing behaviors, confirming or disproving
exis ng hypotheses.
• Predic on Based on Groups:
• Example: Suppose we have a dataset of customer demographics and
spending habits. By clustering similar customers, we can predict the
behavior of new customers based on their group's characteris cs. For
instance, if a new
customer shares similari es with the "budget-conscious" cluster, we can predict
their spending pa erns accordingly.
Differen a ng Clustering Classifica on
Clustering and
Classifica on

1. Clustering analyzes data objects In classifica on, data


without known class label. are grouped by
analyzing the data
objects whose class
label is known.

2. There is no prior knowledge of the There is some prior


a ributes of the data to form knowledge of the
clusters. a ributes of each
classifica on.
3. It is done by grouping only the input It is done by classifying
data because output is not output based on the
predefined. values of the input data.

4. The number of clusters is not known The number of classes is


before clustering. These are known before
iden fied a er the comple on of classifica on as there is
clustering. predefined output
based on input data.

5. Unknown class label Known class label

6. It is considered as unsupervised It is considered as the


learning because there is no prior supervised learning
knowledge of the class labels. because class labels are
known before.

• Hierarchical Clustering:
• Agglomera ve Hierarchical Clustering: Treats each data point as its
own cluster, then merges clusters into larger ones. For example, a
dataset of academic papers starts with each paper as its own cluster,
then papers on similar topics merge into bigger clusters.
• Divisive Hierarchical Clustering: Starts with all data points in one
cluster and splits them into smaller clusters. For instance, star ng
with one cluster of all store customers, the cluster is
split based on purchasing behavior un l each customer forms their own cluster.
• Par onal Clustering:
• Centroid-based Clustering (e.g., K-means): Par ons data into
clusters, each represented by a
centroid. Clusters minimize distance between data points and centroid,
op mizing intra-cluster similarity and inter-cluster dissimilarity. For example,
retail customers can be clustered by buying pa erns, with each cluster's
centroid reflec ng average behavior.
• Model-based Clustering: Uses a sta s cal model for each cluster,
finding the best data fit. For instance, Gaussian mixture models
assume data points in each cluster are Gaussian distributed. This
method is used in image processing to model different textures as
coming from different Gaussian distribu ons.

• Density-based Clustering (e.g., DBSCAN):


• This method clusters points that are closely packed together, marking
as outliers points that lie alone in low-density regions. This is useful
in geographical data analysis where, for example, iden fying regions
of high economic ac vity based on point density of businesses can be
achieved.
• Grid-based Clustering:
• This method quan zes the space into a finite number of cells that
form a grid structure and then performs clustering on the grid
structure. This is effec ve for large spa al data sets, as it speeds up
the clustering process. For example, in meteorological data,
clustering can be applied to grid squares to categorize regional
weather pa erns.

• Spectral Clustering:
• Uses the eigenvalues of a similarity matrix to reduce dimensionality
before clustering in fewer dimensions. This technique is par cularly
useful when the clusters have a complex shape, unlike centroid-
based clustering which assumes spherical clusters. For example, in
social network analysis, spectral clustering can help iden fy
communi es based on the pa erns of rela onships between
members.

Decision Tree
• A decision tree is a model used in data mining, sta s cs, and machine
learning to predict an outcome based on input variables. It resembles a
tree structure with branches and leaves, where each internal node
represents a "decision" based on a feature, each branch represents the
outcome of that decision, and each leaf node represents the final outcome
or class label.

• Advantages and Limitations:


• Advantages:
• Easy to interpret and visualize.
• Requires li le data prepara on compared to other algorithms.
• Can handle both numerical and categorical data.
• Limita ons:
• Prone to overfi ng, especially with many branches.
• Can be biased towards features with more levels.
• Decisions are based on heuris cs, hence might not provide the
best split in some cases.

Bayesian belief networks


• Are tools for represen ng and reasoning under condi ons of uncertainty.
They capture the probabilis c rela onships among a set of variables and
allow for the inference of probabili es even with par al informa on.
• Structure: The core components of a Bayesian belief network include:
• Directed Acyclic Graph (DAG): Each node in the graph represents a
random variable, which can be either discrete or con nuous. These
variables o en correspond to a ributes in data. Arrows or arcs
between nodes represent causal influences.
• Condi onal Probability Tables (CPTs): Each node has an associated
table that quan fies the effect of the parents on the node.

• Usage:
• Learning: Bayesian networks can be trained using data to learn the
condi onal dependencies.
• Inference: Once trained, the network can be used for inference, such
as predic ng the likelihood of lung cancer given that a pa ent is a
smoker with no family history.
• Classifica on: Bayesian networks can classify new cases based on
learned probabili es.
Reinforcement learning
• Reinforcement learning is a type of machine learning where an agent
learns to make decisions by performing ac ons and receiving feedback in
the form of rewards or penal es. This method is similar to how individuals
learn from the consequences of their ac ons in real life.
• Key Concepts in Reinforcement Learning:
• Environment: The world in which the agent operates.
• State: The current situa on of the agent.
• Ac ons: What the agent can do.
• Rewards: Feedback from the environment which can be posi ve
(reinforcements) or nega ve (punishments).
• Imagine a robot naviga ng a maze. The robot has to find the shortest path
to a des na on without prior knowledge of the layout. Each step it takes
provides new informa on:
• If it moves closer to the des na on, it receives a posi ve reward.
• If it hits a wall or moves away from the goal, it receives a nega ve reward.
Through trial and error, the robot learns the op mal path by maximizing its
cumula ve rewards.
Support Vector Machine
• A Support Vector Machine (SVM) is a powerful machine most commonly
used in classifica on problems.
• SVM constructs a hyperplane or set of hyperplanes in a high-dimensional
space, which
can be used for classifica on. The goal is to find the best hyperplane that has
the largest distance to the nearest training data points of any class (func onal
margin), in order to improve the classifica on performance on unseen data.
• Applica ons of SVM:
• Text and Hypertext Classifica on: For filtering spam and categorizing
text based content for news ar cles.
• Image Classifica on: Useful in categorizing images into different
groups (e.g., animals, cars, fruits).
• Handwri en Character Recogni on: Used to recognize le ers and
digits from
handwri en documents.
• Biological Sciences: Applied in protein classifica on and cancer
classifica on based on gene expression data.
Gene c Algorithm
• A gene c algorithm (GA) is a search heuris c inspired by Charles Darwin's
theory of natural selec on. It is used to find op mal or near-op mal
solu ons to complex problems which might otherwise take a long me to
solve.
• Overview of Gene c Algorithm:
• Purpose: Gene c algorithms are used to solve op miza on and
search problems by mimicking the process of natural selec on.
• Process: This involves a popula on of individuals which evolve
towards a be er solu on by combining the characteris cs of high-
quality individuals.
Flowchart of Gene c Algorithm Process:
1. Ini alize Popula on: Start with a randomly generated popula on of n
individuals.
2. Fitness Evalua on: Evaluate the fitness of each individual in the
popula on. The fitness score determines how good
an individual solu on is at solving the problem.
3. Selec on: Select pairs of individuals (parents) based on their fitness
scores. Higher fitness scores generally mean a
higher chance of selec on.
4. Crossover (Recombina on): Combine the features of
selected parents to create offspring. This simulates sexual reproduc on.
5. Muta on: Introduce random changes to individual
offspring to maintain gene c diversity within the popula on.
6. Replacement: Replace the older genera on with the new genera on of
offspring.
7. Termina on: Repeat the process un l a maximum number of genera ons
is reached or a sa sfactory fitness level is achieved.
Example of Gene c Algorithm: Imagine we want to op mize the design of an
aerodynamic car. The objec ve is to minimize air resistance, which directly
impacts fuel efficiency.
• Encoding: Each car design is encoded as a string of numbers (genes),
represen ng different design parameters like shape, size, and materials.
• Ini al Popula on: Generate a random set of car designs.
• Fitness Evalua on: Use a simula on to calculate the air resistance of each
design.
• Selec on: Choose designs with the lowest air resistance.
• Crossover: Create new designs by mixing the features of selected designs.
• Muta on: Slightly alter the designs to explore a variety of design
possibili es.
• Repeat: Con nue the process to evolve increasingly efficient designs over
mul ple genera ons.
Issues in Machine Learning
• Data Quality:
• Importance of Quality: High-quality data is crucial for developing
effec ve ML models. Poor data can lead to inaccurate predic ons
and unreliable outcomes.
• Challenges:
• Data Evalua on and Integra on: Ensuring data is clean, well-
integrated, and representa ve. For example, a model trained to
recognize faces needs a diverse dataset that reflects various
ethnici es, ages, and ligh ng condi ons.
• Data Explora on and Governance: Implemen ng robust data
governance to maintain the integrity and usability of data over
me.
• Transparency:
• Model Explainability: ML models, especially complex ones like deep
neural networks, can act as "black boxes," where it's unclear how
decisions are made.
• Example: In a credit scoring model, it's crucial for regulatory and
fairness reasons to explain why a loan applica on was denied, which
can be challenging with highly complex ML models.
• Manpower:
• Skill Requirement: Effec ve use of ML requires a combina on of
skills in data science, so ware development, and domain exper se.
• Bias Avoidance: Having diverse teams is important to prevent biases
in model development.
• Example: An organiza on implemen ng an ML solu on for customer
service might
need experts in natural language processing, so ware engineering, and
customer interac on to develop a comprehensive tool.
• Other Issues:
• Misapplica on of Technology: ML is not suitable for every problem,
and its misuse can lead to wasted resources or poor decisions.
• Example: Employing deep learning for a simple data analysis
task, where tradi onal sta s cal methods would be more
appropriate and less costly.
• Innova on Misuse: The hype around new ML techniques can lead to
premature adop on
without proper understanding or necessity.
• Example: The early overuse of deep learning in situa ons
where simpler models could suffice, like predic ng
straigh orward outcomes from small datasets.
• Traceability and Reproducibility: Ensuring that ML experiments are
reproducible and that results can be traced back to specific data and
configura on se ngs.
• Example: A research team must be able to replicate an ML
experiment's results using
the same datasets and parameters to verify findings and ensure reliability.
S. No. Data Science Machine Learning

1. Involves data cleansing, Prac ce of using algorithms


prepara on, and analysis. to learn from and make
predic ons based on data.

2. Deals with a variety of data A subset of Ar ficial


opera ons. Intelligence focused on
sta s cal models and
algorithms.

3. Focuses on sourcing, cleaning, Programs learn from data


and processing data to extract and improve autonomously
meaningful insights. without explicit instruc ons.

4. Tools include SAS, Tableau, Tools include Amazon Lex,


Apache Spark, MATLAB. IBM Watson Studio,
Microso Azure ML Studio.
5. Applied in fraud detec on, Used in recommenda on
healthcare analysis, and systems like Spo fy, facial
business op miza on. recogni on technologies.

(UNIT-2: REGRESSION & BAYESIAN LEARNING)


• REGRESSION: Linear Regression and Logis c Regression.
• BAYESIAN LEARNING - Bayes theorem, Concept learning, Bayes Op mal
Classifier, Naïve Bayes classifier, Bayesian belief networks, EM algorithm.
• SUPPORT VECTOR MACHINE: Introduc on, Types of support vector kernel
- (Linear kernel, polynomial kernel,and Gaussiankernel), Hyperplane -
(Decision surface), Proper es of SVM, and Issues in SVM.
Regression
• Regression is a sta s cal technique used to analyze the rela onship
between a dependent variable and aone or more independent variables. It
is widely used in areas like finance, economics, and more, to predict
outcomes and understand variable interac ons.

Types of Regression
• Simple Linear Regression:
• This method involves one independent variable used to predict the
outcome of a dependent variable. The formula is Y=a+bX+u, where:
• Y is the dependent variable we want to predict.
• X is the independent variable used for predic on.
• a is the intercept of the regression line (value of Y when X is 0).
• b is the slope of the regression line, represen ng the change in
Y for a one-unit change in X.
• u is the regression residual, which is the error in the predic on.
• Example: Predic ng house prices (Y) based on house size (X). A larger
house size generally increases the house price.
• Mul ple Linear Regression:
• Involves two or more independent variables to predict the outcome.
The formula is
𝑌=𝑎+𝑏1𝑋1+𝑏2𝑋2+...+𝑏𝑛𝑋𝑛+𝑢, where:
• Each 𝑋𝑖 represents a different independent variable.
• Each 𝑏𝑖 is the coefficient for the corresponding independent
variable, showing how much 𝑌 changes when that variable
changes by one unit, holding other variables
constant.
Logis c Regression
Defini on of Logis c Regression:
• Logis c regression is a sta s cal method and a type of supervised machine
learning algorithm. It is used to es mate the probability that a given input
belongs to a certain category (typically a binary outcome).
• Characteris cs of the Dependent Variable:
• The target variable in logis c regression is binary, meaning it has two
possible outcomes. These outcomes are usually coded as 1
(indica ng success or the presence of a feature, like "yes") and 0
(indica ng failure or the absence of a feature, like "no").

• Applica ons:
• Logis c regression is widely applied in fields such as medicine,
finance, and marke ng. It helps in binary classifica on tasks such as
detec ng whether an email is spam or not, predic ng whether a
pa ent has a disease like diabetes, or determining if a transac on
might be fraudulent.
• Example:
• Predic ng Disease Occurrence: Suppose a medical researcher wants
to predict the likelihood that individuals have diabetes based on their
age and BMI. Here, the outcome variable 𝑌 is whether the person has
diabetes (1) or not (0), and the predictors 𝑋1 and 𝑋2 are age and BMI,
respec vely. The logis c regression model would help es mate the
probability of diabetes for different age groups and BMI levels, using
historical data to determine the coefficients 𝑏1 and 𝑏2 for age and
BMI.
Aspect Linear Regression Logis c Regression

Type of Supervised regression model Supervised classifica on


Model model

Predic on Predicts con nuous values Predicts binary outcomes (0


Outcome or 1)

Mathema cal Uses linear func ons Uses logis c func ons with
Model an ac va on func on

Purpose Es mates values of a Es mates probability of an


dependent variable event

Example Predic ng house prices based Predic ng whether a pa ent


on size has a disease or not
Concept learning
• Concept learning is the process of inferring a func on from labeled
training data in supervised learning. It involves iden fying pa erns or rules
that correctly classify instances into predefined categories, using methods
like decision trees or neural networks to search through possible
hypotheses and select the best one.
Example Sky AirTemp Humidity Wind Water Forecast EnjoySport
1 Sunny Warm Normal Strong Warm Same Yes
2 Sunny Warm High Strong Warm Same Yes
3 Rain Cold High Strong Warm Change No
4 Sunny Warm High Strong Cool Change Yes

• The given table represents a dataset where Tom's enjoyment of his favorite
water sports is recorded based on various weather condi ons. The goal is
to determine under what condi ons Tom enjoys water sport
A ributes and Their Values
• Sky: Sunny, Rainy
• AirTemp: Warm, Cold
• Humidity: Normal, High
• Wind: Strong
• Water: Warm, Cool
• Forecast: Same, Change
• EnjoySport: Indicates whether Tom enjoys water sports under these
condi ons. Possible
• values: Yes, No
• Advantages:
• Generaliza on: Learns broad rules from specific examples.
• Interpretability: Produces human-readable rules or models.
• Disadvantages:
• Overfi ng: Risk of overly complex models.
• Requires Labeled Data: Needs a lot of labeled examples.
• Applica ons:
• Email Spam Detec on: Classifies emails as spam or not.
• Medical Diagnosis: Predicts diseases from pa ent data.
Bayes Op mal Classifier
• The Bayes Op mal Classifier is a theore cal model in machine learning
that makes predic ons based on the highest posterior probability. It
uses Bayes' theorem to combine prior knowledge with observed data
to make the most accurate possible predic ons.

90

You might also like