0% found this document useful (0 votes)
206 views23 pages

AIML - Unit 4 Notes

This document provides notes on machine learning from a course on artificial intelligence and machine learning. It discusses preliminaries of machine learning, including definitions of machine learning and differences between artificial intelligence and machine learning. It also describes varieties of machine learning, learning input/output functions, the need for bias in machine learning, Boolean functions and their representations, and diagrammatic representations of Boolean functions.

Uploaded by

Harshitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
206 views23 pages

AIML - Unit 4 Notes

This document provides notes on machine learning from a course on artificial intelligence and machine learning. It discusses preliminaries of machine learning, including definitions of machine learning and differences between artificial intelligence and machine learning. It also describes varieties of machine learning, learning input/output functions, the need for bias in machine learning, Boolean functions and their representations, and diagrammatic representations of Boolean functions.

Uploaded by

Harshitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

PANIMALAR ENGINERING COLLEGE

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING

21EC1401 - ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING (LAB


INTEGRATED)
UNIT IV - Introduction to Machine Learning
UNIT IV NOTES
Syllabus:
Preliminaries, what is machine learning; varieties of machine learning, learning input/output
functions, bias, sample application. Boolean functions and their classes, CNF, DNF, decision
lists. Version spaces for learning, version graphs, learning search of a version space.

1.MACHINE LEARNING(ML)
ML is a branch of artificial intelligence:
1. Uses computing based systems to make sense out of data
2. Extracting patterns, fitting data to functions, classifying data, etc
ML systems can learn and improve
1. With historical data, time and experience
2. Bridges theoretical computer science and real noise data.

DEFINITION:
It's a subset of artificial intelligence (AI), which focuses on using statistical techniques
to build intelligent computer systems to learn from available databases.
1.1.WHEN DO WE USE MACHINE LEARNING?
ML is used when:
• Human expertise does not exist (navigating on Mars)
• Humans can’t explain their expertise (speech recognition)
• Models must be customized (personalized medicine)
• Models are based on huge amounts of data (genomics)

1.2.DIFFERENCE BETWEEN AI &ML


1.3.APPLICATION OF MACHINE LEARNING:
⮚ Statistics
⮚ Brain Models
⮚ Adaptive Control Theory
⮚ Evolutionary Models
⮚ Artificial Intelligence
⮚ Psychological Models
2.VARIETIES OF MACHINE LEARNING:
• Functions
• Logic programs and rule sets
• Finite-state machines
• Grammars
• Problem solving systems
3.LEARNING INPUT AND OUTPUT FUNCTION
● Assume there is a function f, and the task of the learner is to guess what it
is.
● Our hypothesis about the function to be learned is denoted by h.
● Both f and h are functions of a vector-valued input x = (x1, x2, .., xi , .. , xn)
which has n components.
● Input = X
● Output = h(X)
● The hypothesized function, h, is selected from a class of functions H.
● The function f also belongs to this class or to a subset of this class.
● We select h based on a training set, Ξ, of m input vector examples.
3.1.TYPES OF LEARNING
There are two major settings in which we learn a function.
1. Supervised Learning.
2. Unsupervised Learning.
3.1.1.SUPERVISED LEARNING
● In supervised learning, we know (sometimes only approximately) the values
of f for the m samples in the training set, Ξ.
● A hypothesis, h, that closely agrees with f for the members of Ξ, then this
hypothesis will be a good guess for f—especially if Ξ is large.
● Curve-fitting is a simple example of supervised learning of a function.
● The values of a two-dimensional function, f, at the four sample points
shown by the solid circles is given.

● Need to fit these four points with a function, h, drawn from the set H, of
second-degree functions.
● There a two-dimensional parabolic surface above the x1, x2 plane that fits
the points.
● This Parabolic function ‘h’ is the hypothesis about the function ‘f’ that
produced four samples.
3.1.2.UNSUPERVISED LEARNING
● There exists a training set of vectors without function values for them.
● The problem in this case is to partition the training set into subsets
● They are used in taxonomic problems i.e., to invent ways to classify data
into meaningful categories.

3.1.3.SPEED-UP LEARNING
Changing an existing function into an equivalent one that is computationally
more efficient. This type of learning is called speed-up learning.
3.2.OUTPUTS
The output may be
● Real numbers
● Categorical value
● Vector-valued outputs
● Boolean output values
REAL NUMBERS:
The process embodying the function, h, is called a function estimator, and the
output is called an output value or estimate.

CATEGORICAL VALUE:
The process embodying h is variously called a classifier, a recognizer, or a
categorizer, and the output itself is called a label, a class, a category, or a decision.

VECTOR-VALUED OUTPUTS are also possible with components being real


numbers or categorical values.

BOOLEAN OUTPUT VALUES:


A training pattern having value 1 is called a positive instance, and a training
sample having value 0 is called a negative instance.
Learning a Boolean function is sometimes called concept learning, and the
function is called a concept.

3.3.INPUT VECTORS
The input vector called by a variety of names are:
1. input vector
2. pattern vector
3. feature vector
4. sample
5. example
6. instance
The components, xi , of the input vector are variously called features, attributes,
input variables and components.

● The values of the components can be of three main types. They might be
1. Real valued numbers
2. Discrete-valued numbers(Boolean values (1,0))
3. Categorical values (Boolean values (True, False))
(e.g.,) Categorical values may be
1. ordered
2. unordered
✔ Class, major, sex , advisor are the attributes which can be used to represent a
student.
✔ A particular student can also be represented by a vector such as
Sophomore, history, male, higgins – Unordered
✔ Small, medium, large – Ordered
(e.g.) For attribute value representation
Major : history
Sex : male
Class : sophomore
Advisor : higgins
Age : 19
An important specialization uses Boolean values, which can be regarded as a
special case of either discrete numbers (1,0) or of categorical variables (True, False).
3.4.TRAINING REGIMES
Several ways exist in which the training set Ξ can be used to produce a
hypothesized function.
1. Batch Method : Entire training set is used all at once to compute the function ‘h’.
2. Incremental Method :One member is selected at a time from the training set and
used to modify a current hypothesis. Then another member is selected and so
on.Selection method can be random or in a cyclic manner.
3. Online Method :
● Using the training set members as they become available.
● Used when next training instance is some function of current hypothesis and
previous instance.
(e.g.) Classifier which is used to decide on a robot’s next action given its current set of
sensory inputs.
3.5.PERFORMANCE EVALUATION
In Supervised Learning, evaluation is done on a separate set of inputs and
function values called Testing set.
A hypothesized function is generalized when it guesses well on the testing
set.Common measures are:
1) Mean Squared Error (MSE)
2) Total number of errors.
3.6.NOISE
Vectors in the Training set are corrupted by noise.
1.Class Noise:
It randomly alters the value of the function.
2. Attribute Noise:
It randomly alters the value of the components of the input vector.
4.LEARNING REQUIRES BIAS
• Machine learning is a branch of Artificial Intelligence, which allows machines
to perform data analysis and make predictions.
• If the machine learning model is not accurate, it can make predictions errors,
and these prediction errors are usually known as Bias and Variance.
• The main aim of ML/data science analysts is to reduce these errors in order to
get more accurate results.

Two types of errors in machine learning.


• Reducible errors: These errors can be reduced to improve the model accuracy.
Such errors can further be classified into bias and Variance.
• Irreducible errors: These errors will always be present in the model regardless
of which algorithm has been used. The cause of these errors is unknown variables
whose value can't be reduced.
4.1.HOW BIAS AIDS LEARNING

• The above hypercube represents a boolean function.


• Each vertex represents a different input pattern.
• Six sample patterns in a training set are shown.
• Small squares represent ‘1’
• Small circles represent ‘0’
• Though all possible patterns are not given by the training set, the function can
be determined if bias is given.

5.BOOLEAN FUNCTIONS
5.1.REPRESENTATION OF BOOLEAN ALGEBRA
Boolean algebra is a convenient notation for representing Boolean functions.
Boolean algebra uses

⮚ Connectives (and)
⮚ Inclusive (or)
⮚ Complement or Negation of a variable

❖ The and function of two variables is written x1 · x2.The connective, “·” is


usually suppressed, and the and function is written x1x2.
1. x1x2 has value 1 if and only if both x1 and x2 have value 1.
2. x1x2 has value 0, if either x1 or x2 has value 0.

❖ The (inclusive) or function of two variables is written x1 + x2.


1. x1 + x2 has value 1 if and only if either or both of x1 or x2 has value 1.
2. x1 + x2 has value 0,if both x1 and x2 have value 0.

● The complement or negation of a variable, x, is written x.


1. x has value 1 if and only if x has value 0.
2. x has value 0,if x has value 1.
● A Boolean formula consisting of a single variable, such as x1 is called an atom.
● One consisting of either a single variable or its complement, such as x1, is
called a literal.

The operators · and + do not commute between themselves. Instead, we have


DeMorgan’s laws (which can be verified by using the above definitions):

5.2.DIAGRAMMATIC REPRESENTATIONS
Boolean function could be represented by labeling the vertices of a cube. For a
function of n variables, we would need an n-dimensional hypercube. The following figure
shows Truth Table for AND,OR,NOT.

We show some 2-dimensional examples.

1. Vertices having value 1 are labeled with a small square.


2. Vertices having value 0 are labeled with a small circle.
Here we show 3-dimensional examples.
Using the hypercube representations,a 3-dimensional cube has 23 = 8 vertices.
In general, 2- and 3-dimensional cubes provide some intuition about the properties of
certain Boolean functions. Of course, we cannot visualize hypercubes (for n > 3), and
there are many surprising properties of higher dimensional spaces, so we must be
careful in using intuitions gained in low dimensions.
One diagrammatic technique for dimensions slightly higher than 3 is the
Karnaugh map.
A Karnaugh map is an array of values of a Boolean function in which the
horizontal rows are indexed by the values of some of the variables and the vertical
columns are indexed by the rest. The rows and columns are arranged in such a way that
entries that are adjacent in the map correspond to vertices that are adjacent in the
hypercube representation.
Here we show an example of the 4-dimensional even parity function in Fig.
(An even parity function is a Boolean function that has value 1 if there are an even
number of its arguments that have value 1; otherwise it has value 0.)
6.CLASSES OF BOOLEAN FUNCTIONS
6.1 TERMS AND CLAUSES
To use absolute bias in machine learning, we limit the class of hypotheses. In
learning Boolean functions, we frequently use some of the common sub-classes of those
functions. Therefore, it will be important to know about these subclasses.
1. One basic subclass is called terms.
2. A term is any function written in the form
, where the li are literals. Such a form is
called a conjunction of literals. Some example terms are
.
3. The size of a term is the number of literals it contains. The examples are of sizes
2 and 3, respectively.
4. A clause is any function written in the form are literals.Such a form is called
disjunction of literals.

6.2.DNF FUNCTIONS
A Boolean function is said to be in disjunctive normal form (DNF) if it can be
written as a disjunction of terms.Some examples in DNF are:

A DNF expression is called a k-term DNF expression if it is a disjunction of k


terms; it is in the class k-DNF if the size of its largest term is k. The examples above
are 2-term and 3-term expressions,respectively. Both expressions are in the class 3-
DNF.Each term in a DNF expression for a function is called an implicant because
it “implies” the function (if the term has value 1, so does the function).

1. A term, t, is an implicant of a function, f, if f has value 1 whenever t does.


2. A term, t, is a prime implicant of f if the term, t’, formed by taking any literal
out of an implicant t is no longer an implicant of f. (The implicant cannot be
“divided” by any term and remains an implicant.)

Therefore, are prime implicants of

but is not.
The relationship between implicants and prime implicants can be geometrically
illustrated using the cube representation for Boolean functions.For example,the function
f,
and illustrated in the following figure.Each of the three planes in the figure
“cuts off” a group of vertices having value 1, but none cuts off any vertices having
value 0.
These planes are pictorial devices used to isolate certain lower dimensional
subfaces of the cube. Two of them isolate one-dimensional edges, and the third isolates
a zero-dimensional vertex. Each group of vertices on a subface corresponds to
one of the implicants of the function, f, and thus each implicant corresponds to a subface
of some dimension.

If we can express a function in DNF form, we can use the consensus method to
find an expression for the function in which each term is a prime implicant.

• Consensus:

where f1 and f2 are terms such that no literal appearing in f1 appears


complemented in f2. f1 · f2 is called the consensus of xi· f1
Examples: x1 is the consensus of x1x2 and x1𝑥2 . The terms 𝑥1 x2and x1𝑥2
have no consensus since each term has more than one literal appearing
complemented in the other.

• Subsumption:
xi· f1 + f1 = f1
where f1 is a term. We say that f1 subsumes xi· f1.
Consider the example:
We show a derivation of a set of prime implicants in the consensus tree
of Figure. The circled numbers adjoining the terms indicate the order in which
the consensus and subsumption operations were performed.
Shaded boxes surrounding a term indicate that it was subsumed. The final
form of the function in which all terms are prime implicants is:

Its terms are all of the non-subsumed terms in the consensus tree.

6.3.CNF FUNCTIONS

Disjunctive normal form has a dual: conjunctive normal form (CNF). A


Boolean function is said to be in CNF if it can be written as a conjunction of
clauses.
An example in CNF is: f = (x1 +x2)(x2 +x3 +x4). A CNF expression is
called a k-clause CNF expression if it is a conjunction of k clauses; it is in the
class k-CNF if the size of its largest clause is k. The example is a 2-clause
expression in 3-CNF.
6.4.DECISION LIST

A decision list is written as an ordered list of pairs:

where
● vi are either 0 or 1, the ti are terms in (x1, . . . , xn)
● T is a term whose value is 1 (regardless of the values of the xi).
● The value of a decision list is the value of vi for the first ti in the list that
has value 1. (At least one ti will have value 1, because the last one does;
v1 can be regarded as a default value of the decision list.)
● The decision list is of size k, if the size of the largest term in it is k. The
class of decision lists of size k or less is called k-DL.
An example decision list is:

f has value 0 for x1 = 0, x2 = 0, and x3 = 1. It has value 1 for x1 = 1, x2 = 0,


and x3 = 1. This function is in 3-DL.

7. VERSION SPACE LEARNING FOR ML

Machine Learning compromises of several learning methods, approaches and


techniques, one such basic learning method utilizes version spaces which are more
prominent in boolean function learning.
Let’s assume that we are given a list of inputs X and their corresponding
outputs f(X), along with this we also have a per-defined Hypothesis Space Hv. With
version space learning, we will look at each pair of input-output and rule out
hypothesis from Hv that are not consistent, finally reaching at a subset of Hv. This
subset of Hv, let’s name it “h” is consistent with our data. How do we say h is
consistent with our data-set you ask? h is said to consistent with our dataset if h(x) =
f(x) for all x in X.

Now, how do we find out what hypothesis functions to rule out?

When we are trying to classify the input x1 , it is classified as 0 or 1 (remember,


we are looking a boolean functions) depending upon the majority of the outputs of the
functions in the version space Hv.
If, x1 is classified correctly we move on to the next datapoint but if a mistake is
made , we can drop all the functions that have contributed in-correctly. we can make no
more than log2 (|H|) mistakes, where |H| is the number of hypotheses in the original
hypothesis set H. This is called a mistake bound which is an important concept in the
Machine Learning theory.
Mistake bound tells us that for a learning procedure we cannot make mistakes
more than this upper bound. Thus, lower the mistake-bound the better the learning
approach.
7.1.VERSION GRAPHS

Boolean functions can be ordered by generality. A Boolean function, f is more


general than a function, g if f has value 1 for all of the arguments for which g has value
1 given f is not equal to g

A version graph is used in order to represent this relationship of generality


between the various hypothesis functions of the version space.
The hypothesis hi is represented as a node in the version graph with an arc going
from less general to hypothesis to more general ones.
As, the function f(x)=0 which we represent using “0” here has the value 0 for all
values of x, it is the most specific function( or the least general) and lies at the base of
the version graph.Similarly, as f(x)=1 which we represent here with “1” has the value 1
for all values of x, it is the most general function and lies on the top of a version graph.

As, we continue our training process, the hypothesis that are not consistent are
ruled out and we get a less crowded version graph.
In the version graph thus obtained, there are some values that are maximally
general (excluding “1”) which are known as general boundary set (gbs).
Other functions that are maximally specific (excluding “0”) which are known as
specific boundary set (sbs).
The boundary sets are crucial for any hypothesis space as they provide an explicit
way to define whether a function is a part of the space or not.
This determination is possible as every function in the hypothesis must be more
general than the sbs and more specific than the gbs.
7.2.LEARNING AS SEARCH OF A VERSION SPACE
Learning the solution space of a problem can be thought of as a search problem
where one can either take a top-down approach and utilize various specification
operators to a general function until it is consistent with the dataset or one can take a
bottom-up approach where a specific function is continuously transformed using
generalization operators until we obtain a consistent solution space.

7.3.THE CANDIDATE ELIMINATION METHOD

The Candidate-Elimination algorithm is similar to List-Then-Eliminate


algorithm but uses a more compact representation of version space.
– represents version space by its most general and most specific members
The Candidate-Elimination algorithm represents the version space by recording
only the most general members (G) and its most specific members (S)
The General boundary, G, of version space is the set of its maximally general
members
The Specific boundary, S, of version space is the set of its maximally specific
members
Candidate-Elimination algorithm proceeds by
– initialising G and S to the maximally general and maximally specific
hypotheses in H
– considering each training example in turn and
∗ using positive examples to drive the maximally specific boundary up
∗ using negative examples to drive the maximally general boundary down
G ← maximally general hypotheses in H
S ← maximally specific hypotheses in H
● Concept learning: Concept learning is basically the learning task of the
machine (Learn by Train data)
● General Hypothesis: Not Specifying features to learn the machine.
● G = {‘?’, ‘?’,’?’,’?’…}: Number of attributes
● Specific Hypothesis: Specifying features to learn machine (Specific feature)
● S= {‘ ϕ’, ‘ϕ’, ‘ϕ’, ……, ‘ϕ’ } The number of pi depends on a number of
attributes.
● Version Space: It is an intermediate of general hypothesis and Specific
hypothesis. It not only just writes one hypothesis but a set of all possible
hypotheses based on training data-set.

Representations:
o The most specific hypothesis is represented using ϕ.
o The most general hypothesis is represented using ?.

Algorithm:
Step1: Load Data set
Step2: Initialize General Hypothesis and Specific Hypothesis.
Step3: For each training example
Step4: If example is positive example
if attribute_value == hypothesis_value:
Do nothing
else:
replace attribute value with '?' (Basically generalizing it)
Step5: If example is Negative example
Make generalize hypothesis more specific.

Version Space is a cross between a generic and a specific theory. It didn’t simply
write one hypothesis; it wrote a list of all feasible hypotheses based on the training data.
With regard to hypothesis space H and training examples D, the version space,
denoted as VSH,D is the subset of hypotheses from H that are consistent with the training
instances in D.
For example, consider the following dataset. The classic example of
EnjoySport.

Algorithmic steps:

Initially : G = [?, ?, ?, ?, ?, ?]
S = [Null, Null, Null, Null, Null, Null]

For instance 1 : <'sunny','warm','normal','strong','warm ','same'> and positive output.


G1 = G
S1 = ['sunny','warm','normal','strong','warm ','same']

For instance 2 : <'sunny','warm','high','strong','warm ','same'> and positive output.


G2 = G
S2 = ['sunny','warm',?,'strong','warm ','same']

For instance 3 : <'rainy','cold','high','strong','warm ','change'> and negative output.

G3 = [['sunny', ?, ?, ?, ?, ?], [?, 'warm', ?, ?, ?, ?], [?, ?, ?, ?, ?, 'same']]


S3 = S2

For instance 4 : <'sunny','warm','high','strong','cool','change'> and positive output.


G4 = G3
S4 = ['sunny','warm',?,'strong', ?, ?]

At last, by synchronizing the G4 and S4 algorithm produce the output.

Output :

G = [['sunny', ?, ?, ?, ?, ?], [?, 'warm', ?, ?, ?, ?]]


S = ['sunny','warm',?,'strong', ?, ?]
The Candidate Elimination Algorithm (CEA) is an improvement over the Find-
S algorithm for classification tasks.Here are some advantages and disadvantages of
CEA in comparison with Find-S:

Advantages of CEA over Find-S:

1. Improved accuracy: CEA considers both positive and negative examples to


generate the hypothesis, which can result in higher accuracy when dealing with
noisy or incomplete data.
2. Flexibility: CEA can handle more complex classification tasks, such as those
with multiple classes or non-linear decision boundaries.
3. More efficient: CEA reduces the number of hypotheses by generating a set of
general hypotheses and then eliminating them one by one. This can result in
faster processing and improved efficiency.
4. Better handling of continuous attributes: CEA can handle continuous
attributes by creating boundaries for each attribute, which makes it more suitable
for a wider range of datasets.

Disadvantages of CEA in comparison with Find-S:

1. More complex: CEA is a more complex algorithm than Find-S, which may
make it more difficult for beginners or those without a strong background in
machine learning to use and understand.
2. Higher memory requirements: CEA requires more memory to store the set of
hypotheses and boundaries, which may make it less suitable for memory-
constrained environments.
3. Slower processing for large datasets
Candidate Elimination Algorithm in Machine
Learning
Candidate Elimination Algorithm is used to find the set of
consistent hypothesis, that is Version space.

Example Size Color Shape Class/Label


1 Big Red Circle No
2 Small Red Triangle No
3 Small Red Circle Yes
4 Big Blue Circle No
5 Small Blue Circle Yes

Solution:

S0: (0, 0, 0) Most Specific Boundary

G0: (?, ?, ?) Most Generic Boundary

The first example is negative, the hypothesis at the specific boundary is


consistent, hence we retain it, and the hypothesis at the generic boundary
is inconsistent hence we write all consistent hypotheses by removing one
“?” at a time.

S1: (0, 0, 0)

G1: (Small, ?, ?), (?, Blue, ?), (?, ?, Triangle)

The second example is negative, the hypothesis at the specific boundary


is consistent, hence we retain it, and the hypothesis at the generic
boundary is inconsistent hence we write all consistent hypotheses by
removing one “?” at a time.
S2: (0, 0, 0)

G2: (Small, Blue, ?), (Small, ?, Circle), (?, Blue, ?), (Big, ?, Triangle), (?,
Blue, Triangle)

The third example is positive, the hypothesis at the specific


boundary is inconsistent, hence we extend the specific boundary, and the
consistent hypothesis at the generic boundary is retained and inconsistent
hypotheses are removed from the generic boundary.

S3: (Small, Red, Circle)

G3: (Small, ?, Circle)

The fourth example is negative, the hypothesis at the specific boundary is


consistent, hence we retain it, and the hypothesis at the generic boundary
is inconsistent hence we write all consistent hypotheses by removing one
“?” at a time.

S4: (Small, Red, Circle)

G4: (Small, ?, Circle)

The fifth example is positive, the hypothesis at the specific boundary is


inconsistent, hence we extend the specific boundary, and the consistent
hypothesis at the generic boundary is retained and inconsistent
hypotheses are removed from the generic boundary.

S5: (Small, ?, Circle)

G5: (Small, ?, Circle)

Learned Version Space by Candidate Elimination Algorithm for


given data set is:

S: G: (Small, ?, Circle)

You might also like