AIML - Unit 4 Notes
AIML - Unit 4 Notes
1.MACHINE LEARNING(ML)
ML is a branch of artificial intelligence:
1. Uses computing based systems to make sense out of data
2. Extracting patterns, fitting data to functions, classifying data, etc
ML systems can learn and improve
1. With historical data, time and experience
2. Bridges theoretical computer science and real noise data.
DEFINITION:
It's a subset of artificial intelligence (AI), which focuses on using statistical techniques
to build intelligent computer systems to learn from available databases.
1.1.WHEN DO WE USE MACHINE LEARNING?
ML is used when:
• Human expertise does not exist (navigating on Mars)
• Humans can’t explain their expertise (speech recognition)
• Models must be customized (personalized medicine)
• Models are based on huge amounts of data (genomics)
● Need to fit these four points with a function, h, drawn from the set H, of
second-degree functions.
● There a two-dimensional parabolic surface above the x1, x2 plane that fits
the points.
● This Parabolic function ‘h’ is the hypothesis about the function ‘f’ that
produced four samples.
3.1.2.UNSUPERVISED LEARNING
● There exists a training set of vectors without function values for them.
● The problem in this case is to partition the training set into subsets
● They are used in taxonomic problems i.e., to invent ways to classify data
into meaningful categories.
3.1.3.SPEED-UP LEARNING
Changing an existing function into an equivalent one that is computationally
more efficient. This type of learning is called speed-up learning.
3.2.OUTPUTS
The output may be
● Real numbers
● Categorical value
● Vector-valued outputs
● Boolean output values
REAL NUMBERS:
The process embodying the function, h, is called a function estimator, and the
output is called an output value or estimate.
CATEGORICAL VALUE:
The process embodying h is variously called a classifier, a recognizer, or a
categorizer, and the output itself is called a label, a class, a category, or a decision.
3.3.INPUT VECTORS
The input vector called by a variety of names are:
1. input vector
2. pattern vector
3. feature vector
4. sample
5. example
6. instance
The components, xi , of the input vector are variously called features, attributes,
input variables and components.
● The values of the components can be of three main types. They might be
1. Real valued numbers
2. Discrete-valued numbers(Boolean values (1,0))
3. Categorical values (Boolean values (True, False))
(e.g.,) Categorical values may be
1. ordered
2. unordered
✔ Class, major, sex , advisor are the attributes which can be used to represent a
student.
✔ A particular student can also be represented by a vector such as
Sophomore, history, male, higgins – Unordered
✔ Small, medium, large – Ordered
(e.g.) For attribute value representation
Major : history
Sex : male
Class : sophomore
Advisor : higgins
Age : 19
An important specialization uses Boolean values, which can be regarded as a
special case of either discrete numbers (1,0) or of categorical variables (True, False).
3.4.TRAINING REGIMES
Several ways exist in which the training set Ξ can be used to produce a
hypothesized function.
1. Batch Method : Entire training set is used all at once to compute the function ‘h’.
2. Incremental Method :One member is selected at a time from the training set and
used to modify a current hypothesis. Then another member is selected and so
on.Selection method can be random or in a cyclic manner.
3. Online Method :
● Using the training set members as they become available.
● Used when next training instance is some function of current hypothesis and
previous instance.
(e.g.) Classifier which is used to decide on a robot’s next action given its current set of
sensory inputs.
3.5.PERFORMANCE EVALUATION
In Supervised Learning, evaluation is done on a separate set of inputs and
function values called Testing set.
A hypothesized function is generalized when it guesses well on the testing
set.Common measures are:
1) Mean Squared Error (MSE)
2) Total number of errors.
3.6.NOISE
Vectors in the Training set are corrupted by noise.
1.Class Noise:
It randomly alters the value of the function.
2. Attribute Noise:
It randomly alters the value of the components of the input vector.
4.LEARNING REQUIRES BIAS
• Machine learning is a branch of Artificial Intelligence, which allows machines
to perform data analysis and make predictions.
• If the machine learning model is not accurate, it can make predictions errors,
and these prediction errors are usually known as Bias and Variance.
• The main aim of ML/data science analysts is to reduce these errors in order to
get more accurate results.
5.BOOLEAN FUNCTIONS
5.1.REPRESENTATION OF BOOLEAN ALGEBRA
Boolean algebra is a convenient notation for representing Boolean functions.
Boolean algebra uses
⮚ Connectives (and)
⮚ Inclusive (or)
⮚ Complement or Negation of a variable
5.2.DIAGRAMMATIC REPRESENTATIONS
Boolean function could be represented by labeling the vertices of a cube. For a
function of n variables, we would need an n-dimensional hypercube. The following figure
shows Truth Table for AND,OR,NOT.
6.2.DNF FUNCTIONS
A Boolean function is said to be in disjunctive normal form (DNF) if it can be
written as a disjunction of terms.Some examples in DNF are:
but is not.
The relationship between implicants and prime implicants can be geometrically
illustrated using the cube representation for Boolean functions.For example,the function
f,
and illustrated in the following figure.Each of the three planes in the figure
“cuts off” a group of vertices having value 1, but none cuts off any vertices having
value 0.
These planes are pictorial devices used to isolate certain lower dimensional
subfaces of the cube. Two of them isolate one-dimensional edges, and the third isolates
a zero-dimensional vertex. Each group of vertices on a subface corresponds to
one of the implicants of the function, f, and thus each implicant corresponds to a subface
of some dimension.
If we can express a function in DNF form, we can use the consensus method to
find an expression for the function in which each term is a prime implicant.
• Consensus:
• Subsumption:
xi· f1 + f1 = f1
where f1 is a term. We say that f1 subsumes xi· f1.
Consider the example:
We show a derivation of a set of prime implicants in the consensus tree
of Figure. The circled numbers adjoining the terms indicate the order in which
the consensus and subsumption operations were performed.
Shaded boxes surrounding a term indicate that it was subsumed. The final
form of the function in which all terms are prime implicants is:
Its terms are all of the non-subsumed terms in the consensus tree.
6.3.CNF FUNCTIONS
where
● vi are either 0 or 1, the ti are terms in (x1, . . . , xn)
● T is a term whose value is 1 (regardless of the values of the xi).
● The value of a decision list is the value of vi for the first ti in the list that
has value 1. (At least one ti will have value 1, because the last one does;
v1 can be regarded as a default value of the decision list.)
● The decision list is of size k, if the size of the largest term in it is k. The
class of decision lists of size k or less is called k-DL.
An example decision list is:
As, we continue our training process, the hypothesis that are not consistent are
ruled out and we get a less crowded version graph.
In the version graph thus obtained, there are some values that are maximally
general (excluding “1”) which are known as general boundary set (gbs).
Other functions that are maximally specific (excluding “0”) which are known as
specific boundary set (sbs).
The boundary sets are crucial for any hypothesis space as they provide an explicit
way to define whether a function is a part of the space or not.
This determination is possible as every function in the hypothesis must be more
general than the sbs and more specific than the gbs.
7.2.LEARNING AS SEARCH OF A VERSION SPACE
Learning the solution space of a problem can be thought of as a search problem
where one can either take a top-down approach and utilize various specification
operators to a general function until it is consistent with the dataset or one can take a
bottom-up approach where a specific function is continuously transformed using
generalization operators until we obtain a consistent solution space.
Representations:
o The most specific hypothesis is represented using ϕ.
o The most general hypothesis is represented using ?.
Algorithm:
Step1: Load Data set
Step2: Initialize General Hypothesis and Specific Hypothesis.
Step3: For each training example
Step4: If example is positive example
if attribute_value == hypothesis_value:
Do nothing
else:
replace attribute value with '?' (Basically generalizing it)
Step5: If example is Negative example
Make generalize hypothesis more specific.
Version Space is a cross between a generic and a specific theory. It didn’t simply
write one hypothesis; it wrote a list of all feasible hypotheses based on the training data.
With regard to hypothesis space H and training examples D, the version space,
denoted as VSH,D is the subset of hypotheses from H that are consistent with the training
instances in D.
For example, consider the following dataset. The classic example of
EnjoySport.
Algorithmic steps:
Initially : G = [?, ?, ?, ?, ?, ?]
S = [Null, Null, Null, Null, Null, Null]
Output :
1. More complex: CEA is a more complex algorithm than Find-S, which may
make it more difficult for beginners or those without a strong background in
machine learning to use and understand.
2. Higher memory requirements: CEA requires more memory to store the set of
hypotheses and boundaries, which may make it less suitable for memory-
constrained environments.
3. Slower processing for large datasets
Candidate Elimination Algorithm in Machine
Learning
Candidate Elimination Algorithm is used to find the set of
consistent hypothesis, that is Version space.
Solution:
S1: (0, 0, 0)
G2: (Small, Blue, ?), (Small, ?, Circle), (?, Blue, ?), (Big, ?, Triangle), (?,
Blue, Triangle)
S: G: (Small, ?, Circle)