original ML lab manual (1)
original ML lab manual (1)
Course Objectives:
1. To introduce students to the basic concepts and techniques of Machine Learn- ing.
2. To develop skills of using recent machine learning software for solving prac- tical
problems.
Course Outcomes:
List of Experiments:
1. Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data
from a .CSV file.
2. For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of the set
of all hypotheses consistent with the training examples.
3. Write a program to demonstrate the working of the decision tree based ID3 algorithm.
Use a n appropriate data set for b u i l d i n g the decision tree and apply this knowledge
to classify a new sample .
6. Write a program to implement the naı̈ve Bayesian classifier for a sample training
data set stored as a .CSV file.
7. Assuming a set of documents that need to be classified, use the naı̈ve Bayesian
Classifier model to perform this task. Built-in Java classes/API can be used to write
the program. Calculate the accuracy, precision, and recall for your data set.
8. Write a program to construct a Bayesian network considering medical data.
Use this model to demonstrate the diagnosis of heart patients using standard
Heart Disease Data Set. You can use Java/Python ML library classes/API.
10. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set. Print
both correct and wrong predictions.
11. Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs.
Excercise-1: Implement and demonstrate the FIND-S algorithm for finding the most
specific hypothesis based on a given set of training data samples. Read the training data
from a .CSV file.
Program Code:
Import csv
With open ('tennis.csv', 'r') as f: reader = csv.reader(f) your_list = list(reader)
h = [['0', '0', '0', '0', '0', '0']]
for i in your_list: print(i)
if i[-1] == "True":
j=0
for x in i:
if x != "True":
if x != h[0][j] and h[0][j] == '0':
h[0][j] = x
elif x != h[0][j] and h[0][j] != '0':
h[0][j] = '?'
else:
pass
j=j+1
print("Most specific hypothesis is")
print(h)
Output
Program Code:
class Holder:
factors={} #Initialize an empty dictionary
attributes = () #declaration of dictionaries parameters with an arbitrary length
'''
Constructor of class Holder holding two parameters, self refers to the instance of the class
'''
def init (self,attr): # self.attributes = attr for i in attr:
self.factors[i]=[]
class CandidateElimination:
Positive={} #Initialize positive empty dictionary Negative={} #Initialize negative empty dictionary
'''
Programmatically populate list in the iterating variable trial_set '''
count=0
for trial_set in self.dataset:
if self.is_positive(trial_set):
#if trial set/example consists of positive examples
G = self.remove_inconsistent_G(G,trial_set[0]) #remove inconsitent data from the general boundary
S_new = S[:] #initialize the dictionary with no key-value pair print (S_new)
for s in S:
if not self.consistent(s,trial_set[0]): S_new.remove(s)
generalization = self.generalize_inconsistent_S(s,trial_set[0]) generalization =
self.get_general(generalization,G)
if generalization: S_new.append(generalization)
S = S_new[:]
S = self.remove_more_general(S) print(S)
else:#if it is negative
def initializeS(self):
''' Initialize the specific boundary '''
S = tuple(['-' for factor in range(self.num_factors)]) #6 constraints in the vector return [S]
def initializeG(self):
''' Initialize the general boundary '''
G = tuple(['?' for factor in range(self.num_factors)]) # 6 constraints in the vector return [G]
def is_positive(self,trial_set):
''' Check if a given training trial_set is positive ''' if trial_set[1] == 'Y':
return True
elif trial_set[1] == 'N': return False
else:
raise TypeError("invalid target value")
def match_factor(self,value1,value2):
''' Check for the factors values match, necessary while checking the consistency of training trial_set with
the hypothesis '''
if value1 == '?' or value2 == '?':
return True
elif value1 == value2 :
return True
return False
def consistent(self,hypothesis,instance):
''' Check whether the instance is part of the hypothesis ''' for i,factor in enumerate(hypothesis):
if not self.match_factor(factor,instance[i]): return False
return True
def remove_inconsistent_G(self,hypotheses,instance): ''' For a positive trial_set, the hypotheses in G
inconsistent with it should be removed ''' G_new = hypotheses[:]
for g in hypotheses:
if not self.consistent(g,instance): G_new.remove(g)
return G_new
def remove_more_general(self,hypotheses):
''' After generalizing S for a positive trial_set, the hypothesis in S general than others in S should be
removed '''
S_new = hypotheses[:] for old in hypotheses:
def remove_more_specific(self,hypotheses):
''' After specializing G for a negative trial_set, the hypothesis in G specific than others in G should be
removed '''
G_new = hypotheses[:] for old in hypotheses: for new in G_new:
if old!=new and self.more_specific(new,old): G_new.remove[new]
return G_new
def generalize_inconsistent_S(self,hypothesis,instance):
''' When a inconsistent hypothesis for positive trial_set is seen in the specific boundary S,
it should be generalized to be consistent with the trial_set ... we will get one hypothesis'''
hypo = list(hypothesis) # convert tuple to list for mutability for i,factor in enumerate(hypo):
if factor == '-':
hypo[i] = instance[i]
elif not self.match_factor(factor,instance[i]): hypo[i] = '?'
generalization = tuple(hypo) # convert list back to tuple for immutability return generalization
def specialize_inconsistent_G(self,hypothesis,instance):
''' When a inconsistent hypothesis for negative trial_set is seen in the general boundary G
should be specialized to be consistent with the trial_set.. we will get a set of hypotheses '''
specializations = []
hypo = list(hypothesis) # convert tuple to list for mutability for i,factor in enumerate(hypo):
if factor == '?':
values = self.factors[self.attr[i]] for j in values:
if instance[i] != j: hyp=hypo[:] hyp[i]=j
hyp=tuple(hyp) # convert list back to tuple for immutability specializations.append(hyp)
return specializations
def get_general(self,generalization,G):
''' Checks if there is more general hypothesis in G
for a generalization of inconsistent hypothesis in S
in case of positive trial_set and returns valid generalization '''
for g in G:
if self.more_general(g,generalization): return generalization
return None
def get_specific(self,specializations,S):
''' Checks if there is more specific hypothesis in S for each of hypothesis in specializations of an
inconsistent hypothesis in G in case of negative trial_set and return the valid specializations'''
valid_specializations = [] for hypo in specializations:
for s in S:
if self.more_specific(s,hypo) or s==self.initializeS()[0]: valid_specializations.append(hypo)
return valid_specializations
def exists_general(self,hypothesis,G):
'''Used to check if there exists a more general hypothesis in general boundary for version space'''
for g in G:
if self.more_general(g,hypothesis): return True
return False
def exists_specific(self,hypothesis,S):
'''Used to check if there exists a more specific hypothesis in general boundary for version space'''
for s in S:
if self.more_specific(s,hypothesis): return True
return False
def more_general(self,hyp1,hyp2):
''' Check whether hyp1 is more general than hyp2 ''' hyp = zip(hyp1,hyp2)
for i,j in hyp: if i == '?':
continue
elif j == '?':
if i != '?': return False
elif i != j: return False
else:
continue return True
dataset=[(('sunny','warm','normal','strong','warm','same'),'Y'),(('sunny','warm','high','strong','warm','same')
,'Y'),(('rainy','cold','high','strong','warm','change'),'N'),(('sunny','warm','hi gh','strong','cool','change'),'Y')]
attributes =('Sky','Temp','Humidity','Wind','Water','Forecast') f = Holder(attributes)
f.add_values('Sky',('sunny','rainy','cloudy')) #sky can be sunny rainy or cloudy
f.add_values('Temp',('cold','warm')) #Temp can be sunny cold or warm
f.add_values('Humidity',('normal','high')) #Humidity can be normal or high
f.add_values('Wind',('weak','strong')) #wind can be weak or strong f.add_values('Water',('warm','cold'))
#water can be warm or cold f.add_values('Forecast',('same','change')) #Forecast can be same or change
a = CandidateElimination(dataset,f) #pass the dataset to the algorithm class and call the run algoritm
method
a.run_algorithm()
Output
[('sunny', 'warm', 'normal', 'strong', 'warm', 'same')]
[('sunny', 'warm', 'normal', 'strong', 'warm','same')]
[('sunny', 'warm', '?', 'strong', 'warm', 'same')]
[('?', '?', '?', '?', '?', '?')]
[('sunny', '?', '?', '?', '?', '?'), ('?', 'warm', '?', '?', '?', '?'), ('?', '?', '?', '?', '?', 'same')]
[('sunny', 'warm', '?', 'strong', 'warm', 'same')]
[('sunny', 'warm', '?', 'strong', '?', '?')]
[('sunny', 'warm', '?', 'strong', '?', '?')]
[('sunny', '?', '?', '?', '?', '?'), ('?', 'warm', '?', '?', '?', '?')]
Excercise-3: Write a program to demonstrate the working of the decision tree based
ID3 algorithm. Use an appropriate data set for building the decision tree and apply this
knowledge to classify a new sample.
Program Code:
import numpy as np import math
from data_loader import read_data
class Node:
def init (self, attribute): self.attribute = attribute self.children = [] self.answer = ""
for x in range(items.shape[0]):
dict[items[x]] = np.empty((int(count[x]), data.shape[1]), dtype="|S32")
pos = 0
for y in range(data.shape[0]): if data[y, col] == items[x]:
dict[items[x]][pos] = data[y] pos += 1
if delete:
dict[items[x]] = np.delete(dict[items[x]], col, 1) return items, dict
def entropy(S):
items = np.unique(S) if items.size == 1:
return 0
for x in range(items.shape[0]):
total_size = data.shape[0]
entropies = np.zeros((items.shape[0], 1)) intrinsic = np.zeros((items.shape[0], 1)) for x in
range(items.shape[0]):
ratio = dict[items[x]].shape[0]/(total_size * 1.0) entropies[x] = ratio * entropy(dict[items[x]][:, -1])
intrinsic[x] = ratio * math.log(ratio, 2)
return total_entropy / iv
def create_node(data, metadata):
if (np.unique(data[:, -1])).shape[0] == 1: node = Node("")
node.answer = np.unique(data[:, -1])[0] return node
node = Node(metadata[split])
for x in range(items.shape[0]):
child = create_node(dict[items[x]], metadata) node.children.append((items[x], child))
return node def empty(size):
s = ""
for x in range(size): s += " "
return s
Data_loader.py
import csv
def read_data(filename):
with open(filename, 'r') as csvfile:
datareader = csv.reader(csvfile, delimiter=',') headers = next(datareader)
metadata = [] traindata = []
for name in headers: metadata.append(name)
for row in datareader: traindata.append(row)
Tennis.csv
outlook,temperature,humidity,wind, answer sunny,hot,high,weak,no sunny,hot,high,strong,no
overcast,hot,high,weak,yes rain,mild,high,weak,yes rain,cool,normal,weak,yes rain,cool,normal,strong,no
overcast,cool,normal,strong,yes sunny,mild,high,weak,no sunny,cool,normal,weak,yes
rain,mild,normal,weak,yes sunny,mild,normal,strong,yes overcast,mild,high,strong,yes
overcast,hot,normal,weak,yes rain,mild,high,strong,no
Output
outlook
overcast
b'yes'
rain
wind
b'strong'
b'no'
b'weak'
b'yes'
sunny
humidity
b'high'
b'no'
b'normal'
b'yes
import numpy as np
#Sigmoid Function
def sigmoid (x):
return 1/(1 + np.exp(-x))
#Derivative of Sigmoid Function
def derivatives_sigmoid(x):
return x * (1 - x)
#Variable initialization
epoch=5 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = 2 #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer
#weight and bias initialization
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))
#Backpropagation
EO = y-output
outgrad = derivatives_sigmoid(output)
d_output = EO * outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)#how much hidden layer wts contributed to error
d_hiddenlayer = EH * hiddengrad
1 2 9 92
2 1 5 86
3 3 6 89
Normalize the input
Example Sleep Study Expected % in Exams
Output:
———–Epoch- 1 Starts———-
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.81951208]
[0.8007242 ]
[0.82485744]]
———–Epoch- 1 Ends———-
if __name__ == "__main__":
# Change the file path to the location of your CSV file
file_path = "your_dataset.csv"
print(f"Accuracy: {accuracy}")
output:
Accuracy: 0.85
Excercise-7: Assuming a set of documents that need to be classified, use the na¨ıve
Bayesian Classifier model to perform this task. Built-in Java classes/API can be used to
write the program. Calculate the accuracy, precision, and recall for your data set.
Program Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn import metrics
msg=pd.read_csv('naivetext.csv',names=['message','label'])
data = pd.read_csv("ds4.csv")
heart_disease = pd.DataFrame(data)
print(heart_disease)
model = BayesianModel([
('age', 'Lifestyle'),
('Gender', 'Lifestyle'),
('Family', 'heartdisease'),
('diet', 'cholestrol'),
('Lifestyle', 'diet'),
('cholestrol', 'heartdisease'),
('diet', 'cholestrol')
])
model.fit(heart_disease, estimator=MaximumLikelihoodEstimator)
HeartDisease_infer = VariableElimination(model)
q = HeartDisease_infer.query(variables=['heartdisease'], evidence={
'age': int(input('Enter Age: ')),
'Gender': int(input('Enter Gender: ')),
'Family': int(input('Enter Family History: ')),
'diet': int(input('Enter Diet: ')),
'Lifestyle': int(input('Enter Lifestyle: ')),
'cholestrol': int(input('Enter Cholestrol: '))
})
print(q)
Output:
output:
ypred = classifier.predict(Xtest)
i=0
print ("\n-------------------------------------------------------------------------")
print ('%-25s %-25s %-25s' % ('Original Label', 'Predicted Label', 'Correct/Wrong'))
print ("-------------------------------------------------------------------------")
for label in ytest:
print ('%-25s %-25s' % (label, ypred[i]), end="")
if (label == ypred[i]):
print (' %-25s' % ('Correct'))
else:
print (' %-25s' % ('Wrong'))
i=i+1
print ("-------------------------------------------------------------------------")
print("\nConfusion Matrix:\n",metrics.confusion_matrix(ytest, ypred))
print ("-------------------------------------------------------------------------")
print("\nClassification Report:\n",metrics.classification_report(ytest, ypred))
print ("-------------------------------------------------------------------------")
print('Accuracy of the classifer is %0.2f' % metrics.accuracy_score(ytest,ypred))
print ("-------------------------------------------------------------------------")
Output
sepal-length sepal-width petal-length petal-width
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
-------------------------------------------------------------------------
Original Label Predicted Label Correct/Wrong
-------------------------------------------------------------------------
Iris-versicolor Iris-versicolor Correct
Iris-virginica Iris-versicolor Wrong
Iris-virginica Iris-virginica Correct
Iris-versicolor Iris-versicolor Correct
Iris-setosa Iris-setosa Correct
Iris-versicolor Iris-versicolor Correct
Iris-setosa Iris-setosa Correct
Iris-setosa Iris-setosa Correct
Iris-virginica Iris-virginica Correct
Iris-virginica Iris-versicolor Wrong
Iris-virginica Iris-virginica Correct
Iris-setosa Iris-setosa Correct
Iris-virginica Iris-virginica Correct
Iris-virginica Iris-virginica Correct
Iris-versicolor Iris-versicolor Correct
-------------------------------------------------------------------------
Confusion Matrix:
[[4 0 0]
[0 4 0]
[0 2 5]]
-------------------------------------------------------------------------
Classification Report:
precision recall f1-score support
-------------------------------------------------------------------------
Accuracy of the classifer is 0.87
-------------------------------------------------------------------------
Output: