0% found this document useful (0 votes)
15 views

original ML lab manual (1)

The document outlines the structure and objectives of a Machine Learning Lab course, detailing its internal and external marks. It lists various experiments focused on implementing machine learning algorithms using Java/Python, including the FIND-S algorithm, Candidate-Elimination algorithm, decision trees, and neural networks. The course aims to equip students with practical skills in machine learning software and problem-solving techniques.

Uploaded by

23kq5a1204
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

original ML lab manual (1)

The document outlines the structure and objectives of a Machine Learning Lab course, detailing its internal and external marks. It lists various experiments focused on implementing machine learning algorithms using Java/Python, including the FIND-S algorithm, Candidate-Elimination algorithm, decision trees, and neural networks. The course aims to equip students with practical skills in machine learning software and problem-solving techniques.

Uploaded by

23kq5a1204
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Course Course

Code Course Name Structure


L T P C
P21XXXXX Machine Learning Lab
0 0 3 1.5

Internal Marks: 15 External Marks: 35

Course Objectives:

1. To introduce students to the basic concepts and techniques of Machine Learn- ing.

2. To develop skills of using recent machine learning software for solving prac- tical
problems.

3. To gain experience of doing independent study and research.

Course Outcomes:

1. Design java/python programs for various learning algorithms.


2. Apply appropriate data sets to the machine learning algorithms.
3. Identify and apply machine Learning algorithms to solve real world problems.

List of Experiments:

1. Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data
from a .CSV file.

2. For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of the set
of all hypotheses consistent with the training examples.

3. Write a program to demonstrate the working of the decision tree based ID3 algorithm.
Use a n appropriate data set for b u i l d i n g the decision tree and apply this knowledge
to classify a new sample .

4. Build a prediction model to perform logistic regression

5. Build an Artificial Neural Network by i m p l e m e n t i n g the Back propagation


algorithm and test the same using appropriate data sets.

6. Write a program to implement the naı̈ve Bayesian classifier for a sample training
data set stored as a .CSV file.

7. Assuming a set of documents that need to be classified, use the naı̈ve Bayesian
Classifier model to perform this task. Built-in Java classes/API can be used to write
the program. Calculate the accuracy, precision, and recall for your data set.
8. Write a program to construct a Bayesian network considering medical data.
Use this model to demonstrate the diagnosis of heart patients using standard
Heart Disease Data Set. You can use Java/Python ML library classes/API.

9. Perform clustering using k-means clustering algorithm.

10. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set. Print
both correct and wrong predictions.

11. Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs.

Excercise-1: Implement and demonstrate the FIND-S algorithm for finding the most
specific hypothesis based on a given set of training data samples. Read the training data
from a .CSV file.
Program Code:
Import csv
With open ('tennis.csv', 'r') as f: reader = csv.reader(f) your_list = list(reader)
h = [['0', '0', '0', '0', '0', '0']]
for i in your_list: print(i)
if i[-1] == "True":
j=0
for x in i:
if x != "True":
if x != h[0][j] and h[0][j] == '0':
h[0][j] = x
elif x != h[0][j] and h[0][j] != '0':
h[0][j] = '?'
else:
pass
j=j+1
print("Most specific hypothesis is")
print(h)

Output

'Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same',True


'Sunny', 'Warm', 'High', 'Strong', 'Warm', 'Same',True
'Rainy', 'Cold', 'High', 'Strong', 'Warm', 'Change',False
'Sunny', 'Warm', 'High', 'Strong', 'Cool','Change',True

Maximally Specific set


[['Sunny', 'Warm', '?', 'Strong', '?', '?']]
Excercise-2:.For a given set of training data examples stored in a .CSV file, implement
and demonstrate the Candidate-Elimination algorithm to output a description of the set
of all hypotheses consistent with the training examples.

Program Code:

class Holder:
factors={} #Initialize an empty dictionary
attributes = () #declaration of dictionaries parameters with an arbitrary length

'''
Constructor of class Holder holding two parameters, self refers to the instance of the class
'''
def init (self,attr): # self.attributes = attr for i in attr:
self.factors[i]=[]

def add_values(self,factor,values): self.factors[factor]=values

class CandidateElimination:
Positive={} #Initialize positive empty dictionary Negative={} #Initialize negative empty dictionary

def init (self,data,fact):


self.num_factors = len(data[0][0]) self.factors = fact.factors
self.attr = fact.attributes self.dataset = data

def run_algorithm(self): '''


Initialize the specific and general boundaries, and loop the dataset against the algorithm
'''
G = self.initializeG() S = self.initializeS()

'''
Programmatically populate list in the iterating variable trial_set '''
count=0
for trial_set in self.dataset:
if self.is_positive(trial_set):
#if trial set/example consists of positive examples
G = self.remove_inconsistent_G(G,trial_set[0]) #remove inconsitent data from the general boundary
S_new = S[:] #initialize the dictionary with no key-value pair print (S_new)
for s in S:
if not self.consistent(s,trial_set[0]): S_new.remove(s)
generalization = self.generalize_inconsistent_S(s,trial_set[0]) generalization =
self.get_general(generalization,G)
if generalization: S_new.append(generalization)
S = S_new[:]
S = self.remove_more_general(S) print(S)
else:#if it is negative

S = self.remove_inconsistent_S(S,trial_set[0]) #remove inconsitent data from the specific boundary


G_new = G[:] #initialize the dictionary with no key-value pair (dataset can take any value)
print (G_new) for g in G:
if self.consistent(g,trial_set[0]): G_new.remove(g)
specializations = self.specialize_inconsistent_G(g,trial_set[0]) specializationss =
self.get_specific(specializations,S)
if specializations != []: G_new += specializationss
G = G_new[:]
G = self.remove_more_specific(G) print(G)

print (S) print (G)

def initializeS(self):
''' Initialize the specific boundary '''
S = tuple(['-' for factor in range(self.num_factors)]) #6 constraints in the vector return [S]

def initializeG(self):
''' Initialize the general boundary '''
G = tuple(['?' for factor in range(self.num_factors)]) # 6 constraints in the vector return [G]

def is_positive(self,trial_set):
''' Check if a given training trial_set is positive ''' if trial_set[1] == 'Y':

return True
elif trial_set[1] == 'N': return False
else:
raise TypeError("invalid target value")
def match_factor(self,value1,value2):
''' Check for the factors values match, necessary while checking the consistency of training trial_set with
the hypothesis '''
if value1 == '?' or value2 == '?':
return True
elif value1 == value2 :
return True
return False

def consistent(self,hypothesis,instance):
''' Check whether the instance is part of the hypothesis ''' for i,factor in enumerate(hypothesis):
if not self.match_factor(factor,instance[i]): return False
return True
def remove_inconsistent_G(self,hypotheses,instance): ''' For a positive trial_set, the hypotheses in G
inconsistent with it should be removed ''' G_new = hypotheses[:]

for g in hypotheses:
if not self.consistent(g,instance): G_new.remove(g)
return G_new

def remove_inconsistent_S(self,hypotheses,instance): ''' For a negative trial_set, the hypotheses in S


inconsistent with it should be removed ''' S_new = hypotheses[:]
for s in hypotheses:
if self.consistent(s,instance): S_new.remove(s)
return S_new

def remove_more_general(self,hypotheses):
''' After generalizing S for a positive trial_set, the hypothesis in S general than others in S should be
removed '''
S_new = hypotheses[:] for old in hypotheses:

for new in S_new:


if old!=new and self.more_general(new,old): S_new.remove[new]
return S_new

def remove_more_specific(self,hypotheses):
''' After specializing G for a negative trial_set, the hypothesis in G specific than others in G should be
removed '''
G_new = hypotheses[:] for old in hypotheses: for new in G_new:
if old!=new and self.more_specific(new,old): G_new.remove[new]
return G_new

def generalize_inconsistent_S(self,hypothesis,instance):
''' When a inconsistent hypothesis for positive trial_set is seen in the specific boundary S,
it should be generalized to be consistent with the trial_set ... we will get one hypothesis'''
hypo = list(hypothesis) # convert tuple to list for mutability for i,factor in enumerate(hypo):
if factor == '-':
hypo[i] = instance[i]
elif not self.match_factor(factor,instance[i]): hypo[i] = '?'
generalization = tuple(hypo) # convert list back to tuple for immutability return generalization

def specialize_inconsistent_G(self,hypothesis,instance):
''' When a inconsistent hypothesis for negative trial_set is seen in the general boundary G
should be specialized to be consistent with the trial_set.. we will get a set of hypotheses '''
specializations = []
hypo = list(hypothesis) # convert tuple to list for mutability for i,factor in enumerate(hypo):
if factor == '?':
values = self.factors[self.attr[i]] for j in values:
if instance[i] != j: hyp=hypo[:] hyp[i]=j
hyp=tuple(hyp) # convert list back to tuple for immutability specializations.append(hyp)
return specializations

def get_general(self,generalization,G):
''' Checks if there is more general hypothesis in G
for a generalization of inconsistent hypothesis in S
in case of positive trial_set and returns valid generalization '''

for g in G:
if self.more_general(g,generalization): return generalization
return None

def get_specific(self,specializations,S):
''' Checks if there is more specific hypothesis in S for each of hypothesis in specializations of an
inconsistent hypothesis in G in case of negative trial_set and return the valid specializations'''
valid_specializations = [] for hypo in specializations:
for s in S:
if self.more_specific(s,hypo) or s==self.initializeS()[0]: valid_specializations.append(hypo)
return valid_specializations

def exists_general(self,hypothesis,G):
'''Used to check if there exists a more general hypothesis in general boundary for version space'''

for g in G:
if self.more_general(g,hypothesis): return True
return False

def exists_specific(self,hypothesis,S):
'''Used to check if there exists a more specific hypothesis in general boundary for version space'''

for s in S:
if self.more_specific(s,hypothesis): return True
return False

def more_general(self,hyp1,hyp2):
''' Check whether hyp1 is more general than hyp2 ''' hyp = zip(hyp1,hyp2)
for i,j in hyp: if i == '?':
continue

elif j == '?':
if i != '?': return False
elif i != j: return False
else:
continue return True

def more_specific(self,hyp1,hyp2): ''' hyp1 more specific than hyp2 is


equivalent to hyp2 being more general than hyp1 ''' return self.more_general(hyp2,hyp1)

dataset=[(('sunny','warm','normal','strong','warm','same'),'Y'),(('sunny','warm','high','strong','warm','same')
,'Y'),(('rainy','cold','high','strong','warm','change'),'N'),(('sunny','warm','hi gh','strong','cool','change'),'Y')]
attributes =('Sky','Temp','Humidity','Wind','Water','Forecast') f = Holder(attributes)
f.add_values('Sky',('sunny','rainy','cloudy')) #sky can be sunny rainy or cloudy
f.add_values('Temp',('cold','warm')) #Temp can be sunny cold or warm
f.add_values('Humidity',('normal','high')) #Humidity can be normal or high
f.add_values('Wind',('weak','strong')) #wind can be weak or strong f.add_values('Water',('warm','cold'))
#water can be warm or cold f.add_values('Forecast',('same','change')) #Forecast can be same or change
a = CandidateElimination(dataset,f) #pass the dataset to the algorithm class and call the run algoritm
method
a.run_algorithm()

Output
[('sunny', 'warm', 'normal', 'strong', 'warm', 'same')]
[('sunny', 'warm', 'normal', 'strong', 'warm','same')]
[('sunny', 'warm', '?', 'strong', 'warm', 'same')]
[('?', '?', '?', '?', '?', '?')]
[('sunny', '?', '?', '?', '?', '?'), ('?', 'warm', '?', '?', '?', '?'), ('?', '?', '?', '?', '?', 'same')]
[('sunny', 'warm', '?', 'strong', 'warm', 'same')]
[('sunny', 'warm', '?', 'strong', '?', '?')]
[('sunny', 'warm', '?', 'strong', '?', '?')]
[('sunny', '?', '?', '?', '?', '?'), ('?', 'warm', '?', '?', '?', '?')]

Excercise-3: Write a program to demonstrate the working of the decision tree based
ID3 algorithm. Use an appropriate data set for building the decision tree and apply this
knowledge to classify a new sample.
Program Code:
import numpy as np import math
from data_loader import read_data

class Node:
def init (self, attribute): self.attribute = attribute self.children = [] self.answer = ""

def str (self): return self.attribute


def subtables(data, col, delete): dict = {}
items = np.unique(data[:, col])
count = np.zeros((items.shape[0], 1), dtype=np.int32) for x in range(items.shape[0]):
for y in range(data.shape[0]):
if data[y, col] == items[x]: count[x] += 1

for x in range(items.shape[0]):
dict[items[x]] = np.empty((int(count[x]), data.shape[1]), dtype="|S32")

pos = 0
for y in range(data.shape[0]): if data[y, col] == items[x]:
dict[items[x]][pos] = data[y] pos += 1

if delete:
dict[items[x]] = np.delete(dict[items[x]], col, 1) return items, dict
def entropy(S):
items = np.unique(S) if items.size == 1:

return 0

counts = np.zeros((items.shape[0], 1)) sums = 0

for x in range(items.shape[0]):

counts[x] = sum(S == items[x]) / (S.size * 1.0)


for count in counts:
sums += -1 * count * math.log(count, 2) return sums

def gain_ratio(data, col):


items, dict = subtables(data, col, delete=False)

total_size = data.shape[0]
entropies = np.zeros((items.shape[0], 1)) intrinsic = np.zeros((items.shape[0], 1)) for x in
range(items.shape[0]):
ratio = dict[items[x]].shape[0]/(total_size * 1.0) entropies[x] = ratio * entropy(dict[items[x]][:, -1])
intrinsic[x] = ratio * math.log(ratio, 2)

total_entropy = entropy(data[:, -1]) iv = -1 * sum(intrinsic)

for x in range(entropies.shape[0]): total_entropy -= entropies[x]

return total_entropy / iv
def create_node(data, metadata):
if (np.unique(data[:, -1])).shape[0] == 1: node = Node("")
node.answer = np.unique(data[:, -1])[0] return node

gains = np.zeros((data.shape[1] - 1, 1)) for col in range(data.shape[1] - 1):


gains[col] = gain_ratio(data, col) split = np.argmax(gains)

node = Node(metadata[split])

metadata = np.delete(metadata, split, 0)


items, dict = subtables(data, split, delete=True)

for x in range(items.shape[0]):
child = create_node(dict[items[x]], metadata) node.children.append((items[x], child))
return node def empty(size):
s = ""
for x in range(size): s += " "
return s

def print_tree(node, level): if node.answer != "":


print(empty(level), node.answer) return
print(empty(level), node.attribute) for value, n in node.children:
print(empty(level + 1), value) print_tree(n, level + 2)

metadata, traindata = read_data("tennis.csv") data = np.array(traindata)


node = create_node(data, metadata) print_tree(node, 0)

Data_loader.py
import csv
def read_data(filename):
with open(filename, 'r') as csvfile:
datareader = csv.reader(csvfile, delimiter=',') headers = next(datareader)
metadata = [] traindata = []
for name in headers: metadata.append(name)
for row in datareader: traindata.append(row)

return (metadata, traindata)

Tennis.csv
outlook,temperature,humidity,wind, answer sunny,hot,high,weak,no sunny,hot,high,strong,no
overcast,hot,high,weak,yes rain,mild,high,weak,yes rain,cool,normal,weak,yes rain,cool,normal,strong,no
overcast,cool,normal,strong,yes sunny,mild,high,weak,no sunny,cool,normal,weak,yes
rain,mild,normal,weak,yes sunny,mild,normal,strong,yes overcast,mild,high,strong,yes
overcast,hot,normal,weak,yes rain,mild,high,strong,no
Output
outlook
overcast
b'yes'
rain
wind
b'strong'
b'no'
b'weak'
b'yes'
sunny
humidity
b'high'
b'no'
b'normal'
b'yes

Excercise-4.Build a prediction model to perform logistic regression


Program Code:

# Importing the required modules and classes


from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Loading our dataset


data = load_iris()
# Splitting the independent and dependent variables
X = data.data
Y = data.target
print("The size of the complete dataset is: ", len(X))

# Creating an instance of LogisticRegression class for implementing logistic regression


log_reg = LogisticRegression()

# Segregating the training and testing dataset


X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state = 10)
# Performing the logistic regression on train dataset
log_reg.fit(X_train, Y_train)

# Printing the accuracy score


print("Accuracy score of the predictions made by the model: ", accuracy_score(log_reg.predict(X_te
st), Y_test))
Output

The size of the complete dataset is: 150

Excercise-5: Build an Artificial Neural Network by implementing the Back propagation


algorithm and test the same using appropriate data sets.
Program Code:

import numpy as np

X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)


y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0) #maximum of X array longitudinally
y = y/100

#Sigmoid Function
def sigmoid (x):
return 1/(1 + np.exp(-x))
#Derivative of Sigmoid Function
def derivatives_sigmoid(x):
return x * (1 - x)
#Variable initialization
epoch=5 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = 2 #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer
#weight and bias initialization

wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))

#draws a random range of numbers uniformly of dim x*y


for i in range(epoch):
#Forward Propogation
hinp1=np.dot(X,wh)
hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
outinp= outinp1+bout
output = sigmoid(outinp)

#Backpropagation
EO = y-output
outgrad = derivatives_sigmoid(output)
d_output = EO * outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)#how much hidden layer wts contributed to error
d_hiddenlayer = EH * hiddengrad

wout += hlayer_act.T.dot(d_output) *lr # dotproduct of nextlayererror and currentlayerop


wh += X.T.dot(d_hiddenlayer) *lr

print ("-----------Epoch-", i+1, "Starts----------")


print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" ,output)
print ("-----------Epoch-", i+1, "Ends----------\n")

print("Input: \n" + str(X))


print("Actual Output: \n" + str(y))
print("Predicted Output: \n" ,output)
Training Examples:
Example Sleep Study Expected % in Exams

1 2 9 92

2 1 5 86

3 3 6 89
Normalize the input
Example Sleep Study Expected % in Exams

1 2/3 = 0.66666667 9/9 = 1 0.92

2 1/3 = 0.33333333 5/9 = 0.55555556 0.86

3 3/3 = 1 6/9 = 0.66666667 0.89

Output:
———–Epoch- 1 Starts———-
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.81951208]
[0.8007242 ]
[0.82485744]]
———–Epoch- 1 Ends———-

Excercise-6: Write a program to implement the naı̈ve Bayesian classifier for


a sample training data set stored as a .CSV file.
Import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load the dataset


def load_data(file_path):
df = pd.read_csv(file_path)
return df

# Preprocess the data


def preprocess_data(data):
# Assuming the last column is the target variable and the rest are features
X = data.iloc[:, :-1]
y = data.iloc[:, -1]
return X, y

# Split the dataset into training and testing sets


def split_data(X, y, test_size=0.2, random_state=42):
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=random_state)
return X_train, X_test, y_train, y_test

# Train the naive Bayes classifier


def train_naive_bayes(X_train, y_train):
model = GaussianNB()
model.fit(X_train, y_train)
return model

# Make predictions on the test set


def make_predictions(model, X_test):
predictions = model.predict(X_test)
return predictions
# Evaluate the accuracy of the model
def evaluate_accuracy(y_true, y_pred):
accuracy = accuracy_score(y_true, y_pred)
return accuracy

if __name__ == "__main__":
# Change the file path to the location of your CSV file
file_path = "your_dataset.csv"

# Load and preprocess the data


data = load_data(file_path)
X, y = preprocess_data(data)

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = split_data(X, y)

# Train the naive Bayes classifier


model = train_naive_bayes(X_train, y_train)

# Make predictions on the test set


predictions = make_predictions(model, X_test)

# Evaluate the accuracy of the model


accuracy = evaluate_accuracy(y_test, predictions)

print(f"Accuracy: {accuracy}")

output:

Accuracy: 0.85

Excercise-7: Assuming a set of documents that need to be classified, use the na¨ıve
Bayesian Classifier model to perform this task. Built-in Java classes/API can be used to
write the program. Calculate the accuracy, precision, and recall for your data set.
Program Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn import metrics

msg=pd.read_csv('naivetext.csv',names=['message','label'])

print('The dimensions of the dataset',msg.shape)


msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum

#splitting the dataset into train and test data


xtrain,xtest,ytrain,ytest=train_test_split(X,y)
print ('\n the total number of Training Data :',ytrain.shape)
print ('\n the total number of Test Data :',ytest.shape)

#output the words or Tokens in the text documents


cv = CountVectorizer()
xtrain_dtm = cv.fit_transform(xtrain)
xtest_dtm=cv.transform(xtest)
print('\n The words or Tokens in the text documents \n')
print(cv.get_feature_names())
df=pd.DataFrame(xtrain_dtm.toarray(),columns=cv.get_feature_names())

# Training Naive Bayes (NB) classifier on training data.


clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)

#printing accuracy, Confusion matrix, Precision and Recall


print('\n Accuracy of the classifier is',metrics.accuracy_score(ytest,predicted))
print('\n Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))
print('\n The value of Precision', metrics.precision_score(ytest,predicted))
print('\n The value of Recall', metrics.recall_score(ytest,predicted))
Output:

The dimensions of the dataset (18, 2)


1. I love this sandwich
2. This is an amazing place
3. I feel very good about these beers
4. This is my best work
5. What an awesome view
6. I do not like this restaurant
7. I am tired of this stuff
8. I can’t deal with this
9. He is my sworn enemy
10. My boss is horrible
11. This is an awesome place
12. I do not like the taste of this juice
13. I love to dance
14. I am sick and tired of this place
15. What a great holiday
16. That is a bad locality to stay
17. We will have good fun tomorrow
18. I went to my enemy’s house today
Name: message, dtype: object 0 1
1 1
2 1
3 1
4 1
5 0
6 0
7 0
8 0
9 0
10 1
11 0
12 1
13 0
14 1
15 0
16 1
17 0
Name: labelnum, dtype: int64
The total number of Training Data: (13,) The total number of Test Data: (5,)
The words or Tokens in the text documents
[‘about’, ‘am’, ‘amazing’, ‘an’, ‘and’, ‘awesome’, ‘beers’, ‘best’, ‘can’, ‘deal’, ‘do’, ‘enemy’, ‘feel’, ‘fun’,
‘good’, ‘great’, ‘have’, ‘he’, ‘holiday’, ‘house’, ‘is’, ‘like’, ‘love’, ‘my’, ‘not’, ‘of’, ‘place’,‘restaurant’,
‘sandwich’, ‘sick’, ‘sworn’, ‘these’, ‘this’, ‘tired’, ‘to’, ‘today’, ‘tomorrow’, ‘very’, ‘view’, ‘we’, ‘went’,
‘what’, ‘will’, ‘with’, ‘work’]
Accuracy of the classifier is 0.8
Confusion matrix
[[2 1]
[0 2]]
The value of Precision 0.6666666666666666
The value of Recall 1.0

Excercise-8: Write a Python program to construct a Bayesian network considering


medical data. Use this model to demonstrate the diagnosis of heart patients using
standard Heart Disease Data Set.
Program Code:
import pandas as pd
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianModel
from pgmpy.inference import VariableElimination

data = pd.read_csv("ds4.csv")
heart_disease = pd.DataFrame(data)
print(heart_disease)

model = BayesianModel([
('age', 'Lifestyle'),
('Gender', 'Lifestyle'),
('Family', 'heartdisease'),
('diet', 'cholestrol'),
('Lifestyle', 'diet'),
('cholestrol', 'heartdisease'),
('diet', 'cholestrol')
])

model.fit(heart_disease, estimator=MaximumLikelihoodEstimator)

HeartDisease_infer = VariableElimination(model)

print('For Age enter SuperSeniorCitizen:0, SeniorCitizen:1, MiddleAged:2, Youth:3, Teen:4')


print('For Gender enter Male:0, Female:1')
print('For Family History enter Yes:1, No:0')
print('For Diet enter High:0, Medium:1')
print('for LifeStyle enter Athlete:0, Active:1, Moderate:2, Sedentary:3')
print('for Cholesterol enter High:0, BorderLine:1, Normal:2')

q = HeartDisease_infer.query(variables=['heartdisease'], evidence={
'age': int(input('Enter Age: ')),
'Gender': int(input('Enter Gender: ')),
'Family': int(input('Enter Family History: ')),
'diet': int(input('Enter Diet: ')),
'Lifestyle': int(input('Enter Lifestyle: ')),
'cholestrol': int(input('Enter Cholestrol: '))
})

print(q)

Output:

S.no Age Gender Family Diet Lifestyle Cholestrol Heartdisease


0 0 0 1 1 3 0 1
1 0 1 1 1 3 0 1
2 1 0 0 0 2 1 1
3 4 0 1 1 3 2 0
4 3 1 1 0 0 2 0
5 2 0 1 1 1 0 1
6 4 0 1 0 2 0 1
7 0 0 1 1 3 0 1
8 3 1 1 0 0 2 0
9 1 1 0 0 0 2 1
10 4 1 0 1 2 0 1
11 4 0 1 1 3 2 0
12 2 1 0 0 0 0 0
13 2 0 1 1 1 0 1
14 3 1 1 0 0 1 0
15 0 0 1 0 0 2 1
16 1 1 0 1 2 1 1
17 3 1 1 1 0 1 0
18 4 0 1 1 3 2 0

For Age enter SuperSeniorCitizen:0, SeniorCitizen:1, MiddleAged:2, Youth:3, Teen:4


For Gender enter Male:0, Female:1
For Family History enter Yes:1, No:0
For Diet enter High:0, Medium:1
For Lifestyle enter Athlete:0, Active:1, Moderate:2, Sedentary:3
For Cholesterol enter High:0, BorderLine:1, Normal:2
Enter Age: 0
Enter Gender: 0
Enter Family History: 0
Enter Diet: 0
Enter Lifestyle: 3
Enter Cholestrol: 0
+-----------------+---------------------+
| heartdisease | phi(heartdisease) |
+=================+=====================+
| heartdisease(0) | 0.5000 |
+-----------------+---------------------+
| heartdisease(1) | 0.5000 |
+-----------------+---------------------+
Finding Elimination Order: : : 0it [00:00, ?it/s]
0it [00:00, ?it/s]

Excercise-9: Perform clustering using k-means clustering algorithm.


# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs

# Generate synthetic data for demonstration


data, _ = make_blobs(n_samples=300, centers=4, random_state=42)

# Instantiate the KMeans model with the desired number of clusters


kmeans = KMeans(n_clusters=4)

# Fit the model to the data


kmeans.fit(data)

# Get cluster centers and labels


centers = kmeans.cluster_centers_
labels = kmeans.labels_

# Plot the data points and cluster centers


plt.scatter(data[:, 0], data[:, 1], c=labels, cmap='viridis', edgecolor='k')
plt.scatter(centers[:, 0], centers[:, 1], c='red', marker='X', s=200, label='Cluster Centers')
plt.title('K-Means Clustering')
plt.legend()
plt.show()

output:

<img src="https://i.imgur.com/Kj7ZK0k.png" alt="K-Means Clustering" width="600"/>


Excercise-10: Write a program to implement k-Nearest Neighbor algorithm to classify
the iris data set. Print both correct and wrong predictions.
Program Code:
import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics

names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']

# Read dataset to pandas dataframe


dataset = pd.read_csv("9-dataset.csv", names=names)
X = dataset.iloc[:, :-1]
y = dataset.iloc[:, -1]
print(X.head())
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.10)

classifier = KNeighborsClassifier(n_neighbors=5).fit(Xtrain, ytrain)

ypred = classifier.predict(Xtest)

i=0
print ("\n-------------------------------------------------------------------------")
print ('%-25s %-25s %-25s' % ('Original Label', 'Predicted Label', 'Correct/Wrong'))
print ("-------------------------------------------------------------------------")
for label in ytest:
print ('%-25s %-25s' % (label, ypred[i]), end="")
if (label == ypred[i]):
print (' %-25s' % ('Correct'))
else:
print (' %-25s' % ('Wrong'))
i=i+1
print ("-------------------------------------------------------------------------")
print("\nConfusion Matrix:\n",metrics.confusion_matrix(ytest, ypred))
print ("-------------------------------------------------------------------------")
print("\nClassification Report:\n",metrics.classification_report(ytest, ypred))
print ("-------------------------------------------------------------------------")
print('Accuracy of the classifer is %0.2f' % metrics.accuracy_score(ytest,ypred))
print ("-------------------------------------------------------------------------")

Output
sepal-length sepal-width petal-length petal-width
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2

-------------------------------------------------------------------------
Original Label Predicted Label Correct/Wrong
-------------------------------------------------------------------------
Iris-versicolor Iris-versicolor Correct
Iris-virginica Iris-versicolor Wrong
Iris-virginica Iris-virginica Correct
Iris-versicolor Iris-versicolor Correct
Iris-setosa Iris-setosa Correct
Iris-versicolor Iris-versicolor Correct
Iris-setosa Iris-setosa Correct
Iris-setosa Iris-setosa Correct
Iris-virginica Iris-virginica Correct
Iris-virginica Iris-versicolor Wrong
Iris-virginica Iris-virginica Correct
Iris-setosa Iris-setosa Correct
Iris-virginica Iris-virginica Correct
Iris-virginica Iris-virginica Correct
Iris-versicolor Iris-versicolor Correct
-------------------------------------------------------------------------

Confusion Matrix:
[[4 0 0]
[0 4 0]
[0 2 5]]
-------------------------------------------------------------------------

Classification Report:
precision recall f1-score support

Iris-setosa 1.00 1.00 1.00 4


Iris-versicolor 0.67 1.00 0.80 4
Iris-virginica 1.00 0.71 0.83 7

avg / total 0.91 0.87 0.87 15

-------------------------------------------------------------------------
Accuracy of the classifer is 0.87
-------------------------------------------------------------------------

Excercise-11: Implement the non-parametric Locally Weighted Regression algorithm in


order to fit data points. Select appropriate data set for your experiment and draw
graphs.
Program Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv('/content/tips.csv')
features = np.array(df.total_bill)
labels = np.array(df.tip)

def kernel(data, point, xmat, k):


m,n = np.shape(xmat)
ws = np.mat(np.eye((m)))
for j in range(m):
diff = point - data[j]
ws[j,j] = np.exp(diff*diff.T/(-2.0*k**2))
return ws

def local_weight(data, point, xmat, ymat, k):


wei = kernel(data, point, xmat, k)
return (data.T*(wei*data)).I*(data.T*(wei*ymat.T))

def local_weight_regression(xmat, ymat, k):


m,n = np.shape(xmat)
ypred = np.zeros(m)
for i in range(m):
ypred[i] = xmat[i]*local_weight(xmat, xmat[i],xmat,ymat,k)
return ypred
m = features.shape[0]
mtip = np.mat(labels)
data = np.hstack((np.ones((m, 1)), np.mat(features).T))
ypred = local_weight_regression(data, mtip, 0.5)
indices = data[:,1].argsort(0)
xsort = data[indices][:,0]
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(features, labels, color='blue')
ax.plot(xsort[:,1],ypred[indices], color = 'red', linewidth=3)
plt.xlabel('Total bill')
plt.ylabel('Tip')
plt.show()

Output:

You might also like