0% found this document useful (0 votes)

18 views

ML Lab Manual Devansh (1)

This document is a laboratory manual for the Machine Learning course at Sardar Patel College of Engineering. It outlines various practical experiments, including implementations of algorithms such as FIND-S, Candidate-Elimination, linear regression, logistic regression, and decision trees, among others. Each experiment includes aims, code snippets, and expected outputs, providing a comprehensive guide for students to learn and apply machine learning techniques.

Uploaded by

devanshpatel0144

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

ML Lab Manual Devansh (1)

Uploaded by

devanshpatel0144

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

SARDAR PATEL EDUCATION CAMPUS

SARDAR PATEL COLLEGE OF ENGINEERING

(SPCE), BAKROL-124

GUJARAT TECHNOLOGICAL UNIVERSITY

BACHELOR OF ENGINEERING

DEPARTMENT OF COMPUTER ENGINEERING

MACHINE LEARNING(3170724)

7th Semester

LABORATORY MANUAL
SARDAR PATEL COLLEGE OF ENGINEERING
BAKROL, ANAND

CERTIFICATE

This is to certify that Mr./Miss.

of Enrollment No.
has satisfactorily completed his/her term work in the subject
for the term ending in
20 /20 .

Date:

Signature of Teacher Head of Department

TABLE OF CONTENT

Experiment Experiment Submission

SR NO Sign
Title Date Date
1 Read the training data from a .CSV file.
Implement and demonstrate the FIND-S algorithm
for finding the most specific hypothesis based on a
2 given set of training data samples. Read the training
data from a .CSV file. Create Excel file Weather.csv
and save it in same path
For a given set of training data examples stored in
3 a .CSV file, implement and demonstrate the
Candidate-Elimination algorithm to output a
description of the set of all hypotheses consistent
with the training examples. Create Excel file
Training_examples.csv and save it in same path.
Implement the linear regression. (a) With attribute
4
X and Y. (B) With linear regrate () function.
Implement logistic regression. (a) binary (b)
5
multinomial (c) ordinal
Write a program for polynomial regression.
6

Write a program to demonstrate the working of

7
the decision tree based ID3 algorithm. Use an
appropriate dataset for building the decision tree
and apply this knowledge to classify a new
sample.
Implement SVR (Super vector regression).
8

Python program for recursive binary search.

9
Implement the non-parametric Locally Weighted
10 Regression algorithm in order to fit data points.
Select appropriate data set for your experiment and
draw graphs.
Implement python program to demonstrate KNN
algorithm
11

Write a program to implement k-Nearest Neighbour

algorithm to classify the iris data set. Print both
12 correct and wrong predictions. Python ML library
classes can be used for this problem.

Write a program to implement the naïve Bayesian

13
classifier for a sample training dataset stored as
a .CSV file. Compute the accuracy of the
classifier, considering few test data sets
Write a program to construct Bayesian network
medical.
14
Assuming a set of documents that need to be
15 classified, use the naïve Bayesian Classifier model to
perform this task. Built-in Java classes/ API can be
used to write the program. Calculate the accuracy,
precision, and recall for your dataset.
Implementation on random forest algorithm.
16
Write a algorithm program for APRIORI algorithm.
17
Implement the non-parametric Locally Weighted
18 Regression algorithm in order to fit data points.
Select appropriate data set for your experiment and
draw graphs.
Implementing Bag of Words Model in Python.
19
Implementation of Principal Component Analysis
20 (PCA) algorithm.
Implementation of Independent Component Analysis
21 (ICA).
Implement Python Program to Demonstrate K-
22 Means and EM Algorithm for Machine Learning.
Apply EM algorithm to cluster a set of data stored in
23 a .CSV file. Use the same data set for clustering
using k-Means algorithm. Compare the results of
these two algorithms and comment on the quality of
clustering. You can add Java/ Python ML library
classes/ API in the program.
Build an Artificial Neural Network by implementing
24 the Backpropagation algorithm and test the same
using appropriate data sets.
Write a python program to measure EUCLIDEAL
25 distance between two different variables A1 & B2.
Machine Learning(3170724) 211240107034

Practical-1

Aim- Read the training data from a .CSV file.

Code-
import pandas as pd
file=pd.read_csv("F:\programmes\python_programmes\ML_Practicals\
Tennis.csv")
print(file)

Output-

2|P a ge
Machine Learning(3170724) 211240107034

Practical-2

Aim-Implement and demonstrate the FIND-S algorithm for finding the most
specific hypothesis based on a given set of training data samples. Read the
training data from a .CSV file. Create Excel file Weather.csv and save it in
same path

Code-
import pandas as pd

def find_s(positive_examples):
"""
Find-S algorithm implementation.

Parameters:
- positive_examples: List of positive examples where each example is a list of attribute values.

Returns:
- Most specific hypothesis that covers all positive examples.
"""
# Initialize hypothesis to the first positive example
hypothesis = positive_examples[0].copy()

for example in positive_examples[1:]:

for i in range(len(hypothesis)):
# If the attribute value is different, generalize the hypothesis
if hypothesis[i] != example[i]:
hypothesis[i] = '?'

return hypothesis

def load_data_from_csv(filename):
"""
Load positive examples from a CSV file.

Parameters:
- filename: Path to the CSV file.

Returns:
- List of positive examples.
"""
# Read the CSV file into a DataFrame
df = pd.read_csv(filename)

# Filter only positive examples

positive_examples = df[df['enjoy sport'] == 'yes']

# Drop the target attribute and convert DataFrame to a list of lists

positive_examples = positive_examples.drop(columns=['enjoy sport'])
3|P a ge
Machine Learning(3170724) 211240107034

positive_examples = positive_examples.values.tolist()

return positive_examples

# File path to the CSV file

filename = 'positive_examples.csv'

# Load data from CSV

positive_examples = load_data_from_csv(filename)

# Find the most specific hypothesis

hypothesis = find_s(positive_examples)
print("Most specific hypothesis:", hypothesis)

Output:

4|P a ge
Machine Learning(3170724) 211240107034

Practical-3

Aim- For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of
the set of all hypotheses consistent with the training examples. Create Excel
file Training_examples.csv and save it in same path.

Code-
import pandas as pd

# Load the CSV file into a DataFrame

file_path = 'C:/Users/Admin/Desktop/class ml/positive_examples.csv' # Adjust the path to where your
file is located
data = pd.read_csv(file_path)

# Standardize column names (lowercase and replace spaces with underscores)

data.columns = [col.strip().lower().replace(' ', '_') for col in data.columns]

# Function to check if a hypothesis is consistent with an example

def consistent(hypothesis, example):
for i in range(len(hypothesis)):
if hypothesis[i] != '?' and hypothesis[i] != example[i]:
return False
return True

# Candidate-Elimination Algorithm
def candidate_elimination(examples):
X = examples.iloc[:, :-1].values # Extract features (all columns except the last one)
y = examples.iloc[:, -1].values # Extract target class (last column)

# Initialize the most specific hypothesis (S) and the most general hypothesis (G)
specific_hypothesis = X[0].copy()
general_hypothesis = [['?' for _ in range(len(specific_hypothesis))]]

5|P a ge
Machine Learning(3170724) 211240107034

# Iterate through each example in the dataset

for i, example in enumerate(X):
if y[i].lower() == 'yes': # If the example is positive
# Generalize specific hypothesis (S) if needed
for j in range(len(specific_hypothesis)):
if specific_hypothesis[j] != example[j]:
specific_hypothesis[j] = '?'
# Remove inconsistent hypotheses from general boundary (G)
general_hypothesis = [g for g in general_hypothesis if consistent(g, example)]
elif y[i].lower() == 'no': # If the example is negative
# Specialize general hypothesis (G)
new_general_hypothesis = []
for g in general_hypothesis:
for j in range(len(g)):
if g[j] == '?' and specific_hypothesis[j] != example[j]:
new_hypothesis = g.copy()
new_hypothesis[j] = specific_hypothesis[j]
if consistent(new_hypothesis, example):
new_general_hypothesis.append(new_hypothesis)
general_hypothesis = new_general_hypothesis

return specific_hypothesis, general_hypothesis

# Apply Candidate-Elimination Algorithm to the dataset
S, G = candidate_elimination(data)
# Output the final hypotheses
print("Final Specific Hypothesis (S):", S)
print("Final General Hypothesis (G):", G)

Output:

6|P a ge
Machine Learning(3170724) 211240107034

Practical-4

Aim- Implement the linear regression.

(A) With attribute X and Y

Code-
import matplotlib.pyplot as plt

x= [5,7,8,7,2,17,2,9,4,11,12,9,6]
y= [99,86,87,111,86,103,87,94,78,85,86,77,76]

plt.scatter(x,y)
plt.show()

Output:

7|P a ge
Machine Learning(3170724) 211240107034

(B) With linear regrate () function.

Code-
import matplotlib.pyplot as plt
from scipy import stats

x= [5,7,8,7,2,17,2,9,4,11,12,9,6]
y= [99,86,87,111,86,103,87,94,78,85,86,77,76]
slope, intercept, r, p, std_err = stats.linregress(x, y)
def myfunc(x):
return slope * x + intercept
mymodel = list(map(myfunc, x))
plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()

Output:

8|P a ge
Machine Learning(3170724) 211240107034

Practical-5

Aim- Implement logistic regression.

(a) Binary

Code-
import numpy
from sklearn import linear_model

#Reshaped for Logistic function.

x = numpy.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69, 5.88]).reshape(-1,1)
y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

logr = linear_model.LogisticRegression()
logr.fit(x,y)

#predict if tumor is cancerous where the size is 3.46mm:

predicted = logr.predict(numpy.array([3.5]).reshape(-1,1))
print(predicted)

Output:

(B) Multinomial

Code-
from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.linear_model import LogisticRegression

# define dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5,
n_classes=3, random_state=1)

# define the multinomial logistic regression model

model = LogisticRegression(multi_class='multinomial', solver='lbfgs')

# define the model evaluation procedure

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

# evaluate the model and collect the scores

n_scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
9|P a ge
Machine Learning(3170724) 211240107061

# report the model performance

print('Mean Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))

Output:

Code-
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import OrdinalEncoder

# Example data
x = np.array([5, 10, 15, 20, 25, 30, 35, 40, 45, 50]).reshape(-1, 1)
y = np.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 3]) # 1: Low, 2: Medium, 3: High

# Ordinal encoding (just to ensure proper ordering)

encoder = OrdinalEncoder(categories=[[1, 2, 3]])
y_encoded = encoder.fit_transform(y.reshape(-1, 1)).flatten()

# Logistic regression model

model = LogisticRegression(multi_class='ovr')
model.fit(x, y_encoded)

# Predict the ordinal class for a new input

x_test = np.array([[22]])
predicted_class = model.predict(x_test)

# Decode the predicted class back to the original labels

predicted_label = encoder.inverse_transform(predicted_class.reshape(-1, 1))

print(predicted_label) # Output will be the predicted class (Low, Medium, High)

Output:

10 | P a g e
Machine Learning(3170724) 211240107061

Practical-6

Aim- Write a program for polynomial regression.

Code-
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline

# Generate some example data

np.random.seed(0)
X = np.sort(5 * np.random.rand(100, 1), axis=0)
y = 2 - 1.5 * X + 0.5 * X**2 + np.random.randn(100, 1) * 0.5

# Create polynomial features

degree = 2
poly_features = PolynomialFeatures(degree=degree)
X_poly = poly_features.fit_transform(X)

# Create a linear regression model

model = LinearRegression()
model.fit(X_poly, y)

# Make predictions
X_fit = np.linspace(0, 5, 100).reshape(-1, 1)
X_fit_poly = poly_features.transform(X_fit)
y_pred = model.predict(X_fit_poly)

# Plot the results

plt.scatter(X, y, color='blue', label='Data')
plt.plot(X_fit, y_pred, color='red', label=f'Polynomial Regression (degree={degree})')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()

11 | P a g e
Machine Learning(3170724) 211240107061

Output:

12 | P a g e
Machine Learning(3170724) 211240107061

Practical-7

Aim-Write a program to demonstrate the working of the decision tree based ID3
algorithm. Use an appropriate data set for building the decision tree and apply
this knowledge to classify a new sample.

Code-

import numpy as np
import math
from Data_loader import read_data

class Node:
def init (self, attribute):
self.attribute = attribute
self.children = []
self.answer = ""

def str (self):

return self.attribute

def subtables(data, col, delete):

dict = {}
items = np.unique(data[:, col])
count = np.zeros((items.shape[0], 1), dtype=np.int32)
for x in range(items.shape[0]):
for y in range(data.shape[0]):
if data[y, col] == items[x]:
count[x] += 1

for x in range(items.shape[0]):
dict[items[x]] = np.empty((int(count[x]), data.shape[1]), dtype="|S32")
pos = 0
for y in range(data.shape[0]):
if data[y, col] == items[x]:
dict[items[x]][pos] = data[y]
pos += 1

if delete:
dict[items[x]] = np.delete(dict[items[x]], col, 1)
return items, dict

def entropy(S):
items = np.unique(S)
if items.size == 1:

13 | P a g e
Machine Learning(3170724) 211240107061

return 0
counts = np.zeros((items.shape[0], 1))
sums = 0

for x in range(items.shape[0]):
counts[x] = sum(S == items[x]) / (S.size * 1.0)

for count in counts:

sums += -1 * count * math.log(count, 2)
return sums

def gain_ratio(data, col):

items, dict = subtables(data, col, delete=False)
total_size = data.shape[0]
entropies = np.zeros((items.shape[0], 1))
intrinsic = np.zeros((items.shape[0], 1))

for x in range(items.shape[0]):
ratio = dict[items[x]].shape[0]/(total_size * 1.0)
entropies[x] = ratio * entropy(dict[items[x]][:, -1])
intrinsic[x] = ratio * math.log(ratio, 2)
total_entropy = entropy(data[:, -1])
iv = -1 * sum(intrinsic)

for x in range(entropies.shape[0]):
total_entropy -= entropies[x]
return total_entropy / iv

def create_node(data, metadata):

if (np.unique(data[:, -1])).shape[0] == 1:
node = Node("")
node.answer = np.unique(data[:, -1])[0]
return node

gains = np.zeros((data.shape[1] - 1, 1))

for col in range(data.shape[1] - 1):
gains[col] = gain_ratio(data, col)

split = np.argmax(gains)
node = Node(metadata[split])
metadata = np.delete(metadata, split, 0)
items, dict = subtables(data, split, delete=True)
for x in range(items.shape[0]):
child = create_node(dict[items[x]], metadata)
node.children.append((items[x], child))
return node

14 | P a g e
Machine Learning(3170724) 211240107061

def empty(size):
s = ""
for x in range(size):
s += " "
return s

def print_tree(node, level):

if node.answer != "":
print(empty(level), node.answer)
return

print(empty(level), node.attribute)
for value, n in node.children:
print(empty(level + 1), value)
print_tree(n, level + 2)

metadata, traindata =
read_data( "F:\programmes\python_programmes\ML_Practicals\Tennis.cs
v")
data = np.array(traindata)
node = create_node(data, metadata)
print_tree(node, 0)

Data_loader.py

import csv

def read_data(filename):
with open(filename, 'r') as csvfile:
datareader = csv.reader(csvfile, delimiter=',')
headers = next(datareader)
metadata = []
traindata = []

for name in headers:

metadata.append(name)
for row in datareader:
traindata.append(row)
return (metadata, traindata)

15 | P a g e
Machine Learning(3170724) 211240107061

Tennis.csv

Outlook,Tempearature,Humidity,Wind,answer
sunny,hot,high,weak,no
sunny,hot,high,strong,no
overcast,hot,high,weak,yes
rain,mild,high,weak,yes
rain,cool,normal,weak,yes
rain,cool,normal,strong,no
overcast,cool,normal,strong,yes
sunny,mild,high,weak,no
sunny,cool,normal,weak,yes
rain,mild,normal,weak,yes
sunny,mild,normal,strong,yes
overcast,mild,high,strong,yes
overcast,hot,normal,weak,yes
rain,mild,high,strong,no

Output-

16 | P a g e
Machine Learning(3170724) 211240107061

Practical-8

Aim- Implement SVR (Super vector regression) & SVM (Support Vector Machine).

(a) SVR ( Super Vector Regression)

Code-

# Import necessary libraries

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error

# Sample dataset
X = np.array([[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]])
y = np.array([3, 5, 8, 9, 10, 13, 15, 18, 20, 24])

# Feature Scaling (SVR works better with normalized data)

sc_X = StandardScaler()
sc_y = StandardScaler()
X_scaled = sc_X.fit_transform(X)
y_scaled = sc_y.fit_transform(y.reshape(-1, 1)).flatten()

# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_scaled, test_size=0.2,
random_state=0)

# SVR Model
svr_regressor = SVR(kernel='rbf') # You can use 'linear', 'poly', or 'rbf' for the kernel
svr_regressor.fit(X_train, y_train)

# Predicting on test data

y_pred = svr_regressor.predict(X_test)

# Reverse scaling to get actual values (reshape to 2D before applying inverse_transform)

y_pred_actual = sc_y.inverse_transform(y_pred.reshape(-1, 1))
y_test_actual = sc_y.inverse_transform(y_test.reshape(-1, 1))

# Flatten the results for evaluation

y_pred_actual = y_pred_actual.flatten()
y_test_actual = y_test_actual.flatten()

# Evaluate the model

mse = mean_squared_error(y_test_actual, y_pred_actual)
print(f"Mean Squared Error: {mse}")

17 | P a g e
Machine Learning(3170724) 211240107061

# Visualizing the results

plt.scatter(sc_X.inverse_transform(X), sc_y.inverse_transform(y_scaled.reshape(-1, 1)), color='red',
label='Original Data')
plt.plot(sc_X.inverse_transform(X),
sc_y.inverse_transform(svr_regressor.predict(X_scaled).reshape(-1, 1)), color='blue', label='SVR
Fit')
plt.title('Support Vector Regression')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()

Output:

18 | P a g e
Machine Learning(3170724) 211240107061

(b) SVM (Support Vector Machine)

Code-
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from mlxtend.plotting import plot_decision_regions

# Load the Iris dataset

iris = datasets.load_iris()
X = iris.data[:, :2] # Using only the first two features for visualization purposes
y = iris.target

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize the dataset (important for SVM)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Create an SVM classifier

svm_classifier = SVC()

# Set up hyperparameters for tuning

param_grid = {
'C': [0.1, 1, 10, 100], # Regularization parameter
'kernel': ['linear', 'rbf', 'poly'], # Different types of kernels
'gamma': ['scale', 'auto'] # Kernel coefficient
}

# Perform grid search with cross-validation to find the best parameters

grid_search = GridSearchCV(svm_classifier, param_grid, cv=5)
grid_search.fit(X_train_scaled, y_train)

# Print the best parameters

print("Best parameters found by GridSearchCV:", grid_search.best_params_)

# Train the final model with the best parameters

best_svm = grid_search.best_estimator_

# Make predictions on the test data

y_pred = best_svm.predict(X_test_scaled)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)
19 | P a g e
Machine Learning(3170724) 211240107061

print("\nAccuracy on test data:", accuracy)

# Confusion matrix and classification report

print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Plot decision regions for the training data (using only the first two features for 2D plot)
plt.figure(figsize=(8, 6))
plot_decision_regions(X_train_scaled, y_train, clf=best_svm, legend=2)
plt.title('SVM Decision Boundaries (Training Set)')
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
plt.show()

# Plot decision regions for the test data

plt.figure(figsize=(8, 6))
plot_decision_regions(X_test_scaled, y_test, clf=best_svm, legend=2)
plt.title('SVM Decision Boundaries (Test Set)')
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
plt.show()

Output:

20 | P a g e
Machine Learning(3170724) 211240107061

21 | P a g e
Machine Learning(3170724) 211240107061

Practical-9

Aim- Python program for recursive binary search.

Code-
# Recursive Binary Search function
def binary_search(arr, low, high, x):
# Base case: If the range is valid
if high >= low:
mid = (high + low) // 2

# If the element is present at the middle itself

if arr[mid] == x:
return mid

# If the element is smaller than mid, it can only be present in the left subarray
elif arr[mid] > x:
return binary_search(arr, low, mid - 1, x)

# Else, the element can only be present in the right subarray

else:
return binary_search(arr, mid + 1, high, x)

# If the element is not present in the array

else:
return -1

# Example usage
arr = [2, 3, 4, 10, 40]
x = 10

# Function call
result = binary_search(arr, 0, len(arr) - 1, x)

if result != -1:
print(f"Element is present at index {result}")
else:
print("Element is not present in array")

Output:

22 | P a g e
Machine Learning(3170724) 211240107061

Practical-10

Aim- Implement the non-parametric Locally Weighted Regression algorithm in

order to fit data points. Select appropriate data set for your experiment and
draw graphs.

Code-
import numpy as np
import matplotlib.pyplot as plt

# Locally Weighted Regression (LWR) function

def locally_weighted_regression(X, y, tau, x_query):
"""
Perform Locally Weighted Regression (LWR)
:param X: Training data features (2D array)
:param y: Training data target values
:param tau: Bandwidth (smoothing parameter)
:param x_query: Query point to make prediction (scalar)
:return: Predicted value for x_query
"""
# Add a column of ones to X for intercept term
m = X.shape[0]
X_ = np.column_stack((np.ones(m), X)) # Shape: (m, 2)

# Create x_query with an intercept term

x_query_ = np.array([1, x_query[0]]) # Extract scalar value from x_query array

# Compute weights (using Gaussian kernel)

weights = np.exp(-np.sum((X - x_query)**2, axis=1) / (2 * tau**2))

# Create diagonal weight matrix

W = np.diag(weights)

# Solve for theta using the normal equation

theta = np.linalg.pinv(X_.T @ W @ X_) @ (X_.T @ W @ y)

# Return the predicted value

return x_query_ @ theta

# Generate synthetic dataset

X = np.linspace(0, 10, 100).reshape(-1, 1) # Reshape X to be a 2D array
y = np.sin(X).flatten() + np.random.normal(0, 0.1, X.shape[0]) # Sine wave with noise

# LWR prediction for different points

tau = 0.5 # Bandwidth parameter
y_pred = np.array([locally_weighted_regression(X, y, tau, x_query) for x_query in X])

# Plotting the results

plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='blue', label='Data Points') # Corrected line
23 | P a g e
Machine Learning(3170724) 211240107061

plt.plot(X, y_pred, color='red', label='LWR Fit', linewidth=2)

plt.xlabel('X')
plt.ylabel('y')
plt.title(f'Locally Weighted Regression (tau={tau})')
plt.legend()
plt.show()

Output:

24 | P a g e
Machine Learning(3170724) 211240107061

Practical-11

Aim- Implement python program to demonstrate KNN algorithm.

Code-
import numpy as np
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt
import seaborn as sns

# Load the Wine dataset

data = load_wine()
X = data.data
y = data.target

# Split the data into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize the features

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initialize the KNN classifier

k=5
knn = KNeighborsClassifier(n_neighbors=k)

# Fit the model

knn.fit(X_train, y_train)

# Make predictions
y_pred = knn.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy of KNN with k={k}: {accuracy:.2f}')

# Print confusion matrix and classification report

conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred, target_names=data.target_names)
print("Confusion Matrix:\n", conf_matrix)
print("\nClassification Report:\n", class_report)

# Heatmap of the confusion matrix

plt.figure(figsize=(8, 6))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=data.target_names,

25 | P a g e
Machine Learning(3170724) 211240107061

yticklabels=data.target_names)
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.title('Confusion Matrix')
plt.show()

Output:

26 | P a g e
Machine Learning(3170724) 211240107061

Practical-12

Aim:- Write a program to implement k-Nearest Neighbour algorithm to classify

the iris data set. Print both correct and wrong predictions. Python ML library
classes can be used for this problem.

K-Nearest-Neighbour Algorithm:
1. Load the data
2. Initialize the value of k
3. For getting the predicted class, iterate from 1 to total number of training data points
1. Calculate the distance between test data and each row of training data. Here we will
use Euclidean distance as our distance metric since it’s the most popular method.
The other metrics that can be used are Chebyshev, cosine, etc.
2. Sort the calculated distances in ascending order based on distance values
3. Get top k rows from the sorted array
4. Get the most frequent class of these rows i.e Get the labels of the selected K entries
5. Return the predicted class
If regression, return the mean of the K labels
If classification, return the mode of the K labels

Confusion matrix:
Note, • Class 1 : Positive • Class 2 : Negative

• Positive (P) : Observation is positive (for example: is an apple).

• Negative (N) : Observation is not positive (for example: is not an apple).
• True Positive (TP) : Observation is positive, and is predicted to be positive.
• False Negative (FN) : Observation is positive, but is predicted negative. (Also known as a
"Type II error.")
• True Negative (TN) : Observation is negative, and is predicted to be negative.
• False Positive (FP) : Observation is negative, but is predicted positive. (Also known as a
"Type I error.")

Accuracy: Overall, how often is the classifier correct?

(TP+TN)/total = (100+50)/165 = 0.91
Misclassification Rate: Overall, how often is it wrong?
(FP+FN)/total = (10+5)/165 = 0.09
equivalent to 1 minus Accuracy
also known as "Error Rate“
True Positive Rate: When it's actually yes, how often does it predict yes?
TP/actual yes = 100/105 = 0.95
27 | P a g e
Machine Learning(3170724) 211240107061

also known as "Sensitivity" or "Recall"

False Positive Rate: When it's actually no, how often does it predict yes?
FP/actual no = 10/60 = 0.17
True Negative Rate: When it's actually no, how often does it predict no?
TN/actual no = 50/60 = 0.83
equivalent to 1 minus False Positive Rate
also known as "Specificity“
Precision: When it predicts yes, how often is it correct?
TP/predicted yes = 100/110 = 0.91
Prevalence: How often does the yes condition actually occur in our sample?
actual yes/total = 105/165 = 0.64

Source Code:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
import pandas as pd
dataset=pd.read_csv("iris.csv")
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=0,test_size=0.25)

classifier=KNeighborsClassifier(n_neighbors=8,p=3,metric='euclidean')

classifier.fit(X_train,y_train)

#predict the test resuts

y_pred=classifier.predict(X_test)

cm=confusion_matrix(y_test,y_pred)
print('Confusion matrix is as follows\n',cm)
print('Accuracy Metrics')
print(classification_report(y_test,y_pred))
print(" correct predicition",accuracy_score(y_test,y_pred))
print(" worng predicition",(1-accuracy_score(y_test,y_pred)))

28 | P a g e
Machine Learning(3170724) 211240107061

Output :
Confusion matrix is as follows
[[13 0 0]
[ 0 15 1]
[ 0 0 9]]

Accuracy Metrics
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 13
Iris-versicolor 1.00 0.94 0.97 16
Iris-virginica 0.90 1.00 0.95 9

avg / total 0.98 0.97 0.97 38

correct predicition 0.9736842105263158

worng predicition 0.02631578947368418

29 | P a g e
Machine Learning(3170724) 211240107061

Practical-13

Aim- Write a program to implement the naïve Bayesian classifier for a sample
training dataset stored as a .CSV file. Compute the accuracy of the
classifier, considering few test data sets.

Code-
import csv
import random
import math

def loadcsv(filename):
lines = csv.reader(open(filename, "r"));
dataset = list(lines)
for i in range(len(dataset)):
#converting strings into numbers for processing
dataset[i] = [float(x) for x in dataset[i]]

return dataset

def splitdataset(dataset, splitratio):

#67% training size
trainsize = int(len(dataset) * splitratio);
trainset = []
copy = list(dataset);
while len(trainset) < trainsize:
#generate indices for the dataset list randomly to pick ele for training data
index = random.randrange(len(copy));
trainset.append(copy.pop(index))
return [trainset, copy]

def separatebyclass(dataset):
separated = {} #dictionary of classes 1 and 0
#creates a dictionary of classes 1 and 0 where the values are
#the instances belonging to each class
for i in range(len(dataset)):
vector = dataset[i]
if (vector[-1] not in separated):
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated

def mean(numbers):
return sum(numbers)/float(len(numbers))

30 | P a g e
Machine Learning(3170724) 211240107061

def stdev(numbers):
avg = mean(numbers)
variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-1)
return math.sqrt(variance)

def summarize(dataset): #creates a dictionary of classes

summaries = [(mean(attribute), stdev(attribute)) for attribute in zip(*dataset)];
del summaries[-1] #excluding labels +ve or -ve
return summaries

def summarizebyclass(dataset):
separated = separatebyclass(dataset);
#print(separated)
summaries = {}
for classvalue, instances in separated.items():
#for key,value in dic.items()
#summaries is a dic of tuples(mean,std) for each class value
summaries[classvalue] = summarize(instances) #summarize is used to cal to
mean and std
return summaries

def calculateprobability(x, mean, stdev):

exponent = math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2))))
return (1 / (math.sqrt(2*math.pi) * stdev)) * exponent

def calculateclassprobabilities(summaries, inputvector):

probabilities = {} # probabilities contains the all prob of all class of test data
for classvalue, classsummaries in summaries.items():#class and attribute information
as mean and sd
probabilities[classvalue] = 1
for i in range(len(classsummaries)):
mean, stdev = classsummaries[i] #take mean and sd of every attribute
for class 0 and 1 seperaely
x = inputvector[i] #testvector's first attribute
probabilities[classvalue] *= calculateprobability(x, mean, stdev);#use
normal dist
return probabilities

def predict(summaries, inputvector): #training and test data is passed

probabilities = calculateclassprobabilities(summaries, inputvector)
bestLabel, bestProb = None, -1
for classvalue, probability in probabilities.items():#assigns that class which has he
highest prob
if bestLabel is None or probability > bestProb:
bestProb = probability

31 | P a g e
Machine Learning(3170724) 211240107061

bestLabel = classvalue
return bestLabel

def getpredictions(summaries, testset):

predictions = []
for i in range(len(testset)):
result = predict(summaries, testset[i])
predictions.append(result)
return predictions

def getaccuracy(testset, predictions):

correct = 0
for i in range(len(testset)):
if testset[i][-1] == predictions[i]:
correct += 1
return (correct/float(len(testset))) * 100.0

def main():
filename = 'F:\programmes\python_programmes\ML_Practicals\dataset.csv'
splitratio = 0.67
dataset = loadcsv(filename);

trainingset, testset = splitdataset(dataset, splitratio)

print('Split {0} rows into train={1} and test={2} rows'.format(len(dataset),
len(trainingset), len(testset)))
# prepare model
summaries = summarizebyclass(trainingset);
#print(summaries)
# test model
predictions = getpredictions(summaries, testset) #find the predictions of test data with
the training data
accuracy = getaccuracy(testset, predictions)
print('Accuracy of the classifier is : {0}%'.format(accuracy))

main()

Output:

32 | P a g e
Machine Learning(3170724) 211240107061

Practical-14

Aim- Write a program to construct Bayesian network medical.

Code-
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

# Load the Iris dataset

iris = load_iris()
X = iris.data # Features
y = iris.target # Labels

# Add noise to the dataset

np.random.seed(42)
noise = np.random.normal(0, 0.5, X.shape)
X_noisy = X + noise

# Split the noisy data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X_noisy, y, test_size=0.3, random_state=42)

# Initialize the KNN classifier with k=3

knn = KNeighborsClassifier(n_neighbors=3)

# Fit the model to the noisy training data

knn.fit(X_train, y_train)

# Make predictions on the noisy test set

y_pred = knn.predict(X_test)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

# Optional: Visualize some of the results (2D plot)

# For simplicity, we'll use only the first two features for visualization
X_train_2d = X_train[:, :2]
X_test_2d = X_test[:, :2]

# Fit the model again using the 2D data

knn.fit(X_train_2d, y_train)
y_pred_2d = knn.predict(X_test_2d)

# Plot the decision boundaries

h = .02 # step size in the mesh
x_min, x_max = X_train_2d[:, 0].min() - 1, X_train_2d[:, 0].max() + 1
33 | P a g e
Machine Learning(3170724) 211240107061

y_min, y_max = X_train_2d[:, 1].min() - 1, X_train_2d[:, 1].max() + 1

xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, alpha=0.8)

plt.scatter(X_train_2d[:, 0], X_train_2d[:, 1], c=y_train, edgecolor='k', marker='o')
plt.scatter(X_test_2d[:, 0], X_test_2d[:, 1], c=y_test, marker='o') # Removed edgecolor
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('KNN Decision Boundaries with Noisy Data')
plt.show()

Output:

34 | P a g e
Machine Learning(3170724) 211240107061

Practical-15

Aim- Assuming a set of documents that need to be classified, use the naïve
Bayesian Classifier model to perform this task. Built-in Java classes/ API can
be used to write the program. Calculate the accuracy, precision, and recall for
your dataset.

Code-
# Import necessary libraries
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, classification_report
import numpy as np

# Step 1: Create improved and balanced training data

train_docs = [
"Python is a popular programming language for web development",
"JavaScript enables dynamic content on web pages",
"Rain showers are expected this weekend",
"The sun will shine brightly tomorrow",
"Java is used extensively for enterprise software",
"Thunderstorms might disrupt our travel plans",
"Cloudy skies are common in the monsoon season",
"Developers love Python and JavaScript equally",
]

train_labels = ["tech", "tech", "weather", "weather", "tech", "weather", "weather", "tech"]

# Step 2: Provide clearer test documents

test_docs = [
"Python development is rewarding", # Clearly tech
"It rained all day today", # Clearly weather
"Bright sunny days are ideal for debugging", # Ambiguous but likely tech
]

test_labels = ["tech", "weather", "tech"]

# Step 3: Convert text data into TF-IDF feature vectors

vectorizer = TfidfVectorizer(stop_words='english', max_features=100, ngram_range=(1, 2))
X_train = vectorizer.fit_transform(train_docs)
X_test = vectorizer.transform(test_docs)

# Step 4: Train the Naive Bayes model with optimal alpha

model = MultinomialNB(alpha=0.1) # Smaller alpha for more sensitive learning
model.fit(X_train, train_labels)

# Step 5: Predict the class of test documents

predicted_labels = model.predict(X_test)

35 | P a g e
Machine Learning(3170724) 211240107061

# Step 6: Calculate metrics

accuracy = accuracy_score(test_labels, predicted_labels)
precision = precision_score(test_labels, predicted_labels, average='weighted', zero_division=0)
recall = recall_score(test_labels, predicted_labels, average='weighted', zero_division=0)

# Display the metrics and results

print(f"Accuracy: {accuracy * 100:.2f}%")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print("\nClassification Report:")
print(classification_report(test_labels, predicted_labels, zero_division=0))

# Optional: View top features used by the model

print("\nTop Features:")
print(vectorizer.get_feature_names_out()[:10])

Output:

36 | P a g e
Machine Learning(3170724) 211240107061

Practical-16

Aim- Implementation on random forest algorithm.

Code-
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, StratifiedKFold, cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import LabelEncoder

# Load the dataset

file_path = "C:/Users/Admin/OneDrive/Desktop/7th Sem Practicals/Machine Learning
Practicals/data.csv"
data = pd.read_csv(file_path)

# Inspect data types

print(data.dtypes)

# Identify categorical columns and apply One-Hot Encoding

X = pd.get_dummies(data.iloc[:, :-1]) # Encode features
y = data.iloc[:, -1] # Target

# Apply Label Encoding to the target if it's categorical

if y.dtype == 'object':
le = LabelEncoder()
y = le.fit_transform(y)

# Introduce random noise into the target variable by shuffling it

y = np.random.permutation(y)

# Reduce the random noise added to numeric features

X += np.random.normal(0, 0.5, X.shape) # Decrease noise level

# Split the data into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize Random Forest with slightly higher complexity

clf = RandomForestClassifier(n_estimators=15, max_depth=4, random_state=42) # Increase
n_estimators and max_depth
clf.fit(X_train, y_train)

# Make predictions and evaluate the model

y_pred = clf.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print(f"\nAccuracy: {accuracy * 100:.2f}%")
print("\nClassification Report:\n", classification_report(y_test, y_pred))

37 | P a g e
Machine Learning(3170724) 211240107061

# Cross-validation with Stratified K-Folds

skf = StratifiedKFold(n_splits=3)
scores = cross_val_score(clf, X, y, cv=skf)
print(f"Cross-Validation Accuracy: {scores.mean() * 100:.2f}%")

Output:

38 | P a g e
Machine Learning(3170724) 211240107061

Practical-17

Aim- Write a algorithm program for APRIORI algorithm.

Code-
from itertools import combinations

def get_frequent_itemsets(transactions, min_support):

# Create a dictionary to hold the support counts
support_counts = {}

# Get individual items and count their occurrences

for transaction in transactions:
for item in transaction:
if item in support_counts:
support_counts[item] += 1
else:
support_counts[item] = 1

# Filter out items that do not meet the minimum support

frequent_itemsets = {item: count for item, count in support_counts.items() if count >=
min_support}
itemsets = [set([item]) for item in frequent_itemsets.keys()]

# Generate larger itemsets

k=2
while itemsets:
current_itemsets = []
for i in range(len(itemsets)):
for j in range(i + 1, len(itemsets)):
# Generate new itemset by combining two itemsets
new_itemset = itemsets[i] | itemsets[j]
if len(new_itemset) == k:
current_itemsets.append(new_itemset)

# Count support for new itemsets

support_counts = {}
for transaction in transactions:
for itemset in current_itemsets:
if itemset.issubset(transaction):
if frozenset(itemset) in support_counts:
support_counts[frozenset(itemset)] += 1
else:
support_counts[frozenset(itemset)] = 1

# Filter out itemsets that do not meet the minimum support

itemsets = [itemset for itemset, count in support_counts.items() if count >= min_support]

# Add frequent itemsets to the result

frequent_itemsets.update({itemset: count for itemset, count in support_counts.items() if
39 | P a g e
Machine Learning(3170724) 211240107061

count >= min_support})

k += 1

return frequent_itemsets

# Example usage
if name == " main ":
transactions = [
['milk', 'bread', 'cookies'],
['milk', 'diaper', 'bread', 'cookies'],
['milk', 'diaper', 'bread'],
['bread', 'cookies'],
['milk', 'diaper', 'cookies'],
['milk', 'bread', 'diaper']
]

min_support = 2
frequent_itemsets = get_frequent_itemsets(transactions, min_support)

print("Frequent Itemsets:")
for itemset, count in frequent_itemsets.items():
print(f"{set(itemset)}: {count}")

Output:

40 | P a g e
Machine Learning(3170724) 211240107061

Practical-18

Aim- Implement the non-parametric Locally Weighted Regression algorithm in

order to fit data points. Select appropriate data set for your experiment and
draw graphs.

Code-
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

def kernel(point, xmat, k):

m,n = np.shape(xmat)
weights = np.mat(np.eye((m)))
for j in range(m):
diff = point - X[j]
weights[j,j] = np.exp(diff*diff.T/(-2.0*k**2))
return weights

def localWeight(point, xmat, ymat, k):

wei = kernel(point,xmat,k)
W = (X.T*(wei*X)).I*(X.T*(wei*ymat.T))
return W

def localWeightRegression(xmat, ymat, k):

m,n = np.shape(xmat)
ypred = np.zeros(m)
for i in range(m):
ypred[i] = xmat[i]*localWeight(xmat[i],xmat,ymat,k)
return ypred

# load data points

data = pd.read_csv('F:\programmes\python_programmes\ML_Practicals\hotel_bill.csv')
bill = np.array(data.total_bill)
tip = np.array(data.tip)

#preparing and add 1 in bill

mbill = np.mat(bill)
mtip = np.mat(tip)

m= np.shape(mbill)[1]
one = np.mat(np.ones(m))
X = np.hstack((one.T,mbill.T))

41 | P a g e
Machine Learning(3170724) 211240107061

#set k here
ypred = localWeightRegression(X,mtip,0.5)
SortIndex = X[:,1].argsort(0)
xsort = X[SortIndex][:,0]

fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(bill,tip, color='green')
ax.plot(xsort[:,1],ypred[SortIndex], color = 'red', linewidth=5)
plt.xlabel('Total bill')
plt.ylabel('Tip')
plt.show()

Output-

42 | P a g e
Machine Learning(3170724) 211240107061

Practical-19

Aim- Implementing Bag of Words Model in Python.

Code-
import numpy as np
from nltk.tokenize import word_tokenize
from collections import defaultdict

data = ['She loves pizza, pizza is delicious.','She is a good person.','good people are the best.']

sentences = []
vocab = []
for sent in data:
x = word_tokenize(sent)
sentence = [w.lower() for w in x if w.isalpha() ]
sentences.append(sentence)
for word in sentence:
if word not in vocab:
vocab.append(word)

len_vector = len(vocab)
index_word = {}
i=0
for word in vocab:
index_word[word] = i
i += 1
def bag_of_words(sent):
count_dict = defaultdict(int)
vec = np.zeros(len_vector)
for item in sent:
count_dict[item] += 1
for key,item in count_dict.items():
vec[index_word[key]] = item
return vec
vector = bag_of_words(sentences[0])
print(vector)

Output-

43 | P a g e
Machine Learning(3170724) 211240107061

Practical-20

Aim- Implementation of Principal Component Analysis (PCA) algorithm.

Code-
import numpy as np
import matplotlib.pyplot as plt
import numpy.random as rnd

mu = np.array([10,13])
sigma = np.array([[3.5, -1.8], [-1.8,3.5]])

print("Mu ", mu.shape)

print("Sigma ", sigma.shape)

org_data = rnd.multivariate_normal(mu, sigma, size=(1000))

print("Data shape ", org_data.shape)

mean = np.mean(org_data, axis= 0)

print("Mean ", mean.shape)
mean_data = org_data - mean
print("Data after subtracting mean ", org_data.shape, "\n")

cov = np.cov(mean_data.T)
cov = np.round(cov, 2)
print("Covariance matrix ", cov.shape, "\n")

eig_val, eig_vec = np.linalg.eig(cov)

print("Eigen vectors ", eig_vec)
print("Eigen values ", eig_val, "\n")

indices = np.arange(0,len(eig_val), 1)
indices = ([x for _,x in sorted(zip(eig_val, indices))])[::-1]
eig_val = eig_val[indices]
eig_vec = eig_vec[:,indices]
print("Sorted Eigen vectors ", eig_vec)
print("Sorted Eigen values ", eig_val, "\n")

sum_eig_val = np.sum(eig_val)
explained_variance = eig_val/ sum_eig_val
print(explained_variance)
cumulative_variance = np.cumsum(explained_variance)
print(cumulative_variance)

44 | P a g e
Machine Learning(3170724) 211240107061

pca_data = np.dot(mean_data, eig_vec)

print("Transformed data ", pca_data.shape)

plt.plot(explained_variance,cumulative_variance)
plt.show()

Output:-

45 | P a g e
Machine Learning(3170724) 211240107061

Practical-21

Aim- Implementation of Independent Component Analysis (ICA).

Code-
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import FastICA

# Step 1: Generate sample data

# Set random seed for reproducibility
np.random.seed(0)

# Generate Sine and Square signals

time = np.linspace(0, 8, 200)
s1 = np.sin(2 * time) # Sine wave
s2 = np.sign(np.sin(3 * time)) # Square wave
s3 = np.random.rand(200) # Random noise

# Stack signals together

S = np.c_[s1, s2, s3]

# Mix the signals

A = np.array([[1, 1, 0.5], [0.5, 1, 1], [1, 0.5, 1]]) # Mixing matrix
X = S.dot(A.T) # Mixed signals

# Step 2: Apply ICA

ica = FastICA(n_components=3) # Number of components to extract
S_ = ica.fit_transform(X) # Independent components
A_ = ica.mixing_ # Estimated mixing matrix

# Step 3: Plot the results

plt.figure(figsize=(12, 8))

# Original signals
plt.subplot(4, 1, 1)
plt.title('Original Signals')
plt.plot(S)

# Mixed signals
plt.subplot(4, 1, 2)
plt.title('Mixed Signals')
plt.plot(X)

# Recovered signals
plt.subplot(4, 1, 3)
plt.title('Independent Components (Recovered Signals)')
plt.plot(S_)

46 | P a g e
Machine Learning(3170724) 211240107061

# Mixing matrix
plt.subplot(4, 1, 4)
plt.title('Estimated Mixing Matrix')
plt.imshow(A_, aspect='auto', cmap='viridis')
plt.colorbar()

plt.tight_layout()
plt.show()

Output:

47 | P a g e
Machine Learning(3170724) 211240107061

Practical-22

Aim- Implement Python Program to Demonstrate K-Means and EM Algorithm for

Machine Learning.

Code-
from sklearn.cluster import KMeans
from sklearn.mixture import GaussianMixture
import sklearn.metrics as metrics
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

names = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width', 'Class']

dataset = pd.read_csv("F:\programmes\python_programmes\ML_Practicals\iris_dataset.csv",
names=names)

X = dataset.iloc[:, :-1]

label = {'Iris-setosa': 0,'Iris-versicolor': 1, 'Iris-virginica': 2}

y = [label[c] for c in dataset.iloc[:, -1]]

plt.figure(figsize=(14,7))
colormap=np.array(['red','lime','black'])

# REAL PLOT
plt.subplot(1,3,1)
plt.title('Real')
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y])

# K-PLOT
model=KMeans(n_clusters=3, random_state=0).fit(X)
plt.subplot(1,3,2)
plt.title('KMeans')
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[model.labels_])

print('The accuracy score of K-Mean: ',metrics.accuracy_score(y, model.labels_))

print('The Confusion matrixof K-Mean:\n',metrics.confusion_matrix(y, model.labels_))

# GMM PLOT
gmm=GaussianMixture(n_components=3, random_state=0).fit(X)
y_cluster_gmm=gmm.predict(X)
plt.subplot(1,3,3)

48 | P a g e
Machine Learning(3170724) 211240107061

plt.title('GMM Classification')
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y_cluster_gmm])

print('The accuracy score of EM: ',metrics.accuracy_score(y, y_cluster_gmm))

print('The Confusion matrix of EM:\n ',metrics.confusion_matrix(y, y_cluster_gmm))

Output-

49 | P a g e
Machine Learning(3170724) 211240107061

Practical-23

Aim- Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the
same data set for clustering using k-Means algorithm. Compare the results of
these two algorithms and comment on the quality of clustering. You can add
Java/ Python ML library classes/ API in the program.

Code-
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.mixture import GaussianMixture
from sklearn.metrics import silhouette_score
from sklearn.preprocessing import StandardScaler

# Step 1: Load the dataset

# Replace 'your_dataset.csv' with your actual CSV file path
file_path = "C:/Users/Admin/OneDrive/Desktop/7th Sem Practicals/Machine Learning
Practicals/Student_performance_data _.csv"
data = pd.read_csv(file_path)

# Step 2: Preprocess the data (if necessary)

# Assuming the data needs to be standardized
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)

# Step 3: Apply k-Means Clustering

kmeans = KMeans(n_clusters=3, random_state=42) # Change n_clusters as needed
kmeans_labels = kmeans.fit_predict(data_scaled)

# Step 4: Apply Gaussian Mixture Model (GMM) Clustering

gmm = GaussianMixture(n_components=3, random_state=42) # Change n_components as needed
gmm_labels = gmm.fit_predict(data_scaled)

# Step 5: Evaluate clustering performance

kmeans_silhouette = silhouette_score(data_scaled, kmeans_labels)
gmm_silhouette = silhouette_score(data_scaled, gmm_labels)

print(f'K-Means Silhouette Score: {kmeans_silhouette:.4f}')

print(f'GMM Silhouette Score: {gmm_silhouette:.4f}')

# Step 6: Visualize Clustering Results (for 2D data)

if data.shape[1] == 2:
plt.figure(figsize=(12, 6))

# k-Means clustering results

plt.subplot(1, 2, 1)
plt.scatter(data_scaled[:, 0], data_scaled[:, 1], c=kmeans_labels, cmap='viridis', marker='o')
50 | P a g e
Machine Learning(3170724) 211240107061

plt.title('k-Means Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')

# GMM clustering results

plt.subplot(1, 2, 2)
plt.scatter(data_scaled[:, 0], data_scaled[:, 1], c=gmm_labels, cmap='viridis', marker='o')
plt.title('GMM Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')

plt.tight_layout()
plt.show()
else:
print("Data has more than 2 dimensions. Visualization is not applicable.")

Output:

51 | P a g e
Machine Learning(3170724) 211240107061

Practical-24

Aim- Build an Artificial Neural Network by implementing the Backpropagation

algorithm and test the same using appropriate data sets.

Code-
import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X, axis=0) # maximum of X array longitudinally
y = y/100

# Sigmoid Function
def sigmoid(x):
return 1/(1 + np.exp(-x))

# Derivative of Sigmoid Function

def derivatives_sigmoid(x):
return x * (1 - x)

# Variable initialization
epoch = 7000 # Setting training iterations
lr = 0.1 # Setting learning rate

inputlayer_neurons = 2 # number of features in data set

hiddenlayer_neurons = 3 # number of hidden layers neurons
output_neurons = 1 # number of neurons at output layer

# weight and bias initialization

wh = np.random.uniform(size=(inputlayer_neurons, hiddenlayer_neurons))
bh = np.random.uniform(size=(1, hiddenlayer_neurons))
wout = np.random.uniform(size=(hiddenlayer_neurons, output_neurons))
bout = np.random.uniform(size=(1, output_neurons))

# draws a random range of numbers uniformly of dim x*y

for i in range(epoch):
# Forward Propogation
hinp1 = np.dot(X, wh)
hinp = hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1 = np.dot(hlayer_act, wout)
outinp = outinp1 + bout
output = sigmoid(outinp)
# Backpropagation
EO = y-output

52 | P a g e
Machine Learning(3170724) 211240107061

outgrad = derivatives_sigmoid(output)
d_output = EO * outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act) # how much hidden layer wts
#contributed to error
d_hiddenlayer = EH * hiddengrad
wout += hlayer_act.T.dot(d_output) *lr# dotproduct of nextlayererror and
#currentlayerop
# bout += np.sum(d_output, axis=0,keepdims=True) *lr
wh += X.T.dot(d_hiddenlayer) *lr
#bh += np.sum(d_hiddenlayer, axis=0,keepdims=True) *lr
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" ,output)

Output-

53 | P a g e
Machine Learning(3170724) 211240107061

Practical-25

Aim- Write a python program to measure EUCLIDEAL distance between two

different variables A1 & B2.

Code-
import numpy as np

def euclidean_distance(A1, B2):

"""
Calculate the Euclidean distance between two points A1 and B2.

Parameters:
A1 (tuple): A tuple representing the coordinates of point A1 (x1, y1).
B2 (tuple): A tuple representing the coordinates of point B2 (x2, y2).

Returns:
float: The Euclidean distance between A1 and B2.
"""
return np.sqrt((B2[0] - A1[0])**2 + (B2[1] - A1[1])**2)

# Example usage
A1 = (1, 2) # Coordinates of point A1
B2 = (4, 6) # Coordinates of point B2

distance = euclidean_distance(A1, B2)

print(f"The Euclidean distance between A1 {A1} and B2 {B2} is: {distance:.2f}")

Output:

54 | P a g e

VC Exit Predictor Technical Documentation
No ratings yet
VC Exit Predictor Technical Documentation
8 pages
Microsoft Certified Azure AI Fundamentals
No ratings yet
Microsoft Certified Azure AI Fundamentals
75 pages
My ML Lab Manual
No ratings yet
My ML Lab Manual
21 pages
ML Lab Manual
No ratings yet
ML Lab Manual
26 pages
Edited - Edited - Final ML Lab Manual Version11
No ratings yet
Edited - Edited - Final ML Lab Manual Version11
83 pages
MLT LAB1
No ratings yet
MLT LAB1
27 pages
Ml_Lab_Manual
No ratings yet
Ml_Lab_Manual
70 pages
ML New record (5)
No ratings yet
ML New record (5)
51 pages
Machine Learning Lab Manual (15CSL76)
No ratings yet
Machine Learning Lab Manual (15CSL76)
30 pages
22K61A0618_removed_lab manual sasi cld
No ratings yet
22K61A0618_removed_lab manual sasi cld
25 pages
Machine_learning_laboratory
No ratings yet
Machine_learning_laboratory
44 pages
ML_LAB Record_final
No ratings yet
ML_LAB Record_final
39 pages
Lab Manual Final
No ratings yet
Lab Manual Final
34 pages
Lab Manual: Department of Computer Science and Engineering
No ratings yet
Lab Manual: Department of Computer Science and Engineering
30 pages
IT ML Lab
No ratings yet
IT ML Lab
35 pages
Machine Learning Techniques Lab: Session: 2023-24, Even Semester
No ratings yet
Machine Learning Techniques Lab: Session: 2023-24, Even Semester
20 pages
ML LAB
No ratings yet
ML LAB
51 pages
Jntuk R20 ML
No ratings yet
Jntuk R20 ML
43 pages
original ML lab manual (1)
No ratings yet
original ML lab manual (1)
22 pages
ML (1)(LAB)
No ratings yet
ML (1)(LAB)
51 pages
School of Engineering: Lab Manual On Machine Learning Lab
No ratings yet
School of Engineering: Lab Manual On Machine Learning Lab
23 pages
Outcome Based Lab Report
No ratings yet
Outcome Based Lab Report
22 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
31 pages
ML Lab Manual-17csl76
No ratings yet
ML Lab Manual-17csl76
43 pages
R20-21NM-III-I-ML-LAB MANUAL (1)
No ratings yet
R20-21NM-III-I-ML-LAB MANUAL (1)
38 pages
MLT(1)
No ratings yet
MLT(1)
18 pages
ML RECORD NEW FORMAT
No ratings yet
ML RECORD NEW FORMAT
48 pages
201CS240-MLLABMANUAL
No ratings yet
201CS240-MLLABMANUAL
20 pages
ML-LAB-MANUAL-R20
No ratings yet
ML-LAB-MANUAL-R20
77 pages
AD3461_ML Lab Manual
No ratings yet
AD3461_ML Lab Manual
54 pages
Updated ML LAB Manual-2020-21
No ratings yet
Updated ML LAB Manual-2020-21
57 pages
ML LAB record[1]
No ratings yet
ML LAB record[1]
35 pages
ML Lab
No ratings yet
ML Lab
7 pages
ML Lab
No ratings yet
ML Lab
45 pages
24CSPC212-PIC Lab Manual
No ratings yet
24CSPC212-PIC Lab Manual
45 pages
ML-LAB-MANUAL-R20-1
No ratings yet
ML-LAB-MANUAL-R20-1
63 pages
Machine Learning Lab Mannual CS 601
No ratings yet
Machine Learning Lab Mannual CS 601
30 pages
Machine Learning Lab (17CSL76)
No ratings yet
Machine Learning Lab (17CSL76)
48 pages
ML Lab
No ratings yet
ML Lab
49 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
34 pages
Machine learning
No ratings yet
Machine learning
27 pages
Cat 2 Document Likkitha
No ratings yet
Cat 2 Document Likkitha
80 pages
Machine Learning Techniques LAB FILE - KAI651
No ratings yet
Machine Learning Techniques LAB FILE - KAI651
16 pages
ML Lab Manual - Ex No. 1 To 9
No ratings yet
ML Lab Manual - Ex No. 1 To 9
26 pages
ML Lab R20
No ratings yet
ML Lab R20
42 pages
Machine Learning Lab Record: Dr. Sarika Hegde
No ratings yet
Machine Learning Lab Record: Dr. Sarika Hegde
23 pages
AIML Practical exam codes 1
No ratings yet
AIML Practical exam codes 1
7 pages
MANUAL (1)
No ratings yet
MANUAL (1)
34 pages
B.TECH Machine Learning-Lab
No ratings yet
B.TECH Machine Learning-Lab
99 pages
ML Manual
No ratings yet
ML Manual
34 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
43 pages
FINAL LAB PROGRAMS (2)
No ratings yet
FINAL LAB PROGRAMS (2)
52 pages
MLAll Practical
No ratings yet
MLAll Practical
27 pages
ML LAB PROGRAMS
No ratings yet
ML LAB PROGRAMS
42 pages
15CSL76
No ratings yet
15CSL76
35 pages
Abhishek ML File
No ratings yet
Abhishek ML File
23 pages
MLWP LAB Experiment's
No ratings yet
MLWP LAB Experiment's
11 pages
Ashin ML Record - Merged
No ratings yet
Ashin ML Record - Merged
53 pages
ML Lab Manual (1-9)
No ratings yet
ML Lab Manual (1-9)
37 pages
Machine Learning Lab File
No ratings yet
Machine Learning Lab File
48 pages
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Core Java Programming Book
From Everand
Core Java Programming Book
Manish Soni
No ratings yet
Predictive Model For Diabetes Using Machine Learning
No ratings yet
Predictive Model For Diabetes Using Machine Learning
38 pages
Ai Based Electronic Gadget Recommendation System
No ratings yet
Ai Based Electronic Gadget Recommendation System
12 pages
Review and Analysis of Deep Neural Network Models For Alzheimer's Disease Classification Using Brain Medical Resonance Imaging
No ratings yet
Review and Analysis of Deep Neural Network Models For Alzheimer's Disease Classification Using Brain Medical Resonance Imaging
13 pages
Machine Learning With Decision Trees and Random Forest ?
No ratings yet
Machine Learning With Decision Trees and Random Forest ?
31 pages
Internship Reportfinal
No ratings yet
Internship Reportfinal
21 pages
Lab Program 9
No ratings yet
Lab Program 9
5 pages
Heart Disease Paper
No ratings yet
Heart Disease Paper
10 pages
Unit 4_Question Bank and answers
No ratings yet
Unit 4_Question Bank and answers
23 pages
Lecture 7.2 - DTC Algorithm Implementation
No ratings yet
Lecture 7.2 - DTC Algorithm Implementation
7 pages
Essentials of ML, VITOL Course DA1
No ratings yet
Essentials of ML, VITOL Course DA1
5 pages
Decision tree
No ratings yet
Decision tree
16 pages
Deep-Learning-Based Stair Detection Using 3D Point Cloud Data For Preventing Walking Accidents of The Visually Impaired
No ratings yet
Deep-Learning-Based Stair Detection Using 3D Point Cloud Data For Preventing Walking Accidents of The Visually Impaired
7 pages
Answer Key Sample Paper 3 AI Class 10
No ratings yet
Answer Key Sample Paper 3 AI Class 10
12 pages
Aiml Report PbL PDF Group No. 2 FINAL (2)
No ratings yet
Aiml Report PbL PDF Group No. 2 FINAL (2)
20 pages
AI 10TH UNIT 3
No ratings yet
AI 10TH UNIT 3
42 pages
UNIT4 Confusion Matrix
No ratings yet
UNIT4 Confusion Matrix
12 pages
Anomaly Detection For Industrial Surface Inspection Application in Maintenance of Aircraft Components
No ratings yet
Anomaly Detection For Industrial Surface Inspection Application in Maintenance of Aircraft Components
6 pages
A Comparative Study On Liver Disease Prediction Using Supervised Machine Learning Algorithms 3
No ratings yet
A Comparative Study On Liver Disease Prediction Using Supervised Machine Learning Algorithms 3
5 pages
MLDD
No ratings yet
MLDD
19 pages
AI_based_Software_Testing
No ratings yet
AI_based_Software_Testing
21 pages
Metrics For Multi-Class Classification
No ratings yet
Metrics For Multi-Class Classification
17 pages
Machine Learning Algorithms For Satellite Image Classification Using Google Earth Engine and Landsat Satellite Data Morocco Case Study
No ratings yet
Machine Learning Algorithms For Satellite Image Classification Using Google Earth Engine and Landsat Satellite Data Morocco Case Study
16 pages
Implementation of Chatbot On University Website Using RASA Framework
No ratings yet
Implementation of Chatbot On University Website Using RASA Framework
6 pages
1
No ratings yet
1
7 pages
Unit 6-Feature Engineering and Sensitivity Analysis
No ratings yet
Unit 6-Feature Engineering and Sensitivity Analysis
63 pages
1 s2.0 S2772442522000016 Main
No ratings yet
1 s2.0 S2772442522000016 Main
18 pages
Enhancing The Prediction of Student Performance Based On The Machine Learning XGBoost Algorithm
No ratings yet
Enhancing The Prediction of Student Performance Based On The Machine Learning XGBoost Algorithm
21 pages