0% found this document useful (0 votes)
18 views

ML Lab Manual Devansh (1)

This document is a laboratory manual for the Machine Learning course at Sardar Patel College of Engineering. It outlines various practical experiments, including implementations of algorithms such as FIND-S, Candidate-Elimination, linear regression, logistic regression, and decision trees, among others. Each experiment includes aims, code snippets, and expected outputs, providing a comprehensive guide for students to learn and apply machine learning techniques.

Uploaded by

devanshpatel0144
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

ML Lab Manual Devansh (1)

This document is a laboratory manual for the Machine Learning course at Sardar Patel College of Engineering. It outlines various practical experiments, including implementations of algorithms such as FIND-S, Candidate-Elimination, linear regression, logistic regression, and decision trees, among others. Each experiment includes aims, code snippets, and expected outputs, providing a comprehensive guide for students to learn and apply machine learning techniques.

Uploaded by

devanshpatel0144
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

SARDAR PATEL EDUCATION CAMPUS

SARDAR PATEL COLLEGE OF ENGINEERING


(SPCE), BAKROL-124

GUJARAT TECHNOLOGICAL UNIVERSITY


BACHELOR OF ENGINEERING

DEPARTMENT OF COMPUTER ENGINEERING

MACHINE LEARNING(3170724)

7th Semester

LABORATORY MANUAL
SARDAR PATEL COLLEGE OF ENGINEERING
BAKROL, ANAND

CERTIFICATE

This is to certify that Mr./Miss.

of Enrollment No.
has satisfactorily completed his/her term work in the subject
for the term ending in
20 /20 .

Date:

Signature of Teacher Head of Department


TABLE OF CONTENT

Experiment Experiment Submission


SR NO Sign
Title Date Date
1 Read the training data from a .CSV file.
Implement and demonstrate the FIND-S algorithm
for finding the most specific hypothesis based on a
2 given set of training data samples. Read the training
data from a .CSV file. Create Excel file Weather.csv
and save it in same path
For a given set of training data examples stored in
3 a .CSV file, implement and demonstrate the
Candidate-Elimination algorithm to output a
description of the set of all hypotheses consistent
with the training examples. Create Excel file
Training_examples.csv and save it in same path.
Implement the linear regression. (a) With attribute
4
X and Y. (B) With linear regrate () function.
Implement logistic regression. (a) binary (b)
5
multinomial (c) ordinal
Write a program for polynomial regression.
6

Write a program to demonstrate the working of


7
the decision tree based ID3 algorithm. Use an
appropriate dataset for building the decision tree
and apply this knowledge to classify a new
sample.
Implement SVR (Super vector regression).
8

Python program for recursive binary search.


9
Implement the non-parametric Locally Weighted
10 Regression algorithm in order to fit data points.
Select appropriate data set for your experiment and
draw graphs.
Implement python program to demonstrate KNN
algorithm
11

Write a program to implement k-Nearest Neighbour


algorithm to classify the iris data set. Print both
12 correct and wrong predictions. Python ML library
classes can be used for this problem.

Write a program to implement the naïve Bayesian


13
classifier for a sample training dataset stored as
a .CSV file. Compute the accuracy of the
classifier, considering few test data sets
Write a program to construct Bayesian network
medical.
14
Assuming a set of documents that need to be
15 classified, use the naïve Bayesian Classifier model to
perform this task. Built-in Java classes/ API can be
used to write the program. Calculate the accuracy,
precision, and recall for your dataset.
Implementation on random forest algorithm.
16
Write a algorithm program for APRIORI algorithm.
17
Implement the non-parametric Locally Weighted
18 Regression algorithm in order to fit data points.
Select appropriate data set for your experiment and
draw graphs.
Implementing Bag of Words Model in Python.
19
Implementation of Principal Component Analysis
20 (PCA) algorithm.
Implementation of Independent Component Analysis
21 (ICA).
Implement Python Program to Demonstrate K-
22 Means and EM Algorithm for Machine Learning.
Apply EM algorithm to cluster a set of data stored in
23 a .CSV file. Use the same data set for clustering
using k-Means algorithm. Compare the results of
these two algorithms and comment on the quality of
clustering. You can add Java/ Python ML library
classes/ API in the program.
Build an Artificial Neural Network by implementing
24 the Backpropagation algorithm and test the same
using appropriate data sets.
Write a python program to measure EUCLIDEAL
25 distance between two different variables A1 & B2.
Machine Learning(3170724) 211240107034

Practical-1

Aim- Read the training data from a .CSV file.

Code-
import pandas as pd
file=pd.read_csv("F:\programmes\python_programmes\ML_Practicals\
Tennis.csv")
print(file)

Output-

2|P a ge
Machine Learning(3170724) 211240107034

Practical-2

Aim-Implement and demonstrate the FIND-S algorithm for finding the most
specific hypothesis based on a given set of training data samples. Read the
training data from a .CSV file. Create Excel file Weather.csv and save it in
same path

Code-
import pandas as pd

def find_s(positive_examples):
"""
Find-S algorithm implementation.

Parameters:
- positive_examples: List of positive examples where each example is a list of attribute values.

Returns:
- Most specific hypothesis that covers all positive examples.
"""
# Initialize hypothesis to the first positive example
hypothesis = positive_examples[0].copy()

for example in positive_examples[1:]:


for i in range(len(hypothesis)):
# If the attribute value is different, generalize the hypothesis
if hypothesis[i] != example[i]:
hypothesis[i] = '?'

return hypothesis

def load_data_from_csv(filename):
"""
Load positive examples from a CSV file.

Parameters:
- filename: Path to the CSV file.

Returns:
- List of positive examples.
"""
# Read the CSV file into a DataFrame
df = pd.read_csv(filename)

# Filter only positive examples


positive_examples = df[df['enjoy sport'] == 'yes']

# Drop the target attribute and convert DataFrame to a list of lists


positive_examples = positive_examples.drop(columns=['enjoy sport'])
3|P a ge
Machine Learning(3170724) 211240107034

positive_examples = positive_examples.values.tolist()

return positive_examples

# File path to the CSV file


filename = 'positive_examples.csv'

# Load data from CSV


positive_examples = load_data_from_csv(filename)

# Find the most specific hypothesis


hypothesis = find_s(positive_examples)
print("Most specific hypothesis:", hypothesis)

Output:

4|P a ge
Machine Learning(3170724) 211240107034

Practical-3

Aim- For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of
the set of all hypotheses consistent with the training examples. Create Excel
file Training_examples.csv and save it in same path.

Code-
import pandas as pd

# Load the CSV file into a DataFrame


file_path = 'C:/Users/Admin/Desktop/class ml/positive_examples.csv' # Adjust the path to where your
file is located
data = pd.read_csv(file_path)

# Standardize column names (lowercase and replace spaces with underscores)


data.columns = [col.strip().lower().replace(' ', '_') for col in data.columns]

# Function to check if a hypothesis is consistent with an example


def consistent(hypothesis, example):
for i in range(len(hypothesis)):
if hypothesis[i] != '?' and hypothesis[i] != example[i]:
return False
return True

# Candidate-Elimination Algorithm
def candidate_elimination(examples):
X = examples.iloc[:, :-1].values # Extract features (all columns except the last one)
y = examples.iloc[:, -1].values # Extract target class (last column)

# Initialize the most specific hypothesis (S) and the most general hypothesis (G)
specific_hypothesis = X[0].copy()
general_hypothesis = [['?' for _ in range(len(specific_hypothesis))]]

5|P a ge
Machine Learning(3170724) 211240107034

# Iterate through each example in the dataset


for i, example in enumerate(X):
if y[i].lower() == 'yes': # If the example is positive
# Generalize specific hypothesis (S) if needed
for j in range(len(specific_hypothesis)):
if specific_hypothesis[j] != example[j]:
specific_hypothesis[j] = '?'
# Remove inconsistent hypotheses from general boundary (G)
general_hypothesis = [g for g in general_hypothesis if consistent(g, example)]
elif y[i].lower() == 'no': # If the example is negative
# Specialize general hypothesis (G)
new_general_hypothesis = []
for g in general_hypothesis:
for j in range(len(g)):
if g[j] == '?' and specific_hypothesis[j] != example[j]:
new_hypothesis = g.copy()
new_hypothesis[j] = specific_hypothesis[j]
if consistent(new_hypothesis, example):
new_general_hypothesis.append(new_hypothesis)
general_hypothesis = new_general_hypothesis

return specific_hypothesis, general_hypothesis


# Apply Candidate-Elimination Algorithm to the dataset
S, G = candidate_elimination(data)
# Output the final hypotheses
print("Final Specific Hypothesis (S):", S)
print("Final General Hypothesis (G):", G)

Output:

6|P a ge
Machine Learning(3170724) 211240107034

Practical-4

Aim- Implement the linear regression.

(A) With attribute X and Y

Code-
import matplotlib.pyplot as plt

x= [5,7,8,7,2,17,2,9,4,11,12,9,6]
y= [99,86,87,111,86,103,87,94,78,85,86,77,76]

plt.scatter(x,y)
plt.show()

Output:

7|P a ge
Machine Learning(3170724) 211240107034

(B) With linear regrate () function.

Code-
import matplotlib.pyplot as plt
from scipy import stats

x= [5,7,8,7,2,17,2,9,4,11,12,9,6]
y= [99,86,87,111,86,103,87,94,78,85,86,77,76]
slope, intercept, r, p, std_err = stats.linregress(x, y)
def myfunc(x):
return slope * x + intercept
mymodel = list(map(myfunc, x))
plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()

Output:

8|P a ge
Machine Learning(3170724) 211240107034

Practical-5

Aim- Implement logistic regression.

(a) Binary

Code-
import numpy
from sklearn import linear_model

#Reshaped for Logistic function.


x = numpy.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69, 5.88]).reshape(-1,1)
y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

logr = linear_model.LogisticRegression()
logr.fit(x,y)

#predict if tumor is cancerous where the size is 3.46mm:


predicted = logr.predict(numpy.array([3.5]).reshape(-1,1))
print(predicted)

Output:

(B) Multinomial

Code-
from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.linear_model import LogisticRegression

# define dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5,
n_classes=3, random_state=1)

# define the multinomial logistic regression model


model = LogisticRegression(multi_class='multinomial', solver='lbfgs')

# define the model evaluation procedure


cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

# evaluate the model and collect the scores


n_scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
9|P a ge
Machine Learning(3170724) 211240107061

# report the model performance


print('Mean Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))

Output:

(C) Ordinal

Code-
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import OrdinalEncoder

# Example data
x = np.array([5, 10, 15, 20, 25, 30, 35, 40, 45, 50]).reshape(-1, 1)
y = np.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 3]) # 1: Low, 2: Medium, 3: High

# Ordinal encoding (just to ensure proper ordering)


encoder = OrdinalEncoder(categories=[[1, 2, 3]])
y_encoded = encoder.fit_transform(y.reshape(-1, 1)).flatten()

# Logistic regression model


model = LogisticRegression(multi_class='ovr')
model.fit(x, y_encoded)

# Predict the ordinal class for a new input


x_test = np.array([[22]])
predicted_class = model.predict(x_test)

# Decode the predicted class back to the original labels


predicted_label = encoder.inverse_transform(predicted_class.reshape(-1, 1))

print(predicted_label) # Output will be the predicted class (Low, Medium, High)

Output:

10 | P a g e
Machine Learning(3170724) 211240107061

Practical-6

Aim- Write a program for polynomial regression.

Code-
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline

# Generate some example data


np.random.seed(0)
X = np.sort(5 * np.random.rand(100, 1), axis=0)
y = 2 - 1.5 * X + 0.5 * X**2 + np.random.randn(100, 1) * 0.5

# Create polynomial features


degree = 2
poly_features = PolynomialFeatures(degree=degree)
X_poly = poly_features.fit_transform(X)

# Create a linear regression model


model = LinearRegression()
model.fit(X_poly, y)

# Make predictions
X_fit = np.linspace(0, 5, 100).reshape(-1, 1)
X_fit_poly = poly_features.transform(X_fit)
y_pred = model.predict(X_fit_poly)

# Plot the results


plt.scatter(X, y, color='blue', label='Data')
plt.plot(X_fit, y_pred, color='red', label=f'Polynomial Regression (degree={degree})')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()

11 | P a g e
Machine Learning(3170724) 211240107061

Output:

12 | P a g e
Machine Learning(3170724) 211240107061

Practical-7

Aim-Write a program to demonstrate the working of the decision tree based ID3
algorithm. Use an appropriate data set for building the decision tree and apply
this knowledge to classify a new sample.

Code-

import numpy as np
import math
from Data_loader import read_data

class Node:
def init (self, attribute):
self.attribute = attribute
self.children = []
self.answer = ""

def str (self):


return self.attribute

def subtables(data, col, delete):


dict = {}
items = np.unique(data[:, col])
count = np.zeros((items.shape[0], 1), dtype=np.int32)
for x in range(items.shape[0]):
for y in range(data.shape[0]):
if data[y, col] == items[x]:
count[x] += 1

for x in range(items.shape[0]):
dict[items[x]] = np.empty((int(count[x]), data.shape[1]), dtype="|S32")
pos = 0
for y in range(data.shape[0]):
if data[y, col] == items[x]:
dict[items[x]][pos] = data[y]
pos += 1

if delete:
dict[items[x]] = np.delete(dict[items[x]], col, 1)
return items, dict

def entropy(S):
items = np.unique(S)
if items.size == 1:

13 | P a g e
Machine Learning(3170724) 211240107061

return 0
counts = np.zeros((items.shape[0], 1))
sums = 0

for x in range(items.shape[0]):
counts[x] = sum(S == items[x]) / (S.size * 1.0)

for count in counts:


sums += -1 * count * math.log(count, 2)
return sums

def gain_ratio(data, col):


items, dict = subtables(data, col, delete=False)
total_size = data.shape[0]
entropies = np.zeros((items.shape[0], 1))
intrinsic = np.zeros((items.shape[0], 1))

for x in range(items.shape[0]):
ratio = dict[items[x]].shape[0]/(total_size * 1.0)
entropies[x] = ratio * entropy(dict[items[x]][:, -1])
intrinsic[x] = ratio * math.log(ratio, 2)
total_entropy = entropy(data[:, -1])
iv = -1 * sum(intrinsic)

for x in range(entropies.shape[0]):
total_entropy -= entropies[x]
return total_entropy / iv

def create_node(data, metadata):


if (np.unique(data[:, -1])).shape[0] == 1:
node = Node("")
node.answer = np.unique(data[:, -1])[0]
return node

gains = np.zeros((data.shape[1] - 1, 1))


for col in range(data.shape[1] - 1):
gains[col] = gain_ratio(data, col)

split = np.argmax(gains)
node = Node(metadata[split])
metadata = np.delete(metadata, split, 0)
items, dict = subtables(data, split, delete=True)
for x in range(items.shape[0]):
child = create_node(dict[items[x]], metadata)
node.children.append((items[x], child))
return node

14 | P a g e
Machine Learning(3170724) 211240107061

def empty(size):
s = ""
for x in range(size):
s += " "
return s

def print_tree(node, level):


if node.answer != "":
print(empty(level), node.answer)
return

print(empty(level), node.attribute)
for value, n in node.children:
print(empty(level + 1), value)
print_tree(n, level + 2)

metadata, traindata =
read_data( "F:\programmes\python_programmes\ML_Practicals\Tennis.cs
v")
data = np.array(traindata)
node = create_node(data, metadata)
print_tree(node, 0)

Data_loader.py

import csv

def read_data(filename):
with open(filename, 'r') as csvfile:
datareader = csv.reader(csvfile, delimiter=',')
headers = next(datareader)
metadata = []
traindata = []

for name in headers:


metadata.append(name)
for row in datareader:
traindata.append(row)
return (metadata, traindata)

15 | P a g e
Machine Learning(3170724) 211240107061

Tennis.csv

Outlook,Tempearature,Humidity,Wind,answer
sunny,hot,high,weak,no
sunny,hot,high,strong,no
overcast,hot,high,weak,yes
rain,mild,high,weak,yes
rain,cool,normal,weak,yes
rain,cool,normal,strong,no
overcast,cool,normal,strong,yes
sunny,mild,high,weak,no
sunny,cool,normal,weak,yes
rain,mild,normal,weak,yes
sunny,mild,normal,strong,yes
overcast,mild,high,strong,yes
overcast,hot,normal,weak,yes
rain,mild,high,strong,no

Output-

16 | P a g e
Machine Learning(3170724) 211240107061

Practical-8

Aim- Implement SVR (Super vector regression) & SVM (Support Vector Machine).

(a) SVR ( Super Vector Regression)

Code-

# Import necessary libraries


import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error

# Sample dataset
X = np.array([[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]])
y = np.array([3, 5, 8, 9, 10, 13, 15, 18, 20, 24])

# Feature Scaling (SVR works better with normalized data)


sc_X = StandardScaler()
sc_y = StandardScaler()
X_scaled = sc_X.fit_transform(X)
y_scaled = sc_y.fit_transform(y.reshape(-1, 1)).flatten()

# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_scaled, test_size=0.2,
random_state=0)

# SVR Model
svr_regressor = SVR(kernel='rbf') # You can use 'linear', 'poly', or 'rbf' for the kernel
svr_regressor.fit(X_train, y_train)

# Predicting on test data


y_pred = svr_regressor.predict(X_test)

# Reverse scaling to get actual values (reshape to 2D before applying inverse_transform)


y_pred_actual = sc_y.inverse_transform(y_pred.reshape(-1, 1))
y_test_actual = sc_y.inverse_transform(y_test.reshape(-1, 1))

# Flatten the results for evaluation


y_pred_actual = y_pred_actual.flatten()
y_test_actual = y_test_actual.flatten()

# Evaluate the model


mse = mean_squared_error(y_test_actual, y_pred_actual)
print(f"Mean Squared Error: {mse}")

17 | P a g e
Machine Learning(3170724) 211240107061

# Visualizing the results


plt.scatter(sc_X.inverse_transform(X), sc_y.inverse_transform(y_scaled.reshape(-1, 1)), color='red',
label='Original Data')
plt.plot(sc_X.inverse_transform(X),
sc_y.inverse_transform(svr_regressor.predict(X_scaled).reshape(-1, 1)), color='blue', label='SVR
Fit')
plt.title('Support Vector Regression')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()

Output:

18 | P a g e
Machine Learning(3170724) 211240107061

(b) SVM (Support Vector Machine)

Code-
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from mlxtend.plotting import plot_decision_regions

# Load the Iris dataset


iris = datasets.load_iris()
X = iris.data[:, :2] # Using only the first two features for visualization purposes
y = iris.target

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize the dataset (important for SVM)


scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Create an SVM classifier


svm_classifier = SVC()

# Set up hyperparameters for tuning


param_grid = {
'C': [0.1, 1, 10, 100], # Regularization parameter
'kernel': ['linear', 'rbf', 'poly'], # Different types of kernels
'gamma': ['scale', 'auto'] # Kernel coefficient
}

# Perform grid search with cross-validation to find the best parameters


grid_search = GridSearchCV(svm_classifier, param_grid, cv=5)
grid_search.fit(X_train_scaled, y_train)

# Print the best parameters


print("Best parameters found by GridSearchCV:", grid_search.best_params_)

# Train the final model with the best parameters


best_svm = grid_search.best_estimator_

# Make predictions on the test data


y_pred = best_svm.predict(X_test_scaled)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
19 | P a g e
Machine Learning(3170724) 211240107061

print("\nAccuracy on test data:", accuracy)

# Confusion matrix and classification report


print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Plot decision regions for the training data (using only the first two features for 2D plot)
plt.figure(figsize=(8, 6))
plot_decision_regions(X_train_scaled, y_train, clf=best_svm, legend=2)
plt.title('SVM Decision Boundaries (Training Set)')
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
plt.show()

# Plot decision regions for the test data


plt.figure(figsize=(8, 6))
plot_decision_regions(X_test_scaled, y_test, clf=best_svm, legend=2)
plt.title('SVM Decision Boundaries (Test Set)')
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
plt.show()

Output:

20 | P a g e
Machine Learning(3170724) 211240107061

21 | P a g e
Machine Learning(3170724) 211240107061

Practical-9

Aim- Python program for recursive binary search.

Code-
# Recursive Binary Search function
def binary_search(arr, low, high, x):
# Base case: If the range is valid
if high >= low:
mid = (high + low) // 2

# If the element is present at the middle itself


if arr[mid] == x:
return mid

# If the element is smaller than mid, it can only be present in the left subarray
elif arr[mid] > x:
return binary_search(arr, low, mid - 1, x)

# Else, the element can only be present in the right subarray


else:
return binary_search(arr, mid + 1, high, x)

# If the element is not present in the array


else:
return -1

# Example usage
arr = [2, 3, 4, 10, 40]
x = 10

# Function call
result = binary_search(arr, 0, len(arr) - 1, x)

if result != -1:
print(f"Element is present at index {result}")
else:
print("Element is not present in array")

Output:

22 | P a g e
Machine Learning(3170724) 211240107061

Practical-10

Aim- Implement the non-parametric Locally Weighted Regression algorithm in


order to fit data points. Select appropriate data set for your experiment and
draw graphs.

Code-
import numpy as np
import matplotlib.pyplot as plt

# Locally Weighted Regression (LWR) function


def locally_weighted_regression(X, y, tau, x_query):
"""
Perform Locally Weighted Regression (LWR)
:param X: Training data features (2D array)
:param y: Training data target values
:param tau: Bandwidth (smoothing parameter)
:param x_query: Query point to make prediction (scalar)
:return: Predicted value for x_query
"""
# Add a column of ones to X for intercept term
m = X.shape[0]
X_ = np.column_stack((np.ones(m), X)) # Shape: (m, 2)

# Create x_query with an intercept term


x_query_ = np.array([1, x_query[0]]) # Extract scalar value from x_query array

# Compute weights (using Gaussian kernel)


weights = np.exp(-np.sum((X - x_query)**2, axis=1) / (2 * tau**2))

# Create diagonal weight matrix


W = np.diag(weights)

# Solve for theta using the normal equation


theta = np.linalg.pinv(X_.T @ W @ X_) @ (X_.T @ W @ y)

# Return the predicted value


return x_query_ @ theta

# Generate synthetic dataset


X = np.linspace(0, 10, 100).reshape(-1, 1) # Reshape X to be a 2D array
y = np.sin(X).flatten() + np.random.normal(0, 0.1, X.shape[0]) # Sine wave with noise

# LWR prediction for different points


tau = 0.5 # Bandwidth parameter
y_pred = np.array([locally_weighted_regression(X, y, tau, x_query) for x_query in X])

# Plotting the results


plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='blue', label='Data Points') # Corrected line
23 | P a g e
Machine Learning(3170724) 211240107061

plt.plot(X, y_pred, color='red', label='LWR Fit', linewidth=2)


plt.xlabel('X')
plt.ylabel('y')
plt.title(f'Locally Weighted Regression (tau={tau})')
plt.legend()
plt.show()

Output:

24 | P a g e
Machine Learning(3170724) 211240107061

Practical-11

Aim- Implement python program to demonstrate KNN algorithm.

Code-
import numpy as np
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt
import seaborn as sns

# Load the Wine dataset


data = load_wine()
X = data.data
y = data.target

# Split the data into training and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize the features


scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initialize the KNN classifier


k=5
knn = KNeighborsClassifier(n_neighbors=k)

# Fit the model


knn.fit(X_train, y_train)

# Make predictions
y_pred = knn.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy of KNN with k={k}: {accuracy:.2f}')

# Print confusion matrix and classification report


conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred, target_names=data.target_names)
print("Confusion Matrix:\n", conf_matrix)
print("\nClassification Report:\n", class_report)

# Heatmap of the confusion matrix


plt.figure(figsize=(8, 6))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=data.target_names,

25 | P a g e
Machine Learning(3170724) 211240107061

yticklabels=data.target_names)
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.title('Confusion Matrix')
plt.show()

Output:

26 | P a g e
Machine Learning(3170724) 211240107061

Practical-12

Aim:- Write a program to implement k-Nearest Neighbour algorithm to classify


the iris data set. Print both correct and wrong predictions. Python ML library
classes can be used for this problem.

K-Nearest-Neighbour Algorithm:
1. Load the data
2. Initialize the value of k
3. For getting the predicted class, iterate from 1 to total number of training data points
1. Calculate the distance between test data and each row of training data. Here we will
use Euclidean distance as our distance metric since it’s the most popular method.
The other metrics that can be used are Chebyshev, cosine, etc.
2. Sort the calculated distances in ascending order based on distance values
3. Get top k rows from the sorted array
4. Get the most frequent class of these rows i.e Get the labels of the selected K entries
5. Return the predicted class
If regression, return the mean of the K labels
If classification, return the mode of the K labels

Confusion matrix:
Note, • Class 1 : Positive • Class 2 : Negative

• Positive (P) : Observation is positive (for example: is an apple).


• Negative (N) : Observation is not positive (for example: is not an apple).
• True Positive (TP) : Observation is positive, and is predicted to be positive.
• False Negative (FN) : Observation is positive, but is predicted negative. (Also known as a
"Type II error.")
• True Negative (TN) : Observation is negative, and is predicted to be negative.
• False Positive (FP) : Observation is negative, but is predicted positive. (Also known as a
"Type I error.")

Accuracy: Overall, how often is the classifier correct?


(TP+TN)/total = (100+50)/165 = 0.91
Misclassification Rate: Overall, how often is it wrong?
(FP+FN)/total = (10+5)/165 = 0.09
equivalent to 1 minus Accuracy
also known as "Error Rate“
True Positive Rate: When it's actually yes, how often does it predict yes?
TP/actual yes = 100/105 = 0.95
27 | P a g e
Machine Learning(3170724) 211240107061

also known as "Sensitivity" or "Recall"

False Positive Rate: When it's actually no, how often does it predict yes?
FP/actual no = 10/60 = 0.17
True Negative Rate: When it's actually no, how often does it predict no?
TN/actual no = 50/60 = 0.83
equivalent to 1 minus False Positive Rate
also known as "Specificity“
Precision: When it predicts yes, how often is it correct?
TP/predicted yes = 100/110 = 0.91
Prevalence: How often does the yes condition actually occur in our sample?
actual yes/total = 105/165 = 0.64

Source Code:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
import pandas as pd
dataset=pd.read_csv("iris.csv")
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=0,test_size=0.25)

classifier=KNeighborsClassifier(n_neighbors=8,p=3,metric='euclidean')

classifier.fit(X_train,y_train)

#predict the test resuts


y_pred=classifier.predict(X_test)

cm=confusion_matrix(y_test,y_pred)
print('Confusion matrix is as follows\n',cm)
print('Accuracy Metrics')
print(classification_report(y_test,y_pred))
print(" correct predicition",accuracy_score(y_test,y_pred))
print(" worng predicition",(1-accuracy_score(y_test,y_pred)))

28 | P a g e
Machine Learning(3170724) 211240107061

Output :
Confusion matrix is as follows
[[13 0 0]
[ 0 15 1]
[ 0 0 9]]

Accuracy Metrics
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 13
Iris-versicolor 1.00 0.94 0.97 16
Iris-virginica 0.90 1.00 0.95 9

avg / total 0.98 0.97 0.97 38

correct predicition 0.9736842105263158


worng predicition 0.02631578947368418

29 | P a g e
Machine Learning(3170724) 211240107061

Practical-13

Aim- Write a program to implement the naïve Bayesian classifier for a sample
training dataset stored as a .CSV file. Compute the accuracy of the
classifier, considering few test data sets.

Code-
import csv
import random
import math

def loadcsv(filename):
lines = csv.reader(open(filename, "r"));
dataset = list(lines)
for i in range(len(dataset)):
#converting strings into numbers for processing
dataset[i] = [float(x) for x in dataset[i]]

return dataset

def splitdataset(dataset, splitratio):


#67% training size
trainsize = int(len(dataset) * splitratio);
trainset = []
copy = list(dataset);
while len(trainset) < trainsize:
#generate indices for the dataset list randomly to pick ele for training data
index = random.randrange(len(copy));
trainset.append(copy.pop(index))
return [trainset, copy]

def separatebyclass(dataset):
separated = {} #dictionary of classes 1 and 0
#creates a dictionary of classes 1 and 0 where the values are
#the instances belonging to each class
for i in range(len(dataset)):
vector = dataset[i]
if (vector[-1] not in separated):
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated

def mean(numbers):
return sum(numbers)/float(len(numbers))

30 | P a g e
Machine Learning(3170724) 211240107061

def stdev(numbers):
avg = mean(numbers)
variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-1)
return math.sqrt(variance)

def summarize(dataset): #creates a dictionary of classes


summaries = [(mean(attribute), stdev(attribute)) for attribute in zip(*dataset)];
del summaries[-1] #excluding labels +ve or -ve
return summaries

def summarizebyclass(dataset):
separated = separatebyclass(dataset);
#print(separated)
summaries = {}
for classvalue, instances in separated.items():
#for key,value in dic.items()
#summaries is a dic of tuples(mean,std) for each class value
summaries[classvalue] = summarize(instances) #summarize is used to cal to
mean and std
return summaries

def calculateprobability(x, mean, stdev):


exponent = math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2))))
return (1 / (math.sqrt(2*math.pi) * stdev)) * exponent

def calculateclassprobabilities(summaries, inputvector):


probabilities = {} # probabilities contains the all prob of all class of test data
for classvalue, classsummaries in summaries.items():#class and attribute information
as mean and sd
probabilities[classvalue] = 1
for i in range(len(classsummaries)):
mean, stdev = classsummaries[i] #take mean and sd of every attribute
for class 0 and 1 seperaely
x = inputvector[i] #testvector's first attribute
probabilities[classvalue] *= calculateprobability(x, mean, stdev);#use
normal dist
return probabilities

def predict(summaries, inputvector): #training and test data is passed


probabilities = calculateclassprobabilities(summaries, inputvector)
bestLabel, bestProb = None, -1
for classvalue, probability in probabilities.items():#assigns that class which has he
highest prob
if bestLabel is None or probability > bestProb:
bestProb = probability

31 | P a g e
Machine Learning(3170724) 211240107061

bestLabel = classvalue
return bestLabel

def getpredictions(summaries, testset):


predictions = []
for i in range(len(testset)):
result = predict(summaries, testset[i])
predictions.append(result)
return predictions

def getaccuracy(testset, predictions):


correct = 0
for i in range(len(testset)):
if testset[i][-1] == predictions[i]:
correct += 1
return (correct/float(len(testset))) * 100.0

def main():
filename = 'F:\programmes\python_programmes\ML_Practicals\dataset.csv'
splitratio = 0.67
dataset = loadcsv(filename);

trainingset, testset = splitdataset(dataset, splitratio)


print('Split {0} rows into train={1} and test={2} rows'.format(len(dataset),
len(trainingset), len(testset)))
# prepare model
summaries = summarizebyclass(trainingset);
#print(summaries)
# test model
predictions = getpredictions(summaries, testset) #find the predictions of test data with
the training data
accuracy = getaccuracy(testset, predictions)
print('Accuracy of the classifier is : {0}%'.format(accuracy))

main()

Output:

32 | P a g e
Machine Learning(3170724) 211240107061

Practical-14

Aim- Write a program to construct Bayesian network medical.

Code-
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

# Load the Iris dataset


iris = load_iris()
X = iris.data # Features
y = iris.target # Labels

# Add noise to the dataset


np.random.seed(42)
noise = np.random.normal(0, 0.5, X.shape)
X_noisy = X + noise

# Split the noisy data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X_noisy, y, test_size=0.3, random_state=42)

# Initialize the KNN classifier with k=3


knn = KNeighborsClassifier(n_neighbors=3)

# Fit the model to the noisy training data


knn.fit(X_train, y_train)

# Make predictions on the noisy test set


y_pred = knn.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

# Optional: Visualize some of the results (2D plot)


# For simplicity, we'll use only the first two features for visualization
X_train_2d = X_train[:, :2]
X_test_2d = X_test[:, :2]

# Fit the model again using the 2D data


knn.fit(X_train_2d, y_train)
y_pred_2d = knn.predict(X_test_2d)

# Plot the decision boundaries


h = .02 # step size in the mesh
x_min, x_max = X_train_2d[:, 0].min() - 1, X_train_2d[:, 0].max() + 1
33 | P a g e
Machine Learning(3170724) 211240107061

y_min, y_max = X_train_2d[:, 1].min() - 1, X_train_2d[:, 1].max() + 1


xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, alpha=0.8)


plt.scatter(X_train_2d[:, 0], X_train_2d[:, 1], c=y_train, edgecolor='k', marker='o')
plt.scatter(X_test_2d[:, 0], X_test_2d[:, 1], c=y_test, marker='o') # Removed edgecolor
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('KNN Decision Boundaries with Noisy Data')
plt.show()

Output:

34 | P a g e
Machine Learning(3170724) 211240107061

Practical-15

Aim- Assuming a set of documents that need to be classified, use the naïve
Bayesian Classifier model to perform this task. Built-in Java classes/ API can
be used to write the program. Calculate the accuracy, precision, and recall for
your dataset.

Code-
# Import necessary libraries
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, classification_report
import numpy as np

# Step 1: Create improved and balanced training data


train_docs = [
"Python is a popular programming language for web development",
"JavaScript enables dynamic content on web pages",
"Rain showers are expected this weekend",
"The sun will shine brightly tomorrow",
"Java is used extensively for enterprise software",
"Thunderstorms might disrupt our travel plans",
"Cloudy skies are common in the monsoon season",
"Developers love Python and JavaScript equally",
]

train_labels = ["tech", "tech", "weather", "weather", "tech", "weather", "weather", "tech"]

# Step 2: Provide clearer test documents


test_docs = [
"Python development is rewarding", # Clearly tech
"It rained all day today", # Clearly weather
"Bright sunny days are ideal for debugging", # Ambiguous but likely tech
]

test_labels = ["tech", "weather", "tech"]

# Step 3: Convert text data into TF-IDF feature vectors


vectorizer = TfidfVectorizer(stop_words='english', max_features=100, ngram_range=(1, 2))
X_train = vectorizer.fit_transform(train_docs)
X_test = vectorizer.transform(test_docs)

# Step 4: Train the Naive Bayes model with optimal alpha


model = MultinomialNB(alpha=0.1) # Smaller alpha for more sensitive learning
model.fit(X_train, train_labels)

# Step 5: Predict the class of test documents


predicted_labels = model.predict(X_test)

35 | P a g e
Machine Learning(3170724) 211240107061

# Step 6: Calculate metrics


accuracy = accuracy_score(test_labels, predicted_labels)
precision = precision_score(test_labels, predicted_labels, average='weighted', zero_division=0)
recall = recall_score(test_labels, predicted_labels, average='weighted', zero_division=0)

# Display the metrics and results


print(f"Accuracy: {accuracy * 100:.2f}%")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print("\nClassification Report:")
print(classification_report(test_labels, predicted_labels, zero_division=0))

# Optional: View top features used by the model


print("\nTop Features:")
print(vectorizer.get_feature_names_out()[:10])

Output:

36 | P a g e
Machine Learning(3170724) 211240107061

Practical-16

Aim- Implementation on random forest algorithm.

Code-
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, StratifiedKFold, cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import LabelEncoder

# Load the dataset


file_path = "C:/Users/Admin/OneDrive/Desktop/7th Sem Practicals/Machine Learning
Practicals/data.csv"
data = pd.read_csv(file_path)

# Inspect data types


print(data.dtypes)

# Identify categorical columns and apply One-Hot Encoding


X = pd.get_dummies(data.iloc[:, :-1]) # Encode features
y = data.iloc[:, -1] # Target

# Apply Label Encoding to the target if it's categorical


if y.dtype == 'object':
le = LabelEncoder()
y = le.fit_transform(y)

# Introduce random noise into the target variable by shuffling it


y = np.random.permutation(y)

# Reduce the random noise added to numeric features


X += np.random.normal(0, 0.5, X.shape) # Decrease noise level

# Split the data into train and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize Random Forest with slightly higher complexity


clf = RandomForestClassifier(n_estimators=15, max_depth=4, random_state=42) # Increase
n_estimators and max_depth
clf.fit(X_train, y_train)

# Make predictions and evaluate the model


y_pred = clf.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)


print(f"\nAccuracy: {accuracy * 100:.2f}%")
print("\nClassification Report:\n", classification_report(y_test, y_pred))

37 | P a g e
Machine Learning(3170724) 211240107061

# Cross-validation with Stratified K-Folds


skf = StratifiedKFold(n_splits=3)
scores = cross_val_score(clf, X, y, cv=skf)
print(f"Cross-Validation Accuracy: {scores.mean() * 100:.2f}%")

Output:

38 | P a g e
Machine Learning(3170724) 211240107061

Practical-17

Aim- Write a algorithm program for APRIORI algorithm.

Code-
from itertools import combinations

def get_frequent_itemsets(transactions, min_support):


# Create a dictionary to hold the support counts
support_counts = {}

# Get individual items and count their occurrences


for transaction in transactions:
for item in transaction:
if item in support_counts:
support_counts[item] += 1
else:
support_counts[item] = 1

# Filter out items that do not meet the minimum support


frequent_itemsets = {item: count for item, count in support_counts.items() if count >=
min_support}
itemsets = [set([item]) for item in frequent_itemsets.keys()]

# Generate larger itemsets


k=2
while itemsets:
current_itemsets = []
for i in range(len(itemsets)):
for j in range(i + 1, len(itemsets)):
# Generate new itemset by combining two itemsets
new_itemset = itemsets[i] | itemsets[j]
if len(new_itemset) == k:
current_itemsets.append(new_itemset)

# Count support for new itemsets


support_counts = {}
for transaction in transactions:
for itemset in current_itemsets:
if itemset.issubset(transaction):
if frozenset(itemset) in support_counts:
support_counts[frozenset(itemset)] += 1
else:
support_counts[frozenset(itemset)] = 1

# Filter out itemsets that do not meet the minimum support


itemsets = [itemset for itemset, count in support_counts.items() if count >= min_support]

# Add frequent itemsets to the result


frequent_itemsets.update({itemset: count for itemset, count in support_counts.items() if
39 | P a g e
Machine Learning(3170724) 211240107061

count >= min_support})


k += 1

return frequent_itemsets

# Example usage
if name == " main ":
transactions = [
['milk', 'bread', 'cookies'],
['milk', 'diaper', 'bread', 'cookies'],
['milk', 'diaper', 'bread'],
['bread', 'cookies'],
['milk', 'diaper', 'cookies'],
['milk', 'bread', 'diaper']
]

min_support = 2
frequent_itemsets = get_frequent_itemsets(transactions, min_support)

print("Frequent Itemsets:")
for itemset, count in frequent_itemsets.items():
print(f"{set(itemset)}: {count}")

Output:

40 | P a g e
Machine Learning(3170724) 211240107061

Practical-18

Aim- Implement the non-parametric Locally Weighted Regression algorithm in


order to fit data points. Select appropriate data set for your experiment and
draw graphs.

Code-
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

def kernel(point, xmat, k):


m,n = np.shape(xmat)
weights = np.mat(np.eye((m)))
for j in range(m):
diff = point - X[j]
weights[j,j] = np.exp(diff*diff.T/(-2.0*k**2))
return weights

def localWeight(point, xmat, ymat, k):


wei = kernel(point,xmat,k)
W = (X.T*(wei*X)).I*(X.T*(wei*ymat.T))
return W

def localWeightRegression(xmat, ymat, k):


m,n = np.shape(xmat)
ypred = np.zeros(m)
for i in range(m):
ypred[i] = xmat[i]*localWeight(xmat[i],xmat,ymat,k)
return ypred

# load data points


data = pd.read_csv('F:\programmes\python_programmes\ML_Practicals\hotel_bill.csv')
bill = np.array(data.total_bill)
tip = np.array(data.tip)

#preparing and add 1 in bill


mbill = np.mat(bill)
mtip = np.mat(tip)

m= np.shape(mbill)[1]
one = np.mat(np.ones(m))
X = np.hstack((one.T,mbill.T))

41 | P a g e
Machine Learning(3170724) 211240107061

#set k here
ypred = localWeightRegression(X,mtip,0.5)
SortIndex = X[:,1].argsort(0)
xsort = X[SortIndex][:,0]

fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(bill,tip, color='green')
ax.plot(xsort[:,1],ypred[SortIndex], color = 'red', linewidth=5)
plt.xlabel('Total bill')
plt.ylabel('Tip')
plt.show()

Output-

42 | P a g e
Machine Learning(3170724) 211240107061

Practical-19

Aim- Implementing Bag of Words Model in Python.

Code-
import numpy as np
from nltk.tokenize import word_tokenize
from collections import defaultdict

data = ['She loves pizza, pizza is delicious.','She is a good person.','good people are the best.']

sentences = []
vocab = []
for sent in data:
x = word_tokenize(sent)
sentence = [w.lower() for w in x if w.isalpha() ]
sentences.append(sentence)
for word in sentence:
if word not in vocab:
vocab.append(word)

len_vector = len(vocab)
index_word = {}
i=0
for word in vocab:
index_word[word] = i
i += 1
def bag_of_words(sent):
count_dict = defaultdict(int)
vec = np.zeros(len_vector)
for item in sent:
count_dict[item] += 1
for key,item in count_dict.items():
vec[index_word[key]] = item
return vec
vector = bag_of_words(sentences[0])
print(vector)

Output-

43 | P a g e
Machine Learning(3170724) 211240107061

Practical-20

Aim- Implementation of Principal Component Analysis (PCA) algorithm.

Code-
import numpy as np
import matplotlib.pyplot as plt
import numpy.random as rnd

mu = np.array([10,13])
sigma = np.array([[3.5, -1.8], [-1.8,3.5]])

print("Mu ", mu.shape)


print("Sigma ", sigma.shape)

org_data = rnd.multivariate_normal(mu, sigma, size=(1000))


print("Data shape ", org_data.shape)

mean = np.mean(org_data, axis= 0)


print("Mean ", mean.shape)
mean_data = org_data - mean
print("Data after subtracting mean ", org_data.shape, "\n")

cov = np.cov(mean_data.T)
cov = np.round(cov, 2)
print("Covariance matrix ", cov.shape, "\n")

eig_val, eig_vec = np.linalg.eig(cov)


print("Eigen vectors ", eig_vec)
print("Eigen values ", eig_val, "\n")

indices = np.arange(0,len(eig_val), 1)
indices = ([x for _,x in sorted(zip(eig_val, indices))])[::-1]
eig_val = eig_val[indices]
eig_vec = eig_vec[:,indices]
print("Sorted Eigen vectors ", eig_vec)
print("Sorted Eigen values ", eig_val, "\n")

sum_eig_val = np.sum(eig_val)
explained_variance = eig_val/ sum_eig_val
print(explained_variance)
cumulative_variance = np.cumsum(explained_variance)
print(cumulative_variance)

44 | P a g e
Machine Learning(3170724) 211240107061

pca_data = np.dot(mean_data, eig_vec)


print("Transformed data ", pca_data.shape)

plt.plot(explained_variance,cumulative_variance)
plt.show()

Output:-

45 | P a g e
Machine Learning(3170724) 211240107061

Practical-21

Aim- Implementation of Independent Component Analysis (ICA).

Code-
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import FastICA

# Step 1: Generate sample data


# Set random seed for reproducibility
np.random.seed(0)

# Generate Sine and Square signals


time = np.linspace(0, 8, 200)
s1 = np.sin(2 * time) # Sine wave
s2 = np.sign(np.sin(3 * time)) # Square wave
s3 = np.random.rand(200) # Random noise

# Stack signals together


S = np.c_[s1, s2, s3]

# Mix the signals


A = np.array([[1, 1, 0.5], [0.5, 1, 1], [1, 0.5, 1]]) # Mixing matrix
X = S.dot(A.T) # Mixed signals

# Step 2: Apply ICA


ica = FastICA(n_components=3) # Number of components to extract
S_ = ica.fit_transform(X) # Independent components
A_ = ica.mixing_ # Estimated mixing matrix

# Step 3: Plot the results


plt.figure(figsize=(12, 8))

# Original signals
plt.subplot(4, 1, 1)
plt.title('Original Signals')
plt.plot(S)

# Mixed signals
plt.subplot(4, 1, 2)
plt.title('Mixed Signals')
plt.plot(X)

# Recovered signals
plt.subplot(4, 1, 3)
plt.title('Independent Components (Recovered Signals)')
plt.plot(S_)

46 | P a g e
Machine Learning(3170724) 211240107061

# Mixing matrix
plt.subplot(4, 1, 4)
plt.title('Estimated Mixing Matrix')
plt.imshow(A_, aspect='auto', cmap='viridis')
plt.colorbar()

plt.tight_layout()
plt.show()

Output:

47 | P a g e
Machine Learning(3170724) 211240107061

Practical-22

Aim- Implement Python Program to Demonstrate K-Means and EM Algorithm for


Machine Learning.

Code-
from sklearn.cluster import KMeans
from sklearn.mixture import GaussianMixture
import sklearn.metrics as metrics
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

names = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width', 'Class']

dataset = pd.read_csv("F:\programmes\python_programmes\ML_Practicals\iris_dataset.csv",
names=names)

X = dataset.iloc[:, :-1]

label = {'Iris-setosa': 0,'Iris-versicolor': 1, 'Iris-virginica': 2}

y = [label[c] for c in dataset.iloc[:, -1]]

plt.figure(figsize=(14,7))
colormap=np.array(['red','lime','black'])

# REAL PLOT
plt.subplot(1,3,1)
plt.title('Real')
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y])

# K-PLOT
model=KMeans(n_clusters=3, random_state=0).fit(X)
plt.subplot(1,3,2)
plt.title('KMeans')
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[model.labels_])

print('The accuracy score of K-Mean: ',metrics.accuracy_score(y, model.labels_))


print('The Confusion matrixof K-Mean:\n',metrics.confusion_matrix(y, model.labels_))

# GMM PLOT
gmm=GaussianMixture(n_components=3, random_state=0).fit(X)
y_cluster_gmm=gmm.predict(X)
plt.subplot(1,3,3)

48 | P a g e
Machine Learning(3170724) 211240107061

plt.title('GMM Classification')
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y_cluster_gmm])

print('The accuracy score of EM: ',metrics.accuracy_score(y, y_cluster_gmm))


print('The Confusion matrix of EM:\n ',metrics.confusion_matrix(y, y_cluster_gmm))

Output-

49 | P a g e
Machine Learning(3170724) 211240107061

Practical-23

Aim- Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the
same data set for clustering using k-Means algorithm. Compare the results of
these two algorithms and comment on the quality of clustering. You can add
Java/ Python ML library classes/ API in the program.

Code-
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.mixture import GaussianMixture
from sklearn.metrics import silhouette_score
from sklearn.preprocessing import StandardScaler

# Step 1: Load the dataset


# Replace 'your_dataset.csv' with your actual CSV file path
file_path = "C:/Users/Admin/OneDrive/Desktop/7th Sem Practicals/Machine Learning
Practicals/Student_performance_data _.csv"
data = pd.read_csv(file_path)

# Step 2: Preprocess the data (if necessary)


# Assuming the data needs to be standardized
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)

# Step 3: Apply k-Means Clustering


kmeans = KMeans(n_clusters=3, random_state=42) # Change n_clusters as needed
kmeans_labels = kmeans.fit_predict(data_scaled)

# Step 4: Apply Gaussian Mixture Model (GMM) Clustering


gmm = GaussianMixture(n_components=3, random_state=42) # Change n_components as needed
gmm_labels = gmm.fit_predict(data_scaled)

# Step 5: Evaluate clustering performance


kmeans_silhouette = silhouette_score(data_scaled, kmeans_labels)
gmm_silhouette = silhouette_score(data_scaled, gmm_labels)

print(f'K-Means Silhouette Score: {kmeans_silhouette:.4f}')


print(f'GMM Silhouette Score: {gmm_silhouette:.4f}')

# Step 6: Visualize Clustering Results (for 2D data)


if data.shape[1] == 2:
plt.figure(figsize=(12, 6))

# k-Means clustering results


plt.subplot(1, 2, 1)
plt.scatter(data_scaled[:, 0], data_scaled[:, 1], c=kmeans_labels, cmap='viridis', marker='o')
50 | P a g e
Machine Learning(3170724) 211240107061

plt.title('k-Means Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')

# GMM clustering results


plt.subplot(1, 2, 2)
plt.scatter(data_scaled[:, 0], data_scaled[:, 1], c=gmm_labels, cmap='viridis', marker='o')
plt.title('GMM Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')

plt.tight_layout()
plt.show()
else:
print("Data has more than 2 dimensions. Visualization is not applicable.")

Output:

51 | P a g e
Machine Learning(3170724) 211240107061

Practical-24

Aim- Build an Artificial Neural Network by implementing the Backpropagation


algorithm and test the same using appropriate data sets.

Code-
import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X, axis=0) # maximum of X array longitudinally
y = y/100

# Sigmoid Function
def sigmoid(x):
return 1/(1 + np.exp(-x))

# Derivative of Sigmoid Function


def derivatives_sigmoid(x):
return x * (1 - x)

# Variable initialization
epoch = 7000 # Setting training iterations
lr = 0.1 # Setting learning rate

inputlayer_neurons = 2 # number of features in data set


hiddenlayer_neurons = 3 # number of hidden layers neurons
output_neurons = 1 # number of neurons at output layer

# weight and bias initialization


wh = np.random.uniform(size=(inputlayer_neurons, hiddenlayer_neurons))
bh = np.random.uniform(size=(1, hiddenlayer_neurons))
wout = np.random.uniform(size=(hiddenlayer_neurons, output_neurons))
bout = np.random.uniform(size=(1, output_neurons))

# draws a random range of numbers uniformly of dim x*y


for i in range(epoch):
# Forward Propogation
hinp1 = np.dot(X, wh)
hinp = hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1 = np.dot(hlayer_act, wout)
outinp = outinp1 + bout
output = sigmoid(outinp)
# Backpropagation
EO = y-output

52 | P a g e
Machine Learning(3170724) 211240107061

outgrad = derivatives_sigmoid(output)
d_output = EO * outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act) # how much hidden layer wts
#contributed to error
d_hiddenlayer = EH * hiddengrad
wout += hlayer_act.T.dot(d_output) *lr# dotproduct of nextlayererror and
#currentlayerop
# bout += np.sum(d_output, axis=0,keepdims=True) *lr
wh += X.T.dot(d_hiddenlayer) *lr
#bh += np.sum(d_hiddenlayer, axis=0,keepdims=True) *lr
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" ,output)

Output-

53 | P a g e
Machine Learning(3170724) 211240107061

Practical-25

Aim- Write a python program to measure EUCLIDEAL distance between two


different variables A1 & B2.

Code-
import numpy as np

def euclidean_distance(A1, B2):


"""
Calculate the Euclidean distance between two points A1 and B2.

Parameters:
A1 (tuple): A tuple representing the coordinates of point A1 (x1, y1).
B2 (tuple): A tuple representing the coordinates of point B2 (x2, y2).

Returns:
float: The Euclidean distance between A1 and B2.
"""
return np.sqrt((B2[0] - A1[0])**2 + (B2[1] - A1[1])**2)

# Example usage
A1 = (1, 2) # Coordinates of point A1
B2 = (4, 6) # Coordinates of point B2

distance = euclidean_distance(A1, B2)


print(f"The Euclidean distance between A1 {A1} and B2 {B2} is: {distance:.2f}")

Output:

54 | P a g e

You might also like