21ECE305J – Machine Learning Algorithms
A Laboratory Record
Submitted by
Register No.:
Name:
in partial fulfillment for the award of the degree of
BACHELOR OF TECHNOLOGY
IN
ELECTRONICS AND COMMUNICATION ENGINEERING
DEPARTMENT OF ELECTRONICS AND COMMUNICATION
ENGINEERING FACULTY OF ENGINEERING AND TECHNOLOGY
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
VADAPALANI CAMPUS, CHENNAI – 26
OCTOBER 2024
BONAFIDE CERTIFICATE
Register No: Date:
Certified that this laboratory manual is the Bonafide record of _________, of
III Year, B.Tech Electronics and Communication Engineering who
carried out the laboratory work of Subject Code/Title: 21ECE305J –
MACHINE LEARNING ALGORITHMS under my supervision in the
academic year 2024-2025.
Date: Faculty-in-Charge Head of the Department
Submitted for University Examination held in of
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
Date: Examiner -I Examiner -II
Index
Ex No Date Experiment Sign
1 Linear Regression
2 Logistic Regression
3 K-Fold
4 Support Vector Machine
5 K-Nearest Neighbor
6 K-Means Clustering
7 Hierarchical Clustering
8 Principal Component Analysis
EX NO: 1
Implementation of Linear Regression
DATE:
Aim: To perform linear regression using dataset samples..
Software Required: Google Colab, Python IDE, Jupyter Notebook
Program:
A) Linear regression using random data sample
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(0)
m = 50 # creating 50 samples
X = np.linspace(0,10,m).reshape(m,1)
y = X + np.random.randn(m,1)
print(X)
len(X)
len(y)
plt.scatter(X,y)
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X,y)
model.score(X,y)
Prediction = model.predict(X)
plt.scatter(X,y)
plt.plot(X,Prediction,'r')
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
mae = mean_absolute_error(X, Prediction)
mse = mean_squared_error(X, Prediction)
rmse = np.sqrt(mse)
r_squared = r2_score(X, Prediction)
print("Mean Absolute Error (MAE)=", mae)
print("Mean Squared Error (MSE)=", mse)
print("R-squared (R-squ)=", r_squared)
print("Root Mean Squared Error (RMSE)=", rmse)
Output:
Accuracy: 0.8539729126652087
Mean Absolute Error (MAE)= 0.37381532213368635
Mean Squared Error (MSE)= 0.19259123079935717
R-squared (R-squ)= 0.977795363978427
Root Mean Squared Error (RMSE)= 0.4388521741991911
B) Linear regression on salary dataset
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data=pd.read_csv("Salary_Data.csv")
print(data)
data=data.dropna() #Remove NAN whole row
print(data)
len(data)
x=data[["YearsExperience"]]
y=data[["Salary"]]
plt.scatter(x,y) #plotting experience & salary data
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2) #splitting training data
& testing data
print(len(y_train)) #training dataset (80%)
len(x_train)
print(len(x_test)) #testing dataset(20%)
len(x_test)
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(x_train,y_train) #training experience & salary
predict=model.predict(x_test) #predicting salary by giving testset of experience
print(predict) #predicted value
y_test #original value
plt.scatter(x_test,y_test)
plt.plot(x_test,predict,'r')#plotting graph btw predicted and original value
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
mae=mean_absolute_error(y_test, predict)
mse=mean_squared_error(y_test, predict)
rmse=np.sqrt(mse)
r2=r2_score(y_test, predict)
print("MAE:",mae)
print("MSE:",mse)
print("RMSE:",rmse)
print("R2:",r2)
Result: Thus the linear regression using dataset sample evaluated successfully
Output:
MAE: 5950.500000000003
MSE: 47645779.63765114
RMSE: 6902.592240430485
R2: 0.9124538165906179
EX NO: 2 Implementation of Logistic Regression
DATE:
Aim: To perform Logistic Regression and predict digit from digits dataset
Software Required: Google Colab, Python IDE, Jupyter Notebook
Program:
from sklearn.datasets import load_digits
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
digits=load_digits()
dir(digits)
len(digits.data[1])
plt.gray()
plt.matshow(digits.images[3])
digits.target[3]
x_train,x_test,y_train,y_test=train_test_split(digits.data,digits.target,test_size=0.2)
LGR=LogisticRegression(max_iter=30)
LGR.fit(x_train,y_train)
from sklearn.metrics import confusion_matrix
prediction=LGR.predict(x_test)
cm=confusion_matrix(y_test,prediction)
import seaborn as sn
plt.figure(figsize=(5,5))
sn.heatmap(cm,annot=True)
plt.xlabel(‘prediction’)
plt.ylabel(‘Actual’)
import numpy as np
correct_pred=np.trace(cm)
total_pred=np.sum(cm)
accuracy=correct_pred/total_pred
print(accuracy)
Result: Thus the Logistic Regression and predict digit from digits dataset
implemented successfully
Output:
accuracy=0.9611111111111111
EX NO: 3 Implementation of K-FOLD
DATE:
Aim: To perform K-Fold on Sample Dataset
Software Required: Google Colab, Python IDE, Jupyter Notebook
Program:
A) K-Fold sample program
from sklearn.model_selection import KFold
import numpy as np
data=np.arange(1,21)
data
k=5
kf=KFold(n_splits=5,shuffle=True,random_state=42)
for folds,(train_index,test_index) in enumerate(kf.split(data),1):
train_index,test_index=data[train_index],data[test_index]
print(f"fold:{fold}")
print(f"Train index:{train_index}")
print(f"Test index:{test_index}")
Output:
Fold:1
Train index: [ 3 4 5 6 7 8 9 10 11 12 13 14 15 17 19 20]
Test index: [ 1 2 16 18]
Fold:2
Train index: [ 1 2 3 5 7 8 10 11 13 14 15 16 17 18 19 20]
Test index: [ 4 6 9 12]
Fold:3
Train index: [ 1 2 4 5 6 7 8 9 10 11 12 13 15 16 18 20]
Test index: [ 3 14 17 19]
Fold:4
Train index: [ 1 2 3 4 6 7 8 9 11 12 14 15 16 17 18 19]
Test index: [ 5 10 13 20]
Fold:5
Train index: [ 1 2 3 4 5 6 9 10 12 13 14 16 17 18 19 20]
Test index: [ 7 8 11 15]
B) K-Fold on Linear Regression Using Random Dataset
from sklearn.model_selection import KFold
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np
np.random.seed(42)
x=2*np.random.randn(100,1)
y=4+3*x+np.random.randn(100,1)
model=LinearRegression()
kf=KFold(n_splits=5,shuffle=True,random_state=42)
mse_scores=[]
fold_index=1
for train_index,test_index in kf.split(x):
print(f"fold:{fold_index}")
print(f"Train index:{train_index}")
print(f"Test index:{test_index}")
X_train,X_test=x[train_index],x[test_index]
Y_train,Y_test=y[train_index],y[test_index]
model.fit(X_train,Y_train)
y_pred=model.predict(X_test)
mse=mean_squared_error(Y_test,y_pred)
mse_scores.append(mse)
fold_index +=1
mean_mse=np.mean(mse_scores)
std_mse=np.std(mse_scores)
print(f"MSE SCORE:{mse_scores}")
print(f"MEAN MASE:{mean_mse}")
print(f"Standard deviation:{std_mse}")
Output:
MSE SCORE:[]
MEAN MASE:0.7479020958894542
Standard Deviation:0.0
C) K-Fold on Logistic Regression Using Digits Dataset
from sklearn.model_selection import KFold, cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_digits
from sklearn.metrics import accuracy_score
import numpy as np
digits = load_digits()
X = digits.data
Y = digits.target
model = LogisticRegression(max_iter=10000)
kf = KFold(n_splits=5, shuffle=True, random_state=42)
accuracy_scores = []
for train_index, test_index in kf.split(X):
print(f"Train Index: {train_index}")
print(f"Test Index: {test_index}")
X_train, X_test = X[train_index], X[test_index]
Y_train, Y_test = Y[train_index], Y[test_index]
model.fit(X_train,Y_train)
Y_pred=model.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(Y_test, Y_pred)
accuracy_scores.append(accuracy)
mean_accuracy=np.mean(accuracy_scores)
std_accuracy=np.std(accuracy_scores)
print(f"Accuracy Scores for each fold: {accuracy_scores}")
print(f"Mean Accuracy: {mean_accuracy}")
print(f"Standard Deviation of Accuracy: {std_accuracy}")
Result: Thus the K-Fold on linear and logistic regression has been evaluated
successfully
Output:
Accuracy Scores for each fold: [0.9526462395543176]
Mean Accuracy: 0.9526462395543176
Standard Deviation of Accuracy: 0.0
EX NO: 4 Implementation of Support Vector Machine(SVM)
DATE:
Aim: To perform SVM on Digits Dataset
Software Required: Google Colab, Python IDE, Jupyter Notebook
Program:
from sklearn.datasets import load_digits
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
digits=load_digits()
dir(digits)
len(digits.data[1])
plt.gray()
plt.matshow(digits.images[3])
digits.target[3]
x_train,x_test,y_train,y_test=train_test_split(digits.data,digits.target,test_size=0.2)
svm=SVC()
svm.fit(x_train,y_train)
from sklearn.metrics import confusion_matrix
prediction=svm.predict(x_test)
cm=confusion_matrix(y_test,prediction)
import seaborn as sn
plt.figure(figsize=(5,5))
sn.heatmap(cm,annot=True)
plt.xlabel('prediction')
plt.ylabel('Actual')
import numpy as np
correct_pred=np.trace(cm)
total_pred=np.sum(cm)
accuracy=correct_pred/total_pred
print(accuracy)
Result:
Thus the SVM on Digit Dataset has been implemented successfully
Output:
Accuracy=0.9805555555555555
EX NO: 5 Implementation of K-Nearest Neighbor (KNN) on
DATE: Random Dataset
Aim: To perform KNN on random dataset
Software Required: Google Colab, Python IDE, Jupyter Notebook
Program:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
ship = sns.load_dataset('titanic')
ship.shape
ship = ship[['survived','pclass','sex','age']]
ship.dropna(axis=0,inplace=True)
ship['sex'].replace(['male','female'],[0,1],inplace=True)
ship.head()
knn = KNeighborsClassifier()
y = ship['survived']
X = ship.drop('survived',axis=1)
knn.fit(X,y)
knn.score(X,y)
def survivedPerson(knn,pclass=3,sex=1,age=30):
x = np.array([pclass,sex,age]).reshape(1,3)
print(knn.predict(x))
survivedPerson(knn)
Result: Thus the KNN on a Random Dataset has been implemented
Output:
Accuracy = 0.8305322128851541
SurvivedPerson = [0]
EX NO: 6 Implementation of KMeans on Make Blobs Dataset
DATE:
Aim: To perform KMeans on blobs dataset
Software Required: Google Colab, Python IDE, Jupyter Notebook
Program:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
# 1. Generate synthetic data (or use your own dataset)
X, y = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)
# 2. Apply KMeans Clustering
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
# 3. Get the cluster labels and centroids
labels = kmeans.labels_
centroids = kmeans.cluster_centers_
# 4. Plot the clusters
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', s=50)
plt.scatter(centroids[:, 0], centroids[:, 1], c='red', s=200, marker='X')
plt.title('K-Means Clustering (K=4)')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()
Result: KMeans on Make Blobs Dataset has been implemented successfully
Output:
EX NO: 7 Implementation of Hierarchical Clustering
DATE:
Aim: To perform Hierarchical Clustering on blobs dataset
Software Required: Google Colab, Python IDE, Jupyter Notebook
Program:
import numpy as np
import pandas as pd
from sklearn.datasets import make_blobs
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram, linkage
import matplotlib.pyplot as plt
X, y = make_blobs(n_samples=200, centers=4, cluster_std=0.60, random_state=0)
model = AgglomerativeClustering(n_clusters=4,linkage='ward')
labels = model.fit_predict(X)
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='rainbow')
plt.title('Hierarchical Clustering')
plt.show()
Z = linkage(X, method='ward')
plt.figure(figsize=(10, 7))
plt.title("Dendrogram for Hierarchical Clustering")
dendrogram(Z)
plt.show()
Result: Hierarchical Clustering on Blobs Dataset has been implemented
successfully
Output:
EX NO: 8 Implementation of Principal Component Analysis
(PCA)
DATE:
Aim: To perform Principal Component Analysis on Wholesale Customer dataset.
Software Required: Google Colab, Python IDE, Jupyter Notebook
Program:
import pandas as pd
fromsklearn.preprocessingimportStandardScale
from sklearn.decomposition import PCA
file_path = '/content/Wholesale customers data.csv'
data = pd.read_csv(file_path)
data features = data.drop(columns=['Channel', 'Region'])
scaler = StandardScaler()
scaled_features = scaler.fit_transform(features)
pca = PCA(n_components=2)
pca_transformed = pca.fit_transform(scaled_features)
explained_variance = pca.explained_variance_ratio_
df = pd.DataFrame(pca_transformed, columns=['PC1', 'PC2']) pca_df['Channel'] =
data['Channel'] print("PCA Results:")
print(pca_df.head())
print(f"PC1: {explained_variance[0]:.2f}, PC2: {explained_variance[1]:.2f}")
Import matplotlib.pyplot as plt
plt.figure(figsize=(8, 6)) colors = {1: 'red', 2: 'blue'}
plt.scatter(pca_df['PC1'], pca_df['PC2'], c=pca_df['Channel'].map(colors), alpha=0.5)
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA of Wholesale Customers Data')
plt.show()
Result: Principal Component Analysis on Customer dataset been implemented
Output:
PCA Results:
PC1 PC2 Channel
0 0.193291 -0.305100 2
1 0.434420 -0.328413 2
2 0.811143 0.815096 2
3 -0.778648 0.652754 1
4 0.166287 1.271434 2
Explained Variance:
PC1: 0.44, PC2: 0.28