Machine Learning (BCSL606) Lab Manual
Machine Learning (BCSL606) Lab Manual
COURSE INFORMATION
PROGRAMME COMPUTER SCIENCE AND ENGINEERING
SEMESTER VI
CREDITS 01
L-T-P-S 0-0-2-0
CONTACT HOURS/WEEK 2
COURSE SYLLABUS:
MODULE CONTENTS HOURS
Develop a program to create histograms for all numerical features and analyze the
1. distribution of each feature. Generate box plots for all numerical features and identify any 2
outliers. Use California Housing dataset.
Develop a program to Compute the correlation matrix to understand the relationships
between pairs of features. Visualize the correlation matrix using a heatmap to know which
2. 2
variables have strong positive/negative correlations. Create a pair plot to visualize
pairwise relationships between features. Use California Housing dataset.
3. Develop a program to implement Principal Component Analysis (PCA) for reducing the
2
dimensionality of the Iris dataset from 4 features to 2.
For a given set of training data examples stored in a .CSV file, implement and
4. demonstrate the Find-S algorithm to output a description of the set of all hypotheses 2
consistent with the training examples.
Develop a program to implement k-Nearest Neighbour algorithm to classify the randomly
generated 100 values of x in the range of [0,1]. Perform the following based on dataset
5. generated.
a. Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi ∊ Class1, else 2
xi ∊ Class1
b. Classify the remaining points, x51,……,x100 using KNN. Perform this for
k=1,2,3,4,5,20,30
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
6. Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
2
points. Select appropriate data set for your experiment and draw graphs.
Develop a program to demonstrate the working of Linear Regression and Polynomial
7. Regression. Use Boston Housing Dataset for Linear Regression and Auto MPG Dataset 3
(for vehicle fuel efficiency prediction) for Polynomial Regression.
Develop a program to demonstrate the working of the decision tree algorithm. Use Breast
8. Cancer Data set for building the decision tree and apply this knowledge to classify a new 2
sample.
Develop a program to implement the Naive Bayesian classifier considering Olivetti Face
9. Data set for training. Compute the accuracy of the classifier, considering a few test data 3
sets.
10. Develop a program to implement k-means clustering using Wisconsin Breast Cancer data
3
set and visualize the clustering result.
TOTAL HOURS 28
COURSE PRE-REQUISITES:
COURSE NAME DESCRIPTION COURSE CODE SEM
Students should have the knowledge of probability
MATHEMATICS distributions, specific discrete and continuous
FOR COMPUTER distributions, statistical inferences and the basics of BCS301 III
SCIENCE hypothesis testing essential to understand the working
of ML models.
DATA
Students should have a basic understanding of data
STRUCTURES AND BCS304 III
APPLICATIONS and representations.
COURSE DESCRIPTION:
This Machine Learning lab provides hands-on experience in data visualization, supervised and
unsupervised learning, and dimensionality reduction techniques. Students will analyze datasets like
California Housing, Iris, Boston Housing, and Breast Cancer using histograms, box plots, correlation
heatmaps, and pair plots. Key ML algorithms, including Find-S, k-NN, Decision Trees, Naïve Bayes,
Linear & Polynomial Regression, PCA, and k-Means Clustering, will be implemented to develop a
strong foundation in model building and evaluation.
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
COURSE OBJECTIVES:
This course will enable the students to get practical experience in design, develop, implement, analyze and
evaluation/testing of
To become familiar with data and visualize univariate, bivariate, and multivariate data using statistical
1
techniques and dimensionality reduction.
To understand various machine learning algorithms such as similarity-based learning, regression, decision
2
trees, and clustering.
To familiarize with learning theories, probability-based models and developing the skills required for
3
decision-making in dynamic environments.
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
Experiment-1
Problem statement
Develop a program to create histograms for all numerical features and analyze the distribution of
each feature. Generate box plots for all numerical features and identify any outliers. Use California
Housing dataset.
Program
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Fetch data
"""Imports the California Housing dataset, which contains housing data for different regions
in California."""
housing_data = fetch_california_housing(as_frame=True)
data = housing_data.frame
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
print(f"Outliers in {feature}:")
print(outliers[feature].sort_values())
Output
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
Questions:
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
Experiment-2
Problem statement
Develop a program to Compute the correlation matrix to understand the relationships between pairs of
features. Visualize the correlation matrix using a heatmap to know which variables have strong
positive/negative correlations. Create a pair plot to visualize pairwise relationships between features. Use
California Housing dataset.
Program
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
OUTPUT:
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
Questions:
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
Experiment-3
Problem statement
Develop a program to implement Principal Component Analysis (PCA) for reducing the dimensionality of the
Iris dataset from 4 features to 2.
Program
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
targets = [0, 1, 2]
colors = ['r', 'g', 'b']
for target, color in zip(targets, colors):
indicesToKeep = finalDf['target'] == target
ax.scatter(finalDf.loc[indicesToKeep, 'principal component 1'], finalDf.loc[indicesToKeep,
'principal component 2'], c=color, s=50)
ax.legend(iris.target_names)
ax.grid()
plt.show()
# Print explained variance ratio
print('Explained variance ratio:', pca.explained_variance_ratio_)
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
OUTPUT:
Questions:
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
Experiment-4
Problem statement
For a given set of training data examples stored in a .CSV file, implement and demonstrate the Find-S
algorithm to output a description of the set of all hypotheses consistent with the training examples.
Program
import random
import csv
attributes = [['Sunny','Rainy'],
['Warm','Cold'],
['Normal','High'],
['Strong','Weak'],
['Warm','Cool'],
['Same','Change']]
num_attributes = len(attributes) #6
print (" \n The most general hypothesis : ['?','?','?','?','?','?']\n")
print ("\n The most specific hypothesis : ['0','0','0','0','0','0']\n")
a = []
print("\n The Given Training Data Set \n")
with open('ws.csv', 'r') as csvFile:
reader = csv.reader(csvFile)
for row in reader:
a.append (row)
print(row)
print("\n The initial value of hypothesis: ")
hypothesis = ['0'] * num_attributes
print(hypothesis)
# Comparing with First Training Example
for j in range(0,num_attributes):
hypothesis[j] = a[0][j];
print ("Hypothesis:",hypothesis[j])
# Comparing with Remaining Training Examples of Given Data Set
print("\n Find S: Finding a Maximally Specific Hypothesis\n")
for i in range(0,len(a)):
if a[i][num_attributes]=='Yes':
for j in range(0,num_attributes):
if a[i][j]!=hypothesis[j]:
hypothesis[j]='?'
else :
hypothesis[j]= a[i][j]
print(" For Training Example No :{0} the hypothesis is ".format(i),hypothesis)
print("\n The Maximally Specific Hypothesis for a given Training Examples :\n")
print(hypothesis)
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
OUTPUT:
The most general hypothesis : ['?','?','?','?','?','?']
The most specific hypothesis : ['0','0','0','0','0','0']
The Given Training Data Set
Questions:
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
Experiment-5
Problem statement
Develop a program to implement k-Nearest Neighbour algorithm to classify the randomly generated 100
values of x in the range of [0,1]. Perform the following based on dataset generated.
a. Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi ∊ Class1, else xi ∊ Class1
b. Classify the remaining points, x51,……,x100 }using KNN. Perform this for k=1,2,3,4,5,20,30
Program
import numpy as np
from sklearn.neighbors import KNeighborsClassifier
print(f"K = {k}")
print("Test Value\tPredicted Label")
for val, label in zip(test_data.flatten(), predicted_labels):
print(f"{val:.3f}\t\t{int(label)}")
print()
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
OUTPUT:
K = 1 K = 2 K = 3 K = 4
Test Value Pred Test Value Pred Test Value Pred Test Value Predic
icted Label icted Label icted Label ted Label
0.524 2 0.524 2 0.524 2 0.524 2
0.607 2 0.607 2 0.607 2 0.607 2
0.196 1 0.196 1 0.196 1 0.196 1
0.469 1 0.469 1 0.469 2 0.469 2
0.276 1 0.276 1 0.276 1 0.276 1
0.890 2 0.890 2 0.890 2 0.890 2
0.932 2 0.932 2 0.932 2 0.932 2
0.749 2 0.749 2 0.749 2 0.749 2
0.111 1 0.111 1 0.111 1 0.111 1
0.904 2 0.904 2 0.904 2 0.904 2
0.677 2 0.677 2 0.677 2 0.677 2
0.399 1 0.399 1 0.399 1 0.399 1
0.327 1 0.327 1 0.327 1 0.327 1
0.386 1 0.386 1 0.386 1 0.386 1
0.380 1 0.380 1 0.380 1 0.380 1
0.981 2 0.981 2 0.981 2 0.981 2
0.506 2 0.506 2 0.506 2 0.506 2
0.336 1 0.336 1 0.336 1 0.336 1
0.521 2 0.521 2 0.521 2 0.521 2
0.294 1 0.294 1 0.294 1 0.294 1
0.987 2 0.987 2 0.987 2 0.987 2
0.946 2 0.946 2 0.946 2 0.946 2
0.860 2 0.860 2 0.860 2 0.860 2
0.707 2 0.707 2 0.707 2 0.707 2
0.343 1 0.343 1 0.343 1 0.343 1
0.822 2 0.822 2 0.822 2 0.822 2
0.655 2 0.655 2 0.655 2 0.655 2
0.443 1 0.443 1 0.443 1 0.443 1
0.816 2 0.816 2 0.816 2 0.816 2
0.778 2 0.778 2 0.778 2 0.778 2
0.248 1 0.248 1 0.248 1 0.248 1
0.176 1 0.176 1 0.176 1 0.176 1
0.251 1 0.251 1 0.251 1 0.251 1
0.726 2 0.726 2 0.726 2 0.726 2
0.551 2 0.551 2 0.551 2 0.551 2
0.209 1 0.209 1 0.209 1 0.209 1
0.059 1 0.059 1 0.059 1 0.059 1
0.769 2 0.769 2 0.769 2 0.769 2
0.907 2 0.907 2 0.907 2 0.907 2
0.998 2 0.998 2 0.998 2 0.998 2
0.352 1 0.352 1 0.352 1 0.352 1
0.173 1 0.173 1 0.173 1 0.173 1
0.293 1 0.293 1 0.293 1 0.293 1
0.125 1 0.125 1 0.125 1 0.125 1
0.859 2 0.859 2 0.859 2 0.859 2
0.861 2 0.861 2 0.861 2 0.861 2
0.742 2 0.742 2 0.742 2 0.742 2
0.985 2 0.985 2 0.985 2 0.985 2
0.840 2 0.840 2 0.840 2 0.840 2
0.214 1 0.214 1 0.214 1 0.214 1
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
K = 5 K = 20 K = 30
Test Value Predicted La Test Value Predicted La Test Value Predicted La
bel bel bel
0.524 2 0.524 2 0.524 2
0.607 2 0.607 2 0.607 2
0.196 1 0.196 1 0.196 1
0.469 2 0.469 2 0.469 2
0.276 1 0.276 1 0.276 1
0.890 2 0.890 2 0.890 2
0.932 2 0.932 2 0.932 2
0.749 2 0.749 2 0.749 2
0.111 1 0.111 1 0.111 1
0.904 2 0.904 2 0.904 2
0.677 2 0.677 2 0.677 2
0.399 1 0.399 1 0.399 1
0.327 1 0.327 1 0.327 1
0.386 1 0.386 1 0.386 1
0.380 1 0.380 1 0.380 1
0.981 2 0.981 2 0.981 2
0.506 2 0.506 2 0.506 2
0.336 1 0.336 1 0.336 1
0.521 2 0.521 2 0.521 2
0.294 1 0.294 1 0.294 1
0.987 2 0.987 2 0.987 2
0.946 2 0.946 2 0.946 2
0.860 2 0.860 2 0.860 2
0.707 2 0.707 2 0.707 2
0.343 1 0.343 1 0.343 1
0.822 2 0.822 2 0.822 2
0.655 2 0.655 2 0.655 2
0.443 1 0.443 2 0.443 1
0.816 2 0.816 2 0.816 2
0.778 2 0.778 2 0.778 2
0.248 1 0.248 1 0.248 1
0.176 1 0.176 1 0.176 1
0.251 1 0.251 1 0.251 1
0.726 2 0.726 2 0.726 2
0.551 2 0.551 2 0.551 2
0.209 1 0.209 1 0.209 1
0.059 1 0.059 1 0.059 1
0.769 2 0.769 2 0.769 2
0.907 2 0.907 2 0.907 2
0.998 2 0.998 2 0.998 2
0.352 1 0.352 1 0.352 1
0.173 1 0.173 1 0.173 1
0.293 1 0.293 1 0.293 1
0.125 1 0.125 1 0.125 1
0.859 2 0.859 2 0.859 2
0.861 2 0.861 2 0.861 2
0.742 2 0.742 2 0.742 2
0.985 2 0.985 2 0.985 2
0.840 2 0.840 2 0.840 2
0.214 1 0.214 1 0.214 1
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
Questions:
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
Experiment-6
Problem statement
Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points. Select
appropriate data set for your experiment and draw graphs.
Program
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
SortIndex = X_array[:,1].argsort(0)
#.argsort(0) returns indices that would sort X_array[:,1] in ascending order
# Extract the sorted total_bill values
xsort = X_array[SortIndex,1]
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(bill, tip, color='green') # Plot the regression line correctly
ax.plot(xsort, ypred[SortIndex], color='red', linewidth=5)
plt.xlabel('Total bill')
plt.ylabel('Tip')
plt.show()
OUTPUT:
Questions:
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
Experiment-7
Problem statement
Develop a program to demonstrate the working of Linear Regression and Polynomial Regression. Use Boston
Housing Dataset for Linear Regression and Auto MPG Dataset (for vehicle fuel efficiency prediction) for
Polynomial Regression.
Program
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.datasets import fetch_california_housing
california = fetch_california_housing()
X_california = california.data
y_california = california.target
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
# Load from CSV
df = pd.read_csv("auto_mpg.csv")
# Train-test split
X_train_a, X_test_a, y_train_a, y_test_a = train_test_split(X_auto, y_auto, test_size=0.2,
random_state=42)
# Scaling
scaler_a = StandardScaler()
X_train_a_scaled = scaler_a.fit_transform(X_train_a)
X_test_a_scaled = scaler_a.transform(X_test_a)
# Polynomial Regression
lr_poly = LinearRegression()
lr_poly.fit(X_train_poly, y_train_a)
y_pred_poly = lr_poly.predict(X_test_poly)
# Evaluation
mse_poly = mean_squared_error(y_test_a, y_pred_poly)
print(f"Auto MPG - Polynomial Regression (degree=2) MSE: {mse_poly:.2f}")
# Visualization
plt.figure(figsize=(10, 5))
plt.scatter(X_test_a['weight'], y_test_a, color='blue', label='Actual MPG')
plt.scatter(X_test_a['weight'], y_pred_poly, color='red', alpha=0.6, label='Predicted MPG (Poly)')
plt.xlabel("Weight")
plt.ylabel("MPG")
plt.title("Polynomial Regression (Degree=2) - Auto MPG")
plt.legend()
plt.tight_layout()
plt.show()
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
OUTPUT:
Questions:
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
Experiment-8
Problem statement
Develop a program to demonstrate the working of the decision tree algorithm. Use Breast Cancer Data set for
building the decision tree and apply this knowledge to classify a new sample.
Program
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
OUTPUT:
Accuracy: 0.9473684210526315
Questions:
1. How many features are present in the Breast Cancer dataset used in this program?
2. How does splitting the data prevent overfitting?
3. Which algorithm is used in this program for classification?
4. What do the target values 0 and 1 represent in the dataset?
5. Why is it important to test the model on data not seen during training or testing?
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
Experiment-9
Problem statement
Develop a program to implement the Naive Bayesian classifier considering Olivetti Face Data set for training.
Compute the accuracy of the classifier, considering a few test data sets.
Program
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
OUTPUT:
Questions:
1. What is the shape of the original images, and how does it change after flattening?
2. Why do we flatten the image data before feeding it into the Naive Bayes classifier?
3. What are the limitations of using Naive Bayes for image classification?
4. What will happen if you change the test_size from 0.2 to 0.5?
5. How does the Naive Bayes classifier work, and why might it struggle with image data?
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
Experiment-10
Problem statement
Develop a program to implement k-means clustering using Wisconsin Breast Cancer data set and visualize the
clustering result.
Program
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore
Machine Learning Lab (BCSL606)
OUTPUT:
Questions:
Prepared by: Mr. Janardhana Bhat K, Ms. Babitha Ganesh,and Ms. Kavya A M, Dept. of CSE,
Canara Engineering College, Mangalore