0% found this document useful (0 votes)
35 views26 pages

Machinelearning - Lab Manual

The document is a lab manual for a Machine Learning course, detailing objectives, experiments, and programming tasks using Python. It covers various machine learning techniques such as regression, decision trees, KNN, and clustering, along with required software and hardware specifications. Additionally, it includes viva questions for each week to assess understanding of the concepts taught.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views26 pages

Machinelearning - Lab Manual

The document is a lab manual for a Machine Learning course, detailing objectives, experiments, and programming tasks using Python. It covers various machine learning techniques such as regression, decision trees, KNN, and clustering, along with required software and hardware specifications. Additionally, it includes viva questions for each week to assess understanding of the concepts taught.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Lab Manual

Machine Learning Lab B


Tech R23 III-I SEM
(Computer Science & Engineering(AI & ML))
CS408PC: MACHINE LEARNING LAB

[Link]. II Year I Sem. LTPC


0021

Course Objective:

· The objective of this lab is to get an overview of the various machine learning techniques and can
demonstrate them using python.

List of Experiments:

1. Write a python program to compute Central Tendency Measures: Mean, Median, Mode

Measure of Dispersion: Variance, Standard Deviation

2. Study of Python Basic Libraries such as Statistics, Math, Numpy and Scipy

3. Study of Python Libraries for ML application such as Pandas and Matplotlib

4. Write a Python program to implement Simple Linear Regression

5. Implementation of Multiple Linear Regression for House Price Prediction using sklearn

6. Implementation of Decision tree using sklearn and its parameter tuning

7. Implementation of KNN using sklearn

8. Implementation of Logistic Regression using sklearn

9. Implementation of K-Means Clustering

10. Performance analysis of Classification Algorithms on a specific dataset (Mini Project)

ADDITIONAL EXPERIMENTS :

1. Write a Python program to implement Logistic Regression for iris using sklearn and plot

the confusion matrix.

2. Consider a dataset, use Random Forest to predict the output class. Vary the number of

trees as follows and compare the results: i.20 ii.50 iii.100


PROGRAMS
Week 1

1. Central Tendency and Dispersion Measures in Python


Aim

To write a Python program to compute:

· Measures of Central Tendency: Mean, Median, Mode

· Measures of Dispersion: Variance, Standard Deviation

Software Requirements

o Python 3.x

o statistics module (inbuilt)

o Any IDE (VS Code, Jupyter Notebook, etc.)

Hardware Requirements

o 2 GB RAM minimum

o 1 GHz Processor

o Windows/Linux/Mac OS

Source Code :

import statistics as stats

# Sample dataset

data = [5, 10, 15, 10, 20, 10, 25]

# Central Tendency Measures

mean = [Link](data)

median = [Link](data)

mode = [Link](data)

# Measures of Dispersion

variance = [Link](data)
std_dev = [Link](data)

# Display results

print("Data:", data)

print("Mean:", mean)

print("Median:", median)

print("Mode:", mode)

print("Variance:", variance)

print("Standard Deviation:", std_dev)

Output

Data: [5, 10, 15, 10, 20, 10, 25]

Mean: 13.571428571428571

Median: 10

Mode: 10

Variance: 54.285714285714285

Standard Deviation: 7.368520509014145

Viva Questions:

1. What is the difference between mean and median?

2. When is mode preferred over mean?

3. Define variance in simple terms.

4. How is standard deviation related to variance?

5. Which Python module provides functions to compute statistical values?


Week 2 :
2. Study of Python Basic Libraries: Statistics, Math, NumPy, and SciPy
Aim

To explore and understand basic Python libraries: statistics, math, numpy, and scipy.

Software Requirements

o Python 3.x

o Libraries: math, statistics, numpy, scipy

Hardware Requirements

o 2 GB RAM minimum

o 1 GHz Processor

o Windows/Linux/Mac OS

Source Code:

import math

import statistics as stats

import numpy as np

from scipy import stats as scipy_stats

# Math module demo

print("Square root of 16:", [Link](16))

# Statistics module demo

data = [1, 2, 3, 4, 5]

print("Mean:", [Link](data))

# NumPy demo

np_array = [Link]([1, 2, 3, 4, 5])

print("NumPy Array Mean:", [Link](np_array))

# SciPy demo
print("Mode using SciPy:", scipy_stats.mode(data, keepdims=False).mode)

Output

Square root of 16: 4.0

Mean: 3

NumPy Array Mean: 3.0

Mode using SciPy: 1

Viva Questions

1. What is the purpose of the math module?

2. How is NumPy different from native Python lists?

3. What does the [Link] module offer?

4. Which module would you use for scientific computing?

5. How do you calculate mean using NumPy?


Week 3 :

3. Study of Python Libraries for ML Applications: Pandas and


Matplotlib
Aim:

To study Python libraries used in machine learning applications, namely pandas and matplotlib.

Software Requirements

o Python 3.x

o Libraries: pandas, matplotlib

Hardware Requirements

o 4 GB RAM recommended

o 1.5 GHz Processor

o Windows/Linux/Mac OS

Source Code :

import pandas as pd

import [Link] as plt

# Creating a DataFrame

data = {'Name': ['A', 'B', 'C', 'D'],

'Marks': [88, 92, 79, 85]}

df = [Link](data)

# Display the DataFrame

print("DataFrame:")

print(df)

# Plotting a bar chart

[Link](df['Name'], df['Marks'], color='skyblue')

[Link]('Student Marks')
[Link]('Name')

[Link]('Marks')

[Link]()

Output

DataFrame:

Name Marks

0 A 88

1 B 92

2 C 79

3 D 85

(A bar chart is displayed showing marks of students.)

Viva Questions

1. What is a DataFrame in pandas?

2. How do you read CSV files using pandas?

3. How can matplotlib be used in ML visualization?

4. Which function is used to display plots in matplotlib?

5. What are common use cases for pandas in ML?


Week 4:

4. Simple Linear Regression in Python

Aim

To implement Simple Linear Regression using sklearn

Software Requirements
• Python 3.x
• scikit-learn, matplotlib, pandas

Hardware Requirements
• 4 GB RAM
• 1.5 GHz Processor or higher
• Windows/Linux/Mac OS

Source Code

import pandas as pd
import [Link] as plt
from sklearn.linear_model import LinearRegression

# Sample dataset
data = {'Experience': [1, 2, 3, 4, 5], 'Salary': [30000, 35000, 40000, 45000, 50000]}
df = [Link](data)

X = df[['Experience']] # Feature
y = df['Salary'] # Target

model = LinearRegression()
[Link](X, y)

# Predict salary for 6 years of experience


pred = [Link]([[6]])

print("Predicted salary for 6 years experience:", pred[0])

# Plotting
[Link](X, y, color='blue')
[Link](X, [Link](X), color='red')
[Link]("Experience vs Salary")
[Link]("Years of Experience")
[Link]("Salary")
[Link]()

Output

Predicted salary for 6 years experience: 55000.0

(A graph showing the regression line)

Viva Questions
1. What is simple linear regression?
2. What is the equation of a straight line in regression?
3. Which method is used to fit the regression model?
4. What does the slope represent?
5. How do we make predictions using the model?
Week 5:

5. Multiple Linear Regression for House Price Prediction

Aim

To implement Multiple Linear Regression using sklearn for predicting house prices.

Software Requirements
• Python 3.x
• pandas, sklearn

Hardware Requirements
• 4 GB RAM
• 1.5 GHz Processor
• Windows/Linux/Mac OS

Source Code

import pandas as pd
from sklearn.linear_model import LinearRegression

# Dataset
data = {
'Area': [1000, 1500, 2000, 2500, 3000],
'Bedrooms': [2, 3, 4, 4, 5],
'Price': [300000, 400000, 500000, 550000, 600000]
}
df = [Link](data)

X = df[['Area', 'Bedrooms']] # Features


y = df['Price'] # Target

model = LinearRegression()
[Link](X, y)

# Predict price
prediction = [Link]([[2800, 4]])
print("Predicted house price:", prediction[0])
Output

Predicted house price: 580000.0

Viva Questions

1. What is the difference between simple and multiple linear regression?


2. What is multicollinearity?
3. How many independent variables can be used?
4. What are features and labels?
5. Which library is used for regression in Python?


Week 6:

6. Decision Tree Implementation with Parameter Tuning

Aim

To implement a Decision Tree Classifier using sklearn with parameter tuning.

Software Requirements
• Python 3.x
• sklearn, pandas

Hardware Requirements
• 4 GB RAM
• 1.5 GHz Processor
• Windows/Linux/Mac OS

Source Code

from [Link] import load_iris


from [Link] import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from [Link] import accuracy_score

# Load dataset
iris = load_iris()
X = [Link]
y = [Link]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Create model with tuning


model = DecisionTreeClassifier(max_depth=3, criterion='entropy')
[Link](X_train, y_train)

# Predict
y_pred = [Link](X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

Output

Accuracy: 0.9333 (may vary)

Viva Questions

1. What is a decision tree?


2. What is max_depth in a tree?
3. What are gini and entropy?
4. What is overfitting in decision trees?
5. How do you tune hyperparameters?
Week 7:

7. K-Nearest Neighbors (KNN) using Sklearn

Aim

To implement the K-Nearest Neighbors (KNN) algorithm using sklearn.

Software Requirements
• Python 3.x
• sklearn

Hardware Requirements
• 4 GB RAM
• 1.5 GHz Processor

Source Code

from [Link] import load_iris


from [Link] import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from [Link] import accuracy_score

# Load dataset
iris = load_iris()
X = [Link]
y = [Link]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Create KNN model


model = KNeighborsClassifier(n_neighbors=3)
[Link](X_train, y_train)

# Prediction
y_pred = [Link](X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
Output

Accuracy: 1.0 (may vary)

Viva Questions

1. What is KNN?
2. How does KNN work?
3. What is the effect of changing K?
4. Is KNN supervised or unsupervised?
5. What is the distance metric used in KNN?
Week 8:

8. Logistic Regression using Sklearn

Aim

To implement Logistic Regression for classification using sklearn.

Software Requirements
• Python 3.x
• sklearn

Hardware Requirements
• 4 GB RAM
• 1.5 GHz Processor

Source Code

from [Link] import load_iris


from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from [Link] import accuracy_score

# Load data
iris = load_iris()
X = [Link]
y = ([Link] == 0).astype(int) # Binary classification

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train logistic regression


model = LogisticRegression()
[Link](X_train, y_train)

# Predict
y_pred = [Link](X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
Output

Accuracy: 1.0 (may vary)

Viva Questions

1. What is logistic regression used for?


2. Is logistic regression a classification algorithm?
3. What is the sigmoid function?
4. What is the range of output of logistic regression?
5. How does logistic regression differ from linear regression?
Week 9:

9. Implementation of K-Means Clustering

Aim:
To implement the K-Means Clustering algorithm using sklearn and visualize the clusters.

Software Requirements:
• Python 3.x
• numpy
• matplotlib
• sklearn

Hardware Requirements:
• 4 GB RAM
• 1.5 GHz processor or higher
• OS: Windows/Linux/macOS

Source Code:

import [Link] as plt


from [Link] import KMeans
from [Link] import make_blobs

# Generate sample data


X, y = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)

# Create KMeans model


kmeans = KMeans(n_clusters=4, random_state=0)
[Link](X)
y_kmeans = [Link](X)

# Plotting the clusters


[Link](X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')
centers = kmeans.cluster_centers_
[Link](centers[:, 0], centers[:, 1], c='red', s=200, alpha=0.75, marker='X')
[Link]("K-Means Clustering Result")
[Link]()

Output

A scatter plot with 4 colored clusters and red “X” markers for centroids.

Viva Questions

1. What is the K in K-Means?


2. Is K-Means supervised or unsupervised?
3. How do you choose the number of clusters?
4. What does inertia mean in K-Means?
5. How are centroids updated?
Week 10:

10. Performance Analysis of Classification Algorithms on a Dataset


(Mini Project)

Aim:
To compare the performance of multiple classification algorithms (Logistic Regression, KNN, Decision
Tree) on the Iris dataset.

Software Requirements:
• Python 3.x
• sklearn
• pandas
• matplotlib
• seaborn (optional)

Hardware Requirements:
• 4 GB RAM
• 1.5 GHz processor
• Windows/Linux/macOS

Source Code:

import pandas as pd
from [Link] import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from [Link] import DecisionTreeClassifier
from [Link] import KNeighborsClassifier
from [Link] import accuracy_score

# Load dataset
iris = load_iris()
X = [Link]
y = [Link]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Models
models = {
'Logistic Regression': LogisticRegression(max_iter=200),
'Decision Tree': DecisionTreeClassifier(),
'KNN': KNeighborsClassifier()
}

# Train, predict, and evaluate


for name, model in [Link]():
[Link](X_train, y_train)
y_pred = [Link](X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'{name} Accuracy: {accuracy:.2f}')

Output

Example (varies slightly):


Logistic Regression Accuracy: 1.00
Decision Tree Accuracy: 1.00
KNN Accuracy: 1.00

Viva Questions
1. Why do we use train-test split?
2. What evaluation metric is used here?
3. Which model performed the best?
4. What is overfitting and how can you prevent it?
5. Can accuracy be misleading? If yes, when?


ADDITIONAL EXPERIMENTS:

1. Write a Python program to implement Logistic Regression for iris using sklearn and plot the

confusion matrix.

Aim:
To compare the performance of multiple classification algorithms (Logistic Regression, KNN, Decision
Tree) on the Iris dataset.

Software Requirements:
• Python 3.x
• sklearn
• pandas
• matplotlib
• seaborn (optional)

Hardware Requirements:
• 4 GB RAM
• 1.5 GHz processor
• Windows/Linux/macOS

Source Code:

import numpy as np
import [Link] as plt
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from [Link] import confusion_matrix, ConfusionMatrixDisplay

# Load the Iris dataset


iris = datasets.load_iris()
X = [Link]
y = [Link]

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize and train the Logistic Regression model


logreg = LogisticRegression(max_iter=200)
[Link](X_train, y_train)

# Predict the labels for the test set


y_pred = [Link](X_test)

# Compute the confusion matrix


cm = confusion_matrix(y_test, y_pred)

# Display the confusion matrix


disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=iris.target_names)
[Link](cmap=[Link])
[Link]()

Output :

The confusion matrix will be a 3x3 grid, corresponding to the three Iris species:
Rows: True classes (actual species)
Columns: Predicted classes (predicted species)[Link]
Each cell in the matrix indicates the number of instances where the true class is on the row and the
predicted class is on the column.
2. Consider a dataset, use Random Forest to predict the output class. Vary the number of trees as

follows and compare the results: i.20 ii.50 iii.100

Aim:
To compare the performance of multiple classification algorithms (Logistic Regression, KNN, Decision
Tree) on the Iris dataset.

Software Requirements:
• Python 3.x
• sklearn
• pandas
• matplotlib
• seaborn (optional)

Hardware Requirements:
• 4 GB RAM
• 1.5 GHz processor
• Windows/Linux/macOS

Source Code:

import numpy as np
import [Link] as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from [Link] import RandomForestClassifier
from [Link] import accuracy_score

# Load the Iris dataset


iris = datasets.load_iris()
X = [Link]
y = [Link]

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# List of different numbers of trees to evaluate


n_estimators_list = [20, 50, 100]
accuracies = []

# Train and evaluate a Random Forest classifier for each number of trees
for n_estimators in n_estimators_list:
clf = RandomForestClassifier(n_estimators=n_estimators, random_state=42)
[Link](X_train, y_train)
y_pred = [Link](X_test)
accuracy = accuracy_score(y_test, y_pred)
[Link](accuracy)

# Plot the results


[Link](figsize=(8, 6))
[Link](n_estimators_list, accuracies, marker='o', linestyle='-', color='b')
[Link]('Random Forest Accuracy vs. Number of Trees')
[Link]('Number of Trees')
[Link]('Accuracy')
[Link](True)
[Link]()

Output :

n_estimators_list = [20, 50, 100]

accuracies = [0.9556, 0.9778, 0.9778]

You might also like