0% found this document useful (0 votes)

2 views36 pages

ML Lab Manual

The document outlines various practice programs focused on data analysis and machine learning using Python, including tasks such as creating histograms, computing correlation matrices, implementing PCA, and using algorithms like k-Nearest Neighbour and Naive Bayesian classifier. Each program includes specific datasets, such as the California Housing and Iris datasets, and provides source code examples for implementation. Additionally, it covers data visualization techniques using libraries like Matplotlib and Seaborn.

Uploaded by

surakshakeerthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views36 pages

ML Lab Manual

Uploaded by

surakshakeerthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

SI.

No Description Page
No
1 Practice Programs 1-7
2 Develop a program to create histograms for all numerical features and
analyze the distribution of each feature. Generate box plots for all 8 - 12
numerical features and identify any outliers. Use California Housing
dataset.
3 Develop a program to Compute the correlation matrix to understand
the relationships between pairs of features. Visualize the correlation 13 - 15
matrix using a heatmap to know which variables have strong
positive/negative correlations. Create a pair plot to visualize pairwise
relationships between features. Use California Housing dataset.
4 Develop a program to implement Principal Component Analysis (PCA) 16 – 17
for reducing the dimensionality of the Iris dataset from 4 features to 2.
5 For a given set of training data examples stored in a .CSV file, implement 18 - 19
and demonstrate the Find-S algorithm to output a description of the set
of all hypotheses consistent with the training examples.
6 Develop a program to implement k-Nearest Neighbour algorithm to
classify the randomly generated 100
values of x in the range of [0,1]. Perform the following based on dataset 20 – 22
generated.
a. Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi ε
Class1, else xi ε Class2
b. Classify the remaining points, x51,……,x100 using KNN. Perform this
for k=1,2,3,4,5,20,30
7 Implement the non-parametric Locally Weighted Regression algorithm
in order to fit data points. Select appropriate data set for your 23- 24
experiment and draw graphs
8 Develop a program to demonstrate the working of Linear Regression
and Polynomial Regression. Use Boston Housing Dataset for Linear
Regression and Auto MPG Dataset (for vehicle fuel efficiency 25 - 28
prediction) for Polynomial Regression.
9 Develop a program to demonstrate the working of the decision tree
algorithm. Use Breast Cancer Data set for building the decision tree and 29- 30
apply this knowledge to classify a new sample.
10 Develop a program to implement the Naive Bayesian classifier
considering Olivetti Face Data set for training. Compute the accuracy of 31 - 33
the classifier, considering a few test data sets.
11 Develop a program to implement k-means clustering using Wisconsin 34 - 35
Breast Cancer data set and visualize the clustering result.
12 Viva Questions 36
Practice Programs:

1. Write Python Script to create a DataFrame.

import pandas as pd

# Creating a DataFrame from a dictionary

data = {

'Name': ['Alice', 'Bob', 'Charlie', 'David'],

'Age': [25, 30, 35, 40],

'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']

df = pd.DataFrame(data)

# Display the DataFrame

print(df)

2. Write a Python Script to Read and Write CSV Files

# Save DataFrame to a CSV file

df.to_csv('data.csv', index=False)

# Read DataFrame from a CSV file

df_read = pd.read_csv('data.csv')
print(df_read)

3. Write a Python Script to perform Basic DataFrame Operations

# Show first 2 rows

print(df.head(2))

# Show last 2 rows

print(df.tail(2))

# Get summary statistics

print(df.describe())

# Get column names

print(df.columns)
# Get DataFrame shape (rows, columns)

print(df.shape)

# Get data types of each column

print(df.dtypes)

4. Write a Python Script for Selecting and Filtering Data

# Select a single column

print(df['Name'])

# Select multiple columns

print(df[['Name', 'Age']])

# Filter rows based on a condition

print(df[df['Age'] > 30])
5. Write a Python Script for Adding and Modifying Columns

# Add a new column

df['Salary'] = [50000, 60000, 70000, 80000]

# Modify an existing column

df['Age'] = df['Age'] + 1 # Increase age by 1

print(df)

6. Write a Python Script for Sorting and Grouping Data

# Sort DataFrame by Age in ascending order

print(df.sort_values(by='Age'))

# Group data by City and find the mean Age

print(df.groupby('City')['Age'].mean())

7. Write a Python Script for Handling Missing Values

import numpy as np

# Introduce missing values

df.loc[1, 'Age'] = np.nan

# Check for missing values

print(df.isnull().sum())

# Fill missing values with mean

df['Age'].fillna(df['Age'].mean(), inplace=True)

print(df)

8. Write a Python Script for Applying Functions to DataFrame

# Apply a function to a column

df['Age_Category'] = df['Age'].apply(lambda x: 'Young' if x < 30 else 'Old')
print(df)
9. Write a Python Script to Plot a Line Plot (Trends over Time)
import matplotlib.pyplot as plt
import numpy as np

# Sample Data
x = np.arange(1, 11)
y = np.sin(x)

# Line Plot
plt.plot(x, y, marker='o', linestyle='-', color='b', label='Sine Wave')
plt.xlabel("X Values")
plt.ylabel("Y Values")
plt.title("Simple Line Plot")
plt.legend()
plt.grid(True)
plt.show()
10. Write a Python Script to Plot a Bar Chart (Category Comparison)

import matplotlib.pyplot as plt

# Sample Data

categories = ['A', 'B', 'C', 'D', 'E']

values = [10, 25, 15, 30, 20]

# Bar Plot

plt.bar(categories, values, color=['red', 'blue', 'green', 'purple', 'orange'])

plt.xlabel("Categories")

plt.ylabel("Values")

plt.title("Bar Chart Example")

plt.show()

11. Write a Python Script to Plot a Histogram (Distribution of Data)

import numpy as np

import matplotlib.pyplot as plt

# Generate Random Data

data = np.random.randn(1000)

# Histogram

plt.hist(data, bins=30, color='skyblue', edgecolor='black')

plt.xlabel("Value")

plt.ylabel("Frequency")

plt.title("Histogram of Random Data")

plt.show()

12. Write a Python Script to Plot a Scatter Plot (Relationship between Two Variables)

import numpy as np

import matplotlib.pyplot as plt

# Generate Data

x = np.random.rand(100)

y = np.random.rand(100)

# Scatter Plot

plt.scatter(x, y, c='red', alpha=0.6)

plt.xlabel("X Values")
plt.ylabel("Y Values")

plt.title("Scatter Plot Example")

plt.show()

13. Write a Python Script to Plot a Box Plot (Detecting Outliers)

import seaborn as sns

import numpy as np

import matplotlib.pyplot as plt

# Generate Random Data

data = np.random.randn(100)

# Box Plot

sns.boxplot(data=data, color='lightblue')

plt.title("Box Plot Example")

plt.show()

14. Write a Python Script to Plot a Pair Plot (Multiple Feature Relationships - Iris
Dataset)

import seaborn as sns

import pandas as pd

from sklearn.datasets import load_iris

# Load Iris Dataset

iris = load_iris()

df = pd.DataFrame(iris.data, columns=iris.feature_names)

df['species'] = iris.target

# Pair Plot

sns.pairplot(df, hue='species', palette='coolwarm')

plt.show()

15. Write a Python Script to Plot a Heatmap (Correlation Matrix - Titanic Dataset)

import seaborn as sns

import pandas as pd

# Load Sample Dataset

df = sns.load_dataset("titanic").dropna()

# Compute Correlation

corr_matrix = df.corr()

# Heatmap

plt.figure(figsize=(8,6))

sns.heatmap(corr_matrix, annot=True, cmap="coolwarm", fmt=".2f")

plt.title("Correlation Heatmap")

plt.show()

Program 1: Develop a program to create histograms for all numerical features and analyze
the distribution of each feature. Generate box plots for all numerical features and identify
any outliers. Use California Housing dataset.

Source Code:

import numpy as np

import pandas as pd
import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.datasets import fetch_california_housing

# Load California Housing dataset

data = fetch_california_housing()

df = pd.DataFrame(data.data, columns=data.feature_names)

# Display basic dataset information

print("Dataset Overview:")

print(df.info())

print("\nSummary Statistics:")

print(df.describe())

# Set plot style

sns.set_style("whitegrid")

# Create histograms for all numerical features

plt.figure(figsize=(12, 8))

df.hist(bins=30, figsize=(12, 8), edgecolor='black')

plt.suptitle("Histograms of Numerical Features in California Housing Dataset", fontsize=14)

plt.show()

# Create improved box plots for all numerical features to identify outliers

plt.figure(figsize=(14, 8))

for i, col in enumerate(df.columns):

plt.subplot(3, 3, i + 1)

sns.boxplot(x=df[col], color="skyblue", width=0.6, fliersize=3)

plt.title(col, fontsize=12)

plt.tight_layout()

plt.suptitle("Box Plots of Numerical Features", fontsize=14, y=1.02)

plt.show()

# Identify outliers using IQR method

Q1 = df.quantile(0.25)

Q3 = df.quantile(0.75)

IQR = Q3 - Q1

outliers = ((df < (Q1 - 1.5 * IQR)) | (df > (Q3 + 1.5 * IQR)))

print("\nOutlier Detection:")

print(outliers.sum())

Output :

Dataset Overview:

RangeIndex: 20640 entries, 0 to 20639

Data columns (total 8 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 MedInc 20640 non-null float64

1 HouseAge 20640 non-null float64

2 AveRooms 20640 non-null float64

3 AveBedrms 20640 non-null float64

4 Population 20640 non-null float64

5 AveOccup 20640 non-null float64

6 Latitude 20640 non-null float64

7 Longitude 20640 non-null float64

dtypes: float64(8)
Summary Statistics:

MedInc HouseAge AveRooms AveBedrms Population \

count 20640.000000 20640.000000 20640.000000 20640.000000 20640.000000

mean 3.870671 28.639486 5.429000 1.096675 1425.476744

std 1.899822 12.585558 2.474173 0.473911 1132.462122

min 0.499900 1.000000 0.846154 0.333333 3.000000

25% 2.563400 18.000000 4.440716 1.006079 787.000000

50% 3.534800 29.000000 5.229129 1.048780 1166.000000

75% 4.743250 37.000000 6.052381 1.099526 1725.000000

max 15.000100 52.000000 141.909091 34.066667 35682.000000

AveOccup Latitude Longitude

count 20640.000000 20640.000000 20640.000000

mean 3.070655 35.631861 -119.569704

std 10.386050 2.135952 2.003532

min 0.692308 32.540000 -124.350000

25% 2.429741 33.930000 -121.800000

50% 2.818116 34.260000 -118.490000

75% 3.282261 37.710000 -118.010000

max 1243.333333 41.950000 -114.310000

<Figure size 1200x800 with 0 Axes>

Outlier Detection:

MedInc 681

HouseAge 0

AveRooms 511

AveBedrms 1424

Population 1196

AveOccup 711

Latitude 0

Longitude 0
Program 2: Develop a program to Compute the correlation matrix to understand the
relationships between pairs of features. Visualize the correlation matrix using a heatmap to
know which variables have strong positive/negative correlations. Create a pair plot to
visualize pairwise relationships between features. Use California Housing dataset.

Source Code:

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.datasets import fetch_california_housing

# Load California Housing dataset

data = fetch_california_housing()

df = pd.DataFrame(data.data, columns=data.feature_names)

# Set plot style

sns.set_style("whitegrid")

# Compute and visualize the correlation matrix

plt.figure(figsize=(10, 6))

corr_matrix = df.corr()

sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)

plt.title("Feature Correlation Heatmap", fontsize=14)

plt.show()

# Create pair plot to visualize pairwise relationships between features

sns.pairplot(df, diag_kind='kde', plot_kws={'alpha':0.5})

plt.suptitle("Pair Plot of Features", fontsize=14, y=1.02)

plt.show()
# Identify skewness of numerical features

skew_values = df.skew()

print("\nSkewness of Features:")

print(skew_values)

Output:
Skewness of Features:

MedInc 1.646657

HouseAge 0.060331

AveRooms 20.697869

AveBedrms 31.316956

Population 4.935858

AveOccup 97.639561

Latitude 0.465953 Longitude -0.297801

Program 3: Develop a program to implement Principal Component Analysis (PCA) for
reducing the dimensionality of the Iris dataset from 4 features to 2.

Source Code:

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.decomposition import PCA

from sklearn.datasets import load_iris

from sklearn.preprocessing import StandardScaler

# Load Iris dataset

data = load_iris()

df = pd.DataFrame(data.data, columns=data.feature_names)

# Standardize the data

scaler = StandardScaler()

df_scaled = scaler.fit_transform(df)

# Apply PCA to reduce dimensionality from 4 to 2

pca = PCA(n_components=2)

principal_components = pca.fit_transform(df_scaled)

# Create a new DataFrame with principal components

pca_df = pd.DataFrame(principal_components, columns=['PC1', 'PC2'])

pca_df['Target'] = data.target

# Visualize the PCA results

plt.figure(figsize=(8, 6))
for target, label in enumerate(data.target_names):

subset = pca_df[pca_df['Target'] == target]

plt.scatter(subset['PC1'], subset['PC2'], label=label, alpha=0.7)

plt.xlabel('Principal Component 1')

plt.ylabel('Principal Component 2')

plt.title('PCA of Iris Dataset')

plt.legend()

plt.grid(True)

plt.show()

# Print explained variance ratio

print("Explained Variance Ratio:", pca.explained_variance_ratio_)

Output:

Explained Variance Ratio: [0.72962445 0.22850762]

Program 4: For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Find-S algorithm to output a description of the set of all hypotheses
consistent with the training examples.

Source Code:

import csv

num_attributes = 6

a = []

print("\n The Given Training Data Set \n")

with open('data.csv', 'r') as csvfile:

reader = csv.reader(csvfile)

for row in reader:

a.append (row)

print(row)

print("\n The initial value of hypothesis: ")

hypothesis = ['0'] * num_attributes

print(hypothesis)

for j in range(0,num_attributes):

hypothesis[j] = a[0][j];

print("\n Find S: Finding a Maximally Specific Hypothesis\n")

for i in range(0,len(a)):

if a[i][num_attributes]=='yes':

for j in range(0,num_attributes):

if a[i][j]!=hypothesis[j]:

hypothesis[j]='?'

else :

hypothesis[j]= a[i][j]

print(" For Training instance No:{0} the hypothesis is ".format(i),hypothesis)

print("\n The Maximally Specific Hypothesis for a given Training Examples :\n")
print(hypothesis)

Output:

The Given Training Data Set

['sunny', 'warm', 'normal', 'strong', 'warm', 'same', 'yes']

['sunny', 'warm', 'high', 'strong', 'warm', 'same', 'yes']

['rainy', 'cold', 'high', 'strong', 'warm', 'change', 'no']

['sunny', 'warm', 'high', 'strong', 'cool', 'change', 'yes']

The initial value of hypothesis:

['0', '0', '0', '0', '0', '0']

Find S: Finding a Maximally Specific Hypothesis

For Training instance No:3 the hypothesis is ['sunny', 'warm', '?', 'strong', '?', '?']

The Maximally Specific Hypothesis for a given Training Examples :

['sunny', 'warm', '?', 'strong', '?', '?']

Program 5: Develop a program to implement k-Nearest Neighbour algorithm to classify the
randomly generated 100 values of x in the range of [0,1]. Perform the following based on
dataset generated.

a. Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi ε Class1, else xi ε Class1

b. Classify the remaining points, x51,……,x100 using KNN. Perform this for k=1,2,3,4,5,20,30

Source Code:

import numpy as np

import matplotlib.pyplot as plt

from sklearn.neighbors import KNeighborsClassifier

# Generate 100 random values in the range [0,1]

x = np.random.rand(100, 1)

# Label the first 50 points based on the given condition

labels = np.array([1 if xi <= 0.5 else 2 for xi in x[:50]])

# Prepare training and test sets

X_train, y_train = x[:50], labels # First 50 for training

X_test = x[50:] # Remaining 50 for classification

# Test for different values of k

k_values = [1, 2, 3, 4, 5, 20, 30]

plt.figure(figsize=(10, 6))

for k in k_values:

knn = KNeighborsClassifier(n_neighbors=k)

knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)

# Visualization of classification results

plt.scatter(X_test, y_pred, label=f'k={k}', alpha=0.7)

# Mark training points for reference

plt.scatter(X_train, y_train, color='red', marker='x', label='Training Data')

plt.xlabel('X values')

plt.ylabel('Predicted Class')

plt.title('KNN Classification for Different k-values')

plt.legend()

plt.show()

# Print classification results for each k

for k in k_values:

knn = KNeighborsClassifier(n_neighbors=k)

knn.fit(X_train, y_train)

y_pred = knn.predict(X_test)

print(f'Predictions for k={k}:', y_pred)

Output:
Predictions for k=1: [1 2 2 2 2 1 1 2 2 1 1 2 1 1 1 1 2 1 2 1 1 1 1 2 2 1 1 1 2 1 2 2 1 2 1 2 1

1 1 1 1 1 2 1 1 1 1 2 2 1]

Predictions for k=2: [1 2 2 2 2 1 1 2 2 1 1 2 1 1 1 1 2 1 2 1 1 1 1 2 2 1 1 1 2 1 2 2 1 2 1 2 1

1 1 1 1 1 2 1 1 1 1 2 2 1]

Predictions for k=3: [1 2 2 2 2 1 1 2 2 1 1 2 1 1 1 1 2 1 2 1 1 1 1 2 2 1 1 1 2 1 2 2 1 2 1 2 1

1 1 1 2 1 2 1 1 1 1 2 2 1]

Predictions for k=4: [1 2 2 2 2 1 1 2 2 1 1 2 1 1 1 1 2 1 2 1 1 1 1 2 2 1 1 1 2 1 2 2 1 2 1 2 1

1 1 1 1 1 2 1 1 1 1 2 2 1]

Predictions for k=5: [1 2 2 2 2 1 1 2 2 1 1 2 1 1 1 1 2 1 2 1 1 1 1 2 2 1 1 1 2 1 2 2 1 2 1 2 1

1 1 1 1 1 2 1 1 1 1 2 2 1]

Predictions for k=20: [1 2 2 2 2 1 1 2 2 1 1 2 1 1 1 1 2 1 2 1 1 1 1 2 2 1 1 1 2 1 1 2 1 2 1 2 1

1 1 1 1 1 1 1 1 1 1 2 2 1]

Predictions for k=30: [1 2 2 2 2 1 1 2 2 1 1 2 1 1 1 1 2 1 2 1 1 1 1 2 2 1 1 1 2 1 1 2 1 2 1 2 1

1 1 1 1 1 1 1 1 1 1 2 2 1]
Program 6: Implement the non-parametric Locally Weighted Regression algorithm in order
to fit data points. Select appropriate data set for your experiment and draw graphs.

Source Code:

import numpy as np

import matplotlib.pyplot as plt

# Generate synthetic dataset

np.random.seed(42)

X = np.linspace(0, 10, 100)

y = np.sin(X) + np.random.normal(0, 0.1, 100) # Sinusoidal data with noise

# Define Locally Weighted Regression function

def locally_weighted_regression(x_query, X, y, tau):

m = X.shape[0]

W = np.diag(np.exp(-((X[:, 1] - x_query[1]) ** 2) / (2 * tau ** 2))) # Diagonal weight matrix

theta = np.linalg.pinv(X.T @ W @ X) @ X.T @ W @ y # Compute theta

return x_query @ theta

# Fit Locally Weighted Regression for different values of tau

tau_values = [0.1, 0.5, 1, 5]

X_ones = np.c_[np.ones(X.shape[0]), X] # Add bias term

plt.figure(figsize=(10, 6))

plt.scatter(X, y, label='Data', color='blue', alpha=0.5)

for tau in tau_values:

y_pred = np.array([locally_weighted_regression(np.array([1, x_i]), X_ones, y, tau) for x_i in

X])

plt.plot(X, y_pred, label=f'tau={tau}')

plt.xlabel('X values')

plt.ylabel('Y values')

plt.title('Locally Weighted Regression with Different Bandwidths')

plt.legend()

plt.show()

Output:
Program 7: Develop a program to demonstrate the working of Linear Regression and
Polynomial Regression. Use Boston Housing Dataset for Linear Regression and Auto MPG
Dataset (for vehicle fuel efficiency prediction) for Polynomial Regression.

Source Code:

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.preprocessing import PolynomialFeatures

from sklearn.pipeline import make_pipeline

from sklearn.datasets import fetch_california_housing

from sklearn.metrics import mean_squared_error

# Load Boston Housing Dataset for Linear Regression

boston = fetch_california_housing()

X_boston = boston.data[:, :2] # Selecting first two features for simplicity

y_boston = boston.target

# Split data

X_train, X_test, y_train, y_test = train_test_split(X_boston, y_boston, test_size=0.2,

random_state=42)

# Train Linear Regression Model

linear_reg = LinearRegression()

linear_reg.fit(X_train, y_train)

y_pred = linear_reg.predict(X_test)

# Evaluate Model
mse = mean_squared_error(y_test, y_pred)

print(f'Linear Regression MSE: {mse}')

# Plot Predictions vs Actual

plt.scatter(y_test, y_pred, color='blue', alpha=0.5)

plt.xlabel('Actual Prices')

plt.ylabel('Predicted Prices')

plt.title('Linear Regression: Actual vs Predicted Prices')

plt.show()

# Load Auto MPG Dataset for Polynomial Regression

auto_mpg = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-
data/master/mpg.csv").dropna()

X_auto = auto_mpg[['horsepower']].values

y_auto = auto_mpg['mpg'].values

# Split data

X_train, X_test, y_train, y_test = train_test_split(X_auto, y_auto, test_size=0.2,

random_state=42)

# Train Polynomial Regression Model

degree = 3 # Choosing a cubic polynomial model

poly_model = make_pipeline(PolynomialFeatures(degree), LinearRegression())

poly_model.fit(X_train, y_train)

y_pred_poly = poly_model.predict(X_test)

# Evaluate Model

mse_poly = mean_squared_error(y_test, y_pred_poly)

print(f'Polynomial Regression MSE: {mse_poly}')

# Plot Polynomial Regression Results

X_sorted = np.sort(X_test, axis=0)

y_sorted = poly_model.predict(X_sorted)

plt.scatter(X_test, y_test, color='blue', alpha=0.5, label='Actual')

plt.plot(X_sorted, y_sorted, color='red', label=f'Polynomial Degree {degree}')

plt.xlabel('Horsepower')

plt.ylabel('MPG')

plt.title('Polynomial Regression: Horsepower vs MPG')

plt.legend()

plt.show()

Output:

Linear Regression MSE: 0.6629874283048177

Polynomial Regression MSE: 18.460267222145088

Program 8: Develop a program to demonstrate the working of the decision tree algorithm.
Use Breast Cancer Data set for building the decision tree and apply this knowledge to
classify a new sample.

Source Code:

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier

from sklearn.datasets import load_breast_cancer

from sklearn.metrics import accuracy_score, classification_report

# Load Breast Cancer Dataset

cancer = load_breast_cancer()

X = cancer.data

y = cancer.target

# Split data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Decision Tree Model

decision_tree = DecisionTreeClassifier(random_state=42)

decision_tree.fit(X_train, y_train)

# Predict on test data

y_pred = decision_tree.predict(X_test)

# Evaluate Model

accuracy = accuracy_score(y_test, y_pred)

print(f'Decision Tree Accuracy: {accuracy}')

print(classification_report(y_test, y_pred))

# Classify a new sample

new_sample = np.array([X_test[0]]) # Using first test sample as an example

predicted_class = decision_tree.predict(new_sample)

print(f'Predicted class for new sample: {cancer.target_names[predicted_class[0]]}')

Output:

Decision Tree Accuracy: 0.9473684210526315

precision recall f1-score support

0 0.93 0.93 0.93 43

1 0.96 0.96 0.96 71

accuracy 0.95 114

macro avg 0.94 0.94 0.94 114

weighted avg 0.95 0.95 0.95 114

Predicted class for new sample: benign

Program 9: Develop a program to implement the Naive Bayesian classifier considering
Olivetti Face Data set for training. Compute the accuracy of the classifier, considering a few
test data sets.

Source Code:

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import GaussianNB

from sklearn.datasets import fetch_olivetti_faces

from sklearn.metrics import accuracy_score, classification_report

# Load Olivetti Faces Dataset

faces = fetch_olivetti_faces(shuffle=True, random_state=42)

X = faces.data

y = faces.target

# Split data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Naive Bayes Classifier

naive_bayes = GaussianNB()

naive_bayes.fit(X_train, y_train)

# Predict on test data

y_pred = naive_bayes.predict(X_test)

# Evaluate Model

accuracy = accuracy_score(y_test, y_pred)

print(f'Naive Bayes Accuracy: {accuracy}')

print(classification_report(y_test, y_pred))

# Classify a new sample

new_sample = np.array([X_test[0]]) # Using first test sample as an example

predicted_class = naive_bayes.predict(new_sample)

print(f'Predicted class for new sample: {predicted_class[0]}')

Output:

Naive Bayes Accuracy: 0.775

precision recall f1-score support

0 1.00 1.00 1.00 2

1 1.00 1.00 1.00 1

2 0.33 1.00 0.50 1

3 0.00 0.00 0.00 3

4 1.00 0.50 0.67 4

5 1.00 1.00 1.00 2

7 1.00 1.00 1.00 3

8 1.00 0.67 0.80 3

9 0.50 1.00 0.67 2

10 1.00 1.00 1.00 1

11 1.00 1.00 1.00 1

12 0.50 0.67 0.57 3

13 1.00 0.50 0.67 2

14 0.00 0.00 0.00 4

15 1.00 1.00 1.00 1

16 0.67 1.00 0.80 2

17 1.00 1.00 1.00 2

18 1.00 1.00 1.00 3

19 0.40 1.00 0.57 2

20 1.00 1.00 1.00 3

21 1.00 0.50 0.67 2

22 1.00 0.40 0.57 5

23 1.00 0.50 0.67 2

24 1.00 1.00 1.00 1

25 0.67 1.00 0.80 2

26 1.00 1.00 1.00 1

27 1.00 1.00 1.00 4

28 0.00 0.00 0.00 0

29 1.00 1.00 1.00 2

30 1.00 1.00 1.00 1

31 1.00 0.67 0.80 3

32 1.00 1.00 1.00 1

34 0.00 0.00 0.00 0

35 1.00 1.00 1.00 2

36 1.00 1.00 1.00 2

38 1.00 1.00 1.00 3

39 0.57 1.00 0.73 4

accuracy 0.78 80

macro avg 0.80 0.79 0.77 80

weighted avg 0.82 0.78 0.76 80

Predicted class for new sample: 18

Program 10: Develop a program to implement k-means clustering using Wisconsin Breast
Cancer data set and visualize the clustering result.

Source Code:

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

from sklearn.cluster import KMeans

from sklearn.datasets import load_breast_cancer

from sklearn.decomposition import PCA

# Load Breast Cancer Dataset

cancer = load_breast_cancer()

X = cancer.data

# Apply K-Means Clustering

kmeans = KMeans(n_clusters=2, random_state=42)

kmeans.fit(X)

labels = kmeans.labels_

# Reduce dimensions for visualization using PCA

pca = PCA(n_components=2)

X_pca = pca.fit_transform(X)

# Scatter plot of the clusters

plt.figure(figsize=(8, 6))

plt.scatter(X_pca[:, 0], X_pca[:, 1], c=labels, cmap='viridis', alpha=0.5)

plt.title('K-Means Clustering on Breast Cancer Data')

plt.xlabel('Principal Component 1')

plt.ylabel('Principal Component 2')

plt.colorbar(label='Cluster Label')

plt.show()

Output:
Viva Questions:

1. What is the difference between supervised and unsupervised learning?

2. What are the key assumptions of the Naive Bayes classifier?

3. How does the k-Nearest Neighbors (k-NN) algorithm work?

4. What is the curse of dimensionality, and how does PCA help mitigate it?

5. What is the significance of the correlation matrix in data analysis?

6. How does the Find-S algorithm work for hypothesis learning?

7. What is the difference between parametric and non-parametric regression?

8. Why is feature scaling important in machine learning?

9. How do you evaluate the performance of a clustering algorithm?

10. What is the difference between K-Means clustering and hierarchical clustering?

11. How does Locally Weighted Regression differ from traditional regression models?

12. How does k-NN classify a new data point?

13. What are the advantages and disadvantages of Decision Trees?

14. How does the Naive Bayes classifier handle continuous data?

15. What is the role of the Gaussian assumption in Naive Bayes?

16. What are the hyperparameters in K-Means clustering, and how do they affect results?

17. What is the role of eigenvalues and eigenvectors in PCA?

18. How does polynomial regression differ from linear regression?

19. Why do we use test-train splits in machine learning models?

20. What are some real-world applications of K-Means clustering?

Machine Learning Labnem (1) (1)
No ratings yet
Machine Learning Labnem (1) (1)
5 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Ml Lab Manual
No ratings yet
Ml Lab Manual
60 pages
lab manual ML.docx
No ratings yet
lab manual ML.docx
26 pages
ML-LAB-Manual
No ratings yet
ML-LAB-Manual
18 pages
Machine Learning Laboratory
No ratings yet
Machine Learning Laboratory
23 pages
BCSL606 MACHINE LEARNING LAB FINAL DRAFT
No ratings yet
BCSL606 MACHINE LEARNING LAB FINAL DRAFT
32 pages
BCSL606 MACHINE LEARNING LAB
No ratings yet
BCSL606 MACHINE LEARNING LAB
33 pages
Practical (Data Science)
No ratings yet
Practical (Data Science)
13 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
Ai Class 12 Practical 2
No ratings yet
Ai Class 12 Practical 2
21 pages
manishadav
No ratings yet
manishadav
27 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
ML lab manual
No ratings yet
ML lab manual
25 pages
DSBDA Lab Plan
No ratings yet
DSBDA Lab Plan
5 pages
INT375 ETP PAPER (2)
No ratings yet
INT375 ETP PAPER (2)
11 pages
ML-3
No ratings yet
ML-3
24 pages
GE Practical Sem 2 (2)
No ratings yet
GE Practical Sem 2 (2)
28 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
ml observation
No ratings yet
ml observation
29 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
External
No ratings yet
External
11 pages
PR LIST DSBDA
No ratings yet
PR LIST DSBDA
2 pages
Pert Q Python
No ratings yet
Pert Q Python
3 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
18 pages
Machine Learning Lab Dlihebca6sem
100% (1)
Machine Learning Lab Dlihebca6sem
25 pages
DAV Practical File 234003
No ratings yet
DAV Practical File 234003
14 pages
DSML Problem Statements
No ratings yet
DSML Problem Statements
8 pages
DBDAL LAB - MANUAL - Final
No ratings yet
DBDAL LAB - MANUAL - Final
93 pages
DATASCIENCE (1)
No ratings yet
DATASCIENCE (1)
3 pages
Khadeeja_DS_PRACTICAL 4
No ratings yet
Khadeeja_DS_PRACTICAL 4
24 pages
Dhruv1121
No ratings yet
Dhruv1121
24 pages
Machinelearninglabmanual
No ratings yet
Machinelearninglabmanual
47 pages
Study Material IP XII
No ratings yet
Study Material IP XII
116 pages
Ai Class 12 Practical
No ratings yet
Ai Class 12 Practical
21 pages
Certificate
No ratings yet
Certificate
25 pages
Machine Learning
No ratings yet
Machine Learning
22 pages
ml lab
No ratings yet
ml lab
14 pages
Data Science
No ratings yet
Data Science
18 pages
Data Sci
No ratings yet
Data Sci
6 pages
21hcs4108 Davpracticals
No ratings yet
21hcs4108 Davpracticals
29 pages
Class 12 Practical Assignment Questions
No ratings yet
Class 12 Practical Assignment Questions
3 pages
ML Lab File
No ratings yet
ML Lab File
43 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
Machine Learning Lab Manual (1)
No ratings yet
Machine Learning Lab Manual (1)
33 pages
Python practice questions (1)
No ratings yet
Python practice questions (1)
5 pages
SL-III Lab Manual
No ratings yet
SL-III Lab Manual
74 pages
Machine Learning Programs
No ratings yet
Machine Learning Programs
10 pages
DXV Guidelines
No ratings yet
DXV Guidelines
3 pages
MLLabManual
No ratings yet
MLLabManual
24 pages
Data Science
No ratings yet
Data Science
3 pages
Python Practical Questions@Subas
No ratings yet
Python Practical Questions@Subas
7 pages
Ml Manual
No ratings yet
Ml Manual
30 pages
Ml Lab Manual
No ratings yet
Ml Lab Manual
43 pages
ML Lab Manual
No ratings yet
ML Lab Manual
40 pages
Ankit Python
No ratings yet
Ankit Python
26 pages
X-AI Practical File-2 (2024)
No ratings yet
X-AI Practical File-2 (2024)
17 pages
IPFILE
No ratings yet
IPFILE
26 pages
DV Nivas
No ratings yet
DV Nivas
24 pages
C Language Programming Codes
From Everand
C Language Programming Codes
Durgesh
No ratings yet
Government's Role in Banking - ch09
No ratings yet
Government's Role in Banking - ch09
18 pages
Clasa 7 Scoala
No ratings yet
Clasa 7 Scoala
3 pages
07 July 1993
No ratings yet
07 July 1993
116 pages
GOST R 53402-2009-Pipeline Valves and Fittings_EN 1
No ratings yet
GOST R 53402-2009-Pipeline Valves and Fittings_EN 1
81 pages
A CAN Bus Based System For Monitoring and 1 Libre
No ratings yet
A CAN Bus Based System For Monitoring and 1 Libre
1 page
Taxation Direct Tax Code Assignment 2: SUBMITTED TO: Mrs. Ranjani Matta SUBMITTED BY: Shalini Mahawar
No ratings yet
Taxation Direct Tax Code Assignment 2: SUBMITTED TO: Mrs. Ranjani Matta SUBMITTED BY: Shalini Mahawar
6 pages
(ESP32 At) (v2.2.0.0) Release Note
No ratings yet
(ESP32 At) (v2.2.0.0) Release Note
5 pages
Re Insurance Company
No ratings yet
Re Insurance Company
33 pages
DB 4055 56 67 68 71 72 Duplexers Instructions
No ratings yet
DB 4055 56 67 68 71 72 Duplexers Instructions
2 pages
Bulacan
100% (1)
Bulacan
4 pages
Tugas 2-IF 22 DX-Zani Eko Fitriyanto
No ratings yet
Tugas 2-IF 22 DX-Zani Eko Fitriyanto
3 pages
2012
No ratings yet
2012
29 pages
11 Evaluation and Simulation of Solar Cell by Using Matlab
No ratings yet
11 Evaluation and Simulation of Solar Cell by Using Matlab
8 pages
Rate Analysis of Reinforcing Steel
No ratings yet
Rate Analysis of Reinforcing Steel
9 pages
My Trallaalal
No ratings yet
My Trallaalal
3 pages
Electronic Control Module - Product Link: Especificações
No ratings yet
Electronic Control Module - Product Link: Especificações
3 pages
AUTOCAD Shortcuts - The Piping Engineering World
No ratings yet
AUTOCAD Shortcuts - The Piping Engineering World
5 pages
Complete Thesis
No ratings yet
Complete Thesis
90 pages
Module For Condition Based Maintenance
No ratings yet
Module For Condition Based Maintenance
2 pages
A Blueprint For A Better World - Contents PDF
No ratings yet
A Blueprint For A Better World - Contents PDF
5 pages
BENHAMLAOUI_TP6_CIA
No ratings yet
BENHAMLAOUI_TP6_CIA
13 pages
Extraction and Sugar Industry Applications
No ratings yet
Extraction and Sugar Industry Applications
37 pages
Film Music Homework
100% (1)
Film Music Homework
10 pages
MASTER PLAN 01-Model
No ratings yet
MASTER PLAN 01-Model
1 page
PROJET DESIGN Consignes
No ratings yet
PROJET DESIGN Consignes
5 pages
Security Surveys and Inspections - PDF - Counterintelligence - Classified Information
No ratings yet
Security Surveys and Inspections - PDF - Counterintelligence - Classified Information
45 pages
Versed - Armantrout, Rae - Wesleyan Poetry, 2010 - Wesleyan University Press - 9780819571106 - Anna's Archive
No ratings yet
Versed - Armantrout, Rae - Wesleyan Poetry, 2010 - Wesleyan University Press - 9780819571106 - Anna's Archive
133 pages
9930lec 7
No ratings yet
9930lec 7
22 pages
CD-ROM Appendix D: Using Semilog Plots in Rate Data Analysis
No ratings yet
CD-ROM Appendix D: Using Semilog Plots in Rate Data Analysis
3 pages