0% found this document useful (0 votes)
9 views34 pages

MANUAL

The document outlines a Machine Learning Laboratory course with objectives to understand datasets, implement supervised and unsupervised algorithms, and evaluate their performance. It includes a list of experiments that involve implementing various machine learning algorithms such as Candidate-Elimination, ID3 Decision Tree, Artificial Neural Networks, and Naïve Bayesian Classifier using standard datasets. Each experiment provides a detailed aim, algorithm, program implementation, and expected results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views34 pages

MANUAL

The document outlines a Machine Learning Laboratory course with objectives to understand datasets, implement supervised and unsupervised algorithms, and evaluate their performance. It includes a list of experiments that involve implementing various machine learning algorithms such as Candidate-Elimination, ID3 Decision Tree, Artificial Neural Networks, and Naïve Bayesian Classifier using standard datasets. Each experiment provides a detailed aim, algorithm, program implementation, and expected results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 34

AD3461 MACHINE LEARNING LABORATORY L T P C 0 0 4 2

COURSE OBJECTIVES:
• To understand the data sets and apply suitable algorithms for selecting the appropriate
features for analysis.
• To learn to implement supervised machine learning algorithms on standard datasets and
evaluate the performance.
• To experiment the unsupervised machine learning algorithms on standard datasets and
evaluate the performance.
• To build the graph based learning models for standard data sets.
• To compare the performance of different ML algorithms and select the suitable one based
on the application.

LIST OF EXPERIMENTS:
1. For a given set of training data examples stored in a .CSV file, implement and demonstrate
the Candidate-Elimination algorithm to output a description of the set of all hypotheses
consistent with the training examples.
2. Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use
an appropriate data set for building the decision tree and apply this knowledge to classify a
new sample.
3. Build an Artificial Neural Network by implementing the Backpropagation algorithm and
test
the same using appropriate data sets.
4. Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file and compute the accuracy with a few test data sets.
5. Implement naïve Bayesian Classifier model to classify a set of documents and
measure the accuracy, precision, and recall.
6. Write a program to construct a Bayesian network to diagnose CORONA infection using
standard WHO Data Set.
7. Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for
clustering using the k-Means algorithm. Compare the results of these two algorithms.
8. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions.
9. Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select an appropriate data set for your experiment and draw graphs.

1
EX.NO:1 IMPLEMENTATION OF CANDIDATE-ELIMINATION
ALGORITHM
DATE:

AIM
To implement the Find-S and Candidate-Elimination algorithm to determine the most
specific hypothesis (S) and the most general hypotheses (G) from a given training dataset
consisting of positive and negative examples.

ALGORITHM:
1.Initialize Hypotheses:

 S: Start with the most specific hypothesis (all wildcards, ?).


 G: Start with the most general hypothesis (all wildcards, *).

2.For each example:

 If the example is positive ("Yes"):


o Generalize S to match the example.
o Remove hypotheses from G that don't match the example.
 If the example is negative ("No"):
o Specialize G to exclude the example.
o Remove hypotheses from S that match the negative example.

3.Repeat until all examples are processed.

4.Output the final S and G hypotheses.

2
PROGRAM
import numpy as np
import pandas as pd
# Load dataset
data = pd.read_csv('C:/Users/CSE/Desktop/ENJOYSPORT.CSV')
# Display dataset
print("\n Loaded Dataset:\n")
print(data)
# Extract features and target labels
X = np.array(data.iloc[:, :-1]) # Feature set (concepts)
y = np.array(data.iloc[:, -1]) # Target values
print("\nFeature Set (Concepts):\n", X)
print("\nTarget Values:\n", y)

# Candidate Elimination Algorithm


def candidate_elimination(X, y):
S = X[0].copy() # Initialize specific hypothesis
G = [["?" for _ in range(len(S))] for _ in range(len(S))] # General hypothesis

for i, x in enumerate(X):
if y[i] == "yes": # Positive instance
for j in range(len(S)):
if x[j] != S[j]:
S[j] = "?"
G[j][j] = "?"
else: # Negative instance
for j in range(len(S)):
if x[j] != S[j]:
G[j][j] = S[j]
else:
G[j][j] = "?"
# Remove redundant hypotheses from G
G = [h for h in G if h != ["?"] * len(S)]

3
return S, G
# Run algorithm
S_final, G_final = candidate_elimination(X, y)
# Display final hypotheses
print("\n Final Specific Hypothesis:\n", S_final)
print("\nFinal General Hypothesis:\n", G_final)

DATASET

Temperatur Humidit Forecas


Sky e y Wind Water t EnjoySport
Sunny Warm Normal Strong Warm Same Yes
Sunny Warm High Strong Warm Same Yes
Rainy Cold High Strong Warm Change No
Sunny Warm High Strong Cool Change Yes

OUTPUT:
Loaded Dataset:
SKY AIRTEMP HUMIDITY WIND WATER FORECAST ENJOY SPORT
0 Sunny Warm Normal Strong Warm Same Yes
1 Sunny Warm High Strong Warm Same Yes
2 Rainy Cold High Strong Warm Change No
3 Sunny Warm High Strong Cool Change yes

Feature Set (Concepts):


[['Sunny' 'Warm' 'Normal' 'Strong' 'Warm' 'Same']
['Sunny' 'Warm' 'High' 'Strong' 'Warm' 'Same']
['Rainy' 'Cold' 'High' 'Strong' 'Warm' 'Change']
['Sunny' 'Warm' 'High' 'Strong' 'Cool' 'Change']]

Target Values:
['Yes' 'Yes' 'No' 'yes']
Final Specific Hypothesis:
['Sunny' 'Warm' '?' 'Strong' '?' '?']
Final General Hypothesis:
[['Sunny', '?', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?', '?']]

RESULT
Thus the output for implementation of candidate-elimination algorithm is executed
and verified.

4
EX.NO:2 IMPLEMENTATION OF ID3 DECISION TREE ALGORITHM

DATE:

AIM:

To implement the ID3 Decision Tree Algorithm for classification, using a dataset to
build a decision tree and classify a new sample.

ALGORITHM

Step 1: Load the Dataset

 Read the dataset from a CSV file.


 Split it into features (attributes) and the target (class label).

Step 2: Calculate Entropy and Compute Information Gain

𝐻(𝑆) = − ∑ 𝑝𝑖 log2(𝑝𝑖)
 Entropy measures the impurity or uncertainty in a dataset:

 Information Gain (IG) helps to choose the best feature for splitting:

𝐼𝐺 = 𝐻(𝑆) − ∑ (|𝑆𝑣|/|𝑆| × 𝐻(𝑆𝑣))

Step 3: Select the Best Feature: The feature with the highest information gain is chosen .

Step 4: Split Data and Build Subtrees

 The dataset is divided based on the best feature.


 Recursively repeat Steps 2-4 for each subset.

Step 5: Stopping Conditions

 If all samples belong to one class, return that class (Leaf Node).
 If no features remain, return the majority class.

Step 6: Build Decision Tree Recursively: Construct the decision tree by repeating Steps 2-6
until all branches end in leaf nodes.

Step 7: Classify New Samples

 Traverse the decision tree using the attribute values of the new sample and Assign the
class label of the matching leaf nod

5
PROGRAM

import csv

import math
# Load dataset
file_path = "C:/Users/CSE/Desktop/ENJOYSPORT.CSV"
data = list(csv.reader(open(file_path, "r")))
headers, dataset = data[0], data[1:]

# Function to calculate entropy


def entropy(data):
labels = [row[-1] for row in data]
return -sum((labels.count(label) / len(labels)) * math.log2(labels.count(label) / len(labels))
for label in set(labels))

# Find best feature to split on


def best_feature(data):
return min(range(len(headers) - 1), key=lambda i: sum(entropy(subset) * len(subset) /
len(data) for subset in split_data(data, i).values()))

# Split dataset by feature


def split_data(data, col):
subsets = {}
for row in data:
subsets.setdefault(row[col], []).append(row)
return subsets

# Build the decision tree


def build_tree(data, features):
labels = [row[-1] for row in data]

6
if len(set(labels)) == 1: return labels[0] # Pure class, return label
if not features: return max(set(labels), key=labels.count) # No features left, return majority

best_idx = best_feature(data)
tree = {features[best_idx]: {}}
for value, subset in split_data(data, best_idx).items():
tree[features[best_idx]][value] = build_tree([row[:best_idx] + row[best_idx+1:] for row
in subset], features[:best_idx] + features[best_idx+1:])
return tree

# Print tree
def print_tree(tree, level=0):
if not isinstance(tree, dict):
print(" " * level + f"➡ {tree}")

return
for key, value in tree.items():
print(" " * level + f"[{key}]")
for subkey, subtree in value.items():
print(" " * (level + 1) + f"→ {subkey}")
print_tree(subtree, level + 2)

# Classify test samples


def classify(tree, sample, features):
if not isinstance(tree, dict): return tree
key = next(iter(tree))
return classify(tree[key].get(sample[features.index(key)], "Unknown"), sample, features)

# Build and print tree


decision_tree = build_tree(dataset, headers[:-1])
print("\n **Decision Tree Structure**")
7
print_tree(decision_tree)

# Load and classify test data


test_file = "C:/Users/CSE/Desktop/TESTENJOYSPORT.CSV"
test_data = list(csv.reader(open(test_file, "r")))[1:]

print("\n **Predictions for Test Data**")


for sample in test_data:
print(f"Test Sample: {sample} → **Prediction: {classify(decision_tree, sample, headers[:-
1])}**")

ENJOYSPORT
Temperat Humidi Foreca EnjoySp
Sky ure ty Wind Water st ort
Stron
Sunny Warm Normal g Warm Same Yes
Stron
Sunny Warm High g Warm Same Yes
Stron Chang
Rainy Cold High g Warm e No
Stron Chang
Sunny Warm High g Cool e Yes
Stron Chang
Rainy Warm High g Cool e No
Overc Stron Chang
ast Cold Normal g Warm e Yes
Sunny Warm Normal Weak Warm Same Yes

TESTENJOSPORT
Temperatu Humidit Wate Foreca
sky Wind
re y r st
War
Sunny Cold Normal Weak Same
m
Stron Chang
Rainy Warm High Cool
g e
Overca War
Warm Normal Weak Same
st m

8
Stron War Chang
Rainy Cold High
g m e

OUTPUT
**Decision Tree Structure**
[Sky]
→ Sunny
➡ Yes
→ Rainy
➡ No
→ Overcast
➡ Yes

**Predictions for Test Data**


Test Sample: ['Sunny', 'Cold', 'Normal', 'Weak', 'Warm', 'Same'] → **Prediction: Yes**
Test Sample: ['Rainy', 'Warm', 'High', 'Strong', 'Cool', 'Change'] → **Prediction: No**
Test Sample: ['Overcast', 'Warm', 'Normal', 'Weak', 'Warm', 'Same'] → **Prediction: Yes**
Test Sample: ['Rainy', 'Cold', 'High', 'Strong', 'Warm', 'Change'] → **Prediction: No**

9
RESULT
Thus the output for implementation of ID3 Decision tree algorithm is executed and
verified.

EX.NO:3 BUILD AN ARTIFICIAL NEURAL NETWORK USING THE


BACKPROPAGATION ALGORITHM
DATE:

AIM
To build an Artificial Neural Network (ANN) using the Backpropagation Algorithm
and test it with an appropriate dataset (sleep hours, study hours, and exam scores).

ALGORITHM
1. Import Libraries

 Load necessary libraries (numpy, pandas, sklearn.preprocessing).

2.Load and Preprocess Data

 Read the dataset (sleep.csv).


 Extract input features (Sleep Hours, Study Hours) and output (Exam Score).
 Normalize inputs and output for better training.

3. Initialize Neural Network

 Set the number of neurons (2 input, 3 hidden, 1 output).


 Initialize weights and biases randomly.

4. Define Activation Function

 Use sigmoid function to introduce non-linearity.

5. Train the Model (Backpropagation Algorithm)

 Repeat for a fixed number of iterations:


1. Forward Pass: Compute hidden and output layer activations.
2. Calculate Error: Find the difference between predicted and actual values.

10
3. Backward Pass: Adjust weights and biases using gradient descent.

6. Make Predictions

 Use the trained model to predict exam scores.

7. Display Results

 Print the predicted scores.

PROGRAM
import numpy as np
import pandas as pd

# Activation Function & Derivative


def sigmoid(x): return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x): return x * (1 - x)

# Load Data
df = pd.read_csv("C:/Users/CSE/Desktop/sleep.csv")
X = df.iloc[:, :-1].values # Input Features (Sleep, Study)
y = df.iloc[:, -1].values.reshape(-1, 1) / 100 # Normalize Output

# Debug Shape Issue


print("Shape of X:", X.shape) # Expected (N, 2)
print("Shape of y:", y.shape) # Expected (N, 1)

# Initialize Weights & Biases (Fix dimensions)


np.random.seed(42)
W1 = np.random.rand(X.shape[1], 3) # 2 Inputs → 3 Hidden Neurons
b1 = np.random.rand(1, 3)
W2 = np.random.rand(3, 1) # 3 Hidden → 1 Output

11
b2 = np.random.rand(1, 1)

# Train the Model


for _ in range(5000):
hidden = sigmoid(X @ W1 + b1)
output = sigmoid(hidden @ W2 + b2)

d_output = (y - output) * sigmoid_derivative(output)


d_hidden = d_output @ W2.T * sigmoid_derivative(hidden)

W2 += hidden.T @ d_output * 0.1


W1 += X.T @ d_hidden * 0.1
b2 += d_output.sum(axis=0, keepdims=True) * 0.1
b1 += d_hidden.sum(axis=0, keepdims=True) * 0.1

# Predictions
predictions = sigmoid(sigmoid(X @ W1 + b1) @ W2 + b2) * 100
print("Predicted Exam Scores:", predictions.flatten().round(2))
sleep.csv

Exam
Sleep Study
Student Score
Hours Hours
(%)
1 2 9 92
2 1 5 86
3 3 6 89

OUTPUT
Shape of X: (3, 3)
Shape of y: (3, 1)
Predicted Exam Scores: [89.07 88.55 89.3]

12
RESULT

EX.NO:4 IMPLEMENTATION OF NAIVE BAYESIAN CLASSIFIER

DATE:

Thus the output to build an Artificial Neural Network (ANN) using the
Backpropagation Algorithm is executed and verified.

AIM

To implement the Naïve Bayesian Classifier using a dataset stored in a .CSV file,
preprocess the data, train the model, and evaluate its accuracy.

ALGORITHM

Step 1: Load the Dataset

 Import necessary libraries (pandas, sklearn).


 Load the dataset using pd.read_csv().
 Display the first few rows using data.head().

Step 2: Preprocess the Data

 Separate features (X) and target variable (y).


 Convert categorical variables (if any) into numerical form using label encoding or
one-hot encoding.

Step 3: Split the Dataset

 Divide the dataset into training (80%) and testing (20%) sets using train_test_split().
 Display the sizes of training and testing sets.

Step 4: Train the Naïve Bayes Classifier

 Initialize the Gaussian Naïve Bayes model using GaussianNB().

13
 Train the model with model.fit(X_train, y_train).

Step 5: Make Predictions

 Predict test results using model.predict(X_test).

Step 6: Evaluate the Model

 Compute the accuracy score using accuracy_score(y_test, y_pred).


 Display the model accuracy.

PROGRAM
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load dataset
data = pd.read_csv("C:/Users/CSE/Desktop/manju desk/iris.csv")

# Display sample data


print("Sample Dataset:\n", data.head())

# Features (X) and Target (y)


X = data.iloc[:, :-1]
y = data.iloc[:, -1].astype('category').cat.codes # Convert categorical target to numeric

# Split dataset (80% Train, 20% Test)


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42,
shuffle=True)

# Display dataset sizes

14
print(f"\nTraining Set Size: {X_train.shape}")
print(f"Testing Set Size: {X_test.shape}")

# Train Naïve Bayes model


model = GaussianNB()
model.fit(X_train, y_train)

# Predict and evaluate


y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

# Display results
print(f"\nModel Accuracy: {accuracy * 100:.2f}%")

output:
Sample Dataset:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 1 5.1 3.5 1.4 0.2 Iris-setosa
1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
4 5 5.0 3.6 1.4 0.2 Iris-setosa

Training Set Size: (120, 5)


Testing Set Size: (30, 5)

Model Accuracy: 100.00%

15
RESULT
Thus the output for implementing the Naïve Bayesian Classifier is executed and
verified.

EX.NO:5 NAIVE BAYES ALGORITHMS FOR LEARNING AND


CLASSIFYING TEXT
DATE:

AIM

To implement a Naïve Bayes Classifier to classify a set of documents and measure its
accuracy, precision, and recall.

ALGORITHM

1: Import Required Libraries

 Import necessary libraries such as sklearn.feature_extraction.text,


sklearn.naive_bayes, sklearn.pipeline, sklearn.model_selection, and sklearn.metrics.

2: Prepare the Dataset

 Define a set of text documents as input data.


 Assign corresponding labels to classify the documents (e.g., 1 for positive, 0 for
negative).

3: Split the Data

 Split the dataset into training (70%) and testing (30%) sets using train_test_split().

4: Build the Naïve Bayes Model

 Create a pipeline consisting of:


1. CountVectorizer() - Converts text into a matrix of token counts.

16
2. MultinomialNB() - Naïve Bayes classifier for text data.

5: Train the Model

 Fit the model using the training data.

6: Predict on the Test Data

 Use the trained model to predict labels for the test dataset.

7: Evaluate the Model

PROGRAM

from sklearn.feature_extraction.text import CountVectorizer


from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score

# Sample dataset
documents = [
"I love programming in Python",
"Python is a great programming language",
"I enjoy machine learning and data science",
"Data science is fun and interesting",
"I hate bugs in the code",
"Debugging is a frustrating process",
"Errors and exceptions make coding hard"
]

labels = [1, 1, 1, 1, 0, 0, 0] # 1 for positive, 0 for negative sentiment

17
# Split data
X_train, X_test, y_train, y_test = train_test_split(documents, labels, test_size=0.3,
random_state=42)

# Create and train the model


model = make_pipeline(CountVectorizer(), MultinomialNB())
model.fit(X_train, y_train)

# Predict and evaluate


y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")
print(f"Precision: {precision_score(y_test, y_pred):.2f}")
print(f"Recall: {recall_score(y_test, y_pred):.2f}")

OUTPUT:
Accuracy: 0.33
Precision: 0.50
Recall: 0.50

18
RESULT
Thus the output implement a Naïve Bayes Classifier to classify a set of documents
and measure its accuracy, precision, and recall is executed and verified.

EX.NO:6 BAYESIAN NETWORK


DATE:

AIM

To construct a Bayesian Network using WHO COVID-19 dataset for diagnosing


COVID-19 infection based on symptoms and test results.

ALGORITHM

Step 1: Load Required Libraries

 Import necessary Python libraries such as pandas, pgmpy, and numpy.

Step 2: Load the Dataset

 Read the who_covid_data.csv file using pandas.


 Ensure the dataset has relevant features such as Fever, Cough, Fatigue,
Shortness_of_Breath, COVID, Test_Result.

Step 3: Define the Bayesian Network Structure

 Create a Bayesian Network graph using BayesianNetwork().


 Define relationships between variables, such as:
o Fever → COVID
o Cough → COVID
o Fatigue → COVID
o Shortness_of_Breath → COVID

19
o COVID → Test_Result

Step 4: Train the Model

 Use Maximum Likelihood Estimation (MLE) to train the model on the dataset.
 Fit the model using model.fit().

Step 5: Perform Inference

 Use VariableElimination() to perform probabilistic queries.


 Example query: Find the probability of COVID given Fever and Cough.

Step 6: Display Results

PROGRAM
import pandas as pd
from pgmpy.models import BayesianNetwork
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.inference import VariableElimination

# Load the dataset


dataset = pd.read_csv('C:/Users/CSE/Desktop/WHO.csv')

# Define the Bayesian Network structure


model = BayesianNetwork([
('Fever', 'COVID'),
('Cough', 'COVID'),
('Fatigue', 'COVID'),
('Shortness_of_Breath', 'COVID'),
('COVID', 'Test_Result')
])

# Fit the model using Maximum Likelihood Estimation


model.fit(dataset, estimator=MaximumLikelihoodEstimator)

20
# Perform inference on the model
inference = VariableElimination(model)

# Example query: Probability of having COVID given Fever and Cough


query_result = inference.query(variables=['COVID'], evidence={'Fever': 1, 'Cough': 1})
print(query_result)
OUTPUT:
+----------+--------------+
| COVID | phi(COVID) |
+==========+==============+
| COVID(0) | 0.4500 |
+----------+--------------+
| COVID(1) | 0.5500 |
+----------+--------------+

21
RESULT
Thus the output to construct a Bayesian Network is executed and verified.

EX.NO:7 EXPECTATION-MAXIMIZATION USING GAUSSIAN


MIXTURE MODEL AND K-MEANS CLUSTERING
DATE:

AIM
To implement Expectation-Maximization (EM) using Gaussian Mixture Model
(GMM) and K-Means clustering on the Iris dataset, compare their results, and analyze the
effectiveness of both clustering techniques using the silhouette score.

ALGORITHM

Step 1: Load the Dataset

 Import the Iris dataset from sklearn.datasets.


 Extract the features (sepal length, sepal width, petal length, petal width).

Step 2: Preprocess the Data

 Standardize the data using StandardScaler() to ensure better clustering performance.

Step 3: Apply K-Means Clustering

 Initialize K-Means with n_clusters=3 (since the Iris dataset has 3 species).
 Fit the model and predict the cluster labels.

Step 4: Apply Expectation-Maximization (GMM)

 Initialize Gaussian Mixture Model (GMM) with n_components=3.


 Fit the model and predict the cluster labels using soft clustering.

22
Step 5: Evaluate the Clustering Performance

 Compute the Silhouette Score for both K-Means and GMM to measure cluster
quality.
 Compare the scores to determine which algorithm performs better.

Step 6: Visualize the Clusters

 Plot 3D scatter plots to visualize clusters from both algorithms.


 Use color coding to differentiate clusters.

Step 7: Compare Results

 Analyze the clustering effectiveness, performance, and suitability for the dataset.

PROGRAM

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets
from sklearn.cluster import KMeans
from sklearn.mixture import GaussianMixture
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score

# Load the Iris dataset


dataset = pd.read_csv('C:/Users/CSE/Desktop/manju desk/iris.csv')
iris = datasets.load_iris()
X = iris.data # Features (sepal length, sepal width, petal length, petal width)
y_true = iris.target # True labels (for comparison)

# Standardize the features


scaler = StandardScaler()

23
X_scaled = scaler.fit_transform(X)

# Apply K-Means Clustering


kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
kmeans_labels = kmeans.fit_predict(X_scaled)

# Apply Expectation-Maximization (Gaussian Mixture Model)


gmm = GaussianMixture(n_components=3, random_state=42)
gmm_labels = gmm.fit_predict(X_scaled)

# Compute Silhouette Scores


kmeans_silhouette = silhouette_score(X_scaled, kmeans_labels)
gmm_silhouette = silhouette_score(X_scaled, gmm_labels)

# Print the results


print("Silhouette Score for K-Means:", kmeans_silhouette)
print("Silhouette Score for EM (GMM):", gmm_silhouette)

# Visualizing Clusters
plt.figure(figsize=(12, 5))

# K-Means Clustering
plt.subplot(1, 2, 1)
sns.scatterplot(x=X_scaled[:, 0], y=X_scaled[:, 1], hue=kmeans_labels, palette='viridis')
plt.title("K-Means Clustering")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")

# EM Clustering

24
plt.subplot(1, 2, 2)
sns.scatterplot(x=X_scaled[:, 0], y=X_scaled[:, 1], hue=gmm_labels, palette='coolwarm')
plt.title("EM (GMM) Clustering")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")

plt.show()

OUTPUT:

25
RESULT
The output to implement Expectation-Maximization (EM) using Gaussian Mixture
Model (GMM) and K-Means clustering is executed and verified.

EX.NO:8 IMPLEMENTATION OF K-NEAREST NEIGHBORS (K-NN)


ALGORITHM
DATE:

AIM
To implement the k-Nearest Neighbors (k-NN) algorithm to classify the Iris dataset,
print both correct and incorrect predictions, and evaluate the model's performance using
accuracy and a classification report.

ALGORITHM
Step 1: Load the Dataset

 Import the Iris dataset from sklearn.datasets.


 Extract features (sepal length, sepal width, petal length, petal width) and target labels
(species).

Step 2: Split the Data

 Divide the dataset into training (80%) and testing (20%) subsets using
train_test_split().

Step 3: Standardize the Data

 Normalize feature values using StandardScaler() to improve model performance.

Step 4: Train the k-NN Model

26
 Initialize the KNeighborsClassifier with k=5.
 Train the model using the training data.

Step 5: Make Predictions

 Use the trained model to predict labels for the test set.

Step 6: Evaluate the Model

 Compare predicted labels with actual labels from the test set.
 Identify and print correct and incorrect predictions.
 Calculate the accuracy score using accuracy_score().
 Display the classification report (precision, recall, F1-score).

Step 7: Display the Results

PROGRAM
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report

# Load the Iris dataset


dataset = pd.read_csv('C:/Users/CSE/Desktop/manju desk/iris.csv')
iris = datasets.load_iris()
X = iris.data # Features: Sepal & Petal length/width
y = iris.target # Target labels (species)

# Split the dataset into 80% training and 20% testing data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

27
# Standardize features for better performance
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initialize k-NN classifier with k=5


knn = KNeighborsClassifier(n_neighbors=5)

# Train the model


knn.fit(X_train, y_train)

# Make predictions
y_pred = knn.predict(X_test)

# Identify correct and incorrect predictions


correct_preds = (y_pred == y_test)
wrong_preds = (y_pred != y_test)

# Print correct predictions


print("\n Correct Predictions:")
for i in range(len(y_test)):
if correct_preds[i]:
print(f"Sample {i+1}: True={iris.target_names[y_test[i]]},
Predicted={iris.target_names[y_pred[i]]}")

# Print incorrect predictions


print("\n Incorrect Predictions:")
for i in range(len(y_test)):
if wrong_preds[i]:

28
print(f"Sample {i+1}: True={iris.target_names[y_test[i]]},
Predicted={iris.target_names[y_pred[i]]}")

# Print accuracy score


accuracy = accuracy_score(y_test, y_pred)
print(f"\nClassification Accuracy: {accuracy * 100:.2f}%")

# Print classification report


print("\n Classification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))

OUTPUT:
Correct Predictions:
Sample 1: True=versicolor, Predicted=versicolor
Sample 2: True=setosa, Predicted=setosa
Sample 3: True=virginica, Predicted=virginica
Sample 4: True=versicolor, Predicted=versicolor
Sample 5: True=versicolor, Predicted=versicolor
Sample 6: True=setosa, Predicted=setosa
Sample 7: True=versicolor, Predicted=versicolor
Sample 8: True=virginica, Predicted=virginica
Sample 9: True=versicolor, Predicted=versicolor
Sample 10: True=versicolor, Predicted=versicolor
Sample 11: True=virginica, Predicted=virginica
Sample 12: True=setosa, Predicted=setosa
Sample 13: True=setosa, Predicted=setosa
Sample 14: True=setosa, Predicted=setosa
Sample 15: True=setosa, Predicted=setosa
Sample 16: True=versicolor, Predicted=versicolor
Sample 17: True=virginica, Predicted=virginica
Sample 18: True=versicolor, Predicted=versicolor
Sample 19: True=versicolor, Predicted=versicolor
Sample 20: True=virginica, Predicted=virginica
Sample 21: True=setosa, Predicted=setosa
Sample 22: True=virginica, Predicted=virginica
Sample 23: True=setosa, Predicted=setosa
Sample 24: True=virginica, Predicted=virginica
Sample 25: True=virginica, Predicted=virginica

29
Sample 26: True=virginica, Predicted=virginica
Sample 27: True=virginica, Predicted=virginica
Sample 28: True=virginica, Predicted=virginica
Sample 29: True=setosa, Predicted=setosa
Sample 30: True=setosa, Predicted=setosa

Incorrect Predictions:

Classification Accuracy: 100.00%

Classification Report:
precision recall f1-score support

setosa 1.00 1.00 1.00 10


versicolor 1.00 1.00 1.00 9
virginica 1.00 1.00 1.00 11

accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30

30
RESULT
Thus the output to implement the k-Nearest Neighbors (k-NN) algorithm is executed
and verified.

EX.NO:9 IMPLEMENT THE LOCALLY WEIGHTED REGRESSION


(LWR) ALGORITHM
DATE:

AIM
To implement the Locally Weighted Regression (LWR) algorithm, a non-
parametric regression technique, and fit data points from the Iris dataset. The experiment
will visualize how LWR adjusts the regression curve based on different bandwidth
parameters (tau).

ALGORITHM

Step 1: Import Required Libraries

 Import numpy, pandas, matplotlib.pyplot, and sklearn.datasets for data handling and
visualization.

Step 2: Load the Dataset

 Load the Iris dataset using datasets.load_iris() from Scikit-learn.


 Extract Petal Length as the independent variable (X).
 Extract Petal Width as the dependent variable (y).

Step 3: Standardize the Data

 Use StandardScaler() to normalize the feature values.

31
Step 4: Define the Locally Weighted Regression (LWR) Function

Step 5: Perform Predictions

 Generate 100 test points (X_test).


 Compute predicted values (y_pred) for each test point using LWR.

Step 6: Visualize the Results

 Scatter plot the training data (red points).


 Plot the fitted regression curve (blue line).

PROGRAM
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.preprocessing import StandardScaler

# Load the Iris dataset


iris = datasets.load_iris()
X = iris.data[:, 2].reshape(-1, 1) # Petal Length
y = iris.data[:, 3] # Petal Width

# Standardize the feature


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

32
# Locally Weighted Regression function
def locally_weighted_regression(x_test, X, y, tau):
"""Computes the LWR prediction for a single test point."""
m = X.shape[0]
W = np.exp(-np.sum((X - x_test) ** 2, axis=1) / (2 * tau ** 2)) # Gaussian weights
W = np.diag(W) # Convert to diagonal matrix

# Solve for theta using weighted normal equation: (X'WX)θ = X'Wy


X_bias = np.hstack((np.ones((m, 1)), X)) # Add bias term
theta = np.linalg.pinv(X_bias.T @ W @ X_bias) @ (X_bias.T @ W @ y)

# Predict output for the given test point


x_test_bias = np.hstack(([1], x_test))
return x_test_bias @ theta # Return prediction

# Make predictions at different points


X_test = np.linspace(X_scaled.min(), X_scaled.max(), 100).reshape(-1, 1)
y_pred = np.array([locally_weighted_regression(x, X_scaled, y, tau=0.3) for x in X_test])
# Plot results
plt.scatter(X_scaled, y, color='red', label="Training Data")
plt.plot(X_test, y_pred, color='blue', label="LWR Fit (tau=0.3)")
plt.xlabel("Standardized Petal Length")
plt.ylabel("Petal Width")
plt.title("Locally Weighted Regression on Iris Dataset")
plt.legend()
plt.show()

OUTPUT:

33
RESULT
Thus the output to implement the Locally Weighted Regression (LWR) algorithm
is executed and verified.

34

You might also like