MANUAL
MANUAL
COURSE OBJECTIVES:
• To understand the data sets and apply suitable algorithms for selecting the appropriate
features for analysis.
• To learn to implement supervised machine learning algorithms on standard datasets and
evaluate the performance.
• To experiment the unsupervised machine learning algorithms on standard datasets and
evaluate the performance.
• To build the graph based learning models for standard data sets.
• To compare the performance of different ML algorithms and select the suitable one based
on the application.
LIST OF EXPERIMENTS:
1. For a given set of training data examples stored in a .CSV file, implement and demonstrate
the Candidate-Elimination algorithm to output a description of the set of all hypotheses
consistent with the training examples.
2. Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use
an appropriate data set for building the decision tree and apply this knowledge to classify a
new sample.
3. Build an Artificial Neural Network by implementing the Backpropagation algorithm and
test
the same using appropriate data sets.
4. Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file and compute the accuracy with a few test data sets.
5. Implement naïve Bayesian Classifier model to classify a set of documents and
measure the accuracy, precision, and recall.
6. Write a program to construct a Bayesian network to diagnose CORONA infection using
standard WHO Data Set.
7. Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for
clustering using the k-Means algorithm. Compare the results of these two algorithms.
8. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions.
9. Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select an appropriate data set for your experiment and draw graphs.
1
EX.NO:1 IMPLEMENTATION OF CANDIDATE-ELIMINATION
ALGORITHM
DATE:
AIM
To implement the Find-S and Candidate-Elimination algorithm to determine the most
specific hypothesis (S) and the most general hypotheses (G) from a given training dataset
consisting of positive and negative examples.
ALGORITHM:
1.Initialize Hypotheses:
2
PROGRAM
import numpy as np
import pandas as pd
# Load dataset
data = pd.read_csv('C:/Users/CSE/Desktop/ENJOYSPORT.CSV')
# Display dataset
print("\n Loaded Dataset:\n")
print(data)
# Extract features and target labels
X = np.array(data.iloc[:, :-1]) # Feature set (concepts)
y = np.array(data.iloc[:, -1]) # Target values
print("\nFeature Set (Concepts):\n", X)
print("\nTarget Values:\n", y)
for i, x in enumerate(X):
if y[i] == "yes": # Positive instance
for j in range(len(S)):
if x[j] != S[j]:
S[j] = "?"
G[j][j] = "?"
else: # Negative instance
for j in range(len(S)):
if x[j] != S[j]:
G[j][j] = S[j]
else:
G[j][j] = "?"
# Remove redundant hypotheses from G
G = [h for h in G if h != ["?"] * len(S)]
3
return S, G
# Run algorithm
S_final, G_final = candidate_elimination(X, y)
# Display final hypotheses
print("\n Final Specific Hypothesis:\n", S_final)
print("\nFinal General Hypothesis:\n", G_final)
DATASET
OUTPUT:
Loaded Dataset:
SKY AIRTEMP HUMIDITY WIND WATER FORECAST ENJOY SPORT
0 Sunny Warm Normal Strong Warm Same Yes
1 Sunny Warm High Strong Warm Same Yes
2 Rainy Cold High Strong Warm Change No
3 Sunny Warm High Strong Cool Change yes
Target Values:
['Yes' 'Yes' 'No' 'yes']
Final Specific Hypothesis:
['Sunny' 'Warm' '?' 'Strong' '?' '?']
Final General Hypothesis:
[['Sunny', '?', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?', '?']]
RESULT
Thus the output for implementation of candidate-elimination algorithm is executed
and verified.
4
EX.NO:2 IMPLEMENTATION OF ID3 DECISION TREE ALGORITHM
DATE:
AIM:
To implement the ID3 Decision Tree Algorithm for classification, using a dataset to
build a decision tree and classify a new sample.
ALGORITHM
𝐻(𝑆) = − ∑ 𝑝𝑖 log2(𝑝𝑖)
Entropy measures the impurity or uncertainty in a dataset:
Information Gain (IG) helps to choose the best feature for splitting:
Step 3: Select the Best Feature: The feature with the highest information gain is chosen .
If all samples belong to one class, return that class (Leaf Node).
If no features remain, return the majority class.
Step 6: Build Decision Tree Recursively: Construct the decision tree by repeating Steps 2-6
until all branches end in leaf nodes.
Traverse the decision tree using the attribute values of the new sample and Assign the
class label of the matching leaf nod
5
PROGRAM
import csv
import math
# Load dataset
file_path = "C:/Users/CSE/Desktop/ENJOYSPORT.CSV"
data = list(csv.reader(open(file_path, "r")))
headers, dataset = data[0], data[1:]
6
if len(set(labels)) == 1: return labels[0] # Pure class, return label
if not features: return max(set(labels), key=labels.count) # No features left, return majority
best_idx = best_feature(data)
tree = {features[best_idx]: {}}
for value, subset in split_data(data, best_idx).items():
tree[features[best_idx]][value] = build_tree([row[:best_idx] + row[best_idx+1:] for row
in subset], features[:best_idx] + features[best_idx+1:])
return tree
# Print tree
def print_tree(tree, level=0):
if not isinstance(tree, dict):
print(" " * level + f"➡ {tree}")
return
for key, value in tree.items():
print(" " * level + f"[{key}]")
for subkey, subtree in value.items():
print(" " * (level + 1) + f"→ {subkey}")
print_tree(subtree, level + 2)
ENJOYSPORT
Temperat Humidi Foreca EnjoySp
Sky ure ty Wind Water st ort
Stron
Sunny Warm Normal g Warm Same Yes
Stron
Sunny Warm High g Warm Same Yes
Stron Chang
Rainy Cold High g Warm e No
Stron Chang
Sunny Warm High g Cool e Yes
Stron Chang
Rainy Warm High g Cool e No
Overc Stron Chang
ast Cold Normal g Warm e Yes
Sunny Warm Normal Weak Warm Same Yes
TESTENJOSPORT
Temperatu Humidit Wate Foreca
sky Wind
re y r st
War
Sunny Cold Normal Weak Same
m
Stron Chang
Rainy Warm High Cool
g e
Overca War
Warm Normal Weak Same
st m
8
Stron War Chang
Rainy Cold High
g m e
OUTPUT
**Decision Tree Structure**
[Sky]
→ Sunny
➡ Yes
→ Rainy
➡ No
→ Overcast
➡ Yes
9
RESULT
Thus the output for implementation of ID3 Decision tree algorithm is executed and
verified.
AIM
To build an Artificial Neural Network (ANN) using the Backpropagation Algorithm
and test it with an appropriate dataset (sleep hours, study hours, and exam scores).
ALGORITHM
1. Import Libraries
10
3. Backward Pass: Adjust weights and biases using gradient descent.
6. Make Predictions
7. Display Results
PROGRAM
import numpy as np
import pandas as pd
# Load Data
df = pd.read_csv("C:/Users/CSE/Desktop/sleep.csv")
X = df.iloc[:, :-1].values # Input Features (Sleep, Study)
y = df.iloc[:, -1].values.reshape(-1, 1) / 100 # Normalize Output
11
b2 = np.random.rand(1, 1)
# Predictions
predictions = sigmoid(sigmoid(X @ W1 + b1) @ W2 + b2) * 100
print("Predicted Exam Scores:", predictions.flatten().round(2))
sleep.csv
Exam
Sleep Study
Student Score
Hours Hours
(%)
1 2 9 92
2 1 5 86
3 3 6 89
OUTPUT
Shape of X: (3, 3)
Shape of y: (3, 1)
Predicted Exam Scores: [89.07 88.55 89.3]
12
RESULT
DATE:
Thus the output to build an Artificial Neural Network (ANN) using the
Backpropagation Algorithm is executed and verified.
AIM
To implement the Naïve Bayesian Classifier using a dataset stored in a .CSV file,
preprocess the data, train the model, and evaluate its accuracy.
ALGORITHM
Divide the dataset into training (80%) and testing (20%) sets using train_test_split().
Display the sizes of training and testing sets.
13
Train the model with model.fit(X_train, y_train).
PROGRAM
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
# Load dataset
data = pd.read_csv("C:/Users/CSE/Desktop/manju desk/iris.csv")
14
print(f"\nTraining Set Size: {X_train.shape}")
print(f"Testing Set Size: {X_test.shape}")
# Display results
print(f"\nModel Accuracy: {accuracy * 100:.2f}%")
output:
Sample Dataset:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 1 5.1 3.5 1.4 0.2 Iris-setosa
1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
4 5 5.0 3.6 1.4 0.2 Iris-setosa
15
RESULT
Thus the output for implementing the Naïve Bayesian Classifier is executed and
verified.
AIM
To implement a Naïve Bayes Classifier to classify a set of documents and measure its
accuracy, precision, and recall.
ALGORITHM
Split the dataset into training (70%) and testing (30%) sets using train_test_split().
16
2. MultinomialNB() - Naïve Bayes classifier for text data.
Use the trained model to predict labels for the test dataset.
PROGRAM
# Sample dataset
documents = [
"I love programming in Python",
"Python is a great programming language",
"I enjoy machine learning and data science",
"Data science is fun and interesting",
"I hate bugs in the code",
"Debugging is a frustrating process",
"Errors and exceptions make coding hard"
]
17
# Split data
X_train, X_test, y_train, y_test = train_test_split(documents, labels, test_size=0.3,
random_state=42)
OUTPUT:
Accuracy: 0.33
Precision: 0.50
Recall: 0.50
18
RESULT
Thus the output implement a Naïve Bayes Classifier to classify a set of documents
and measure its accuracy, precision, and recall is executed and verified.
AIM
ALGORITHM
19
o COVID → Test_Result
Use Maximum Likelihood Estimation (MLE) to train the model on the dataset.
Fit the model using model.fit().
PROGRAM
import pandas as pd
from pgmpy.models import BayesianNetwork
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.inference import VariableElimination
20
# Perform inference on the model
inference = VariableElimination(model)
21
RESULT
Thus the output to construct a Bayesian Network is executed and verified.
AIM
To implement Expectation-Maximization (EM) using Gaussian Mixture Model
(GMM) and K-Means clustering on the Iris dataset, compare their results, and analyze the
effectiveness of both clustering techniques using the silhouette score.
ALGORITHM
Initialize K-Means with n_clusters=3 (since the Iris dataset has 3 species).
Fit the model and predict the cluster labels.
22
Step 5: Evaluate the Clustering Performance
Compute the Silhouette Score for both K-Means and GMM to measure cluster
quality.
Compare the scores to determine which algorithm performs better.
Analyze the clustering effectiveness, performance, and suitability for the dataset.
PROGRAM
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets
from sklearn.cluster import KMeans
from sklearn.mixture import GaussianMixture
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
23
X_scaled = scaler.fit_transform(X)
# Visualizing Clusters
plt.figure(figsize=(12, 5))
# K-Means Clustering
plt.subplot(1, 2, 1)
sns.scatterplot(x=X_scaled[:, 0], y=X_scaled[:, 1], hue=kmeans_labels, palette='viridis')
plt.title("K-Means Clustering")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
# EM Clustering
24
plt.subplot(1, 2, 2)
sns.scatterplot(x=X_scaled[:, 0], y=X_scaled[:, 1], hue=gmm_labels, palette='coolwarm')
plt.title("EM (GMM) Clustering")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()
OUTPUT:
25
RESULT
The output to implement Expectation-Maximization (EM) using Gaussian Mixture
Model (GMM) and K-Means clustering is executed and verified.
AIM
To implement the k-Nearest Neighbors (k-NN) algorithm to classify the Iris dataset,
print both correct and incorrect predictions, and evaluate the model's performance using
accuracy and a classification report.
ALGORITHM
Step 1: Load the Dataset
Divide the dataset into training (80%) and testing (20%) subsets using
train_test_split().
26
Initialize the KNeighborsClassifier with k=5.
Train the model using the training data.
Use the trained model to predict labels for the test set.
Compare predicted labels with actual labels from the test set.
Identify and print correct and incorrect predictions.
Calculate the accuracy score using accuracy_score().
Display the classification report (precision, recall, F1-score).
PROGRAM
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report
# Split the dataset into 80% training and 20% testing data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
27
# Standardize features for better performance
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Make predictions
y_pred = knn.predict(X_test)
28
print(f"Sample {i+1}: True={iris.target_names[y_test[i]]},
Predicted={iris.target_names[y_pred[i]]}")
OUTPUT:
Correct Predictions:
Sample 1: True=versicolor, Predicted=versicolor
Sample 2: True=setosa, Predicted=setosa
Sample 3: True=virginica, Predicted=virginica
Sample 4: True=versicolor, Predicted=versicolor
Sample 5: True=versicolor, Predicted=versicolor
Sample 6: True=setosa, Predicted=setosa
Sample 7: True=versicolor, Predicted=versicolor
Sample 8: True=virginica, Predicted=virginica
Sample 9: True=versicolor, Predicted=versicolor
Sample 10: True=versicolor, Predicted=versicolor
Sample 11: True=virginica, Predicted=virginica
Sample 12: True=setosa, Predicted=setosa
Sample 13: True=setosa, Predicted=setosa
Sample 14: True=setosa, Predicted=setosa
Sample 15: True=setosa, Predicted=setosa
Sample 16: True=versicolor, Predicted=versicolor
Sample 17: True=virginica, Predicted=virginica
Sample 18: True=versicolor, Predicted=versicolor
Sample 19: True=versicolor, Predicted=versicolor
Sample 20: True=virginica, Predicted=virginica
Sample 21: True=setosa, Predicted=setosa
Sample 22: True=virginica, Predicted=virginica
Sample 23: True=setosa, Predicted=setosa
Sample 24: True=virginica, Predicted=virginica
Sample 25: True=virginica, Predicted=virginica
29
Sample 26: True=virginica, Predicted=virginica
Sample 27: True=virginica, Predicted=virginica
Sample 28: True=virginica, Predicted=virginica
Sample 29: True=setosa, Predicted=setosa
Sample 30: True=setosa, Predicted=setosa
Incorrect Predictions:
Classification Report:
precision recall f1-score support
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
30
RESULT
Thus the output to implement the k-Nearest Neighbors (k-NN) algorithm is executed
and verified.
AIM
To implement the Locally Weighted Regression (LWR) algorithm, a non-
parametric regression technique, and fit data points from the Iris dataset. The experiment
will visualize how LWR adjusts the regression curve based on different bandwidth
parameters (tau).
ALGORITHM
Import numpy, pandas, matplotlib.pyplot, and sklearn.datasets for data handling and
visualization.
31
Step 4: Define the Locally Weighted Regression (LWR) Function
PROGRAM
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
32
# Locally Weighted Regression function
def locally_weighted_regression(x_test, X, y, tau):
"""Computes the LWR prediction for a single test point."""
m = X.shape[0]
W = np.exp(-np.sum((X - x_test) ** 2, axis=1) / (2 * tau ** 2)) # Gaussian weights
W = np.diag(W) # Convert to diagonal matrix
OUTPUT:
33
RESULT
Thus the output to implement the Locally Weighted Regression (LWR) algorithm
is executed and verified.
34