0% found this document useful (0 votes)

363 views37 pages

CP4252 Machine Learning Lab Manual

Uploaded by

muthu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

363 views37 pages

CP4252 Machine Learning Lab Manual

Uploaded by

muthu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

BONAFIDECERTIFICATE

Name of the Student:

This is to certify that this is a bonafide record of the workdone by the above student with
RollNo. of Semester B.E
Degree in in
the during Laboratory
the academic year 2022 – 2023.

Staff-In-Charge Head of the Dept.

Date:

Submitted for the Practical Examination held on

Internal Examiner External Examiner

CS3461- OPERATING SYSTEMS LABORATORY
INDEX

EX.NO. DATE NAME OF THE EXPERIMENT MARKS SIGN

PROGRAM 1:

IMPLEMENT NAIVE BAYES THEOREM TO CLASSIFY THE ENGLISH TEXT:

AIM:
To Implement naive bayes theorem to classify the English text

DESCRIPTION:

The challenge of text classification is to attach labels to bodies of text, e.g., tax document, medical
form, etc. based on the text itself. For example, think of your spam folder in your email. How does your
email provider know that a particular message is spam or “ham” (not spam)? We’ll take a look at one
natural language processing technique for text classification called Naive Bayes.

SOURCE CODE:

import pandas as pd

fromsklearn.model_selection import train_test_split

fromsklearn.feature_extraction.text import CountVectorizer

fromsklearn.naive_bayes import MultinomialNB

fromsklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score

msg = pd.read_csv('document.csv', names=['message', 'label'])

print("Total Instances of Dataset: ", msg.shape[0])

msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})

X = msg.message

y = msg.labelnum

Xtrain, Xtest, ytrain, ytest = train_test_split(X, y)

count_v = CountVectorizer()

Xtrain_dm = count_v.fit_transform(Xtrain)

Xtest_dm = count_v.transform(Xtest)

df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names())
clf = MultinomialNB()

clf.fit(Xtrain_dm, ytrain)

pred = clf.predict(Xtest_dm)

print('Accuracy Metrics:')

print('Accuracy: ', accuracy_score(ytest, pred)) print('Recall: ', recall_score(ytest, pred)) print('Precision: ',

precision_score(ytest, pred))

print('Confusion Matrix: \n', confusion_matrix(ytest, pred))

document.csv:
I love this sandwich,pos This

is an amazingplace,pos

I feel very good about these beers,pos

This is my best work,pos

What an awesome view,pos

I do not like this restaurant,neg

I am tired of this stuff,neg

I can't deal with this,neg He

is my sworn enemy,neg My

boss is horrible,neg

This is an awesome place,pos

I do not like the taste of this juice,neg

I love to dance,pos

I am sick and tired of this place,neg

What a great holiday,pos

That is a bad locality to stay,neg

We will have good fun tomorrow,pos I

went to my enemy's house today,neg

OUTPUT:
Total Instances of Dataset: 18

Accuracy Metrics: Accuracy: 0.6

Recall: 0.6666666666666666

Precision: 0.6666666666666666

Confusion Matrix:

[[1 1]

[1 2]]
VIVA QUESTIONS & ANSWERS:

1. How Naive Bayes algorithm works?

Let’s understand it using an example. Below I have a training data set of weather and
corresponding target variable ‘Play’ (suggesting possibilities of playing). Now, we need to classify
whether players will play or not based on weather condition. Let’s follow the below steps to perform it.

Step 1: Convert the data set into a frequency table

Step 2: Create Likelihood table by finding the probabilities like Overcast probability = 0.29 and
probability of playing is 0.64.

Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for each class. The class
with the highest posterior probability is the outcome of prediction.

Problem:: Players will play if weather is sunny. Is this statement is correct?

We can solve it using above discussed method of posterior probability.

P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)

Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 = 0.64

Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.
Naive Bayes uses a similar method to predict the probability of different class based on various attributes.
This algorithm is mostly used in text classification and with problems having multiple classes.

2. Applications of Naive Bayes Algorithms:

 Real time Prediction: Naive Bayes is an eager learning classifier and it is sure fast. Thus, it could be
used for making predictions in real time.

 Multi class Prediction: This algorithm is also well known for multi class prediction feature. Here we
can predict the probability of multiple classes of target variable.

 Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes classifiers mostly used in text
classification (due to better result in multi class problems and independence rule) have higher success
rate as compared to other algorithms. As a result, it is widely used in Spam filtering (identify spam e-
mail) and Sentiment Analysis (in social media analysis, to identify positive and negative customer
sentiments).

 Recommendation System: Naive Bayes Classifier and Collaborative Filtering together builds a
Recommendation System that uses machine learning and data mining techniques to filter unseen
information and predict whether a user would like a given resource or not.
PROGRAM 2:

TO IMPLEMENT CLASSIFICATION WITH K-NEAREST NEIGHBORS

AIM:

Write a program to implement classification with K-Nearest Neighbors

DESCRIPTION:

Write a program to implement classification with K-Nearest Neighbors. In this question, you

will use the scikit-learn’s KNN classifier to classify real vs. fake news headlines. The aim of

this question is for you to read the scikit-learn API and get comfortable with training

/validation splits. Use California Housing Datasets.

Distance Metrics

K-Nearest-Neighbour Algorithm:

L-Load the data

M-Initialize the value of k
N-For getting the predicted class, iterate from 1 to total number of training data points

1. Calculate the distance between test data and each row of training data. Here we will use Euclidean
distance as our distance metric since it’s the most popular method. The other metrics that can be
used are Chebyshev, cosine, etc.
2. Sort the calculated distances in ascending order based on distance values
3. Get top k rows from the sorted array.
4. Get the most frequent class of these rows i.e Get the labels of the selected K entries.
5. Return the predicted class. If regression, return the mean of the K labels If classification,
return the mode of the K labels.
Confusion matrix:

Note;

• Class 1 : Positive
• Class 2 : Negative

• Positive (P) : Observation is positive (for example: is an apple).

• Negative (N) : Observation is not positive (for example: is not an apple).
• True Positive (TP) : Observation is positive, and is predicted to be positive.
• False Negative (FN) : Observation is positive, but is predicted negative. (Also known as a "Type II
error.".
• True Negative (TN) : Observation is negative, and is predicted to be negative.
• False Positive (FP) : Observation is negative, but is predicted positive. (Also known as a "Type I error.").

ACCURACY = TP+TN
------------------
TP+TN+FP+FN

RECALL = TP
-------------------
TP+FN

PRECISION = TP
-------------------
TP+FP

F-MEASURE = 2*RECALL*PRECISION
--------------------------------
RECALL+PRECISION
EXAMPLE:

PREDICTED: PREDICTED:
NO YES
TN=50 FP=10 60

n=165 FN=5 TP=100 105

ACTUAL = NO 55 110

ACTUAL = YES

Accuracy: Overall, how often is the classifier correct?

(TP+TN)/total = (100+50)/165 = 0.91

Misclassification Rate: Overall, how often is it wrong?

(FP+FN)/total = (10+5)/165 = 0.09
equivalent to 1 minus Accuracy also known as "Error Rate“.

True Positive Rate: When it's actually yes, how often does it predict yes?
TP/actual yes = 100/105 = 0.95
also known as "Sensitivity" or "Recall".

False Positive Rate: When it's actually no, how often does it predict yes?
FP/actual no = 10/60 = 0.17

True Negative Rate: When it's actually no, how often does it predict no?
TN/actual no = 50/60 = 0.83
equivalent to 1 minus False Positive Rate also known as "Specificity“.

Precision: When it predicts yes, how often is it correct?

TP/predicted yes = 100/110 = 0.91.

Prevalence: How often does the yes condition actually occur in our sample?
actual yes/total = 105/165 = 0.64

Source Code:

from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split

import pandas as pd

dataset=pd.read_csv("iris.csv")
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=0,test_size=0.25)

classifier=KNeighborsClassifier(n_neighbors=8,p=3,metric='euclidean')

classifier.fit(X_train,y_train)

#predict the test resuts

y_pred=classifier.predict(X_test)

cm=confusion_matrix(y_test,y_pred)
print('Confusion matrix is as follows\n',cm)
print('Accuracy Metrics')
print(classification_report(y_test,y_pred))
print(" correct predicition",accuracy_score(y_test,y_pred))
print(" worng predicition",(1-accuracy_score(y_test,y_pred)))

Output :
Confusion matrix is as follows
[[13 0 0]
[ 0 15 1]
[ 0 0 9]]

Accuracy Metrics:

Precision Recall F1-score Support

Iris-setosa 1.00 1.00 1.00 13
Iris-versicolor 1.00 0.94 0.97 16
Iris-virginica 0.90 1.00 0.95 9
Avg / total 0.98 0.97 0.97 38

Correct Predicition: 0.9736842105263158

Wrong Predicition: 0.02631578947368418
PROGRAM 3:

IMPLEMENT A LINEAR REGRESSION WITH A REAL DATASET

AIM:

To Implement Linear Regression with a real datasets and Experiment with different features in

building a model. Tune the model's hyper parameters.

DESCRIPTION:

1. Define business object.

2. Make sense of the data from a high level.

 data types (number, text, object, etc.)

 continuous/discrete
 basic stats (min, max, std, median, etc.) using boxplot
 frequency via histogram
 scales and distributions of different features

3. Create the training and test sets using proper sampling methods, e.g., random vs. Stratified.
4. Correlation analysis (pair-wise and attribute combinations).
5. Data cleaning (missing data, outliers, data errors).
6. Data transformation via pipelines (categorical text to number using one hot encoding, feature
scaling via normalization/standardization, feature combinations).
7. Train and cross validate different models and select the most promising one (Linear Regression,
Decision Tree, and Random Forest were tried in this tutorial).
8. Fine tune the model using trying different combinations of hyper parameters.
9. Evaluate the model with best estimators in the test set.
10. Launch, monitor, and refresh the model and system.

SOURCE CODE:

# This Python 3 environment comes with many helpful analytic libraries installed.

# It is defined by the kaggle/python docker image: https://github.com/kaggle/ docker-python.

import numpy as np

import pandas as pd

%matplotlib inline
import matplotlib.pyplot as plt

import seaborn as sns

# Input data files are available in the "../input/" directory.

# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input
Directory

import os

print(os.listdir("../input"))

# Any results you write to the current directory are saved as output.

['anscombe.csv', 'housing.csv']

# loading data

data_path = "../input/housing.csv"

housing = pd.read_csv(data_path)

# see the basic info

housing.info()

RangeIndex: 20640 entries, 0 to 20639

Data columns (total 10 columns):

longitude 20640 non-null float64

latitude 20640 non-null float64

housing_median_age 20640 non-null float64

total_rooms 20640 non-null float64

total_bedrooms 20433 non-null float64

population 20640 non-null float64

households 20640 non-null float64

median_income 20640 non-null float64

median_house_value 20640 non-null float64

ocean_proximity 20640 non-null object

dtypes: float64(9), object(1)

memory usage: 1.6+ MB

Input(3): housing.head(10)

Input(4): housing.describe()

Input(5): housing.boxplot(['median_house_value'], figsize=(10, 10))

Input(6): housing.hist(bins=50, figsize=(15, 15))

Output(6):
Input(7): housing['ocean_proximity'].value_counts()

Output(7):

<1H OCEAN 9136

INLAND 6551

NEAR OCEAN 2658

NEAR BAY 2290

ISLAND 5

Name: ocean_proximity, dtype: int64

Input(8):

op_count = housing['ocean_proximity'].value_counts()

plt.figure(figsize=(10,5))

sns.barplot(op_count.index, op_count.values, alpha=0.7)

plt.title('Ocean Proximity Summary')

plt.ylabel('Number of Occurrences', fontsize=12)

plt.xlabel('Ocean Proximity', fontsize=12)

plt.show()

# housing['ocean_proximity'].value_counts().hist()

Output(8):
Input(9): housing['median_income'].hist()

Output(9): <matplotlib.axes._subplots.AxesSubplot at 0x7f264523cb00>

Input(10): housing['median_income'].hist()

Output(10): <matplotlib.axes._subplots.AxesSubplot at 0x7f264523cb00>

Input(11):housing.plot(kind='scatter', x='longitude', y='latitude', alpha=0.1)

Output(11): <matplotlib.axes._subplots.AxesSubplot at 0x7f2645224e10>

Input(12):

# Pearson's r, aka, standard correlation coefficient for every pair

corr_matrix = housing.corr()

# Check the how much each attribute correlates with the median house value

corr_matrix['median_house_value'].sort_values(ascending=False)

Output(12):

median_house_value 1.000000

median_income 0.687160

total_rooms 0.135097

housing_median_age 0.114110

households 0.064506

total_bedrooms 0.047689

population -0.026920

longitude -0.047432

latitude -0.142724

Name: median_house_value, dtype: float64.

PROGRAM 4:

THE SCIKIT-LEARNS’s KNN CLASSIFIER TO CLASSIFY REAL vs. FAKE NEWS

HEADLINES.

AIM:

The aim of this question is for you to read the scikit-learn API and get comfortable with training/validation

splits. Use California Housing Datasets.

DESCRIPTION:

Classification with Nearest Neighbors. In this question, you will use the scikit-learn’s KNN classifier to

classify real vs. fake news headlines. The aim of this question is for you to read the scikit-learn API and get

comfortable with training/validation splits. Use California Housing Datasets.

SOURCE CODE:

import csv

import random

import math

import operator

def loadDataset(filename, split, trainingSet=[] , testSet=[])

:
with open(filename, 'rb') as csvfile
:
lines = csv.reader(csvfile)

dataset = list(lines)

for x in range(len(dataset)-1):

for y in range(4):

dataset[x][y] = float(dataset[x][y])

if random.random() < split: trainingSet.append(dataset[x])

else:

testSet.append(dataset[x])
def euclideanDistance(instance1, instance2, length):

distance = 0

for x in range(length):

distance += pow((instance1[x] - instance2[x]), 2)

return math.sqrt(distance)

def getNeighbors(trainingSet, testInstance, k):

distances = []

length = len(testInstance)-1

for x in range(len(trainingSet)):

dist = euclideanDistance(testInstance, trainingSet[x], length) distances.append((trainingSet[x], dist))

distances.sort(key=operator.itemgetter(1))

neighbors = []

for x in range(k):

neighbors.append(distances[x][0])

return neighbors

def getResponse(neighbors):

classVotes = {}

for x in range(len(neighbors)):

response = neighbors[x][-1]

if response in classVotes:

classVotes[response] += 1

else:

classVotes[response] = 1
sortedVotes =

sorted(classVotes.iteritems(),

reverse=True)

return sortedVotes[0][0]

def getAccuracy(testSet,

predictions): correct = 0

for x in

range(len(testSet)):

key=operator.itemgetter(1);

if testSet[x][-1] == predictions[x]:

correct += 1

return (correct/float(len(testSet))) * 100.0

def main():

# prepare

Data

trainingSet=

[] testSet=[]

split = 0.67

loadDataset('knndat.data', split, trainingSet,

testSet) print('Train set: ' + repr(len(trainingSet)))

print('Test set: ' + repr(len(testSet)))

# generate

predictions

predictions=[]
k=3

for x in range(len(testSet)):

neighbors = getNeighbors(trainingSet, testSet[x],

k) result = getResponse(neighbors)

predictions.append(result)

print('> predicted=' + repr(result) + ', actual=' + repr(testSet[x][-

) accuracy = getAccuracy(testSet, predictions)

print('Accuracy: ' + repr(accuracy) +

1]) '%') main()

OUTPUT:

Confusion matrix is as follows:

[[11 0 0]
[0 9 1]
[0 1 8]]

Accuracy metrics:

0 1.00 1.00 1.00 11

1 0.90 0.90 0.90 10

2 0.89 0.89 0.89 9

Avg/Total: 0.93 0.93 0.93 30.

PROGRAM 5:

ANALYZE DELTAS BETWEEN TRAINING SET AND VALIDATION SET RESULTS

AIM:

To implement the experiment with validation sets and test sets using the datasets.

DESCRIPTION:

To experiment with validation sets and test sets using the datasets. Split a training set into a smaller

training set and a validation set. Analyze deltas between training set and validation set results. Test the

trained model with a test set to determine whether your trained model is over fitting. Detect and fix a

common training problem.

SOURCE CODE:

import matplotlib.pyplot as plt

import numpy as np

from sklearn.model_selection import cross_validate, train_test_split

from sklearn.preprocessing import Polynomial Features, StandardScaler

from sklearn.pipeline import make_pipeline

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error

np.random.seed(42)

# Generate data and plot

N = 300

x = np.linspace(0, 7*np.pi, N)

smooth = 1 + 0.5*np.sin(x)

y = smooth + 0.2*np.random.randn(N)

plt.plot(x, y)

plt.plot(x, smooth)
plt.xlabel("x")

plt.ylabel("y")

plt.ylim(0,2)

plt.show()

# Train-test split, intentionally use shuffle=False

X = x.reshape(-1,1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, shuffle=False)

# Create two models: Polynomial and linear regression

degree = 2

polyreg = make_pipeline(Polynomial Features(degree), LinearRegression(fit_intercept=False))

linreg = LinearRegression()

# Cross-validation

scoring = "neg_root_mean_squared_error"

polyscores = cross_validate(polyreg, X_train, y_train, scoring=scoring, return_estimator=True)

linscores = cross_validate(linreg, X_train, y_train, scoring=scoring, return_estimator=True)

# Which one is better? Linear and polynomial

print("Linear regression score:", linscores["test_score"].mean())

print("Polynomial regression score:", polyscores["test_score"].mean())

print("Difference:", linscores["test_score"].mean() - polyscores["test_score"].mean())

print("Coefficients of polynomial regression and linear regression:")

# Let's show the coefficient of the last fitted polynomial regression

# This starts from the constant term and in ascending order of powers

print(polyscores["estimator"][0].steps[1][1].coef_)

# And show the coefficient of the last-fitted linear regression

print(linscores["estimator"][0].intercept_, linscores["estimator"][-1].coef_)

# Plot and compare

plt.plot(x, y)

plt.plot(x, smooth)

plt.plot(x, polyscores["estimator"][0].predict(X))

plt.plot(x, linscores["estimator"][0].predict(X))

plt.ylim(0,2)

plt.xlabel("x")

plt.ylabel("y")

plt.show()

# Retrain the model and evaluate

import sklearn

linreg = sklearn.base.clone(linreg)

linreg.fit(X_train, y_train)

print("Test set RMSE:", mean_squared_error(y_test, linreg.predict(X_test), squared=False))

print("Mean validation RMSE:", -linscores["test_score"].mean()).

OUTPUT:
PROGRAM 6:

IMPLEMENT A BINARY CLASSIFICATION MODEL

AIM:

Implement a binary classification model. Binary question such as "Are houses in this neighborhood above a

certain price?

DESCRIPTION:

Implement a binary classification model. Binary question such as "Are houses in this neighborhood above a

certain price?"(use data from exercise 1). Modify the classification threshold and determine how that

modification influences the model. Experiment with different classification metrics to determine your

model's effectiveness.

SOURCE CODE:

train_df = pd.read_csv("https://download.mlcc.google.com/mledu datasets/california_housing_train.csv")

test_df = pd.read_csv("https://download.mlcc.google.com/mledu-datasets/california_housing_test.csv")

train_df = train_df.reindex(np.random.permutation(train_df.index))

# shuffle the training set

threshold = 265000 # This is the 75th percentile for median house values.

train_df_norm["median_house_value_is_high"] = ? Your code here

test_df_norm["median_house_value_is_high"] = ? Your code here

# Print out a few example cells from the beginning and

# middle of the training set, just to make sure that

# your code created only 0s and 1s in the newly created

# median_house_value_is_high column

train_df_norm["median_house_value_is_high"].head(8000)

inputs = {

# Features used to train the model on.

'median_income': tf.keras.Input(shape=(1,)),

'total_rooms': tf.keras.Input(shape=(1,))

# The following variables are the hyperparameters.

learning_rate = 0.001

epochs = 20

batch_size = 100

classification_threshold = 0.35

label_name = "median_house_value_is_high"

# Modify the following definition of METRICS to generate

# not only accuracy and precision, but also recall:

METRICS = [
tf.keras.metrics.BinaryAccuracy(name='accuracy',

threshold=classification_threshold),
tf.keras.metrics.Precision(thresholds=classification_threshold,

name='precision'
),
? # write code here
]

# Establish the model's topography.

my_model = create_model(inputs, learning_rate, METRICS)

# Train the model on the training set.

epochs, hist = train_model(my_model, train_df_norm, epochs,

label_name, batch_size)
# Plot metrics vs. Epochs
list_of_metrics_to_plot = ['accuracy', 'precision', 'recall']

plot_curve(epochs, hist, list_of_metrics_to_plot)

OUTPUT:
PROGRAM 7:

IMPLEMENT THE FINITE WORDS CLASSIFICATION SYSTEM USING

BACKPROPAGATIONALGORITHM

AIM:

To implement the finite words classification system using Back-propagation algorithm.

DESCRIPTION:

What is back propagation?

We can define the back propagation algorithm as an algorithm that trains some given feed forward Neural

Network for a given input pattern where the classifications are known to us. At the point when every

passage of the example set is exhibited to the network, the network looks at its yield reaction to the

example input pattern. After that, the comparison done between output response and expected output with

the error value is measured. Later, we adjust the connection weight based upon the error value measured.

It was first introduced in the 1960s and 30 years later it was popularized by David Rumelhart,

Geoffrey Hinton, and Ronald Williams in the famous 1986 paper. In this paper, they spoke about the

various neural networks. Today, back propagation is doing good. Neural network training happens through

back propagation. By this approach, we fine-tune the weights of a neural net based on the error rate

obtained in the previous run. The right manner of applying this technique reduces error rates and makes

the model more reliable. Back propagation is used to train the neural network of the chain rule method. In

simple terms, after each feed-forward passes through a network, this algorithm does the backward pass to

adjust the model’s parameters based on weights and biases. A typical supervised learning algorithm

attempts to find a function that maps input data to the right output. Back propagation works with a multi-

layered neural network and learns internal representations of input to output mapping.

The Back propagation algorithm is a supervised learning method for multi layer feed forward

networks from the field of Artificial Neural Networks.

Feed-forward neural networks are inspired by the information processing of one or more neural
cells, called a neuron. A neuron accepts input signals via its dendrites, which pass the electrical signal

down to the cell body. The axon carries the signal out to synapses, which are the connections of a cell’s

axon to other cell’s dendrites.

The principle of the back propagation approach is to model a given function by modifying internal

weightings of input signals to produce an expected output signal. The system is trained using a supervised

learning method, where the error between the system’s output and a known expected output is presented to

the system and used to modify its internal state.

Technically, the back propagation algorithm is a method for training the weights in a multilayer

feed-forward neural network. As such, it requires a network structure to be defined of one or more layers

where one layer is fully connected to the next layer. A standard network structure is one input layer, one

hidden layer, and one output layer.Back propagation can be used for both classification

and regression problems.

In classification problems, best results are achieved when the network has one neuron in the output

layer for each class value. For example, a 2-class or binary classification problem with the class values of

A and B. These expected outputs would have to be transformed into binary vectors with one column for

each class value. Such as [1, 0] and [0, 1] for A and B respectively. This is called a one hot encoding.

How does back propagation work?

Let us take a look at how back propagation works. It has four layers: input layer, hidden layer,

hidden layer II and final output layer. So, the main three layers are:

1. Input layer

2. Hidden layer

3. Output layer.

Each layer has its own way of working and its own way to take action such that we are able to get the

desired results and correlate these scenarios to our conditions. Let us discuss other details needed to help

summarizing this algorithm.

SOURCE CODE:

import pandas as pd

fromsklearn.model_selection import train_test_split

fromsklearn.feature_extraction.text import

CountVectorizer from sklearn.neural_network

import MLPClassifier

fromsklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score

msg = pd.read_csv('document.csv',

names=['message', 'label']) print("Total Instances of

Dataset: ", msg.shape[0]) msg['labelnum'] =

msg.label.map({'pos': 1, 'neg': 0})

msg.me

ssage y

msg.lab

Elnum

Xtrain, Xtest, ytrain, ytest = train_test_split(X, y)

count_v = CountVectorizer()

Xtrain_dm =

count_v.fit_transform(Xtrain)

Xtest_dm =

count_v.transform(Xtest)

df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names())
clf = MLPClassifier(solver='lbfgs', alpha=1e-5,hidden_layer_sizes=(5, 2),

random_state=1)

clf.fit(Xtrain_dm, ytrain) pred = clf.predict(Xtest_dm)

print('Accuracy Metrics:')

print('Accuracy: ', accuracy_score(ytest, pred)) print('Recall: ', recall_score(ytest, pred))

print('Precision: ', precision_score(ytest, pred))

print('Confusion Matrix: \n', confusion_matrix(ytest, pred)),

document.csv:

I love this sandwich

,pos This is

an amazingplace,

pos KG CO

I feel very good about these

beers,pos This is my best

work,pos

What an awesome view,pos

I do not like this

restaurant,neg I am

tired of this stuff,neg

I can't deal with this,neg

He is my sworn enemy,ne

My boss is horrible,neg

This is an awesome place,pos

I do not like the taste of this

juice,neg I love to dance,pos

I am sick and tired

of this place,neg

What a great holiday,pos

That is a bad locality to stay

,neg We will have good fun

tomorrow,pos I went to my

enemy's house today,neg

OUTPUT:

Total Instances of Dataset: 18

Accuracy Metrics:

Accuracy: 0.8

Recall: 1.0

Precisio n: 0.75

Confusi on Matrix:

[[1 1]

[0 3]}
VIVA QUESTIONS:

What is machine learning?

Define supervised learning?

Define unsupervised learning?

Define semi supervised learning?

Define reinforcement learning?

What do you mean by hypotheses is classification?

What is clustering?

Define precision, accuracy and recall 10.Define entropy?

Define regression?

How Knn is different from k-means clustering?

What is concept learning?

Define specific boundary and general boundary?

Define target function 16.Define decision tree?

What is ANN?

Explain gradient descent approximation?

State Bayes theorem?

Define Bayesian belief networks?

Differentiate hard and soft clustering?

Define variance 23. What is inductive machine learning?

Why K nearest neighbour algorithm is lazy learning algorithm?

Why naïve Bayes is naïve?

Mention classification algorithms?

Define pruning 28.Differentiate Clustering and classification?

Mention clustering algorithms and Define Bias?

CP4252 Machine Learning Lab Manual
100% (2)
CP4252 Machine Learning Lab Manual
33 pages
Cp4252-Machine Learning Lab Manual 23-24
No ratings yet
Cp4252-Machine Learning Lab Manual 23-24
28 pages
M.Tech Machine Learning Lab Exam 2024
No ratings yet
M.Tech Machine Learning Lab Exam 2024
1 page
Gaussian Mixture Model Parameters
No ratings yet
Gaussian Mixture Model Parameters
24 pages
Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
If4071 - Deep Learnig - Me - Iii Sem
No ratings yet
If4071 - Deep Learnig - Me - Iii Sem
1 page
CS3591 Computer Networks Lab Manual Finalized
No ratings yet
CS3591 Computer Networks Lab Manual Finalized
67 pages
M.E. Cse (Ai&ml)
No ratings yet
M.E. Cse (Ai&ml)
63 pages
CS3491 AIML Question Set
No ratings yet
CS3491 AIML Question Set
2 pages
Combining Classifiers in Machine Learning An Introductory Guide
No ratings yet
Combining Classifiers in Machine Learning An Introductory Guide
11 pages
Advanced C Programming Guide
No ratings yet
Advanced C Programming Guide
26 pages
Applied Probability and Statistics For Computer Science Engineers
No ratings yet
Applied Probability and Statistics For Computer Science Engineers
1 page
DM DT Solved Example 02 - Unlocked
No ratings yet
DM DT Solved Example 02 - Unlocked
3 pages
CS3361 Set2
No ratings yet
CS3361 Set2
6 pages
DL Question Bank
No ratings yet
DL Question Bank
23 pages
ML Unit 2
No ratings yet
ML Unit 2
11 pages
DAN Lab ManuaL
No ratings yet
DAN Lab ManuaL
53 pages
Data Mining Lab Manual for GTU
No ratings yet
Data Mining Lab Manual for GTU
52 pages
If4071 Deep Learning
No ratings yet
If4071 Deep Learning
1 page
Unit 3 PPT Ai
No ratings yet
Unit 3 PPT Ai
93 pages
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
No ratings yet
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
7 pages
CS3301 Datastructure QN Paper Apr-May
No ratings yet
CS3301 Datastructure QN Paper Apr-May
2 pages
AML - Theory - Syllabus - Chandigarh University
No ratings yet
AML - Theory - Syllabus - Chandigarh University
4 pages
CS6456-Object Oriented Programming
No ratings yet
CS6456-Object Oriented Programming
15 pages
Unit 3
No ratings yet
Unit 3
25 pages
Unit I
No ratings yet
Unit I
41 pages
Pattern Recognition and Anomaly Detection Lab
No ratings yet
Pattern Recognition and Anomaly Detection Lab
3 pages
CS 3 - Problem Solving Agent
No ratings yet
CS 3 - Problem Solving Agent
80 pages
ML Question Bank
No ratings yet
ML Question Bank
7 pages
BigData Mining and Analytics
No ratings yet
BigData Mining and Analytics
2 pages
Game Theory and AI Algorithms Explained
No ratings yet
Game Theory and AI Algorithms Explained
24 pages
ML Lab Manual for CS Students
No ratings yet
ML Lab Manual for CS Students
62 pages
AID 4th Semester Machine Learning Laboratory - Lab Manual
No ratings yet
AID 4th Semester Machine Learning Laboratory - Lab Manual
56 pages
Unit-3-Second Chapter
No ratings yet
Unit-3-Second Chapter
9 pages
HCI Designer Career Exploration Guide
100% (1)
HCI Designer Career Exploration Guide
2 pages
Concept Learning in Machine Learning
No ratings yet
Concept Learning in Machine Learning
71 pages
Cs3452 Theory of Computation
No ratings yet
Cs3452 Theory of Computation
43 pages
18AI61
No ratings yet
18AI61
3 pages
Lab Program
100% (1)
Lab Program
15 pages
CS3591 CN Lab Manual R2021
100% (1)
CS3591 CN Lab Manual R2021
51 pages
DDM Lab Manual
No ratings yet
DDM Lab Manual
55 pages
Database Design Lab Record 2023-24
No ratings yet
Database Design Lab Record 2023-24
99 pages
Data Structures Design - AD3251 - Important Questions With Answer - Unit 1 - Abstract Data Types
No ratings yet
Data Structures Design - AD3251 - Important Questions With Answer - Unit 1 - Abstract Data Types
15 pages
Applications of Bayes' Theorem in AI
No ratings yet
Applications of Bayes' Theorem in AI
21 pages
NNDL Unit 3: Deep Learning Overview
No ratings yet
NNDL Unit 3: Deep Learning Overview
17 pages
Database Design and E-R Modeling Guide
No ratings yet
Database Design and E-R Modeling Guide
76 pages
NNDL Important Questions
0% (1)
NNDL Important Questions
5 pages
MACHINE LEARNING Important Questions
100% (1)
MACHINE LEARNING Important Questions
2 pages
Deep Learning Lab Exam Tasks
No ratings yet
Deep Learning Lab Exam Tasks
2 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
32 pages
JSP Program
No ratings yet
JSP Program
10 pages
Unit 5 1
No ratings yet
Unit 5 1
18 pages
Smooth N-Gram
No ratings yet
Smooth N-Gram
2 pages
Bcs054 Object Oriented System Design With C
0% (1)
Bcs054 Object Oriented System Design With C
2 pages
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow
No ratings yet
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow
85 pages
Understanding Cost Function & Gradient Descent
No ratings yet
Understanding Cost Function & Gradient Descent
142 pages
Uniform-Cost vs Breadth-First Search
No ratings yet
Uniform-Cost vs Breadth-First Search
3 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
5 pages
M.E. Bda 2021
No ratings yet
M.E. Bda 2021
64 pages
Cp4252 Machine Learning Lab Manual
No ratings yet
Cp4252 Machine Learning Lab Manual
40 pages
IDEA
No ratings yet
IDEA
3 pages
Fundamentals of Artificial Neural Networks
No ratings yet
Fundamentals of Artificial Neural Networks
7 pages
Adaptive Dynamic Programming With Applications in Optimal Control 1st Edition Derong Liu - Quickly Download The Ebook To Read Anytime, Anywhere
100% (2)
Adaptive Dynamic Programming With Applications in Optimal Control 1st Edition Derong Liu - Quickly Download The Ebook To Read Anytime, Anywhere
59 pages
ECON 481 Final Exam Analysis
No ratings yet
ECON 481 Final Exam Analysis
4 pages
Difference Between Bisection and False Position
No ratings yet
Difference Between Bisection and False Position
2 pages
Math 1 Assignment: Derivatives, Integrals, Systems
No ratings yet
Math 1 Assignment: Derivatives, Integrals, Systems
3 pages
It 2201 - Data Structures and Algorithms 2010
No ratings yet
It 2201 - Data Structures and Algorithms 2010
2 pages
An Introduction To Programming The Winograd Fourier Transform Algorithm
100% (1)
An Introduction To Programming The Winograd Fourier Transform Algorithm
14 pages
STAT 231 Problem Set 2 Solutions
No ratings yet
STAT 231 Problem Set 2 Solutions
9 pages
Discrete Probability Distributions
No ratings yet
Discrete Probability Distributions
29 pages
GWO Optimization for PID Controller
No ratings yet
GWO Optimization for PID Controller
5 pages
CPM PERT Lecture Slides - 04
No ratings yet
CPM PERT Lecture Slides - 04
26 pages
Neural Network Learning Rules Explained
No ratings yet
Neural Network Learning Rules Explained
11 pages
Airline Stewardess Hiring Optimization
No ratings yet
Airline Stewardess Hiring Optimization
5 pages
Mathematical Test Sample
No ratings yet
Mathematical Test Sample
12 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
Grokking Algorithms. 2nd Edition Aditya Y. Bhargava. ebook full reference edition
100% (2)
Grokking Algorithms. 2nd Edition Aditya Y. Bhargava. ebook full reference edition
45 pages
Z Transform and Inverse Z & Pole Zero
No ratings yet
Z Transform and Inverse Z & Pole Zero
5 pages
How To Estimate Long-Run Relationships in Economics
No ratings yet
How To Estimate Long-Run Relationships in Economics
13 pages
Benefits of Propensity Score Matching
No ratings yet
Benefits of Propensity Score Matching
84 pages
Lect 2 Common Architectural Principles of Deep Networks
No ratings yet
Lect 2 Common Architectural Principles of Deep Networks
20 pages
Cellcrypt Mobile Android A4 V2.3
No ratings yet
Cellcrypt Mobile Android A4 V2.3
2 pages
Hazard Function
No ratings yet
Hazard Function
15 pages
Taylor Schermers Unit 3 Meadows or Malls Report
No ratings yet
Taylor Schermers Unit 3 Meadows or Malls Report
7 pages
Ine 425-S01 - 02024 - en 2
No ratings yet
Ine 425-S01 - 02024 - en 2
3 pages
Bayesian Games: 1: Definition and Equilibrium
No ratings yet
Bayesian Games: 1: Definition and Equilibrium
20 pages
Automatic Control For Mechanical Engineers
No ratings yet
Automatic Control For Mechanical Engineers
176 pages
P16mba7 1
No ratings yet
P16mba7 1
4 pages
CS p3 Chap 6 (Artifcial Intelligence) 2024 May June To 2021 Note Plus Past Papers
No ratings yet
CS p3 Chap 6 (Artifcial Intelligence) 2024 May June To 2021 Note Plus Past Papers
56 pages
CHEN20051 Chemical Eng. Optimisation Exam Paper - Final
No ratings yet
CHEN20051 Chemical Eng. Optimisation Exam Paper - Final
5 pages