0% found this document useful (0 votes)
31 views5 pages

Workshop - NLP - Ipynb - Colaboratory

The document shows how to build a sentiment analysis model using scikit-learn and NLP techniques like tokenization, lemmatization and CountVectorizer. It loads data, encodes labels, trains a random forest classifier on the transformed text data and evaluates the model on test data.

Uploaded by

andy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views5 pages

Workshop - NLP - Ipynb - Colaboratory

The document shows how to build a sentiment analysis model using scikit-learn and NLP techniques like tokenization, lemmatization and CountVectorizer. It loads data, encodes labels, trains a random forest classifier on the transformed text data and evaluates the model on test data.

Uploaded by

andy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

#import all necessary libraries

!pip install scikit-plot


from scikitplot.metrics import plot_confusion_matrix
import torch
import pandas as pd
import seaborn as sns
import re
import nltk
nltk.download('wordnet')
nltk.download('stopwords')
nltk.download('omw-1.4')
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score,precision_score,recall_score,confusion_matrix,c

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/p


Collecting scikit-plot
Downloading scikit_plot-0.3.7-py3-none-any.whl (33 kB)
Requirement already satisfied: scipy>=0.9 in /usr/local/lib/python3.7/dist-packages (
Requirement already satisfied: matplotlib>=1.4.0 in /usr/local/lib/python3.7/dist-pac
Requirement already satisfied: joblib>=0.10 in /usr/local/lib/python3.7/dist-packages
Requirement already satisfied: scikit-learn>=0.18 in /usr/local/lib/python3.7/dist-pa
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local
Requirement already satisfied: numpy>=1.11 in /usr/local/lib/python3.7/dist-packages
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-pac
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-pac
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (fr
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-
Installing collected packages: scikit-plot
Successfully installed scikit-plot-0.3.7
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data] Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


device

device(type='cuda')

df = pd.read_csv("/content/train.txt",delimiter=';',names=['text','label'])
df
text label

0 i didnt feel humiliated sadness

1 i can go from feeling so hopeless to so damned... sadness

2 im grabbing a minute to post i feel greedy wrong anger

3 i am ever feeling nostalgic about the fireplac... love

4 i am feeling grouchy anger

... ... ...

17995 im having ssa examination tomorrow in the morn... sadness

17996 i constantly worry about their fight against n... joy

17997 i feel its important to share this info for th... joy
sns.countplot(df['label'])
17998 i truly feel that if you are passionate enough... joy
# sns.countplot(df.label)
17999 i feel like i just wanna buy any cute make up ... joy
/usr/local/lib/python3.7/dist-packages/seaborn/_decorators.py:43: FutureWarning: Pass
18000 rows × 2 columns
FutureWarning
<matplotlib.axes._subplots.AxesSubplot at 0x7f5baa2f20d0>

df.label.unique()

array(['sadness', 'anger', 'love', 'surprise', 'fear', 'joy'],


dtype=object)

# def custom_encoder(df):
# df = df.replace(to_replace ="surprise", value =1)
# df = df.replace(to_replace ="love", value =1)
# df = df.replace(to_replace ="joy", value =1)
# df = df.replace(to_replace ="fear", value =0)
# df = df.replace(to_replace ="anger", value =0)
# df = df.replace(to_replace ="sadness", value =0)
# return df

# df['label'] = custom_encoder(df['label'])
def custom_encoder(df):
df.replace(to_replace ="surprise", value =1, inplace=True)
df.replace(to_replace ="love", value =1, inplace=True)
df.replace(to_replace ="joy", value =1, inplace=True)
df.replace(to_replace ="fear", value =0, inplace=True)
df.replace(to_replace ="anger", value =0, inplace=True)
df.replace(to_replace ="sadness", value =0, inplace=True)

custom_encoder(df['label'])

sns.countplot(df.label)

/usr/local/lib/python3.7/dist-packages/seaborn/_decorators.py:43: FutureWarning: Pass


FutureWarning
<matplotlib.axes._subplots.AxesSubplot at 0x7f5baa656590>

lm = WordNetLemmatizer()

stops = set(stopwords.words('english'))
# stops

def text_transformation(df_col):
corpus = []
for item in df_col:
new_item = re.sub('[^a-zA-Z]',' ',str(item)) #get rid of symbols($, Rs.) and only
new_item = new_item.lower()
new_item = new_item.split()
new_item = [lm.lemmatize(word) for word in new_item if word not in stops]
corpus.append(' '.join(str(x) for x in new_item))
return corpus

corpus = text_transformation(df['text'])

cv = CountVectorizer(ngram_range=(1,2))
traindata = cv.fit_transform(corpus)
X = traindata
y = df.label

New speaker: Aditya Shah

rfc = RandomForestClassifier(max_features='auto',
max_depth=None,
n_estimators=500,
min_samples_split=5,
min_samples_leaf=1)
rfc.fit(X,y)

RandomForestClassifier(min_samples_split=5, n_estimators=500)

test_df = pd.read_csv('/content/test.txt',delimiter=';',names=['text','label'])

X_test,y_test = test_df.text,test_df.label
#encode the labels into two classes , 0 and 1
test_df = custom_encoder(y_test)
#pre-processing of text
test_corpus = text_transformation(X_test)
#convert text data into vectors
testdata = cv.transform(test_corpus)
#predict the target
predictions = rfc.predict(testdata)

plot_confusion_matrix(y_test,predictions)
acc_score = accuracy_score(y_test,predictions)
pre_score = precision_score(y_test,predictions)
rec_score = recall_score(y_test,predictions)
print('Accuracy_score: ',acc_score)
print('Precision_score: ',pre_score)
print('Recall_score: ',rec_score)
print("-"*50)
cr = classification_report(y_test,predictions)
print(cr)
Accuracy_score: 0.9615
Precision_score: 0.9616648411829135
Recall_score: 0.9543478260869566
--------------------------------------------------
precision recall f1-score support

0 0.96 0.97 0.96 1080


1 0.96 0.95 0.96 920

accuracy 0.96 2000


macro avg 0.96 0.96 0.96 2000
weighted avg 0.96 0.96 0.96 2000

def expression_check(prediction_input):
if prediction_input == 0:
print("Input statement has Negative Sentiment.")
elif prediction_input == 1:
print("Input statement has Positive Sentiment.")
else:
print("Invalid Statement.")

# function to take the input statement and perform the same transformations we did earlier
def sentiment_predictor(input):
input = text_transformation(input)
transformed_input = cv.transform(input)
prediction = rfc.predict(transformed_input)
expression_check(prediction)

input1 = ["Worst laptop I have ever seen"]


input2 = ["Synapse is good"]

sentiment_predictor(input1)
sentiment_predictor(input2)

Input statement has Negative Sentiment.


Input statement has Positive Sentiment.

You might also like