0% found this document useful (0 votes)

26 views9 pages

Text Classification

Text classification is the process of assigning predefined categories to textual documents, with applications including spam filtering and sentiment analysis. The workflow involves data preparation, text normalization, feature extraction, model training, and evaluation using metrics like accuracy and F1 score. Various algorithms such as Multinomial Naïve Bayes and Support Vector Machines are used for classification.

Uploaded by

harrypoter

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views9 pages

Text Classification

Uploaded by

harrypoter

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

05/04/2024

MDAN 54233
Outline

• What is text classification?

• Applications
• Types of classification
• Steps of text classification

Text Classification
Supunmali Ahangama

MDAN 54233 2

1 2

Text Classification Applications

• Also known as document classification. • News articles categorization

– A document is represented as textual data such as sentences or • Spam filtering
paragraphs belonging to the English language
• Music or movie genre categorization
• Text classification is assigning text documents into one or
more classes or categories, assuming that there is a • Customer support request categorization
predefined set of classes. • Sentiment analysis
• Language detection

MDAN 54233 3 MDAN 54233 4

3 4
05/04/2024

News articles categorization Spam classification

MDAN 54233 5 https://www.datacamp.com/tutorial/text-classification-python MDAN 54233 6

5 6

Categorize customer support requests Two major types

• Content based classification

– Analyzing the actual content of the information to determine its
category or class.
– E.g., Spam/Ham by analyzing the words and phrases in emails

• Request based classification

– Classifying based on the requests or queries made by users.

https://www.datacamp.com/tutorial/text-classification-python MDAN 54233 7 MDAN 54233 8

7 8
05/04/2024

Automated Text Classification Supervised learning

• Learning Algorithms • Classification:

– Supervised machine learning – the outcomes to be predicted are distinct categories (outcome
– Unsupervised machine learning variable is a categorical variable).
– Semi-supervised learning • Regression
– Reinforcement learning – the outcome to be predicted is a continuous numeric variable.

MDAN 54233 9 MDAN 54233 10

9 10

I am happy as I
am doing the
Text Analytics Positive: 1
Module

MDAN 54233 11 MDAN 54233 12

11 12
05/04/2024

Types of Classification

Based on the number of classes that can be predicted on

any data point:
• Binary classification
• Multi-class classification (multinomial classification)
• Multi-label classification

MDAN 54233 13 https://www.datacamp.com/tutorial/text-classification-python MDAN 54233 14

13 14

Text Classification - Workflow

1. Prepare train and test datasets

2. Text normalization
3. Feature extraction
4. Model training
5. Model prediction and evaluation
6. Model deployment

MDAN 54233 15 MDAN 54233 16

15 16
05/04/2024

Text Normalization Feature Extraction

Some of the commonly used steps • In ML terminology, features are unique, measurable
• Expanding contractions attributes (properties) for each data point (observation) in
a dataset.
• Text standardization through lemmatization
• Features are usually numeric in nature and can be absolute
• Removing special characters and symbols
numeric values or categorical features encoded as binary
• Removing stopwords features.

Refer the previous lesson “Pre-processing” for more details.

MDAN 54233 17 MDAN 54233 18

17 18

Feature Extraction Techniques Vocabulary

• Bag of Words model Documents: I am happy as I am following this course

[doc_1, doc_2, ….., doc_m] ….
• TF-IDF model ….
I hated the book
• Advanced word vectorization models

V = [I, am, happy, as, following, this, course, ….., hated, the, book]
Refer the previous lesson “Text Vectorization” for more
details.

MDAN 54233 19 MDAN 54233 20

19 20
05/04/2024

Feature Extraction Feature Extraction

Binary code based Absolute term
I am happy as I am following this course on availability I am happy as I am following this course frequency.

[I, am, happy, as, following, this, course, ….., hated, the, book] [I, am, happy, as, following, this, course, ….., hated, the, book]

[1, 1, 1, 1, 1, 1, 1, ….., 0, 0 0] [2, 2, 1, 1, 1, 1, 1, ….., 0, 0 0]

• A lot of zeros. It is known as Sparse representation. • A lot of zeros. It is known as Sparse representation.

MDAN 54233 21 MDAN 54233 22

21 22

Positive Negative
I am happy as I am following this course I am sad as I am not good in this course
Feature Extraction with Frequencies - Example I am happy I am sad

Vocabulary PosFreq(1) NegFreq(0)

I 3 3
am 3 3
Positive Negative
happy 2 0
I am happy as I am following this I am sad, I am not good in coding
as 1 1
course I am sad
following 1 0
I am happy
this 1 1
course 1 1
sad 0 2
not 0 1
good 0 1 Get the frequency for the each
in 0 1 distinct word.

MDAN 54233 23 MDAN 54233 24

23 24
05/04/2024

Vocabulary PosFreq(1)
I am happy as I am following this course
I 3
am 3
happy 2
as 1
following 1 12
this 1
course 1
sad 0
not 0
good 0
in 0
MDAN 54233 25 MDAN 54233 26

25 26

Vocabulary NegFreq(0) I am sad as I am not good in this course Vocabulary PosFreq(1) NegFreq(0)
I am happy as I am following this course
I 3 I 3 3
am 3 am 3 3
happy 0 happy 2 0
as 1 as 1 1
following 0 following 1 0
this 1 13 this 1 1
12 9
course 1 course 1 1

sad 2 sad 0 2

not 1 not 0 1

good 1 good 0 1 feature vector is [1,12, 9]

in 1 in 0 1

MDAN 54233 27 MDAN 54233 28

27 28
05/04/2024

Classification Algorithms Processes in Classification

There are many algorithms. The algorithms that would be • Training

covered in this lesson are: • Evaluation (test)
• Multinomial Naïve Bayes – Evaluate performance with the Ground truth (actual class labels)

• Support vector machines • Tuning (hyperparameter tuning or optimization)

• Logistic regression
• Random forest

MDAN 54233 29 MDAN 54233 30

29 30

Evaluating Classification Models Evaluation

• To check how well these models are performing

• E.g.
– Accuracy
– Precision
– Recall
– F1 score

MDAN 54233 31 MDAN 54233 32

31 32
05/04/2024

Activity Summary

• Calculate the accuracy, Recall, Precision, F Score • Text classification, is a natural language processing task that involves
assigning predefined categories or labels to textual documents.
• In data preprocessing, the raw text data is cleaned, normalized, and
transformed into a numerical format suitable for machine learning
algorithms.
• Feature extraction involves selecting relevant features that can represent
the documents effectively.
• Model selection is the process of choosing the best algorithm or approach
to classify the text data.
• Finally, evaluation is done to measure the performance of the classification
model using various metrics such as accuracy, precision, recall, and F1-
score.
MDAN 54233 33 MDAN 54233 34

33 34

Neural Networks for Classification Tasks
No ratings yet
Neural Networks for Classification Tasks
8 pages
L2 Cse256 Fa24 TC
No ratings yet
L2 Cse256 Fa24 TC
65 pages
Text Classification Techniques Explained
No ratings yet
Text Classification Techniques Explained
19 pages
Text Classification with Machine Learning
No ratings yet
Text Classification with Machine Learning
9 pages
Introduction to Classification in ML
No ratings yet
Introduction to Classification in ML
31 pages
ML - Mod2 Classification
No ratings yet
ML - Mod2 Classification
74 pages
Machine Learning Classification Guide
No ratings yet
Machine Learning Classification Guide
28 pages
Mla Unit-5'2
No ratings yet
Mla Unit-5'2
74 pages
Classification FoundationalMathofAI S24
No ratings yet
Classification FoundationalMathofAI S24
6 pages
ITD253 L6 TextClassificationClustering
No ratings yet
ITD253 L6 TextClassificationClustering
39 pages
A Study On Document Classification Using Machine Learning Techniques
No ratings yet
A Study On Document Classification Using Machine Learning Techniques
6 pages
Poetry Classification by Poet Analysis
No ratings yet
Poetry Classification by Poet Analysis
3 pages
Text Classification Research Paper 2
No ratings yet
Text Classification Research Paper 2
7 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
100 pages
Efficient Text Classification with ML
No ratings yet
Efficient Text Classification with ML
8 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
39 pages
Machine Learning Feature Engineering
No ratings yet
Machine Learning Feature Engineering
94 pages
Visual Recognition
No ratings yet
Visual Recognition
123 pages
Chapter Veera 6
No ratings yet
Chapter Veera 6
4 pages
Classification Techniques in Machine Learning
No ratings yet
Classification Techniques in Machine Learning
5 pages
Data Classification Methods Overview
No ratings yet
Data Classification Methods Overview
2 pages
Machine Learning Course Syllabus Overview
No ratings yet
Machine Learning Course Syllabus Overview
118 pages
Text Classification and Rocchio Algorithm
No ratings yet
Text Classification and Rocchio Algorithm
32 pages
K-Nearest Neighbors Overview
No ratings yet
K-Nearest Neighbors Overview
31 pages
Unit 2 Classification
No ratings yet
Unit 2 Classification
59 pages
Module 1-3
No ratings yet
Module 1-3
63 pages
Naïve Bayes for CS Students
No ratings yet
Naïve Bayes for CS Students
55 pages
Classification
No ratings yet
Classification
21 pages
NLP Classifier Evaluation Metrics Guide
No ratings yet
NLP Classifier Evaluation Metrics Guide
146 pages
Recent Trends in One-Class Classification
No ratings yet
Recent Trends in One-Class Classification
10 pages
Classification Algorithms in Machine Learning
No ratings yet
Classification Algorithms in Machine Learning
76 pages
NLP Module 3
No ratings yet
NLP Module 3
66 pages
Classification in Machine Learning
No ratings yet
Classification in Machine Learning
4 pages
Understanding Classification in Machine Learning
No ratings yet
Understanding Classification in Machine Learning
66 pages
Classification and Regression Overview
No ratings yet
Classification and Regression Overview
26 pages
Document Classification with Machine Learning
No ratings yet
Document Classification with Machine Learning
17 pages
Understanding Classification Algorithms
No ratings yet
Understanding Classification Algorithms
131 pages
UNIT5
No ratings yet
UNIT5
23 pages
Binary Classifier Training and Evaluation
No ratings yet
Binary Classifier Training and Evaluation
151 pages
Predictive Analytics: Classification Basics
No ratings yet
Predictive Analytics: Classification Basics
28 pages
Understanding Machine Learning Classification
No ratings yet
Understanding Machine Learning Classification
13 pages
ML 4
No ratings yet
ML 4
32 pages
6.data Mining - Classification
No ratings yet
6.data Mining - Classification
37 pages
Chapter 4. Classification Algorithms-Stud
No ratings yet
Chapter 4. Classification Algorithms-Stud
43 pages
Machine Learning: Text Classification Guide
No ratings yet
Machine Learning: Text Classification Guide
40 pages
Understanding Confusion Matrix Metrics
No ratings yet
Understanding Confusion Matrix Metrics
21 pages
Classification Algorithms - Unit III P3
No ratings yet
Classification Algorithms - Unit III P3
10 pages
Data Science
No ratings yet
Data Science
25 pages
Text Classification Techniques Overview
No ratings yet
Text Classification Techniques Overview
3 pages
Category AI Model
No ratings yet
Category AI Model
7 pages
Multiclass Classification Survey
No ratings yet
Multiclass Classification Survey
9 pages
144-Statistical Analysis of Imbalanced Classification With Training Size Variation and Subsampling On Datasets of Research Papers in Biomedical Literature
No ratings yet
144-Statistical Analysis of Imbalanced Classification With Training Size Variation and Subsampling On Datasets of Research Papers in Biomedical Literature
26 pages
T4 - Image Classification
No ratings yet
T4 - Image Classification
92 pages
UNIT-4 Information Retrieval Notes
No ratings yet
UNIT-4 Information Retrieval Notes
16 pages
Text Classification
No ratings yet
Text Classification
7 pages
Detecting Depression Posts on Reddit
100% (1)
Detecting Depression Posts on Reddit
37 pages
Automated Health Document Classification
No ratings yet
Automated Health Document Classification
44 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
6 pages
Understanding Agents: Sensors and Effects
No ratings yet
Understanding Agents: Sensors and Effects
8 pages
Keras Guide for Deep Learning Enthusiasts
No ratings yet
Keras Guide for Deep Learning Enthusiasts
1 page
ML, DL, DS
No ratings yet
ML, DL, DS
23 pages
Explainable Time Series Prediction of f1 Tyres
No ratings yet
Explainable Time Series Prediction of f1 Tyres
9 pages
BERT-Based Topic Modeling Guide
No ratings yet
BERT-Based Topic Modeling Guide
9 pages
AI - ML Beginner-Friendly Resources For Cs
No ratings yet
AI - ML Beginner-Friendly Resources For Cs
9 pages
Solving 3D Bin Packing Problem Via Multimodal
No ratings yet
Solving 3D Bin Packing Problem Via Multimodal
3 pages
Microsoft Azure Generative AI Course
No ratings yet
Microsoft Azure Generative AI Course
4 pages
Unit 1
No ratings yet
Unit 1
95 pages
Data Science & Machine Learning Guide
No ratings yet
Data Science & Machine Learning Guide
30 pages
Cis262 HMM
No ratings yet
Cis262 HMM
34 pages
CS60050 Machine Learning Assignment 3
No ratings yet
CS60050 Machine Learning Assignment 3
5 pages
UNIT-3 (Gen AI)
No ratings yet
UNIT-3 (Gen AI)
21 pages
DeepMind Whitepaper
No ratings yet
DeepMind Whitepaper
9 pages
AI & Machine Learning Postgrad Program
No ratings yet
AI & Machine Learning Postgrad Program
30 pages
A Guide To GenerativeAI (GAI) and Large Language Models (LLMS)
No ratings yet
A Guide To GenerativeAI (GAI) and Large Language Models (LLMS)
14 pages
BENSALAH Nouhaila, AYAD Habib, ADIB Abdellah and IBN EL FAROUK Abdelhamid+
No ratings yet
BENSALAH Nouhaila, AYAD Habib, ADIB Abdellah and IBN EL FAROUK Abdelhamid+
2 pages
Understanding Artificial Intelligence Basics
No ratings yet
Understanding Artificial Intelligence Basics
9 pages
MLP and Backpropagation Overview
No ratings yet
MLP and Backpropagation Overview
22 pages
Effective Attention Modeling For Neural Relation Extraction
No ratings yet
Effective Attention Modeling For Neural Relation Extraction
10 pages
Nidhish Wakodikar: AI & Data Science Skills
No ratings yet
Nidhish Wakodikar: AI & Data Science Skills
1 page
Forex Trading Using Neural Network Filters
No ratings yet
Forex Trading Using Neural Network Filters
4 pages
AI CH-Natural Language Processing Class 10th A BY HITESH YADAV
No ratings yet
AI CH-Natural Language Processing Class 10th A BY HITESH YADAV
5 pages
ChatGPT Principles and Architecture (Ge Cheng) (Z-Library)
No ratings yet
ChatGPT Principles and Architecture (Ge Cheng) (Z-Library)
502 pages
Deep Learning Program for Data Science
No ratings yet
Deep Learning Program for Data Science
9 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
7 pages
Unit4 Deep Learning
No ratings yet
Unit4 Deep Learning
6 pages
Deep Learning Exam Questions and Topics
No ratings yet
Deep Learning Exam Questions and Topics
1 page
Adaline/Madaline
100% (8)
Adaline/Madaline
38 pages
Yolo
No ratings yet
Yolo
38 pages