0% found this document useful (0 votes)
6 views6 pages

Project Paper Submission B21CS045

This paper examines the use of deep learning models, specifically LSTM and BERT, for classifying tweets as disaster-related or non-disaster-related to improve real-time disaster management. The study finds that the LSTM model outperforms BERT in terms of F1 score for short tweets, highlighting the effectiveness of NLP techniques in emergency response. A dataset of 7,613 social media posts was analyzed, demonstrating the potential for machine learning in monitoring and responding to disasters.

Uploaded by

bharadwaj.4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views6 pages

Project Paper Submission B21CS045

This paper examines the use of deep learning models, specifically LSTM and BERT, for classifying tweets as disaster-related or non-disaster-related to improve real-time disaster management. The study finds that the LSTM model outperforms BERT in terms of F1 score for short tweets, highlighting the effectiveness of NLP techniques in emergency response. A dataset of 7,613 social media posts was analyzed, demonstrating the potential for machine learning in monitoring and responding to disasters.

Uploaded by

bharadwaj.4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Social Media Integration for Real-Time Disaster

Management
Marri Bharadwaj Dr. Chandana N
Computer Science & Engineering (CSE) Center for Emerging Technologies for Sustainable Development (CETSD)
IIT Jodhpur IIT Jodhpur
[email protected] [email protected]

Abstract—Early identification of disaster-related events on how machine learning models can read human language in
social media channels like Twitter considerably improves emer- real-time and derive urgency and relevance from it.
gency response and risk reduction efforts. In this paper, I studied
and compared the performance of two deep learning models,
The larger rationale behind this work is driven by public
namely Long Short-Term Memory (LSTM) and Bidirectional good. An effective tweet classification model may serve as
Encoder Representations from Transformers (BERT), for car- a useful adjunct to aid emergency services, public health
rying out binary classification of tweets as disaster-related or agencies, and humanitarian organisations in monitoring un-
non-disaster-related. Our LSTM model utilises FastText embed- folding disasters, analysing their path, and coordinating urgent
dings with gated memory cells to make full use of sequential
information, and BERT relies on deep contextual embeddings
response to the disaster. It reflects the increased overlap of
enabled by attention. Despite the groundbreaking structure of technology and society where machine learning is increasingly
BERT, I noted that our LSTM model performs better than BERT vital for humanitarian assistance and risk reduction. [2]
for short tweets, with an F1 value of 0.79 compared to BERT’s
value of 0.74. Most of this performance advantage arises due
to the length of tweets being short, so that better training and II. DATA C OLLECTION
optimisation could be achieved by the LSTM model. The study
points out the possibility of using NLP-enabled deep learning In order to build and test sound disaster detection model
models for real-time monitoring of disasters, with scalability of
solutions amenable for deployment in emergency management utilising NLP approaches, I assemble a data set comprised
systems. of differing social media posts specifically related to natural
Index Terms—Disaster classification, LSTM, BERT, Natural and man-made disasters. The data included social media
Language Processing, Social Media Mining, Tweet Analysis, posts most often collected from platforms including Twitter,
Emergency Response, Deep Learning, Real-Time Detection, Fast- Facebook, and Reddit with the express goal of including as
Text.
much diversity in real-world context, informal language-styles,
and reporting behaviour as possible.
I. I NTRODUCTION The final data set comprises 7,613 posts, each of which
Disasters, both natural and human-made, threaten human is labeled as either disaster-related posts (target = 1) or non-
life, infrastructure, and environmental stability. Climate change disasters posts (target = 0). The posts covered a variety of
and urban vulnerabilities are increasing the frequency and disaster events, ranging from:
intensity of disasters, leading to a growing need to deliver and • Natural disasters: Earthquakes, hurricanes, floods, wild-
assess rapid detection, communication, and response mecha- fires, tsunamis, tornadoes, blizzards, volcanic eruption,
nisms. Conventional models for disaster management typically landslides, heat waves, droughts, and storms.
rely on slow messages and formal alerts that do not always • Man-made disasters: terrorist attacks, industrial acci-
communicate what is happening on the ground at the time. dents, an oil spill, radiation emergency, explosion, trans-
Social media, especially Twitter, has emerged as a dy- portation accidents.
namic space where a person shares live updates, personal
experiences, and requests for assistance. These types of user- The data set was collected in a CSV file and contained the
generated insights can act as signals of an impending crisis or following variables:
existing emergency situation. The challenge comes in filtering • Text: The post or tweet text.
through the vast quantity of unstructured text data to locate • Keyword: A relevant keyword extracted from the post
relevant, credible information. [1] (when applicable).
This project seeks to assess and demonstrate that Natural • Location: The location in which post was created (often
Language Processing (NLP) techniques can be used to classify partial or missing).
tweets as disaster related or not, thus exhibiting a way to • Target: A binary variable indicating if the post is about a
contribute to real-time disaster portal monitoring efforts. I used real disaster (1) or not (0) - only available in the training
deep learning models trained on annotated data to demonstrate data set.
Fig. 2. Frequency Histogram

Fig. 1. Class Distribution

A. Class Distribution
An initial analysis of the target labels revealed a fairly even
split throughout the dataset with around 43% of the posts
pointing to actual disasters and 57% pointing to non-disaster
events. This balance is beneficial for training machine learning
models as it minimises the likelihood of leaning towards any
particular category.
Figure-1 visualises the class distribution.

B. Keyword Analysis
Of the 7,613 posts, only 61 did not have a keyword. The Fig. 3. Proportion of Posts across the Classes
dataset comprises 222 unique keywords, many of which are
multi-word keywords and correspond to certain disaster types.
A histogram of keyword frequencies (Figure-2) shows a • Removing non-Latin characters.
long-tail distribution, with only a few keywords appearing • Lowercasing all text.
frequently, while many keywords appeared a handful of times. • Standardising country names, e.g., changed ”united
Also, Figure-3 exhibits the proportion of posts classified as a states” and ”us” to ”usa”.
real disaster per keyword, with the high variance of proportions • Removing excess white space and formatting issues.
suggesting that keywords are good predictive features that In the end, while the location variable had been cleaned,
correspond well with specific reports of real disasters. it was still very sparse and had high levels of ambiguity. In
Figure-4 you can see that all but a few locations are unique
C. Location Analysis or missing completely, and the distribution across classes is
Due to a high level of inconsistency and noise in the location so uniform that it was irrelevant to move forward training a
variable of the data set, including 2,533 missing cases and model. This aspect of the data will be omitted from any model
over 3,300 unique named locations, I conducted extensive data training based on its low pertinence.
cleaning and assessment of this variable. Some examples of the
III. M ETHODOLOGY AND P IPELINE
issues to address are informal expressions found throughout
the data, e.g., recorded locations of ”Earth” and ”Somewhere A. Text Pre-processings and Word Embeddings
in the sky,” inconsistent naming of the location feature, or To refine the model’s capacity to comprehend and derive
special characters. Data cleaning involved considerable efforts generalisations from textual data, I employed FastText em-
and included the following steps: beddings to vectorise the words contained in it. FastText
of the words in our dataset had valid vectors assigned to them,
which directly supports our model in terms of remembering
meaning.
Crawl Embeddings have 0.780 vocabulary cover example
and of 0.938 text cover example.
I also preprocessed the keyword column by replacing en-
coded spaces (%20) with regular spaces and then replacing
missing values with the placeholder ”empty”. This preprocess
allowed us to treat the keyword field in our model as an
additional feature.
B. Input Preparation
The clean text data, and preprocessed text data was then
shaped to feed it into our model for training. I were able to
split it into the datasets of train-test (96-4 split) using scikit-
learn’s ”train-test-split method, which allowed us to keep a
small sample for final analysis and evaluation, while providing
enough data to train effectively.
How to Prepare Inputs for LSTM:
Fig. 4. Frequency of Location Occurences • Tokenisation:
– The text column was tokenised utilising Tokenizer()
and converted into sequences of integers.
embeddings differ from older embeddings like Word2Vec or
– The keyword column was tokenised separately and
GloVe, in that FastText utilises subword (character n-gram)
represented in matrix format.
information, which likewise permits it to generalise to out-of-
• Padding: The token sequences were given padding to
vocabulary (OOV) words by building representations for new
words. [3] ensure that there will be uniform sizes of input across
At first, the raw text data had comparatively low embedding all data samples.
• Meta-data Features: Additional float-type metadata fea-
coverage. FastText embeddings provided coverage of 51.5% of
the vocabulary and 81.8% of the text. The limited embedding tures (such as word-count, etc.) were concatenated with
coverage was primarily due to tokens that were not able the keyword matrix to create a large set of float features.
• Embedding Matrix: I initialised the embedding matrix
to be tokenised given the presence of irregular characters,
symbols, and punctuation. To improve embedding coverage with FastText’s vectors, such that all words I found in the
and generalisation in the embeddings, I executed a number of vocabulary were given a corresponding 300-dimensional
text pre-processing workflows: vector. Any words not found in FastText’s vectors were
given the zero vector. I ended up with 3,386 unknown
• Removal of URLs: All hyperlinks were removed from the
words, which was a significant drop from the prepro-
text using regular expressions, as these were predom- cessed state.
inantly noise, but also inconsequential for purposes of
understanding content. C. LSTM-Based Model with Attention
• Expansion of Contractions: Common contractions, like To create a strong neural baseline for the tweet-level
”won’t”, ”can’t”, ”I’m”, and so forth, were enhanced classification task, I used a Bidirectional Long Short-Term
to their full form (e.g., ”will not”, ”can not”, ”I am”) Memory (BiLSTM) network with an attention mechanism.
to improve the ability to match with the embedding This architecture enabled the model to capture our sentiment
vocabulary. classification task’s forward and backward contextual depen-
• Lower-casing: I then converted to lower case. This im- dencies in the text, while also being able to pay attention to
proved the coverage and reduced vocab size. the most useful piece of the tweet. [4]
• Removal of characters and symbols: All characters that 1) Data Pre-processing: The preprocessing pipeline was as
are not Latin (including punctuation and special charac- follows:
ters) were removed using regular expressions. • Converting tweets to lowercase, removing URLs, hash-
• Removing stopwords: Common stopwords (e.g., ”the”, tags, mentions, emojis, special characters, and those extra
”is”, ”and”) were removed to keep only the words that whitespaces/unnecessary spaces, etc.
hold meaning. • Tokenising the cleaned tweets with a torchtext’s ”Field”
After preprocessing the above tasks, the vocabulary cover- object with a maximum and fixed number of tokens of
age reached 78% and the text coverage was improved to 93.8% 40.
based on embedding coverage checks. This was an important • Padding/truncating sequences to ensure uniform input
preprocessing task since I wanted to ensure that the majority lengths.
output probability for a tweet being positive.
The ability of the BiLSTM and attention model to learn a
dependency of relevance to an arbitrary sentiment not con-
strained to only the first words of the sequence is a powerful
sentiment analysis approach.
I set the following training parameters:
• Loss Function: Binary Cross-Entropy (BCE)
• Optimiser: Adam with learning rate as 10−3
• Batch Size: 32
• Epochs: 10
• Validation Split: 10% of training data
• Device: CUDA-enabled GPU for faster training
To alleviate overfitting, I monitored the validation accuracy,
with early stopping based on patience over the best validation
loss.

D. BERT Model Fine-Tuning


To enhance model efficacy further and to evaluate an
alternative related to the LSTM-based architecture, I produced
a fine-tuned BERT model. Bidirectional Encoder Representa-
tions from Transformers (BERT) has established state-of-the-
art performance on a number of natural language processing
(NLP) tasks including cases requiring contextual awareness.
[5]
This implementation utilised the pre-trained bert-base-
uncased model from HuggingFace’s Transformers library. This
implementation with its transformer-based embedding and pre-
training on a large, unsupervised English corpus is capable of
producing rich, contextual embeddings for each token in the
Fig. 5. LSTM Architecture
input.
1) Data Tokenisation: I tokenised the raw text into input
•Building a vocabulary from training data and initialising for the BERT model using bert’s tokeniser:
embedding using the pre-trained FastText word embed- • Each sentence was tokenised to subword tokens repre-
dings (300-dimensional). sented as a separate token.
• Tokens that were not found in the pre-trained word • A [CLS] token was added to the beginning of each
embeddings were given randomly initialised embeddings. sentence and a [SEP] token added to the end of each
2) Model Architecture: The proposed architecture takes a sequence.
modular approach: • The tokeniser also identified and produced attention
masks and token types, which tell the model which tokens
• Embedding Layer: I initialised a FastText Embedding
the model should give focus to and indicates the model
layer so the model can leverage embedding contextual
how to read sentence pairs (not needed in this case but
richness with its subword structure.
was trained to accommodate).
• Bidirectional LSTM: A single layer BiLSTM was used,
with a hidden layer output size of 128, which processes Each input was truncated or padded to a maximum input
input in both forward and backward direction to access, length of 128 tokens, that were then converted to and wrapped
and capture, the complete context. on PyTorch tensor datasets in order to batch and load the inputs
• Attention Mechanism: A dot-product style attention com-
into the model efficiently.
putes weights across the hidden states of all time steps, 2) Model Architecture: I fine-tuned a BertForSequence-
forming a context vector as a weighted sum. This will Classification model pre-trained for binary classification tasks:
allow the model to learn by assigning more attention • A classification head with a single linear layer and
to the relevant parts of the tweet, as determined by the sigmoid activation for binary classification was added on
attention weights. top of the pooled [CLS] output produced by the BERT
• Fully Connected Layer: I took the attention-weighted model.
context vector and applied a linear transformation fol- • The model was trained using binary cross-entropy loss
lowed by a sigmoid activation, producing the model and optimised using the AdamW optimiser. The AdamW
TABLE I
C OMPARISON OF LSTM AND BERT M ODELS

Model Accuracy (%) Precision (%) Recall (%) F1 Score (%)


LSTM 80.33 78.87 78.87 78.87
BERT 75.08 72.00 76.06 73.98

Fig. 7. Fluctuations in LSTM Training

how hyperparameters affect performance metrics. I also put


together training and validation curves which demonstrate that
the model converged steadily over epochs in terms of accuracy,
precision, recall, and F1 score, showing no signs of overfitting.
Although BERT, which is based on the transformer archi-
Fig. 6. BERT Architecture tecture, is widely considered state-of-the-art for many NLP
tasks, its performance in this case was a tick lower than the
LSTM model. This was as the result of the nature of the dataset
optimiser is a commonly used training regime for trans- that favors a sequence-based modeling approach better suited
former architectures. for the LSTM type of modeling or the fine-tuning, owing to
• I used a learning rate scheduler during training with get- relatively smaller sizes of the tweets I was able to collect.
linear-schedule-with-warmup to improve convergence.
3) Training Strategy:
• I trained with a batch size of 16 for 4 epochs. A. Comparitive Insights
• To track overfitting and generalisation, I monitored the
• In terms of the majority of metric outputs, the LSTM
validation performance at the end of each epoch.
model performed (in)significantly better than BERT. This
• The model was trained in the GPU environment using
is especially true with (highest levels) for precision and
CUDA to speed convergence.
F1 score, which are the most meaningful metrics for
Predictions were made during inference by using disaster-related tasks using binary classification.
torch.sigmoid(logits) to generate probability scores, which • Although, BERT performed better with recall, which is
were subsequently thresholded to produce binary labels. advantageous to applications delivering greater specificity
IV. R ESULTS AND C OMPARISON to minimize false negatives; this is important to note as
The findings show that the LSTM model does a good job of it is crucial during practical implementations to reduce
capturing sequential dependencies of the text and generalises false negatives.
reasonably well to unseen data, providing a good balance of • BERT average losses remain greater than LSTM sug-
precision and recall. gesting BERT’s predictions were of less projection. This
I conducted hyperparameter tuning (learning rate and could be attributed to the small size of the tweets I
warmup proportion), and visualised the results to determine collected.
• Data Collection for Policy Officials: Trends from tweets
associated with disasters can be a component of method-
ical strategy for action and future resilience.
VI. C ONCLUSION
In this project, I constructed and analysed a tweet clas-
sification model using LSTM and BERT, a platform that
can assist researchers in differentiating disaster-associated and
non-disaster-associated messages. The research was conducted
by iteratively developing, fine-tuning LSTM and BERT, and
measuring performance by standard evaluation metrics of
accuracy, precision, recall, and F1-score. Overall, both showed
good performance on measuring semantic patterns from social
media posts.
Interestingly, in the series of experiments I conducted, I
found that a LSTM-based model slightly outperformed BERT
on the collected data set. It could be explained by the relatively
short length and simple structure of the tweets found in our
Fig. 8. Consistent BERT Training twitter data set. As a lighter and easier to train machine
learning model, LSTMs were able to successfully model a
semantic pattern for sentiment without the necessity for the
V. A PPLICATIONS AND I MPACT
deeper context modelling provided by BERT. Accordingly,
The suggested system is a significant potential approach to in actual applications where tweet posts are more diverse,
real-time disaster detection, for extreme social media, espe- embedded slang and require deeper context, BERT is expected
cially Twitter, is both decentralised and rapid in its mode of to outperform machine learning deep models due to its strong
communication—people will often tweet about experiencing capacity for language representation.
a disaster minutes after it has occurred—sometimes before Despite the promise of the results found with BERT, lim-
news agencies, or even governmental agencies, have had time itations exist. Specifically, BERT is still a computationally
to activate. Thus, if a user, for example, tweets about an expensive model to train and requires extensive GPU and time.
earthquake, a BERT-based model can classify that tweet as The computational cost poses a substantial barrier to real time
being disaster-related (or not) with a near-zero response time or more broadly deployed applications without the appropriate
so that it can be used as an early-warning system in situational infrastructure.
awareness repositories in emergency management systems. In future endeavours, I hope to achieve the following:
Furthermore, this approach could be incorporated into emer- • Incorporate more diverse and robust datasets with a
gency response systems using real-time APIs connected to broader range of disaster types and linguistic styles.
Twitter’s streaming output. The model could monitor, continu- • Extend the operation of our system to handle multi-
ally and in real-time, classify, and flag tweeting about potential lingual tweets, thus improving applicability worldwide.
disasters to authorities in the field. If geolocated information • Include geolocation metadata to improve real-time disas-
is available, this can bolster the effectiveness of the model by ter mapping and localised alerts.
identifying the general area impacted. • Investigate model distillation or lightweight transformers
The consequences of such a system are broad-based, such for better inference speed and real-time deployment.
as:
R EFERENCES
• Public Health: Early detection of a disaster may lead to
an expedited response time in raising medical teams or [1] M. Imran, C. Castillo, F. Diaz, and S. Vieweg, “Processing social media
messages in mass emergency: A survey,” ACM Comput. Surv., vol. 47,
medical supplies, if appropriate, particularly in health- no. 4, pp. 1–38, 2015. https://doi.org/10.1145/2771588
related disasters or crises such as epidemics, or natural [2] S. Vieweg et al., “Microblogging during two natural hazards events:
disasters where injuries are a potential outcome of the What Twitter may contribute to situational awareness,” in Proc. SIGCHI
Conf. Human Factors Comput. Syst., 2010, pp. 1079–1088. https://doi.
disaster. org/10.1145/1753326.1753486
• Crisis Response Teams: By classifying disaster tweets (as [3] J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global vectors
an example), this may prioritise resource allocation, by for word representation,” in Proc. EMNLP, 2014, pp. 1532–1543. https:
//nlp.stanford.edu/pubs/glove.pdf
identifying and tracking a disaster as it unfolds. [4] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
• General Public: Social media or dashboards with alerts computation, vol. 9, no. 8, pp. 1735–1780, 1997. https://doi.org/10.1162/
can serve timely notifications for the general public or neco.1997.9.8.1735
[5] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training
individuals to provide them the opportunity of acting on of deep bidirectional transformers for language understanding,” arXiv
recommendations for self-protective behaviour such as preprint, 2018. https://arxiv.org/abs/1810.04805
evacuating or sheltering.

You might also like