0% found this document useful (0 votes)
74 views17 pages

Chatgpt'S Scientific Writings: A Case Study On Traffic Safety

This document summarizes a study that compares introductions written by humans for published traffic safety articles to introductions generated by ChatGPT. The study uses both supervised machine learning classifiers and unsupervised text analysis approaches to identify differences between the human-written and ChatGPT-generated introductions. The classifiers achieved 96.4-99.5% accuracy in distinguishing the introductions. Text analysis found that human and ChatGPT introductions used different frequent keywords. The study aims to contribute to understanding how advanced AI may impact scientific writing.

Uploaded by

Jiahao Mi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views17 pages

Chatgpt'S Scientific Writings: A Case Study On Traffic Safety

This document summarizes a study that compares introductions written by humans for published traffic safety articles to introductions generated by ChatGPT. The study uses both supervised machine learning classifiers and unsupervised text analysis approaches to identify differences between the human-written and ChatGPT-generated introductions. The classifiers achieved 96.4-99.5% accuracy in distinguishing the introductions. Text analysis found that human and ChatGPT introductions used different frequent keywords. The study aims to contribute to understanding how advanced AI may impact scientific writing.

Uploaded by

Jiahao Mi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

ChatGPT's Scientific Writings: A Case Study on Traffic Safety

Boniphace Kutela, Ph.D., P.E (Corresponding Author)


Assistant Research Scientist
Texas A&M Transportation Institute
701 N Post Oak Ln # 430, Houston, TX 77024
Email: [email protected]

Kelvin J. Msechu, RSP1


Traffic/ITS Engineer
Atkins North America Inc.
28175 Haggerty Rd, Novi, MI 48377
Email: [email protected]

Subasish Das, Ph.D.


Assistant Professor
Texas State University
San Marcos, TX- 78254
Email: [email protected]

Emmanuel Kidando, Ph.D., P.E.


Department of Civil and Environmental Engineering
Cleveland State University
2121 Euclid Avenue, Cleveland, OH 44115
Email: e. [email protected]

Electronic copy available at: https://ssrn.com/abstract=4329120


Abstract
The use of advanced language models, such as ChatGPT, is emerging in various fields. Such models can
assist students with homework, help prepare business plans, write code, and even create surveys. It has also
been found that these programs can even generate fake abstracts and manuscripts. To understand the future
of scientific writing in the era of ChatGPT, it is important to explore ChatGPT and human-generated texts
for manuscript preparation. This study aimed to evaluate the capability of ChatGPT to prepare a manuscript
for publication by comparing its output with actual published content using supervised and unsupervised
text mining approaches. This study used the introduction sections of 327 published articles on traffic safety.
To obtain the ChatGPT-generated introduction section, a prompt with instructions and the title of 327
manuscripts was supplied to ChatGPT. Five supervised text classifiers, Support Vector Machine (SVM),
Random Forest (RF), Naive Bayes (NB), Logitboost, and Neural Network (NNet), were applied to classify
human-generated versus ChatGPT-generated introductions. Two unsupervised approaches, Text Network
Analysis (TNA) and Text Cluster, were used to identify the difference in textual contents in human-
generated and ChatGPT-generated introductions. Results indicate a significant disparity between human-
generated and ChatGPT-generated introductions. The accuracy of the supervised text classifiers ranged
between 96.4% and 99.5%; SVM, RF, and NNet performed have higher precision. The key features from
SVM, RF, and NNet indicated more generic keywords with small variations across the classifiers. The
results from the topmost frequent keywords showed a great variation of keywords for the ChatGPT and
human-generated texts. Furthermore, the cluster analysis results indicated key clusters that are only
observed in ChatGPT-generated introductions but not in human-generated introductions, and vice versa.
The findings from this study can contribute to understanding the broader perspective of using advanced
language models in scientific writing.

Keywords: ChatGPT, Scientific Writings, Traffic Safety, Text mining

Electronic copy available at: https://ssrn.com/abstract=4329120


Introduction
Artificial intelligence (AI) is revolutionizing the way people live and perform various activities daily. The
use of advanced technologies, such as smartphones and smart watches, has changed how people live and
work. Furthermore, the use of voice command devices such as SIRI and Alexa have also altered day-to-day
activities.
In November 2022, OpenAI released ChatGPT, an advanced language model, that interacts with
the users by providing a set of written directives and produces the written text according to the instruction
given (ChatGPT & Perlman, 2022; Noever & Ciolino, 2022; Wenzlaff & Spaeth, 2022). This tool has
gained attention in different fields, such as academicians, economists, social scientists, engineers, and
computer scientists. (ChatGPT & Perlman, 2022; Gao et al., 2022). Most of the concerns have been about
whether ChatGPT will replace some of the human-generated activities, such as writing codes/algorithms,
preparing poems, movie transcripts, etc. (ChatGPT & Perlman, 2022; Qadir, 2022). Some scholars even
argue that some jobs will be replaced with the ChatGPT (Qadir, 2022), whereas others disagree indicating
that the tool is not capable of taking over most human-generated jobs and thus will have minimal impact
(Aydın & Karaarslan, 2022; Chen & Eger, 2022; Frye, 2022; Wenzlaff & Spaeth, 2022). On the other hand,
the ChatGPT is based on the advanced language model, a reinforcement learning, which is expected to get
better when new observations are included in retraining the model.
In the academic atmosphere, the major issue discussed has been its application in schools (Graham,
2022; Mollick & Mollick, 2022; Stokel-Walker, 2022; Susnjak, 2022; Zhai, 2022), professional exams such
as medical exams, and the Bar exam (Bommarito & Katz, 2022). The school-based arguments are more on
the moral use of the tool and whether students should be trained on how to use the tool or wait for them to
discover it by themselves. A study by Mollick concluded that students should be exposed to the tool,
whereas another study revealed otherwise (Mollick & Mollick, 2022). A few studies noted various
advantages of teaching students this tool and its limitations, suggesting that teaching students will help them
not to cheat in their assignments and exams (Qadir, 2022). Other studies suggest the need for new types of
assignments that focus on creativity and skills that are not easily mimicked by AI (Graham, 2022; Zhai,
2022). Some school districts went far and ban the use of ChatGPT for academic activities. Such schools
include the New York City schools, Seattle Public Schools, Los Angeles Unified School District, Fairfax
County Public Schools in Virginia, among others (Johnson, 2023).
Moreover, several papers have explored the impact of ChatGPT in peer-reviewed publication
settings. The key questions have been on the impact of ChatGPT on manuscript preparation and ethical
issues, among others (ChatGPT & Perlman, 2022; Gao et al., 2022; Jabotinsky & Sarel, 2022; McKee &
Noever, 2022; Qadir, 2022; Stokel-Walker, 2022; Susnjak, 2022). Studies that focus on ethical issues are
driven by the question of whether it is ethical to use ChatGPT to prepare a paper for publication. Such
preparation can be for the entire paper or for a portion of a paper. In this case, there have been some
disagreements. Thunström, (2022) attempted to write an academic paper about ChatGPT using ChatGPT
and then tried to publish it, but Aydın & Karaarslan, (2022) opposed such an approach on the grounds that
the system is not ready for the tool. Other scholars prepared manuscript and submitted journal for
publication (Zhai, 2022). So far, no official statement has been released from major publishing agencies on
the use of ChatGPT for manuscript preparation.
Although there have been debates on whether ChatGPT can create a manuscript that is publishable,
a few studies have focused on proof of concept. The studies attempted to establishes whether ChatGPT
materials are different from human-generated scientific writing. For instance, a study by Gao et al., (2022)
compared the abstracts written by humans to those generated by ChatGPT. The authors evaluated the

Electronic copy available at: https://ssrn.com/abstract=4329120


abstracts using an AI detector based on the plagiarism check and human reviewers. The results showed that
while the abstracts were written clearly, only 8% followed the specific journal’s formatting requirements.
Most abstracts received high scores suggesting they are likely to be generated by AI. Moreover, the authors
suggest that journals and medical conferences should include AI output detectors in the editorial process
and disclose the use of AI technology to maintain rigorous scientific standards (Gao et al., 2022). Although
this study had an interesting finding, more exploration is still needed. Specifically, it is relatively hard for
a human to generate an abstract of a paper given the title only. This is because the abstract normally contains
background information, methodological approach, findings, and other interesting features. Additionally,
their studies did not explore the key features that distinguish human and ChatGPT texts, making it relatively
hard to understand, especially for readers who want to get the granular findings. Additionally, efforts have
been made to detect ChatGPT created content (Cerullo, 2023). In their efforts, tools/apps such as GPTZero
(GPTZero, 2023) were developed to detect AI generated texts. However, such tools/apps might need
advanced knowledge and might not be scalable. Nevertheless, ChatGPT is relatively new, and its capability
in the publication process is not well explored.
In this study, the introduction sections of manuscripts are used to explore the difference between
human-generated texts in peer-reviewed journals and ChatGPT-generated texts. The authors hypothesized
that ChatGPT can prepare a concise introduction section from the input of a well-defined title. Further, this
study adds to the body of literature regarding the methodological approach needed to explore the key
difference between human-generated scientific content and ChatGPT texts.
The remaining sections are presented as follows. The next section presents the study methodology
and discusses the data description and analytical approaches. The results and discussion section follows,
then the conclusion and future studies are presented last.

Methodology
As described earlier, this study intends to explore whether ChatGPT can produce publication-ready
materials comparable to human-generated text in published journals. This section presents the
methodological approach used to attain the study objectives. The section is divided into two main sections,
data description, and analytical methods.

Data Description
To explore the capability of ChatGPT in generating publication-ready materials, two types of data are
necessary, human-written text and ChatGPT-generated text. In this study, the authors utilized the
introduction section of published papers as the human-written text. Then, the paper’s title and other
descriptions were used to create the ChatGPT-generated text. The introduction section of the paper was
selected because of the easiness of developing the introduction section for a well-defined title. The
following prompt was used to generate the ChatGPT introduction section.
I want you to develop an introduction section of a manuscript for publication. I will give you a
number of titles then I want you to give me the introduction section of the paper. You need to adopt
a persona of a highly skilled writer in traffic safety. In your writeup, include the actual citations,
actual references, and actual traffic safety statistics. The first title is “Title of the paper”.
The top-cited research papers in traffic safety were selected. This was performed by using the Web of
Science database whereby “traffic safety”, “traffic crashes”, and “traffic accidents” were used to obtain the
traffic safety-related manuscripts. Then, we ranked the papers in descending order based on the number of
citations. A minimum of 30 citations was considered as a cutoff point, yielding 525 papers. The highly cited

Electronic copy available at: https://ssrn.com/abstract=4329120


papers were selected since various researchers have extensively used them. Thus, their contents are more
likely to be used to generate a ChatGPT database of features. Additionally, each title of the papers was
reviewed, and 102 papers had titles that were not traffic safety-related, and thus they were removed.
Furthermore, 96 articles published by journals with limited access to the introduction section were also
removed. In the end, 327 papers from 102 journals were selected for further analysis, most of which were
from Accident Analysis and Prevention, Journal of Safety Research, and Analytic Methods in Accident
Research (see Table 1).
Table 1 Distribution of Traffic Safety Papers
Number of
Journal papers
Accident Analysis and Prevention 121
Journal of Safety Research 27
Analytic Methods in Accident Research 26
Injury Prevention 15
Safety Science 11
Transportation Research Part C-Emerging Technologies 5
Others 122
Total 327

To prepare the ChatGPT-generated introductions, the titles of 327 original manuscript titles were used.
These titles were supplied to the prompt mentioned earlier. Each Chat-GPT-based introduction was
matched to the human-written introduction for further analysis.

Analytical Methods

This study applied two analytical methods to the text data, supervised and unsupervised text
mining. The supervised text mining utilized five text classifiers, Support Vector Machine (SVM), Random
Forest (RF), Naïve Bayes (NB), Logitboost, and Neural Network (NNet), to understand the key features
associated with the ChatGPT-generated text. The selection of these five classifiers was based on their
capability to predict text data (Arteaga et al., 2020; Joachims, 1998; Pranckevičius & Marcinkevičius, 2017;
Yuan et al., 2019). To perform the supervised text mining, all the ChatGPT-generated introductions were
categorized as “1,” and the human-generated introductions were categorized as “0”. Three performance
measures, accuracy, recall, and F1-Score, described in Equations 1, 2, and 3, respectively, were used to
evaluate the performance of the five classifiers. However, since the objective was to determine how the
ChatGPT texts were predicted, the recall scores are important. To assess the performance of the classifiers,
the dataset was divided into two portions, 70% for training purposes and 30% for testing purposes. In this
study, the assumption is that if the supervised text classifiers can have a high precision/recall score, then
the ChatGPT content is significantly different from the human-written content. The top 30 important
features from each classifier were used for the interpretation of the results.
𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (1)
𝑇𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (2)
𝑇𝑃 + 𝐹𝑁
2 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙 ∗ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
𝐹 − 1 𝑠𝑐𝑜𝑟𝑒 = (3)
𝑅𝑒𝑐𝑎𝑙𝑙 + 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
Whereby,

Electronic copy available at: https://ssrn.com/abstract=4329120


• TP: True Positive, actual positive values correctly classified as positive.
• TN: True Negative, actual negative values correctly classified as negative.
• FP: False Positive, actual negative values incorrectly classified as positive.
• FN: False Negative, actual positive values incorrectly classified as negative.

Although supervised text mining provides insights into the classification of texts based on the writer
(i.e., ChatGPT vs. human), it does not provide enough information on the content of the writeups (Kutela,
Das, et al., 2021; Kutela, Magehema, et al., 2022; Kutela, Novat, et al., 2022). In this study, the content is
as important as knowing the information source. Thus, unsupervised text mining was deemed necessary.
Two types of unsupervised text mining methods were applied, text network analysis (TNA) and text cluster.
The next section discusses the two applied unsupervised text mining approaches.
The first method, Text Network Analysis (TNA) has been effectively used in various fields such
as literature and linguistics (Hunter, 2014), traffic safety and operations (Boniphace Kutela & Teng, 2021;
Kwayu et al., 2021), and bibliometrics of transportation studies (Jiang et al., 2020). Using nodes and edges,
TNA establishes relationships between keywords within a corpus (see Figure 1). TNA’s strength lies in its
ability to visualize keywords and establish a connection among them (Jiang et al., 2020; B. Kutela et al.,
2021; Boniphace Kutela et al., 2021; Paranyushkin, 2011). The size of the nodes and the edges represent
the frequency and co-occurrence of keywords in the network, respectively.

Node/keyword

Edge/Co-occurrence

Community

Figure 1 A Skeleton of the Text Network (B. Kutela et al., 2021)

When performing TNA analysis, various processes are performed on the data. The first process involved is
normalization, where unstructured data is converted to structured data, all symbols are removed, and all
texts are converted to lowercase. The output from this process is then used to create a matrix of keywords
along with their frequencies of occurrence. The constructed matrix is then visualized with keywords as
nodes of various sizes depending on their recorded frequencies. Various metrics can be used for

Electronic copy available at: https://ssrn.com/abstract=4329120


comparative analysis. However, document and collocated frequency are the two metrics applied to compare
human-generated and ChatGPT-generated introductions in this study. Document frequency assesses the
number of documents with the keyword of interest. This is different from the keyword frequency, which
focuses on the number of times the keyword appears in the document. Further, collocation frequency shows
the number of times the keywords are located next to each other. Collocation frequency provides more
insights than individual keywords as it focuses more on the closeness between two keywords in the corpus.
The collocation of keywords in a text network plays a great part in forming text clusters normally referred
to as a community of keywords. A community of keywords represents a group collectively clustered in the
text network. A text network can have two or more communities (see Figure 1).

Lastly, text clustering using a simple Reinert textual clustering method was applied to understand the key
clusters for each source of the introduction section (Barnier, 2022). This method follows the data
preparation approach like TNA. However, the product of this approach is a number of clusters that
represents a certain theme. The next section presents the results and discussion.

Results and Discussions


This section presents the results and discussion of this study. The results of supervised classifiers
are presented, followed by the text network results and text clusters.

Text Classifiers Results


Table 2 presents the performance score of the classifiers. It can be observed that all classifiers
have high scores. For example, the accuracy score ranged from 96.4% for the Logitboost classifier to 99.5%
for NNet classifiers. The consistency of the classifiers indicates the texts generated by ChatGPT are
significantly different from the ones generated by a human. The precision value is 100% for the two
classifiers RF and NNet, while the F-1 score value ranges between 99.5% and 97.5%. The high prediction
accuracy and precision scores suggest that the keywords produced by the ChatGPT are significantly
different from the ones produced by a human for the same title of the paper.
Table 2 Performance Scores of Text Classifiers
Classifier
SVM RF NNet NB Logitboost
Accuracy 98.5% 99.0% 99.5% 97.4% 96.4%
Performance Precision 99.0% 100.0% 100.0% 96.0% 95.0%
measure F-1 Score 98.5% 99.0% 99.5% 97.5% 96.5%

Since NB and logitboost performance were relatively lower than the rest, the SVM, RF, and NNet classifiers
were used to determine important features. Figure 2 presents the 30 topmost important features of the SVM,
RF, and NNet classifiers. According to the results in Figure 2, the classifiers have relatively similar topmost
keywords, but their ranking varies. This observation explains why the classifiers have a relatively similar
performance score. Therefore, from this analysis, all three classifiers suggest that ChatGPT-generated texts
can be flagged due to their auto-generated textual content for developing the introduction sections. Overall,
15 keywords (50%) among the 30 top-listed key features appear in all three classifiers. Six unique features
were observed under the SVM classifier, two under RF, and eleven under NNet. Some features were shared
between two classifiers. For instance, SVM and RF shared nine features, while RF and NNet shared four

Electronic copy available at: https://ssrn.com/abstract=4329120


features. The keywords inform development, interventions, inform, and understand appear on the top 10
most ranked features for all three classifiers.

Figure 2 The 30-topmost Important Features


The observations from the supervised text mining section provide several insights. First, ChatGPT-
generated text can easily be identified by text classifiers. This is indicated by the high prediction accuracies
presented in Table 2. This finding suggests that ChatGPT cannot be used to produce entire manuscript-
ready content. This finding can be utilized by plagiarism software companies to flag ChatGPT-generated
manuscripts. Secondly, in addition to the ChatGPT materials being generic, they are likely to be based on
the keywords available in a title. This means the algorithm is less likely to capture other safety-related
keywords that are not supplied in the title. Similar findings were reported in a study that developed a tool
to detect ChatGPT-written text (Cerullo, 2023).
Supervised text mining is a great tool to assess whether two objects of interest are similar. However, the
tool does not provide great details on the content of the objects. Thus, other unsupervised text mining
methods are normally applied. In this study, text networks and text clusters are the two methods applied to
explore the content of the introductions. The next section presents the text network results.

Text Networks Results


Figure 3 presents the text network of the keywords from the titles of the publications used in this
study. As previously mentioned, publications used to understand the performance of ChatGPT in

Electronic copy available at: https://ssrn.com/abstract=4329120


manuscript preparation were based on traffic safety. This is very evident in Figure 3; the text network is
heavily centered on five large nodes with the keywords traffic, crash, crashes, injury, and safety. In fact,
about 50% of all titles had keyword traffic, while the keywords crash, severity, and crashes appeared in
about 25% to 30% of the titles. It is even more evident as thicker edges exist between and towards these
keywords from other smaller nodes. Linked keywords with a higher frequency of co-occurrence are road
and traffic, traffic and analysis, traffic and accident, traffic and crashes, injury and severity, crash and
severity, and traffic and safety, to mention a few. This can be deduced from the size of the edge linking two
keywords.

Figure 3 Text Network of the Human-generated Manuscript Titles


In addition to the titles, introduction sections written by humans and by ChatGPT were evaluated. Figure
4 presents the text network of the human-generated introduction section, while Table 3 presents the
performance metrics of the text networks. The text network of human-generated introductions (Figure 4(a))
is heavily centered on the keywords traffic, crashes, crash, safety, and road. This is because the studies
used in this paper are traffic safety related. Further, the human-generated introduction sections constitute
keywords such as severity, injury, fatalities, and injuries, indicating the introduction also covers injury
severity in various studies. The methodology keywords - models, methods, and predictions, are also
presented in this section. Some other keywords have a relatively low representation, including intersections,
vehicles, health, risk, number, pedestrian, data, driving, and driver, among others.

Electronic copy available at: https://ssrn.com/abstract=4329120


(a) Human-generated Introductions in Published Manuscripts

(b) ChatGPT-generated Introductions


Figure 4 Text Network of Human-generated and ChatGPT-generated Introductions
Figure 4 (b) presents the text network for ChatGPT-generated introductions. The text network is heavily
centered on the keywords crashes, traffic, safety, road, and crash, similar to the human-generated text

10

Electronic copy available at: https://ssrn.com/abstract=4329120


network. However, the human-generated text network nodes appear to be larger than those presented on
the ChatGPT network. This outcome implies that these keywords appear more frequently in human-
generated introduction sections than in the ChatGPT-generated introduction sections. Observing the left
side of the ChatGPT-generated text network (Figure 4(b)), the linked keywords include develop and
interventions, develop and strategies, reduce and risk, improve and safety, health and development, health
and health, and health and concern, are observed. These linked keywords can be related to improving traffic
safety solutions and the well-being of road users. On the other hand, the right side of the text network
contains linked keywords such as use and data, provide and insights, provide and results, using and data,
identify and characteristics, statistical and analysis, which are related to terms of analysis and
recommendation/findings.
Although Figure 4 shows that the two networks share some similarities and portray some
differences, a comparative analysis of the two networks can be performed using the keyword and
collocation frequencies. According to the results in Table 3, among the top 20 keywords, only eight are
common for both sides. Even for the eight common keywords, the ranking varies significantly. For instance,
the keyword crash in human-generated metrics is ranked fourth, appearing in 242 introductions, while it is
ranked 14th in the ChatGPT-generated metrics, appearing 206 times. Keywords such as aim, develop, use,
reduce, etc., appear only in the ChatGPT-generated metrics. On the other hand, the keywords however,
number, research, differ, and road, among others, appear only on the human-generated metrics. This
observation indicates that ChatGPT is capable of replicating some keywords.
Table 3 Text Network Metrics
ChatGPT-generated Metrics Human-generated Metrics
SN Feature Docfreq Collocation Count Feature Docfreq Collocation Count
1 studies 299 world health organization 112 traffic 302 road traffic crashes 105
2 traffic 286 provide important insights 108 studies 292 road traffic accidents 57
3 aim 284 can inform development 100 use 260 road traffic injuries 38
4 develop 275 interventions policies aimed 91 crash 242 national highway traffic 37
5 use 261 road traffic crashes 80 safety 235 highway traffic safety 37
6 data 247 development targeted interventions 74 result 234 world health organization 37
7 safety 246 public health concern 70 road 232 traffic safety administration 35
8 reduce 238 national highway traffic 65 factor 231 paper organized follows 34
9 inform 231 highway traffic safety 65 data 226 leading cause death 33
10 factor 227 traffic safety administration 65 vehicle 216 motor vehicle crashes 32
11 understand 222 development interventions policies 64 research 209 crash prediction models 31
12 identified 221 improve traffic safety 64 injuries 206 organized follows section 27
13 can 212 major public health 62 severe 206 safety administration nhtsa 26
14 crash 206 results provide important 59 however 205 crash injury severity 24
15 intervention 204 targeted interventions reduce 59 increase 204 traffic accident risk 23
16 improve 195 inform development strategies 58 one 202 involved traffic crashes 23
17 import 192 risk traffic accidents 55 related 197 crash risk prediction 21
18 risk 181 inform development targeted 55 differ 192 aggressive driving behavior 21
19 result 179 development strategies improve 54 number 190 driver injury severity 20
20 include 178 improve road safety 54 risk 189 road traffic accident 20
Key: Docfreq = Document frequency

In addition to the individual keywords, the collocated keywords results can be used to distinguish
ChatGPT-generated introductions from human-generated introductions. In this case, the three collocated
keywords were used to distinguish the two introductions. The results in Table 3 show that the two

11

Electronic copy available at: https://ssrn.com/abstract=4329120


approaches differ significantly. Only four pairs of collocated keywords, world health organization, road
traffic crashes, highway traffic safety, and national highway traffic, are common for both approaches.
Conversely, the human-generated introductions have collocated keywords that show the section where the
authors describe the layout of the paper. Keywords such as paper organized follows and organized follows
section, are typical examples. On the other hand, the keywords associated with the use of the study can be
observed in the collocated keywords. Such keywords include interventions policies aimed, development
interventions policies, targeted interventions reduce, etc. These findings indicate that the ChatGPT
algorithm tends to summarize the possible use of each research, while that is not the case for most human-
generated scientific writeups.

The text network and associated metrics provide the details of the content of the introductions, which
facilitates the comparisons of the two. However, in order to understand the content in the overall picture,
the clusters of keywords are important. The next section presents the text clusters of the keywords.

Text Clusters Results


Figure 5 presents the five clusters of ChatGPT-generated and human-written introductions.
According to the clusters for ChatGPT-generated introductions (Figure 5(a)), the first cluster contains
keywords such as policies, understanding, informing, interventions, development, and improving. Such
keywords can be associated with policies and actions taken to increase traffic safety. The second cluster
presents keywords related to models and analytical approaches. Such keywords include models, Bayesian,
regression, prediction, and techniques. Most of the keywords in the third cluster, such as database,
information, location, characteristics, and density, can be associated with traffic safety data/information
collected in traffic studies. Keywords relating to operational actions taken to increase traffic safety can be
observed in the fourth cluster, with keywords such as strategies, reduce, planners, driving, and developing.
The fifth cluster produced from the ChatGPT-generated introduction contains keywords such as middle-
income, seeking, practitioners, countries, gap, findings, literature, and policymakers. These keywords
relate to traffic study findings and recommendation texts. The last cluster, cluster number six, contains
keywords that explain a wider geographic implication of traffic safety and keywords related to large
communities such as health, world, organization, public, worldwide, million, and national.
Figure 5(b) presents clusters derived from human-generated introductions. From the figure, it can
be observed that the first cluster contains keywords that can be linked to results/outcomes of a traffic crash;
such keywords are psychological, accidents, injury, hospital, trauma, severity, and police. On the other
hand, the second cluster contains keywords such as countries, million, world, injuries, deaths, and global.
These keywords are linked to explanations of traffic safety in a larger geographic context. Keywords
relating to human behavior and performance factors that can lead to traffic crashes can be observed in the
third cluster; the keywords include seatbelt, policies, alcohol, distraction, young, enforcement, new, and
behaviors. The fourth cluster includes keywords such as models, macro-level, variables, factors, function,
investigated, and performance; such keywords explain traffic analysis-related terms. Closely comparable
to cluster four, the fifty cluster keywords are related to traffic safety analysis methods, and keywords such
as prediction, method, research, classification, techniques, analysis, and statistical can be observed. The
final cluster, cluster six, contains keywords such as bends, curve, road, traffic, single, density, and rural;
these keywords are associated with roadway characteristics.

12

Electronic copy available at: https://ssrn.com/abstract=4329120


(a) Clusters of the ChatGPT-generated Introductions

(b) Clusters of the Human-generated Introductions


Figure 5 Text Clusters of Human-generated and ChatGPT-generated Introductions
Some similarities and differences can be observed from the clusters generated by ChatGPT and
human-generated introductions. For example, similarities can be observed on keywords related to macro-
level impacts of traffic safety, and both methods showed these impacts in a wider geographic sense
(ChatGPT’s cluster 6, human-generated cluster 2), using common keywords such as global, globally, world,
health, death, fatalities, and million. Further, there are similarities in traffic modeling and analysis methods
(ChatGPT’s cluster 2, human-generated cluster 5), common keywords observed in the ChatGPT and
human-generated clusters are prediction, methods, techniques, data, statistical, and analysis. Conversely,
differences in the generated introductions are noted. For example, ChatGPT keywords focused more on

13

Electronic copy available at: https://ssrn.com/abstract=4329120


educative action to increase traffic safety (cluster 1), traffic safety data (cluster 3), operational actions taken
to increase traffic safety (cluster 4), and traffic study-related texts (cluster 5). On the other hand, keywords
from human-generated clusters focused on outcomes after traffic crashes (cluster 1), human behavior
leading to traffic crashes (cluster 3), traffic analysis-related terms (cluster 4), and roadway characteristics
(cluster 6). Observing the difference in these clusters, an implication can be made that ChatGPT clusters
are more focused on generalized actions relating to traffic safety, such as education, whereas human-
generated clusters are more focused on outcomes and actionable parameters related to traffic safety, such
as human behavior leading to traffic accidents and roadway characteristics.

Conclusions and Future Needs


This study evaluated the potential of ChatGPT to produce scholarly writeups that could be
considered for publication. The introduction sections of 327 published articles were compared with the
introductions prepared using ChatGPT. The ChatGPT-generated introductions were obtained using the
papers’ title. Supervised and unsupervised text mining approaches were then applied to the text data to
explore the pattern and key features that distinguish ChatGPT-generated text from human-generated text.
The findings indicate that the introduction sections generated using ChatGPT-generated text differ
significantly from the ones generated by a human. This was first revealed by the high prediction accuracy
of the three algorithms used to classify the text, then the contents of the text/introduction sections, which
were explored using unsupervised approaches. The findings have the following implications:

• Since ChatGPT produces relatively different outputs, overreliance on it would result in published
papers having a similar pattern that is easily identifiable. Plagiarism software with the use of AI
can easily flag ChatGPT based contents. It is suggested to avoid ChatGPT based contents for
scholarly writings.
• ChatGPT should provide warnings or restrictions on generating fake texts for scientific writings.
Both plagiarism and AI-generated fake texts can harm the rigor of scientific works.
• Since there are some similarities in terms of the contents of the text (ChatGPT vs. human-written
texts), a person who intends to use ChatGPT for academic/publishing purposes needs to carefully
review the ChatGPT-generated content and edit it necessary. This is to say, ChatGPT content
should only be used as guiding materials and not as final material.
• Since there have been some developments in detecting ChatGPT-created content, journal editors
should adopt these tools to detect such content. The authors need to provide a disclaimer at the end
of the manuscript if ChatGPT has been used for some contents of the manuscript.
This study has successfully shown how ChatGPT-generated introduction sections of published papers differ
from those generated by a human. The study used traffic safety-related manuscripts to answer the research
question. The results showed a clear difference between the two. However, this might be attributed to the
fact that traffic safety studies might not be very popular. It would be of interest to explore the performance
of ChatGTP in other areas, such as social science, economics, and politics, that are relatively popular and
utilized by numerous people and have millions of content/papers that have already been published.
Additionally, the prompt may produce different results based on the level of the details supplied. Future
studies may evaluate the difference in the performance of the ChatGPT given different type/detail level of
the prompt.

14

Electronic copy available at: https://ssrn.com/abstract=4329120


Authors’ Contribution
The authors confirm their contribution to the paper as follows: B. Kutela; Study conception and design: B.
Kutela and K. Msechu; data collection and analysis: B. Kutela, K Msechu, S. Das, and E. Kidando;
interpretation of results: B. Kutela, K. Msechu, S. Das, and E. Kidando; draft manuscript preparation: B.
Kutela, K. Msechu, and S. Das. All authors reviewed the results and approved the final version of the
manuscript. None of the sections of this manuscript was prepared by ChatGPT.

Funding
Authors received no funding from any source.

References
Arteaga, C., Paz, A., & Park, J. W. (2020). Injury severity on traffic crashes: A text mining with an
interpretable machine-learning approach. Safety Science, 132.
https://doi.org/10.1016/j.ssci.2020.104988
Aydın, Ö., & Karaarslan, E. (2022). OpenAI ChatGPT Generated Literature Review: Digital Twin in
Healthcare. SSRN Electronic Journal. https://doi.org/10.2139/SSRN.4308687
Barnier, J. (2022). Introduction to rainette.
https://juba.github.io/rainette/articles/introduction_en.html#double-clustering
Bommarito, M. J., & Katz, D. M. (2022). GPT Takes the Bar Exam. SSRN Electronic Journal.
https://doi.org/10.2139/SSRN.4314839
Cerullo, M. (2023). Princeton student says his new app helps teachers find ChatGPT cheats - CBS News.
https://www.cbsnews.com/news/chatgpt-princeton-student-gptzero-app-edward-tian/
ChatGPT, O. A. A., & Perlman, A. (2022). The Implications of OpenAI’s Assistant for Legal Services
and Society. SSRN Electronic Journal. https://doi.org/10.2139/SSRN.4294197
Chen, Y., & Eger, S. (2022). Transformers Go for the LOLs: Generating (Humourous) Titles from
Scientific Abstracts End-to-End. ArXiv. https://doi.org/10.48550/arxiv.2212.10522
Frye, B. L. (2022). Should Using an AI Text Generator to Produce Academic Writing Be Plagiarism?
https://papers.ssrn.com/abstract=4292283
Gao, C. A., Howard, F. M., Markov, N. S., Dyer, E. C., Ramesh, S., Luo, Y., & Pearson, A. T. (2022).
Comparing scientific abstracts generated by ChatGPT to original abstracts using an artificial
intelligence output detector, plagiarism detector, and blinded human reviewers. BioRxiv,
2022.12.23.521610. https://doi.org/10.1101/2022.12.23.521610
GPTZero. (2023). GPTZero: Humans Deserve to Know the Truth. https://gptzero.me/
Graham, F. (2022). Daily briefing: Will ChatGPT kill the essay assignment? Nature.
https://doi.org/10.1038/D41586-022-04437-2
Hunter, S. (2014). A Novel Method of Network Text Analysis. Open Journal of Modern Linguistics, 4,
350–366. https://doi.org/10.4236/ojml.2014.42028

15

Electronic copy available at: https://ssrn.com/abstract=4329120


Jabotinsky, H. Y., & Sarel, R. (2022). Co-authoring with an AI? Ethical Dilemmas and Artificial
Intelligence. SSRN Electronic Journal. https://doi.org/10.2139/SSRN.4303959
Jiang, C., Bhat, C. R., & Lam, W. H. K. (2020). A bibliometric overview of Transportation Research Part
B: Methodological in the past forty years (1979–2019). Transportation Research Part B:
Methodological, 138, 268–291. https://doi.org/10.1016/j.trb.2020.05.016
Joachims, T. (1998). Text categorization with Support Vector Machines: Learning with many relevant
features (pp. 137–142). https://doi.org/10.1007/bfb0026683
Johnson, A. (2023). ChatGPT In Schools: Here’s Where It’s Banned—And How It Could Potentially Help
Students. Forbes. https://www.forbes.com/sites/ariannajohnson/2023/01/18/chatgpt-in-schools-
heres-where-its-banned-and-how-it-could-potentially-help-students/?sh=73873a316e2c
Kutela, B., Das, S., & Dadashova, B. (2021). Mining patterns of autonomous vehicle crashes involving
vulnerable road users to understand the associated factors. Accident Analysis & Prevention, 106473.
https://doi.org/10.1016/J.AAP.2021.106473
Kutela, B., Langa, N., Mwende, S., Kidando, E., Kitali, A. E., & Bansal, P. (2021). A text mining
approach to elicit public perception of bike-sharing systems. Travel Behaviour and Society, 24, 113–
123. https://doi.org/10.1016/j.tbs.2021.03.002
Kutela, B., Magehema, R. T., Langa, N., Steven, F., & Mwekh’iga, R. J. (2022). A comparative analysis
of followers’ engagements on bilingual tweets using regression-text mining approach. A case of
Tanzanian-based airlines. International Journal of Information Management Data Insights, 2(2),
100123. https://doi.org/10.1016/J.JJIMEI.2022.100123
Kutela, B., Novat, N., Adanu, E. K., Kidando, E., & Langa, N. (2022). Analysis of residents’ stated
preferences of shared micro-mobility devices using regression-text mining approach. Transportation
Planning and Technology, 45(2), 159–178. https://doi.org/10.1080/03081060.2022.2089145
Kutela, B., Novat, N., & Langa, N. (2021). Exploring geographical distribution of transportation research
themes related to COVID-19 using text network approach. Sustainable Cities and Society, 67.
https://doi.org/10.1016/j.scs.2021.102729
Kutela, B., & Teng, H. (2021). The Use of Dynamic Message Signs (DMSs) on the Freeways: An
Empirical Analysis of DMSs Logs and Survey Data. Journal of Transportation Technologies,
11(01), 90–107. https://doi.org/10.4236/jtts.2021.111006
Kwayu, K. M., Kwigizile, V., Lee, K., Oh, J.-S., & Oh, J.-S. (2021). Discovering latent themes in traffic
fatal crash narratives using text mining analytics and network topology. Accident Analysis and
Prevention, 150, 105899. https://doi.org/10.1016/j.aap.2020.105899
McKee, F., & Noever, D. (2022). Chatbots in a Botnet World. ArXiv.
https://doi.org/10.48550/arxiv.2212.11126
Mollick, E. R., & Mollick, L. (2022). New Modes of Learning Enabled by AI Chatbots: Three Methods
and Assignments. SSRN Electronic Journal. https://doi.org/10.2139/SSRN.4300783
Noever, D., & Ciolino, M. (2022). The Turing Deception. ArXiv.
https://doi.org/10.48550/arxiv.2212.06721

16

Electronic copy available at: https://ssrn.com/abstract=4329120


Paranyushkin, D. (2011). Identifying the Pathways for Meaning Circulation using Text Network
Analysis. Venture Fiction Practices, 2(4). www.noduslabs.com
Pranckevičius, T., & Marcinkevičius, V. (2017). Comparison of Naïve Bayes, Random Forest, Decision
Tree, Support Vector Machines, and Logistic Regression Classifiers for Text Reviews
Classification. Baltic J. Modern Computing, 5(2), 221–232.
https://doi.org/10.22364/bjmc.2017.5.2.05
Qadir, J. (2022). Engineering Education in the Era of ChatGPT: Promise and Pitfalls of Generative AI
for Education. https://doi.org/10.36227/TECHRXIV.21789434.V1
Stokel-Walker, C. (2022). AI bot ChatGPT writes smart essays - should professors worry? Nature.
https://doi.org/10.1038/D41586-022-04397-7
Susnjak, T. (2022). ChatGPT: The End of Online Exam Integrity?
https://doi.org/10.48550/arxiv.2212.09292
Thunström, A. O. (2022). We Asked GPT-3 to Write an Academic Paper about Itself--Then We Tried to
Get It Published - Scientific American. https://www.scientificamerican.com/article/we-asked-gpt-3-
to-write-an-academic-paper-about-itself-mdash-then-we-tried-to-get-it-published/
Wenzlaff, K., & Spaeth, S. (2022). Smarter than Humans? Validating how OpenAI’s ChatGPT Model
Explains Crowdfunding, Alternative Finance and Community Finance. SSRN Electronic Journal.
https://doi.org/10.2139/SSRN.4302443
Yuan, J., Abdel-Aty, M., Gong, Y., & Cai, Q. (2019). Real-Time Crash Risk Prediction using Long
Short-Term Memory Recurrent Neural Network. Transportation Research Record: Journal of the
Transportation Research Board, 2673(4), 314–326. https://doi.org/10.1177/0361198119840611
Zhai, X. (2022). ChatGPT User Experience: Implications for Education View project ChatGPT User
Experience: Implications for Education. https://www.researchgate.net/publication/366463233

17

Electronic copy available at: https://ssrn.com/abstract=4329120

You might also like