Chatgpt'S Scientific Writings: A Case Study On Traffic Safety
Chatgpt'S Scientific Writings: A Case Study On Traffic Safety
Methodology
As described earlier, this study intends to explore whether ChatGPT can produce publication-ready
materials comparable to human-generated text in published journals. This section presents the
methodological approach used to attain the study objectives. The section is divided into two main sections,
data description, and analytical methods.
Data Description
To explore the capability of ChatGPT in generating publication-ready materials, two types of data are
necessary, human-written text and ChatGPT-generated text. In this study, the authors utilized the
introduction section of published papers as the human-written text. Then, the paper’s title and other
descriptions were used to create the ChatGPT-generated text. The introduction section of the paper was
selected because of the easiness of developing the introduction section for a well-defined title. The
following prompt was used to generate the ChatGPT introduction section.
I want you to develop an introduction section of a manuscript for publication. I will give you a
number of titles then I want you to give me the introduction section of the paper. You need to adopt
a persona of a highly skilled writer in traffic safety. In your writeup, include the actual citations,
actual references, and actual traffic safety statistics. The first title is “Title of the paper”.
The top-cited research papers in traffic safety were selected. This was performed by using the Web of
Science database whereby “traffic safety”, “traffic crashes”, and “traffic accidents” were used to obtain the
traffic safety-related manuscripts. Then, we ranked the papers in descending order based on the number of
citations. A minimum of 30 citations was considered as a cutoff point, yielding 525 papers. The highly cited
To prepare the ChatGPT-generated introductions, the titles of 327 original manuscript titles were used.
These titles were supplied to the prompt mentioned earlier. Each Chat-GPT-based introduction was
matched to the human-written introduction for further analysis.
Analytical Methods
This study applied two analytical methods to the text data, supervised and unsupervised text
mining. The supervised text mining utilized five text classifiers, Support Vector Machine (SVM), Random
Forest (RF), Naïve Bayes (NB), Logitboost, and Neural Network (NNet), to understand the key features
associated with the ChatGPT-generated text. The selection of these five classifiers was based on their
capability to predict text data (Arteaga et al., 2020; Joachims, 1998; Pranckevičius & Marcinkevičius, 2017;
Yuan et al., 2019). To perform the supervised text mining, all the ChatGPT-generated introductions were
categorized as “1,” and the human-generated introductions were categorized as “0”. Three performance
measures, accuracy, recall, and F1-Score, described in Equations 1, 2, and 3, respectively, were used to
evaluate the performance of the five classifiers. However, since the objective was to determine how the
ChatGPT texts were predicted, the recall scores are important. To assess the performance of the classifiers,
the dataset was divided into two portions, 70% for training purposes and 30% for testing purposes. In this
study, the assumption is that if the supervised text classifiers can have a high precision/recall score, then
the ChatGPT content is significantly different from the human-written content. The top 30 important
features from each classifier were used for the interpretation of the results.
𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (1)
𝑇𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (2)
𝑇𝑃 + 𝐹𝑁
2 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙 ∗ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
𝐹 − 1 𝑠𝑐𝑜𝑟𝑒 = (3)
𝑅𝑒𝑐𝑎𝑙𝑙 + 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
Whereby,
Although supervised text mining provides insights into the classification of texts based on the writer
(i.e., ChatGPT vs. human), it does not provide enough information on the content of the writeups (Kutela,
Das, et al., 2021; Kutela, Magehema, et al., 2022; Kutela, Novat, et al., 2022). In this study, the content is
as important as knowing the information source. Thus, unsupervised text mining was deemed necessary.
Two types of unsupervised text mining methods were applied, text network analysis (TNA) and text cluster.
The next section discusses the two applied unsupervised text mining approaches.
The first method, Text Network Analysis (TNA) has been effectively used in various fields such
as literature and linguistics (Hunter, 2014), traffic safety and operations (Boniphace Kutela & Teng, 2021;
Kwayu et al., 2021), and bibliometrics of transportation studies (Jiang et al., 2020). Using nodes and edges,
TNA establishes relationships between keywords within a corpus (see Figure 1). TNA’s strength lies in its
ability to visualize keywords and establish a connection among them (Jiang et al., 2020; B. Kutela et al.,
2021; Boniphace Kutela et al., 2021; Paranyushkin, 2011). The size of the nodes and the edges represent
the frequency and co-occurrence of keywords in the network, respectively.
Node/keyword
Edge/Co-occurrence
Community
When performing TNA analysis, various processes are performed on the data. The first process involved is
normalization, where unstructured data is converted to structured data, all symbols are removed, and all
texts are converted to lowercase. The output from this process is then used to create a matrix of keywords
along with their frequencies of occurrence. The constructed matrix is then visualized with keywords as
nodes of various sizes depending on their recorded frequencies. Various metrics can be used for
Lastly, text clustering using a simple Reinert textual clustering method was applied to understand the key
clusters for each source of the introduction section (Barnier, 2022). This method follows the data
preparation approach like TNA. However, the product of this approach is a number of clusters that
represents a certain theme. The next section presents the results and discussion.
Since NB and logitboost performance were relatively lower than the rest, the SVM, RF, and NNet classifiers
were used to determine important features. Figure 2 presents the 30 topmost important features of the SVM,
RF, and NNet classifiers. According to the results in Figure 2, the classifiers have relatively similar topmost
keywords, but their ranking varies. This observation explains why the classifiers have a relatively similar
performance score. Therefore, from this analysis, all three classifiers suggest that ChatGPT-generated texts
can be flagged due to their auto-generated textual content for developing the introduction sections. Overall,
15 keywords (50%) among the 30 top-listed key features appear in all three classifiers. Six unique features
were observed under the SVM classifier, two under RF, and eleven under NNet. Some features were shared
between two classifiers. For instance, SVM and RF shared nine features, while RF and NNet shared four
10
In addition to the individual keywords, the collocated keywords results can be used to distinguish
ChatGPT-generated introductions from human-generated introductions. In this case, the three collocated
keywords were used to distinguish the two introductions. The results in Table 3 show that the two
11
The text network and associated metrics provide the details of the content of the introductions, which
facilitates the comparisons of the two. However, in order to understand the content in the overall picture,
the clusters of keywords are important. The next section presents the text clusters of the keywords.
12
13
• Since ChatGPT produces relatively different outputs, overreliance on it would result in published
papers having a similar pattern that is easily identifiable. Plagiarism software with the use of AI
can easily flag ChatGPT based contents. It is suggested to avoid ChatGPT based contents for
scholarly writings.
• ChatGPT should provide warnings or restrictions on generating fake texts for scientific writings.
Both plagiarism and AI-generated fake texts can harm the rigor of scientific works.
• Since there are some similarities in terms of the contents of the text (ChatGPT vs. human-written
texts), a person who intends to use ChatGPT for academic/publishing purposes needs to carefully
review the ChatGPT-generated content and edit it necessary. This is to say, ChatGPT content
should only be used as guiding materials and not as final material.
• Since there have been some developments in detecting ChatGPT-created content, journal editors
should adopt these tools to detect such content. The authors need to provide a disclaimer at the end
of the manuscript if ChatGPT has been used for some contents of the manuscript.
This study has successfully shown how ChatGPT-generated introduction sections of published papers differ
from those generated by a human. The study used traffic safety-related manuscripts to answer the research
question. The results showed a clear difference between the two. However, this might be attributed to the
fact that traffic safety studies might not be very popular. It would be of interest to explore the performance
of ChatGTP in other areas, such as social science, economics, and politics, that are relatively popular and
utilized by numerous people and have millions of content/papers that have already been published.
Additionally, the prompt may produce different results based on the level of the details supplied. Future
studies may evaluate the difference in the performance of the ChatGPT given different type/detail level of
the prompt.
14
Funding
Authors received no funding from any source.
References
Arteaga, C., Paz, A., & Park, J. W. (2020). Injury severity on traffic crashes: A text mining with an
interpretable machine-learning approach. Safety Science, 132.
https://doi.org/10.1016/j.ssci.2020.104988
Aydın, Ö., & Karaarslan, E. (2022). OpenAI ChatGPT Generated Literature Review: Digital Twin in
Healthcare. SSRN Electronic Journal. https://doi.org/10.2139/SSRN.4308687
Barnier, J. (2022). Introduction to rainette.
https://juba.github.io/rainette/articles/introduction_en.html#double-clustering
Bommarito, M. J., & Katz, D. M. (2022). GPT Takes the Bar Exam. SSRN Electronic Journal.
https://doi.org/10.2139/SSRN.4314839
Cerullo, M. (2023). Princeton student says his new app helps teachers find ChatGPT cheats - CBS News.
https://www.cbsnews.com/news/chatgpt-princeton-student-gptzero-app-edward-tian/
ChatGPT, O. A. A., & Perlman, A. (2022). The Implications of OpenAI’s Assistant for Legal Services
and Society. SSRN Electronic Journal. https://doi.org/10.2139/SSRN.4294197
Chen, Y., & Eger, S. (2022). Transformers Go for the LOLs: Generating (Humourous) Titles from
Scientific Abstracts End-to-End. ArXiv. https://doi.org/10.48550/arxiv.2212.10522
Frye, B. L. (2022). Should Using an AI Text Generator to Produce Academic Writing Be Plagiarism?
https://papers.ssrn.com/abstract=4292283
Gao, C. A., Howard, F. M., Markov, N. S., Dyer, E. C., Ramesh, S., Luo, Y., & Pearson, A. T. (2022).
Comparing scientific abstracts generated by ChatGPT to original abstracts using an artificial
intelligence output detector, plagiarism detector, and blinded human reviewers. BioRxiv,
2022.12.23.521610. https://doi.org/10.1101/2022.12.23.521610
GPTZero. (2023). GPTZero: Humans Deserve to Know the Truth. https://gptzero.me/
Graham, F. (2022). Daily briefing: Will ChatGPT kill the essay assignment? Nature.
https://doi.org/10.1038/D41586-022-04437-2
Hunter, S. (2014). A Novel Method of Network Text Analysis. Open Journal of Modern Linguistics, 4,
350–366. https://doi.org/10.4236/ojml.2014.42028
15
16
17