0% found this document useful (0 votes)
81 views14 pages

Hiiiijhi

This survey explores the intersection of Explainable AI (XAI) and Large Language Models (LLMs), emphasizing the need for transparency and interpretability in LLMs due to their increasing integration in various industries. The authors outline key challenges, research gaps, and future directions for enhancing the explainability of LLMs, advocating for a balanced approach that prioritizes both performance and interpretability. The paper categorizes existing literature and presents a systematic mapping study to provide a comprehensive overview of the current state of research in this area.

Uploaded by

maddy.mahad2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views14 pages

Hiiiijhi

This survey explores the intersection of Explainable AI (XAI) and Large Language Models (LLMs), emphasizing the need for transparency and interpretability in LLMs due to their increasing integration in various industries. The authors outline key challenges, research gaps, and future directions for enhancing the explainability of LLMs, advocating for a balanced approach that prioritizes both performance and interpretability. The paper categorizes existing literature and presents a systematic mapping study to provide a comprehensive overview of the current state of research in this area.

Uploaded by

maddy.mahad2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1

XAI meets LLMs: A Survey of the


Relation between Explainable AI and
Large Language Models
Erik Cambria
School of Computer Science and Engineering
Nanyang Technological University, Singapore
arXiv:2407.15248v1 [[Link]] 21 Jul 2024

cambria@[Link]
Lorenzo Malandri
Dept. of Statistics and Quantitative Methods
University of Milano-Bicocca, Milan, Italy
[Link]@[Link]
Fabio Mercorio
Dept. of Statistics and Quantitative Methods
University of Milano-Bicocca, Milan, Italy
[Link]@[Link]
Navid Nobani
Dept. of Statistics and Quantitative Methods
University of Milano-Bicocca, Milan, Italy
[Link]@[Link]
Andrea Seveso
Dept. of Statistics and Quantitative Methods
University of Milano-Bicocca, Milan, Italy
[Link]@[Link]

Abstract—In this survey, we address the key chal- community to advance both LLM and XAI fields
lenges in Large Language Models (LLM) research, together.
focusing on the importance of interpretability. Driven
by increasing interest from AI and business sectors, Index Terms—Explainable Artificial Intelligence,
we highlight the need for transparency in LLMs. We Interpretable Machine Learning, Large Language
examine the dual paths in current LLM research and Models, Natural Language Processing
eXplainable Artificial Intelligence (XAI): enhancing
performance through XAI and the emerging focus
on model interpretability. Our paper advocates for a I. I NTRODUCTION
balanced approach that values interpretability equally
with functional advancements. Recognizing the rapid
development in LLM research, our survey includes
both peer-reviewed and preprint (arXiv) papers, of-
T HE emergence of LLMs has significantly im-
pacted Artificial Intelligence (AI), given their
excellence in several Natural Language Processing
fering a comprehensive overview of XAI’s role in (NLP) applications. Their versatility reduces the
LLM research. We conclude by urging the research need for handcrafted features, enabling applications
2

across various domains. Their heightened creativity a clear and organised overview of the state of the
in content generation and contextual understand- art.
ing contributes to advancements in creative writing 2) We conduct a comprehensive survey of peer-
and conversational AI. Additionally, extensive pre- reviewed and preprint papers based on ArXiv
training on large amounts of data enables LLMs and DBLP databases, going beyond using com-
to exhibit strong generalisation capacities without mon research tools.
further domain-specific data from the user Zhao 3) We critically assess current practices, identifying
et al. [2023a], Amin et al. [2023]. For those research gaps and issues and articulating poten-
reasons, LLMs are swiftly becoming mainstream tial future research trajectories.
tools, deeply integrated into many industry sectors,
such as medicine (see, e.g., Thirunavukarasu et al. B. Research questions
[2023]) and finance (see, e.g., Wu et al. [2023a]),
to name a few. In this survey, we explore the coexistence of
However, their emergence also raises ethical XAI methods with LLMs and how these two fields
concerns, necessitating ongoing efforts to address are [Link], our investigation revolves
issues related to bias, misinformation, and respon- around these key questions:
sible AI deployment. LLMs are a notoriously com- Q1 How are XAI techniques currently being inte-
plex “black-box” system. Their inner workings are grated with LLMs?
opaque, and their intricate complexity makes their Q2 What are the emerging trends in converging
interpretation challenging Kaadoud et al. [2021], LLMs with XAI methodologies?
Cambria et al. [2023a]. Such opaqueness can lead to Q3 What are the gaps in the current related litera-
the production of inappropriate content or mislead- ture, and what areas require further research?
ing outputs Weidinger et al. [2021]. Finally, lacking
visibility on their training data can further hinder II. T HE N EED FOR E XPLANATIONS IN LLM S
trust and accountability in critical applications Liu In XAI field, the intersection with LLMs presents
[2023]. unique challenges and opportunities. This survey
In this context, XAI is a crucial bridge between paper aims to dissect these challenges, extending
complex LLM-based systems and human under- the dialogue beyond the conventional understand-
standing of their behaviour. Developing XAI frame- ing of XAI’s objective, which is to illuminate the
works for LLMs is essential for building user trust, inner mechanisms of opaque models for various
ensuring accountability and fostering a responsible stakeholders while avoiding the introduction of new
and ethical use of those models. uncertainties (See e.g., Cambria et al. [2023b],
In this article, we review and categorise current Burkart and Huber [2021]).
XAI for LLMs in a structured manner. Emphasising Despite their advancements, LLMs struggle with
the importance of clear and truthful explanations, as complexity and opacity, raising design, deployment
suggested by Sevastjanova and El-Assady [2022], and interpretation issues. Inspired by Weidinger
this survey aims to guide future research towards et al. [2021], this paper categorises LLM challenges
enhancing LLMs’ explainability and trustworthiness into user-visible and invisible ones.
in practical applications. a) Visible User Challenges: Directly perceiv-
able challenges for users without specialised tools.
b) Trust and Transparency: Trust issues
A. Contribution
arise in crucial domains, e.g., healthcare Merco-
The contribution of our work is threefold: rio et al. [2020], Gozzi et al. [2022], Alimonda
1) We introduce a novel categorisation framework et al. [2022] or finance Xing et al. [2020], Castel-
for assessing the body of research concerning the novo et al. [2023], Yeo et al. [2023], due to the
explainability of LLMs. The framework provides opacity of black-box models, including LLMs.
3

XAI must offer transparent, ethically aligned ex- SLRs focus on specific questions, thoroughly
planations for wider acceptance, especially under review fewer publications, and strive for precise,
stringent regulations that mandate explainability evidence-based outcomes Barn et al. [2017].
(e.g., EU’s GDPR Novelli et al. [2024]). This im- Following Martı́nez-Gárate et al. [2023], we
pacts regulatory compliance and public credibil- designed our SMS for XAI and LLMs, including
ity, with examples in European skill intelligence peer-reviewed and preprint papers. The latter
projects requiring XAI for decision explanations choice is because we believe in rapidly evolving
Malandri et al. [2022a, 2024, 2022b,c]. fields like computer science, including preprints
c) Misuse and Critical Thinking Impacts: offering access to the latest research, essential
LLMs’ versatility risks misuse, such as con- for a comprehensive review Oikonomidi et al.
tent creation for harmful purposes and evading [2020].
moderation Shen et al. [2023]. Over-reliance on We followed these steps to structure our SMS:
LLMs may also erode critical thinking and inde- Section I-B proposes and defines the research
pendent analysis, as seen in educational contexts questions, Section III-A describes how the pa-
(see, e.g. Abd-Alrazaq et al. [2023]). per retrieval has been performed; Section III-B
d) Invisible User Challenges: Challenges re- describes the paper selection process based on
quiring deeper model understanding. the defined criteria; Section III-C explains who
e) Ethical and Privacy Concerns: Ethical we dealt with false positive results and finally in
dilemmas from LLM use, such as fairness and Section IV we describe the obtained results.
hate speech issues, and privacy risks like sensitive
data exposure, require proactive measures and A. Paper retrieval
ethical guidelines Weidinger et al. [2021], Yan
et al. [2023], Salimi and Saheb [2023]. a) Overview: Instead of utilising common
f) Inaccuracies and Hallucinations: LLMs scientific search engines such as Google Scholar,
can generate false information, posing risks in we employed a custom search methodology de-
various sectors like education, journalism, and scribed in the following part. By scrutinising the
healthcare. Addressing these issues involves im- titles and abstracts of the obtained papers, we
proving LLM accuracy, educating users, and conducted targeted searches using a predefined
developing fact-checking systems Rawte et al. set of keywords pertinent to LLMs and XAI. This
[2023], Azaria and Mitchell [2023]. manual and deliberate search strategy was cho-
sen to minimise the risk of overlooking relevant
studies that automated search algorithms might
III. M ETHODOLOGY
miss and ensure our SMS dataset’s accuracy
Systematic Mapping Studies (SMSs) are com- and relevance. Through this rigorous process, we
prehensive surveys that categorise and summarise constructed a well-defined corpus of literature
a range of published works in a specific research poised for in-depth analysis and review. Figure
area, identifying literature gaps, trends, and future 1 provides an overview of this process.
research needs. They are especially useful in b) Peer-reviewed papers: We initiated this
large or under-explored fields where a detailed step by identifying top-tier Q1 journals within
Systematic Literature Review (SLR) may not be the “Artificial Intelligence” category of 2022 (last
feasible. year available at the start of the study), providing
SMS and SLR follow a three-phase method us with 58 journals from which to draw relevant
(planning, conducting, reporting) but differ in publications.
their approach, as SMSs address broader ques- Subsequently, we utilised the XML dump1
tions, cover a wider range of publications with from dblp computer science bibliography to get
a less detailed review, and aim to provide
an overview of the research field. In contrast, 1 [Link]
4

the titles of all papers published in the identified some research keywords possess a broad mean-
Q1 journals, except ten journals not covered by ing, for instance the words ’explain’ and ’in-
dblp. Once we gathered these paper titles, we terpret’ can be used in contexts different from
proceeded to find their abstract. To do so, we the one of XAI, we retrieved few false posi-
initially used the last available citation network tive papers, i.e., papers not dealing with both
of AMiner2 but given that this dump lacks the XAI and LLMs. We excluded the false pos-
majority of 2023 publications, we leveraged Sco- itives—publications that address only XAI or
pus API, a detailed database of scientific abstracts LLMs independently or none of them. To do
and citations, to retrieve the missing abstracts so, we manually analysed the title and abstract
corresponding to the amassed titles. of each paper. This meticulous vetting process
c) Pre-print papers: We scraped all com- resulted in 233 papers relevant to XAI and LLMs.
puter science papers presented in the Arxiv Given that including all these papers in our sur-
database from 2010 until October 2023, resulting vey was not feasible, we have selected the most
in 548,711 papers. Consequently, we used the relevant ones, based on their average number of
Arxiv API to get the abstracts of these papers. citations per year. The whole research process
resulted in 35 articles selected.
B. Paper selection
We employed a comprehensive set of keywords IV. R ETRIEVAL R ESULTS
to filter the collected papers for relevance to We divide papers into two macro-categories
LLMs and XAI. The search terms were care- of Applicaiton papers, i.e., papers that somehow
fully chosen to encompass the various terminolo- generated explanations, either towards explain-
gies and phrases commonly associated with each ability or to use them as a feature for another task,
field.3 and Discussion papers, i.e., papers that do not
In our search, we applied a logical OR operator engage with explanation generation but address
within the members of each list to capture any an issue or research gap regarding the explainable
of the terms within a single category, and an LLM models.
AND operator was used between the two lists to
ensure that only papers containing terms from A. Application Papers
both categories were retrieved for our analysis. The first macro-category includes papers using
LLMs in a methodology, tool, or task. Based
C. Dealing with false positives on how LLMs are used, we further divide this
Upon completion of the initial retrieval phase, category into two sub-categories as follows: ”To
we identified a total of 1,030 manuscripts. Since explain”, i.e., papers which try to explain how
2 [Link]
LLMs work and provide an insight into the
[Link]
3 The opaque nature of these models. The second sub-
keywords for XAI included: [’xai’, ’explain’, ’explana-
tion’, ’interpret’, ’black box’, ’black-box’, ’blackbox’, ’transpar- category of papers called ”As feature”, uses the
ent model understanding’, ’feature importance’, ’accountable ai’, explanations and features generated by LLMs to
’ethical ai’, ’trustworthy ai’, ’fairness’, ’ai justification’, ’causal improve the results of various tasks. The follow-
inference’, ’ai audit’]
While for LLMs, the keywords are; [’llm’, ’large language ing parts discuss these sub-categories:
model’, ’gpt-3’, ’gpt-2’, ’gpt3’, ’gpt2’, ’bert’, ’language model 1) To Explain: Most papers, i.e., 17 out of 35,
pre-training’, ’fine-tuning language models’, ’generative pre- fit into this sub-category, with most addressing
trained transformer’, ’llama’, ’ bard’, ’roberta’, ’ T5’, ’xl-
net’, ’megatron’, ’electra’, ’deberta’, ’ ernie’, ’ albert’, ’ bart’, the need for more interpretable and transparent
’blenderbot’, ’open pre-trained transformer’, ’mt-nlg’, ’turing- LLMs.
nlg’, ’pegasus’, ’gpt-3.5’, ’gpt-4’, ’gpt3.5’, ’gpt4’, ’ cohere’, For instance, Vig [2019] introduces a visuali-
’claude’, ’jurassic-1’, ’openllama’, ’falcon’, ’dolly’, ’mpt’, ’gua-
naco’, ’bloom’, ’ alpaca’, ’openchatkit’, ’gpt4all’, ’flan-t5’, sation tool for understanding the attention mech-
’orca’] anism in Transformer models like BERT and
5

2 Paper retrieval

Defining Arxiv
1 research
questions Not
available
abstracts
Preprint
Papers

Title + Abstract
3
Paper All
selection Papers

dblp

Dataset 1 Peer-reviewed
Title Papers
Elimination of Abstract
4 Aggregation
false positive Title + Abstract
Filter DBLP-Citation
network V14
Dataset
+ 2

Title Title
Title + Abstract

Not Not
Paper List of available available
5 journals
classification Journals abstracts

Fig. 1: The process used for getting the papers related to our keywords, including the definition of research
questions, paper retrieval, paper selection, elimination of false positives and classifying papers in the pre-
defined categories.

GPT-2. Their proposed tool provides insights approach for visual classification using descrip-
at multiple scales, from individual neurons to tions generated by LLMs. This method, which
whole model layers, helping to detect model bias, they term “classification by description,” involves
locate relevant attention heads, and link neurons using LLMs like GPT-3 to generate descriptive
to model behaviour. features of visual categories. These features are
Swamy et al. [2021] presents a methodol- then used to classify images more accurately
ogy for interpreting the knowledge acquisition while providing more transparent results than
and linguistic skills of BERT-based language traditional methods that rely solely on category
models by extracting knowledge graphs from names.
these models at different stages of their training.
Gao et al. [2023a] examines ChatGPT’s ca-
Knowledge graphs are often used for explainable
pabilities in causal reasoning using tasks like
extrapolation reasoning Lin et al. [2023].
Event Causality Identification (ECI), Causal Dis-
Wu et al. [2021] propose Polyjuice, a general-
covery (CD), and Causal Explanation Generation
purpose counterfactual generator. This tool gen-
(CEG). The authors claim that while ChatGPT
erates diverse, realistic counterfactuals by fine-
is effective as a causal explainer, it struggles
tuning GPT-2 on multiple datasets, allowing for
with causal reasoning and often exhibits causal
controlled perturbations regarding type and loca-
hallucinations. The study also investigates the
tion.
impact of In-Context Learning (ICL) and Chain-
Wang et al. [2022] investigates the mechanistic
of-Thought (CoT) techniques, concluding that
interpretability of GPT-2 small, particularly its
ChatGPT’s causal reasoning ability is highly sen-
ability to identify indirect objects in sentences.
sitive to the structure and wording of prompts.
The study involves circuit analysis and reverse
engineering of the model’s computational graph, Pan et al. [2023] is a framework that aims to
identifying specific attention heads and their roles enhance LLMs with explicit, structured knowl-
in this task. edge from KGs, addressing issues like halluci-
Menon and Vondrick [2022] introduce a novel nations and lack of interpretability. The paper
6

Paper and Tool Star Fork Update Target Agnostic Goal

Vig [2019] BertViz 6.1k 734 08/23 Transformers ✓ C E IMP INT R


Swamy et al. [2021] Experiments 19 2 05/22 BERT-based LM ✗ C E IMP INT R
Wu et al. [2021] Polyjuice 90 16 08/22 - ✓ C E IMP INT R
Wang et al. [2022] TransformerLens 48 161 01/23 GPT2-small ✗ C E IMP INT R
Menon and Vondrick [2022] - - - - Vision-LM ✓ C E IMP INT R
Gao et al. [2023a] Experiments 17 0 10/23 ChatGPT ✗ C E IMP INT R
Pan et al. [2023] - - - - LLMs ✓ C E IMP INT R
Conmy et al. [2023] ACDC 105 23 11/23 Transformers ✓ C E IMP INT R
He et al. [2022] RR 38 2 02/23 LLMs ✓ C E IMP INT R
Yoran et al. [2023] MCR 71 9 01/24 LLMs ✓ C E IMP INT R
Sarti et al. [2023] Inseq 250 26 01/24 SeqGen models ✓ C E IMP INT R
Wu et al. [2023b] Boundless DAS 0 17 01/24 LLMs ✓ C E IMP INT R
Li et al. [2023] XICL 1 3 11/23 LLMs ✓ C E IMP INT R
Chen et al. [2023] LMExplainer - - - LLMs ✓ C E IMP INT R
Gao et al. [2023b] Chat-REC - - - Rec. systems ✗ C E IMP INT R

Zhang et al. [2022] DSRLM 9 1 07/23 LLMs ✓ C E IMP INT R


Singh et al. [2023] SASC 61 14 01/24 LLMs ✓ C E IMP INT R
Li et al. [2022] - - - - LLMs ✓ C E IMP INT R
Ye and Durrett [2022] TextualExplInCon-
11 2 02/23 LLMs ✓ C E IMP INT R
text
Turpin et al. [2023] Experiments 25 9 03/23 LLMs ✓ C E IMP INT R
Kang et al. [2023] AutoSD - - - Debugging models ✗ C E IMP INT R
Krishna et al. [2023] AMPLIFY - - - LLMs ✓ C E IMP INT R
Yang et al. [2023] Labo 51 4 12/23 CBM ✗ C E IMP INT R
Bitton-Guetta et al. [2023] WHOOPS! - - - LLMs ✓ C E IMP INT R
Shi et al. [2023] Chatgraph 2 0 07/23 LLMs ✓ C E IMP INT R

TABLE I: Synthesis of recent application papers, summarising engagement indicators as of January 2024,
update timelines, model specificity, and the overarching aims of each study. In the first section of the
table, To Explain papers are listed, and As Feature works in the second. Stars, forks, and last updates
are not reported (-) for papers lacking associated repositories. Target is the specific focus of the study,
such as a particular type of language model. Agnostic indicates whether the study is model-agnostic or
not. The goal represents the primary objective of each study: comparison of models (C), explanation (E),
improvement (IMP), interpretability (INT), and reasoning (R).

outlines three main approaches: KG-enhanced external knowledge to enhance the faithfulness
LLMs, LLM-augmented KGs, and synergised of explanations and improve overall performance.
LLMs with KGs. This unification improves the This approach, called Rethinking with Retrieval,
performance and explainability of AI systems in uses CoT prompting to generate reasoning paths
various applications. refined with relevant external knowledge. The
Conmy et al. [2023] focuses on automating a authors claim that their method significantly im-
part of the mechanistic interpretability workflow proves the performance of LLMs on complex
in neural networks. Using algorithms like Auto- reasoning tasks by producing more accurate and
matic Circuit Discovery (ACDC), the authors au- reliable explanations.
tomate the identification of sub-graphs in neural Multi-Chain Reasoning (MCR) introduced
models that correspond to specific behaviours or by Yoran et al. [2023] improves question-
functionalities. answering in LLMs by prompting them to meta-
He et al. [2022] presents a novel post- reason over multiple reasoning chains. This ap-
processing approach for LLMs that leverages proach helps select relevant facts, mix informa-
7

tion from different chains, and generate better reasoning with pre-trained language models. The
explanations for the answers. The paper demon- authors claim their framework improves logical
strates MCR’s superior performance over previ- reasoning in language models through a sym-
ous methods, especially in multi-hop question- bolic module that performs deductive reasoning,
answering. enhancing accuracy on deductive reasoning tasks.
Inseq Sarti et al. [2023] is a Python library that 2) As Feature: Papers in this sub-category do
facilitates interpretability analyses of sequence not directly aim to provide more transparent mod-
generation models. The toolkit focuses on ex- els or explain LLM-based models. Instead, they
tracting model internals and feature importance use LLMs to generate reasoning and descriptions,
scores, particularly for transformer architectures. which are used as input to a secondary task.
It centralises access to various feature attribution For instance, Li et al. [2022] explore how
methods, intuitively representable with visualisa- LLMs’ explanations can enhance the reasoning
tions such as heatmaps Aminimehr et al. [2023], capabilities of smaller language models (SLMs).
promoting fair and reproducible evaluations of They introduce a multi-task learning framework
sequence generation models. where SLMs are trained with explanations from
Boundless Distributed Alignment Search LLMs, leading to improved performance in rea-
(Boundless DAS) introduced by Wu et al. soning tasks.
[2023b] is a method for identifying interpretable Ye and Durrett [2022] evaluates the reliability
causal structures in LLMs. In their paper, the of explanations generated by LLMs in few-shot
authors demonstrate that the Alpaca model, a learning scenarios. The authors claim that LLM
7B parameter LLM, solves numerical reasoning explanations often do not significantly improve
problems by implementing simple algorithms learning performance and can be factually unre-
with interpretable boolean variables. liable by highlighting the potential misalignment
Li et al. [2023] investigate how various demon- between LLM reasoning and factual correctness
strations influence ICL in LLMs by exploring the in their explanations.
impact of contrastive input-label demonstration Turpin et al. [2023] investigates the reliabil-
pairs, including label flipping, input perturbation, ity of CoT reasoning. The authors claim that
and adding complementary explanations. The while CoT can improve task performance, it can
study employs saliency maps to qualitatively and also systematically misrepresent the true reason
quantitatively analyse how these demonstrations behind a model’s prediction. They demonstrate
affect the predictions of LLMs. this through experiments showing how biasing
LMExplainer Chen et al. [2023] is a method features in model inputs, such as reordering
for interpreting the decision-making processes of multiple-choice options, can heavily influence
LMs. This approach combines a knowledge graph CoT explanations without being acknowledged in
and a graph attention neural network to explain the explanation itself.
the reasoning behind an LM’s predictions. Kang et al. [2023] introduce an approach for
Gao et al. [2023b] propose a novel recom- automating the debugging process called Auto-
mendation system framework, Chat-REC, which mated Scientific Debugging (AutoSD). This ap-
integrates LLMs for generating more interactive proach leverages LLMs to generate hypotheses
and explainable recommendations. The system about bugs in code and uses debuggers to interact
converts user-profiles and interaction histories with the buggy code. This approach leads to
into prompts for LLMs, enhancing the recom- automated conclusions and patch generation and
mendation process with the ICL capabilities of provides clear explanations for the debugging
LLMs. decisions, potentially leading to more efficient
DSR-LM proposed by Zhang et al. [2022] is and accurate decisions by developers.
a framework combining differentiable symbolic Krishna et al. [2023] present a framework
8

called Amplifying Model Performance by Lever- B. Discussion Papers


aging In-Context Learning with Post Hoc Ex- Unlike the Application papers, this category
planations (AMPLIFY), aiming to improve the includes papers that target the argument of XAI
performance of LLMs on complex reasoning and through LLMs and vice versa but do not neces-
language understanding tasks by automating the sarily provide any specific methodology, frame-
generation of rationales. It leverages post hoc work or application. This category, in turn, is di-
explanation methods, which output attribution vided into two subcategories of Issues, or works
scores indicating the influence of each input which mention a concern and Benchmark and
feature on model predictions, to construct natu- Metrics, which mainly focus on evaluation and
ral language rationales. These rationales provide assessment of XAI methods in LLM field.
corrective signals to LLMs. 1) Issues: Bowman [2023] critically examines
LLMs, highlighting their unpredictability and the
Yang et al. [2023] introduces Language Guided emergent nature of their capabilities with scal-
Bottlenecks (LaBo), a method for constructing ing. They underscore the challenges in steering
high-performance Concept Bottleneck Models and interpreting LLMs and the necessity for a
(CBMs) without manual specification of con- nuanced understanding of their limitations and
cepts. LaBo leverages GPT-3 to generate fac- potential.
tual sentences about categories, forming candi- Liu et al. [2023] offers a survey and set of
date concepts for CBMs. These concepts are guidelines for assessing the alignment of LLMs
then aligned with images using CLIP Radford with human values and intentions. They cate-
et al. [2021] to form a bottleneck layer. The gorise and detail aspects of LLM trustworthiness,
method efficiently searches for bottlenecks using including reliability, safety, fairness, resistance to
a submodular utility, focusing on discriminative misuse, explainability, adherence to social norms,
and diverse information. The authors claim their and robustness.
method outperforms black box linear probes in Liao and Vaughan [2023] emphasise the need
few-shot classification tasks across 11 diverse for transparency in LLMs from a human-centred
datasets, showing comparable or better perfor- perspective. The authors discuss the unique chal-
mance with more data. lenges of achieving transparency with LLMs, dif-
ferentiating them from smaller, more specialised
models. The paper proposes a roadmap for re-
Bitton-Guetta et al. [2023] introduces search, emphasising the importance of under-
WHOOPS!, a new dataset and benchmark standing and addressing the transparency needs
designed to test AI models’ visual commonsense of diverse stakeholders in the LLM ecosystem.
reasoning abilities. The dataset comprises It advocates for developing and designing trans-
images intentionally defying commonsense, parency approaches that consider these stake-
created using image generation tools like holder needs, the novel applications of LLMs,
Midjourney. The paper assesses AI models on and their various usage patterns and associated
tasks such as image captioning, cross-modal challenges.
matching, visual question answering, and the Lastly, Xie et al. [2023] highlights the limita-
challenging task of explanation generation, tions of ChatGPT in explainability and stability in
where models must identify and explain the the context of financial market analysis through a
unusualness of an image. Results show that zero-shot analysis. The authors suggest the need
even advanced models like GPT3 and BLIP2 for more specialised training or fine-tuning.
struggle with these tasks, highlighting a gap in 2) Benchmark and Metrics: Lu et al. [2022]
AI’s visual commonsense reasoning compared introduce SCIENCEQA, a new dataset for mul-
to human performance. timodal science question answering. This dataset
9

includes around 21k questions with diverse sci- identified problems suggests an imperative for
ence topics and annotations, featuring lectures substantial engagement from the XAI community
and explanations to aid in understanding the to confront these issues adequately.
reasoning process. The authors demonstrate how a) Open-Source Engagement: Our survey
language models, particularly LLMs, can be study shows that more studies are moving beyond
trained to generate these lectures and explana- the traditional approach of merely describing
tions as part of a CoT process, enhancing their methodologies in text. Instead, they release them
reasoning capabilities. The study shows that CoT as tangible tools or open-source code, frequently
improves question-answering performance and hosted on platforms such as GitHub. This evo-
provides insights into the potential of LLMs to lution is a commendable step toward enhanc-
mimic human-like multi-step reasoning in com- ing transparency and reproducibility in computer
plex, multimodal domains. science research. The trend suggests a growing
Golovneva et al. [2022] introduce ROSCOE, a inclination among authors to release their code
set of metrics designed to evaluate the step-by- and publicly publish their tools, a notable change
step reasoning of language models, especially in from a few years ago. However, we should
scenarios without a golden reference. This work also mention the inconsistency in the level of
includes a taxonomy of reasoning errors and a community engagement with these repositories.
comprehensive evaluation of ROSCOE against While some repositories attract substantial inter-
baseline metrics across various reasoning tasks. est, fostering further development and improve-
The authors demonstrate ROSCOE’s effective- ment, others remain underutilised. This disparity
ness in assessing semantic consistency, logicality, in engagement raises important questions about
informativeness, fluency, and factuality in model- the factors influencing community interaction
generated rationales. with these resources.
Zhao et al. [2023b] presents a comprehensive b) Target: Predominantly, most works have
survey on explainability techniques for LLMs, directed their attention towards LLMs rather than
focusing on Transformer-based models. It cat- concentrating on more specialised or narrower
egorises these techniques based on traditional subjects within AI-based systems. This broad ap-
fine-tuning and prompting paradigms, detailing proach contrasts the relatively few studies that fo-
methods for generating local and global expla- cus specifically on Transformers or are confined
nations. The paper addresses the challenges and to examining particular categories of systems,
potential directions for future research in explain- such as recommendation systems. This overar-
ability, highlighting LLMs’ unique complexities ching focus on LLMs represents a positive and
and capabilities compared to conventional deep- impactful trend within the AI community. Given
learning models. Nevertheless, the survey mainly the rapid development and increasing prominence
focuses on XAI in general and has minimal of LLM systems in academic and practical appli-
coverage of the relationship between XAI and cations, this broader focus is timely and crucial
LLMs. for driving our understanding and capabilities
in this domain forward. It ensures that research
V. D ISCUSSION keeps pace with the advancements in the field,
Our analysis indicates that a limited number fostering a comprehensive and forward-looking
of the reviewed publications directly tackle the approach essential for AI technologies’ continued
challenges highlighted in Section II. For example, growth and evolution.
the work by Liu et al. [2023] focuses on trust- c) Goal: Our analysis, as delineated in Ta-
related concerns in LLMs, whereas Gao et al. ble I, reveals a bifurcation in the objectives of
[2023a] investigates the issue of misinformation the LLM studies under review. On the one hand,
propagation by LLMs. This scant attention to the a subset of these works is primarily dedicated to
10

explaining and enhancing the interpretability of challenges posed by the opacity of these sys-
these ’black box’ models. On the other hand, a tems. The importance of explainability should be
larger contingent is more task-oriented, focusing elevated from a mere ’nice-to-have’ feature to
on augmenting specific tasks and models, with an integral aspect of the development process.
interpretability emerging merely as a byproduct. This involves a proactive approach to incorporate
This dichotomy in research focus underscores a explainability in the design and implementation
pivotal trend: a pressing need to shift more at- phases of LLM-based systems. Such a shift in
tention towards demystifying the inner workings perspective is essential to ensure that these mod-
of LLMs. Rather than solely leveraging these els are effective, transparent and accountable.
models to boost task performance, their inher- Secondly, we urge researchers in the XAI field
ently opaque nature should not be overlooked. to broaden their investigative scope. The focus
The pursuit of performance improvements must should not only be on devising methodologies ca-
be balanced with efforts to unravel and clarify the pable of handling the complexity of LLM-based
underlying mechanisms of LLMs. This approach systems but also on enhancing the presentation
is crucial for fostering a deeper understanding of layer of these explanations. Currently, explana-
these complex systems, ensuring their application tions provided are often too complex for non-
is effective and transparent. Such a balanced technical stakeholders. Therefore, developing ap-
focus is essential for advancing the field tech- proaches that render these explanations more
nically and maintaining ethical and accountable accessible and understandable to a wider audi-
AI development. ence is imperative. This dual approach will make
LLMs more understandable and user-friendly and
bridge the gap between technical efficiency and
VI. C ONCLUSION ethical responsibility in AI development.
Our SMS reveals that only a handful of works
are dedicated to developing explanation methods R EFERENCES
for LLM-based systems. This finding is partic- Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi
ularly salient, considering the rapidly growing Tang, Xiaolei Wang, Yupeng Hou, Yingqian
prominence of LLMs in various applications. Our Min, Beichen Zhang, Junjie Zhang, Zican Dong,
study, therefore, serves a dual purpose in this et al. A survey of large language models.
context. Firstly, it acts as a navigational beacon arXiv:2303.18223, 2023a.
for the XAI community, highlighting the fertile Mostafa Amin, Erik Cambria, and Björn Schuller.
areas where efforts to create interpretable and Can ChatGPT’s responses boost traditional nat-
transparent LLM-based systems can effectively ural language processing? IEEE Intelligent Sys-
address the challenges the broader AI community tems, 38(5):5–11, 2023.
faces. Secondly, it is a call to action, urging Arun James Thirunavukarasu, Darren Shu Jeng
researchers and practitioners to venture into this Ting, Kabilan Elangovan, Laura Gutierrez,
relatively underexplored domain. The need for Ting Fang Tan, and Daniel Shu Wei Ting. Large
explanation methods in LLM-based systems is language models in medicine. Nature medicine,
not just a technical necessity but also a step pages 1–11, 2023.
towards responsible AI practice. By focusing on Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravol-
this area, the XAI community can contribute ski, Mark Dredze, Sebastian Gehrmann, Prab-
significantly to making AI systems more efficient, hanjan Kambadur, David Rosenberg, and Gideon
trustworthy and accountable. Mann. Bloomberggpt: A large language model
Our call for action is as follows: Firstly, for finance. arXiv:2303.17564, 2023a.
researchers employing LLM models must ac- Ikram Chraibi Kaadoud, Lina Fahed, and Philippe
knowledge and address the potential long-term Lenca. Explainable ai: a narrative review at
11

the crossroad of knowledge discovery, knowledge Artificial Intelligence and Neural Engineering
representation and representation learning. In (MetroXRAINE), pages 265–270. IEEE, 2022.
MRC, volume 2995, pages 28–40. ceur-ws. org, Frank Xing, Lorenzo Malandri, Yue Zhang, and
2021. Erik Cambria. Financial sentiment analysis: an
Erik Cambria, Rui Mao, Melvin Chen, Zhaoxia investigation into common mistakes and silver
Wang, and Seng-Beng Ho. Seven pillars for the bullets. In Proceedings of the 28th international
future of artificial intelligence. IEEE Intelligent conference on computational linguistics, pages
Systems, 38(6):62–69, 2023a. 978–987, 2020.
Laura Weidinger, John Mellor, Maribeth Rauh, Alessandro Castelnovo, Nicole Inverardi, Lorenzo
Conor Griffin, Jonathan Uesato, Po-Sen Huang, Malandri, Fabio Mercorio, Mario Mezzanzanica,
Myra Cheng, Mia Glaese, Borja Balle, Atoosa and Andrea Seveso. Leveraging group contrastive
Kasirzadeh, et al. Ethical and social risks of harm explanations for handling fairness. In World
from language models. arXiv:2112.04359, 2021. Conference on Explainable Artificial Intelligence,
Yang Liu. The importance of human-labeled data pages 332–345. Springer, 2023.
in the era of llms. In Proceedings of the Thirty- Wei Jie Yeo, Wihan van der Heever, Rui Mao,
Second International Joint Conference on Artifi- Erik Cambria, Ranjan Satapathy, and Gianmarco
cial Intelligence, pages 7026–7032, 2023. Mengaldo. A comprehensive review on financial
Rita Sevastjanova and Mennatallah El-Assady. Be- explainable ai. arXiv preprint arXiv:2309.11960,
ware the rationalization trap! when language 2023.
model explainability diverges from our mental Claudio Novelli, Federico Casolari, Philipp Hacker,
models of language, 2022. Giorgio Spedicato, and Luciano Floridi. Gener-
Erik Cambria, Lorenzo Malandri, Fabio Mercorio, ative ai in eu law: Liability, privacy, intellectual
Mario Mezzanzanica, and Navid Nobani. A property, and cybersecurity. EU Law: Liability,
survey on xai and natural language explanations. Privacy, Intellectual Property, and Cybersecurity
Information Processing & Management, 60(1): (January 14, 2024), 2024.
103111, 2023b. Lorenzo Malandri, Fabio Mercorio, Mario Mezzan-
Nadia Burkart and Marco F Huber. A survey on zanica, Navid Nobani, and Andrea Seveso. Con-
the explainability of supervised machine learning. trXT: Generating contrastive explanations from
Journal of Artificial Intelligence Research, 70: any text classifier. Inf. Fusion, 81:103–115,
245–317, 2021. 2022a. doi: 10.1016/[Link].2021.11.016. URL
Fabio Mercorio, Mario Mezzanzanica, and Andrea [Link]
Seveso. exdil: A tool for classifying and explain- Lorenzo Malandri, Fabio Mercorio, Mario Mezzan-
ing hospital discharge letters. In International zanica, and Andrea Seveso. Model-contrastive
Cross-Domain Conference for Machine Learn- explanations through symbolic reasoning. Deci-
ing and Knowledge Extraction, pages 159–172. sion Support Systems, 176:114040, 2024.
Springer, 2020. Lorenzo Malandri, Fabio Mercorio, Mario Mez-
Noemi Gozzi, Lorenzo Malandri, Fabio Merco- zanzanica, Navid Nobani, and Andrea Seveso.
rio, and Alessandra Pedrocchi. Xai for myo- Contrastive explanations of text classifiers as a
controlled prosthesis: Explaining emg data for service. In Proceedings of the 2022 Conference
hand gesture classification. Knowledge-Based of the North American Chapter of the Association
Systems, 240:108053, 2022. for Computational Linguistics: Human Language
Nicola Alimonda, Luca Guidotto, Lorenzo Malan- Technologies: System Demonstrations, pages 46–
dri, Fabio Mercorio, Mario Mezzanzanica, and 53, 2022b.
Giovanni Tosi. A survey on xai for cyber physical Lorenzo Malandri, Fabio Mercorio, Mario Mez-
systems in medicine. In 2022 IEEE International zanzanica, Navid Nobani, Andrea Seveso, et al.
Conference on Metrology for Extended Reality, The good, the bad, and the explainer: a tool for
12

contrastive explanations of text classifiers. In Jesse Vig. A multiscale visualization of attention in


IJCAI, pages 5936–5939. AAAI Press, 2022c. the transformer model. In Proceedings of the 57th
Xinyue Shen, Zeyuan Chen, Michael Backes, Annual Meeting of the Association for Computa-
Yun Shen, and Yang Zhang. ” do anything tional Linguistics: System Demonstrations, pages
now”: Characterizing and evaluating in-the-wild 37–42, 2019.
jailbreak prompts on large language models. Vinitra Swamy, Angelika Romanou, and Martin
arXiv:2308.03825, 2023. Jaggi. Interpreting language models through
Alaa Abd-Alrazaq, Rawan AlSaad, Dari Alhuwail, knowledge graph extraction. In NeurIPS, 2021.
Arfan Ahmed, Padraig Mark Healy, Syed Lat- T Wu, M Tulio Ribeiro, J Heer, and D Weld.
ifi, Sarah Aziz, Rafat Damseh, Sadam Alabed Polyjuice: Generating counterfactuals for ex-
Alrazak, Javaid Sheikh, et al. Large language plaining, evaluating, and improving models. In
models in medical education: Opportunities, chal- ACL-IJCNLP, 2021.
lenges, and future directions. JMIR Medical Kevin Ro Wang, Alexandre Variengien, Arthur
Education, 9(1):e48291, 2023. Conmy, Buck Shlegeris, and Jacob Steinhardt.
Lixiang Yan, Lele Sha, Linxuan Zhao, Yuheng Li, Interpretability in the wild: a circuit for indirect
Roberto Martinez-Maldonado, Guanliang Chen, object identification in gpt-2 small. In NeurIPS
Xinyu Li, Yueqiao Jin, and Dragan Gašević. ML Safety Workshop, 2022.
Practical and ethical challenges of large language Sachit Menon and Carl Vondrick. Visual classifi-
models in education: A systematic scoping re- cation via description from large language mod-
view. British Journal of Educational Technology, els. In The Eleventh International Conference on
2023. Learning Representations, 2022.
Ali Salimi and Hady Saheb. Large language models Jinglong Gao, Xiao Ding, Bing Qin, and Ting Liu.
in ophthalmology scientific writing: Ethical con- Is chatgpt a good causal reasoner? a comprehen-
siderations blurred lines or not at all? American sive evaluation. arXiv:2305.07375, 2023a.
Journal of Ophthalmology, 2023. Shirui Pan, Linhao Luo, Yufei Wang, Chen Chen,
Vipula Rawte, Amit Sheth, and Amitava Das. A Jiapu Wang, and Xindong Wu. Unifying large
survey of hallucination in large foundation mod- language models and knowledge graphs: A
els, 2023. roadmap. arXiv:2306.08302, 2023.
Amos Azaria and Tom Mitchell. The inter- Arthur Conmy, Augustine N Mavor-Parker, Aengus
nal state of an llm knows when its lying. Lynch, Stefan Heimersheim, and Adrià Garriga-
arXiv:2304.13734, 2023. Alonso. Towards automated circuit discovery for
Balbir Barn, Souvik Barat, and Tony Clark. Con- mechanistic interpretability. arXiv:2304.14997,
ducting systematic literature reviews and system- 2023.
atic mapping studies. In Innovations in Software Hangfeng He, Hongming Zhang, and Dan Roth.
Engineering Conference, pages 212–213, 2017. Rethinking with retrieval: Faithful large language
Ángel Antonio Martı́nez-Gárate, José Alfonso model inference. arXiv:2301.00303, 2022.
Aguilar-Calderón, Carolina Tripp-Barba, and Ori Yoran, Tomer Wolfson, Ben Bogin, Uri Katz,
Anı́bal Zaldı́var-Colado. Model-driven ap- Daniel Deutch, and Jonathan Berant. Answering
proaches for conversational agents development: questions by meta-reasoning over multiple chains
A systematic mapping study. IEEE Access, 2023. of thought. arXiv:2304.13007, 2023.
Theodora Oikonomidi, Isabelle Boutron, Olivier Gabriele Sarti, Nils Feldhus, Ludwig Sickert, and
Pierre, Guillaume Cabanac, Philippe Ravaud, and Oskar van der Wal. Inseq: An interpretabil-
Covid-19 Nma Consortium. Changes in evidence ity toolkit for sequence generation models.
for studies assessing interventions for covid-19 arXiv:2302.13942, 2023.
reported in preprints: meta-research study. BMC Zhengxuan Wu, Atticus Geiger, Christopher Potts,
medicine, 18:1–10, 2020. and Noah D Goodman. Interpretability at
13

scale: Identifying causal mechanisms in alpaca. arXiv:2305.11426, 2023.


arXiv:2305.08809, 2023b. Yue Yang, Artemis Panagopoulou, Shenghao Zhou,
Zongxia Li, Paiheng Xu, Fuxiao Liu, and Hyemi Daniel Jin, Chris Callison-Burch, and Mark
Song. Towards understanding in-context learn- Yatskar. Language in a bottle: Language model
ing with contrastive demonstrations and saliency guided concept bottlenecks for interpretable im-
maps. arXiv:2307.05052, 2023. age classification. In Proceedings of the
Zichen Chen, Ambuj K Singh, and Misha Sra. Lm- IEEE/CVF Conference on Computer Vision and
explainer: a knowledge-enhanced explainer for Pattern Recognition, pages 19187–19197, 2023.
language models. arXiv:2303.16537, 2023. Nitzan Bitton-Guetta, Yonatan Bitton, Jack Hes-
Yunfan Gao, Tao Sheng, Youlin Xiang, Yun sel, Ludwig Schmidt, Yuval Elovici, Gabriel
Xiong, Haofen Wang, and Jiawei Zhang. Chat- Stanovsky, and Roy Schwartz. Breaking common
rec: Towards interactive and explainable sense: Whoops! a vision-and-language bench-
llms-augmented recommender system. mark of synthetic and compositional images. In
arXiv:2303.14524, 2023b. Proceedings of the IEEE/CVF International Con-
Hanlin Zhang, Ziyang Li, Jiani Huang, Mayur ference on Computer Vision, pages 2616–2627,
Naik, and Eric Xing. Improved logical reason- 2023.
ing of language models via differentiable sym- Yucheng Shi, Hehuan Ma, Wenliang Zhong,
bolic programming. In First Workshop on Pre- Gengchen Mai, Xiang Li, Tianming Liu, and
training: Perspectives, Pitfalls, and Paths For- Junzhou Huang. Chatgraph: Interpretable text
ward at ICML 2022, 2022. classification by converting chatgpt knowledge to
Chandan Singh, Aliyah R Hsu, Richard Antonello, graphs. arXiv:2305.03513, 2023.
Shailee Jain, Alexander G Huth, Bin Yu, and Qika Lin, Jun Liu, Rui Mao, Fangzhi Xu, and
Jianfeng Gao. Explaining black box text mod- Erik Cambria. Techs: Temporal logical graph
ules in natural language with language models. networks for explainable extrapolation reasoning.
arXiv:2305.09863, 2023. In Proceedings of the 61st Annual Meeting of the
Shiyang Li, Jianshu Chen, Yelong Shen, Zhiyu Association for Computational Linguistics (Vol-
Chen, Xinlu Zhang, Zekun Li, Hong Wang, Jing ume 1: Long Papers), pages 1281–1293, 2023.
Qian, Baolin Peng, Yi Mao, et al. Explanations Amirhossein Aminimehr, Pouya Khani, Amirali
from large language models make small reason- Molaei, Amirmohammad Kazemeini, and Erik
ers better. arXiv:2210.06726, 2022. Cambria. Tbexplain: A text-based explanation
Xi Ye and Greg Durrett. The unreliability of method for scene classification models with the
explanations in few-shot prompting for textual statistical prediction correction. arXiv preprint
reasoning. NeurIPS, 35:30378–30392, 2022. arXiv:2307.10003, 2023.
Miles Turpin, Julian Michael, Ethan Perez, Alec Radford, Jong Wook Kim, Chris Hallacy,
and Samuel R Bowman. Language models Aditya Ramesh, Gabriel Goh, Sandhini Agarwal,
don’t always say what they think: Unfaith- Girish Sastry, Amanda Askell, Pamela Mishkin,
ful explanations in chain-of-thought prompting. Jack Clark, et al. Learning transferable visual
arXiv:2305.04388, 2023. models from natural language supervision. In
Sungmin Kang, Bei Chen, Shin Yoo, and Jian- International conference on machine learning,
Guang Lou. Explainable automated debugging pages 8748–8763. PMLR, 2021.
via large language model-driven scientific debug- Samuel R Bowman. Eight things to know about
ging. arXiv:2304.02195, 2023. large language models. arXiv:2304.00612, 2023.
Satyapriya Krishna, Jiaqi Ma, Dylan Slack, Asma Yang Liu, Yuanshun Yao, Jean-Francois Ton, Xi-
Ghandeharioun, Sameer Singh, and Himabindu aoying Zhang, Ruocheng Guo, Hao Cheng, Yegor
Lakkaraju. Post hoc explanations of lan- Klochkov, Muhammad Faaiz Taufiq, and Hang
guage models can improve language models. Li. Trustworthy llms: a survey and guideline
14

for evaluating large language models’ alignment.


In Socially Responsible Language Modelling Re-
search, 2023.
Q Vera Liao and Jennifer Wortman Vaughan. Ai
transparency in the age of llms: A human-
centered research roadmap. arXiv:2306.01941,
2023.
Qianqian Xie, Weiguang Han, Yanzhao Lai, Min
Peng, and Jimin Huang. The wall street neo-
phyte: A zero-shot analysis of chatgpt over mul-
timodal stock movement prediction challenges.
arXiv:2304.05351, 2023.
Pan Lu, Swaroop Mishra, Tanglin Xia, Liang
Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind
Tafjord, Peter Clark, and Ashwin Kalyan. Learn
to explain: Multimodal reasoning via thought
chains for science question answering. NeurIPS,
35:2507–2521, 2022.
Olga Golovneva, Moya Peng Chen, Spencer Poff,
Martin Corredor, Luke Zettlemoyer, Maryam
Fazel-Zarandi, and Asli Celikyilmaz. Roscoe: A
suite of metrics for scoring step-by-step reason-
ing. In The Eleventh International Conference on
Learning Representations, 2022.
Haiyan Zhao, Hanjie Chen, Fan Yang, Ninghao
Liu, Huiqi Deng, Hengyi Cai, Shuaiqiang Wang,
Dawei Yin, and Mengnan Du. Explainability for
large language models: A survey. ACM TIST,
2023b.

You might also like