Multilingual Image Schema Analysis
Multilingual Image Schema Analysis
Abstract
In embodied cognition, physical experiences
are believed to shape abstract cognition, such
as natural language and reasoning. Image
schemas were introduced as spatio-temporal
cognitive building blocks that capture these re-
curring sensorimotor experiences. The few ex-
isting approaches for automatic detection of Figure 1: Example of the image schema C ONTAIN -
image schemas in natural language rely on spe- MENT : From experiencing different types of a C ON -
cific assumptions about word classes as indi- TAINER in early infancy (left); to the development of
cators of spatio-temporal events. Furthermore, the schema (middle); and the usage in language on ab-
the lack of sufficiently large, annotated datasets stract topics (right)
makes evaluation and supervised learning diffi-
cult. We propose to build on the recent success
of large multilingual pretrained language mod- about abstract concepts, such as thinking, emotions,
els and a small dataset of examples from image or life (see Figure 1).
schema literature to train a supervised classi- In order to systematically analyse the occurrence
fier that classifies natural language expressions
of image schemas in natural language, we pro-
of varying lengths into image schemas. De-
spite most of the training data being in English pose to build on the recent success of multilin-
with few examples for German, the model per- gual pretrained language models and a small set
forms best in German. Additionally, we anal- of examples from image schema literature (Hurti-
yse the model’s zero-shot performance in Rus- enne, 2017) to train a supervised classifier based on
sian, French, and Mandarin. To further inves- XLM RoBERTa (XLM-R) (Conneau et al., 2020)
tigate the model’s behaviour, we utilize local to classify natural language expressions into image
linear approximations for prediction probabili-
schemas. An image schema detection model as
ties that indicate which words in a sentence the
model relies on for its final classification deci- ours could help linguists to explore the use of im-
sion. Code and dataset are publicly available1 . age schemas efficiently and effectively in large text
corpora. It can guide researchers who, for instance,
1 Introduction investigate how the use of image schemas differs
across languages and cultures (e.g., Choi and Bow-
In the tradition of embodied cognition, image erman, 1991; Papafragou et al., 2006), how the lan-
schemas have been proposed by Lakoff (1987) and guage of children with spatial impairments differs
Johnson (1987) as spatio-temporal cognitive build- (e.g., Lakusta and Landau, 2005) or which image
ing blocks that capture recurring sensorimotor ex- schemas occur in various literary works (e.g., Free-
periences. For instance, in early infancy we expe- man, 2002). Moreover, we hope that analysing
rience many objects with the properties of a C ON - image schemas in large text corpora allows us to
TAINER , i.e., having an inside and an outside sepa-
contribute to image schema theory directly and to
rated by a boundary. The image schema C ONTAIN - investigate how we think and talk about abstract
MENT captures this experience and is subsequently
concepts.
used to make sense of new experiences while at the
Our proposed method has significant advan-
same time also influencing how we think and talk
tages over previously proposed methods. Sev-
1
https://tinyurl.com/24haedv5 eral corpus linguistic studies (e.g., Dodge and
5571
Proceedings of the 29th International Conference on Computational Linguistics, pages 5571–5581
October 12–17, 2022.
Lakoff, 2005) and unsupervised machine learning ified for each image schema as well as for each
approaches (e.g., Gromann and Hedblom, 2017) language resulting in a substantial manual effort.
for image schema extraction rely on specific parts- Moreover, such patterns lead to low recall and
of-speech (POS) as indicators of spatio-temporal have no mechanisms to handle polysemous words.
events. These approaches using POS-tags con- The only existing machine learning approach clus-
ventionally portray prepositions as excellent spa- ters triples of syntactically dependent nouns, verbs,
tial indicators and verbs as movement indica- and prepositions in order to group them by image
tors (e.g., Gromann and Hedblom, 2017; Kord- schema in an unsupervised manner (Gromann and
jamshidi et al., 2011). However, spatial language Hedblom, 2017; Wachowiak, 2020). Since this ap-
might be expressed with prepositions (He walked proach relies on assumptions about word classes,
across the room) or without (He crossed the room) especially preposition, the range of expressions that
(Dodge and Lakoff, 2005). In both examples, the can be considered is limited. Fields with themat-
underlying image-schematic structure is that of ically related objectives are metaphor extraction
S OURCE -PATH -G OAL, i.e., the way through the and spatial role labeling, where recent state-of-the-
room. Since not all spatial expressions in language art approaches rely on pretrained neural language
rely on prepositions, a more general, word class- models (e.g., Dankers et al., 2020; Leong et al.,
independent method is needed, which we propose 2020) and on contextualized embeddings created
in form of a supervised training procedure based for trajector, landmark, and preposition candidates
on a multilingual pretrained language model. (Ramrakhiyani et al., 2019).
In contrast to these previous methods, we make
use of a small annotated image schema corpus that 3 Foundation
not only allows us to extract image schemas in
Embodied cognition, a field that builds on the hy-
different languages without relying on manually
pothesis that cognitive processes are grounded in
created patterns, but also provides a gold standard
perception and sensorimotor interactions with the
to evaluate our model. Natural language exam-
world, has experienced significant traction in cogni-
ples of image schemas in literature have been col-
tive linguistics. In this tradition, Lakoff (1987) and
lected in a repository (Hurtienne, 2017). However,
Johnson (1987) introduce image schemas as cogni-
this database is rather inconsistent in its formatting
tive concepts that are firmly rooted in sensorimotor
and image schema annotation. Thus, we cleaned
experiences that eventually shape higher-level cog-
and complemented it with other examples from
nition, including natural language.
MetaNet (Dodge et al., 2015). Our classification
method is trained and primarily evaluated in En- 3.1 Image Schemas
glish and German. We also analyse the model’s
zero-shot performance on a small set of sentences An image schema according to Johnson (1987, p.
in French, Russian, and Mandarin, representing xiv) “is a recurring, dynamic pattern of our per-
different language families. To further investigate ceptual interactions and motor programs that gives
the model’s behaviour, we utilize the explainable coherence and structure to our experience.” Schema
artificial intelligence model LIME (Ribeiro et al., here follows the notion of Langacker (1987) to ab-
2016) that provides local linear approximations for stract away from less important details to core com-
prediction probabilities for each word in the input monalities of experiences. Image relates to imag-
istic in the sense of sensory experiences building
expression in relation to each available target class,
i.e., image schema. Thereby, we can provide an on information from different perceptual modali-
analysis of which words in the input sequence the ties (Talmy, 2005). They are directly meaningful,
model primarily relies on to make its predictions. preconceptual structures that represent experien-
tial gestalts, i.e., parts that flexibly organize expe-
2 Related Work riences into coherent wholes. Repeated physical
experiences starting in early infancy form concepts
Most previous automated approaches for image that manifest themselves in language. For instance,
schema extraction rely on handwritten rules and we learn early on that many objects function as
pattern matching to annotate natural text with im- C ONTAINER, for instance, a glass, a fridge, or a
age schemas (e.g., Bennett and Cialone, 2014). basket, while other objects, such as tables, do not
However, such rules and patterns have to be spec- show the same properties. Having learned the im-
5572
Image Schema Definition Conceptual Example
Metaphor
C ENTER - Experience of objects or events as central, AFFECTION IS He keeps everyone at arms
P ERIPHERY while others are peripheral or even outside PHYSICAL CLOSE- length. (Lakoff et al.,
(Gibbs Jr et al., 1994, p. 237). The pe- NESS 1991, p. 155)
riphery depends on the center but not vice
versa (Lakoff, 1987, p. 274).
C ONTACT Relates to two entities physically touching COMMUNICATION She’s in touch with him.
without depending on each other (Cienki, IS ESTABLISHED (Hurtienne, 2017)
2008, p. 36). BY PHYSICAL
CONTACT
C ONTAINMENT Experience of boundedness, entailing an MIND AS CON- Who put that idea in your
interior, exterior, and a boundary (Johnson, TAINER FOR head? (Jäkel, 2003, pp.
1987). IDEAS 156-157)
F ORCE Implies the exertion of physical strengths HAPPINESS IS A He was swept off his feet.
in one or more directions (Cienki, 2008, p. NATURAL FORCE (Kövecses, 2010, p. 100)
431).
PART-W HOLE Wholes consisting of parts and a configu- COHERENT IS His thoughts are scattered.
ration of parts (Lakoff, 1987, p. 273). WHOLE (Lakoff et al., 1991, p.
138)
S CALE Quantitatively it refers to the grouping of IMPORTANT IS Maslow is a towering fig-
discrete objects and substances that can be BIG ure in humanistic psychol-
increased and decreased in amount; quali- ogy. (Tolaas, 1991, p.
tatively it refers to the degree of intensity 207)
(Johnson, 1987, p. 122).
S OURCE - Source or starting point, goal or endpoint, PURPOSES ARE He finally reached his
PATH -G OAL a series of contiguous locations connect- DESTINATIONS goals. (Kövecses, 2010, p.
ing both, and movement (Johnson, 1987, 163)
p.113).
V ERTICALITY A tendency to employ an U P -D OWN ori- LIFE IS UP He’s at the peak of health.
entation (Johnson, 1987, p. xiv). (Lakoff and Johnson,
1980, p. 15)
age schema C ONTAINMENT, it is later on reflected in which the physical properties of C ONTAINMENT
in our language about physical objects; but also in the sense of having an inside, outside and a
about abstract concepts, for example, in expres- boundary are transferred to the abstract concept
sions such as He’s gone out of his mind. The image of “mind” assigning it similar properties. Thus,
schemas we consider in this work, selected based image schemas function as structuring devices for
on available natural language examples in litera- language and thought (Kimmel, 2009). Similarities
ture, are defined, related to conceptual metaphors, in underlying image-schematic structures across ex-
and exemplified in Table 1. pressions and even across languages can help guide
the analysis of language. For instance, the same
3.2 Image Schemas and Natural Language metaphor and image schema can be observed in
the Russian expression ...стереотипах, которые
Instead of only pertaining to the physical realm, нам вбивались в голову в советское время...2
image schemas are metaphorically projected onto (stereotypes that were hammered into our heads
abstract target domains (Lakoff, 1987). In during Soviet times). The image schema C ON -
other words, conceptual metaphors map structures TAINMENT is frequently used to talk about emo-
learned in the physical source domain, i.e., spatial tions, for example in French, Je suis cachée au
in the case of image-schematic metaphors, to an ab- bord des larmes3 (I’m hiding on the verge of tears),
stract target domain. To take up a previous example,
the expression He’s gone out of his mind relates to 2
In VTimes on 31 October 2020.
3
the conceptual metaphor MIND AS CONTAINER Part of lyrics of anxiété by Pomme.
5573
German, ...nicht aus der Ruhe bringen (not be up- Image Schema EN DE
set; literally: not get out of one’s calm) (Baldauf, C ENTER -P ERIPHERY 96 40
1997, p. 135) or Chinese, 他怒火中烧 (Ta nu-huo C ONTACT 30 0
zhong shao; He has angry fire burning inside him) C ONTAINMENT 451 154
(Yu, 1995, p. 62). F ORCE 273 26
Linguistic analyses of image schemas have been PART-W HOLE 30 0
criticized to suffer from circularity in the sense that S CALE 52 10
language analysis represents a means for forming S OURCE -PATH -G OAL 367 99
inferences about the mind, body and their interre- V ERTICALITY 236 85
lations, the results of which then motivate differ- Total 1,535 414
ent arguments on linguistic phenomena (Gibbs and
Table 2: Sample distribution across languages and im-
Colston, 1995, pp. 245-246). Natural language age schemas
might not provide evidence on the origin of im-
age schemas, however, its analysis can foster an
understanding of image schema usage in natural task, multilingual models show decent zero-shot
languages (Dodge and Lakoff, 2005). This idea performances on languages they were originally
is further supported by neuroscientific evidence. pretrained on, but that were not part of the training
For instance, Durand et al. (2018) found that mo- set in the finetuning stage.
tor areas in the brain are activated when process-
ing action words. Their research focuses on verb 4 Data
anomia, described as difficulty to retrieve words, The data combined from the image schema reposi-
and showed an added value of combining language tory (Hurtienne, 2017) and MetaNet (Dodge et al.,
and sensorimotor strategies to effectively foster re- 2015) consist of a total of 1,949 samples: 1,535
covery from verb anomia. in English and 414 in German. The exact distri-
bution per image schema can be seen in Table 2.
3.3 Language Models The cleaning of the image schema repository con-
sisted in deduplicating and ensuring a consistent
Many of the recent successes in natural language
processable format and annotation. Additionally,
processing can be accredited to deep neural lan-
the authors of this paper and Chao Xu for Man-
guage models. Such models learn rich, contextual-
darin manually curated small test datasets of image
ized language representations during a pretraining
schematic language in Russian, French, and Man-
stage, in which they learn to predict a masked word
darin, consisting of 35, 40, and 55 samples respec-
given its context, a task for which large amounts
tively, for evaluating the zero-shot performance of
of training data are readily available. In a second
the classifier. Sources for the additional language
stage, these models can be finetuned for specific
samples consisted of image schema literature, nov-
tasks like classification or question answering by
els, and online news articles.
adding additional layers on top of the output of
the language model, thus, utilizing the previously 5 Method
learned representations. Such a model is then op-
timized end-to-end, i.e., no additional manually 5.1 Supervised Classification Model
created feature extraction pipeline is needed, but We use the English and German data described in
the neural network takes in text as it is and learns Section 4 for finetuning XLM-R in order to classify
by itself to pay attention to the features important natural language sequences into image schemas.
for a specific task. One of the most prominent The model input consists of natural language ex-
language models is BERT (Devlin et al., 2019), pressions, which are classified into one of the eight
which is based on the now ubiquitous Transformer image schemas described in Section 3.1 by adding
architecture (Vaswani et al., 2017). Multilingual a fully connected layer on top of XLM-R’s output
variants of BERT use multiple languages in the with one output-neuron representing each class. We
pretraining phase, for instance multilingual BERT train the model with 80% of the available data leav-
and XLM-R (Conneau et al., 2020), which was pre- ing the other 20% for testing. We use a stratified
trained on text in 100 different languages and uses train-test split guaranteeing the same distribution of
an improved training paradigm. Depending on the labels in training and test set. In order to see if the
5574
model achieves consistent results we cross-validate The data used for clustering consists of all En-
it by training it on five different stratified random glish samples, including both training and test data,
splits and report the averaged results for accuracy as the unsupervised approach does not require any
and F1 scores. All Russian, French, and Mandarin training.
samples are only in the test data and never seen
during training. XLM-R exists in different sizes 5.3 LIME Explanations
depending on their number of parameters. For our For a detailed analysis of the model’s decisions, we
experiments we choose the variant called XLM- use LIME (Ribeiro et al., 2016), which is a method
RBase . This model is trained for 12 epochs utilizing for interpreting machine learning models by ap-
the Adam optimizer with a learning rate of 3e-5 proximating local decisions with an interpretable
and a batch size of 16. model that assigns weights to the different input
features. A local decision refers to a classifica-
5.2 Unsupervised Baseline Classifier tion of a single input instance, whose features, in
To see how our model compares to other image our case, are the words that make up the sequence.
schema extraction methods, we re-implement a Such an interpretable model is build for a specific
recent approach that clusters instances of spatial input sample by being trained on perturbations of
language based on the underlying image schema that sample and the corresponding outputs of the
(Gromann and Hedblom, 2017; Wachowiak, 2020). original model. A perturbed text sample, for in-
This approach uses the neural dependency parser stance, leaves out one or more words contained
Stanza (Qi et al., 2020) to find prepositions as mark- in the original sample. The thus generated expla-
ers of spatial language as well as their connected nations indicate which words the classifier based
verbs and nouns. Examples of resulting triples its decision on, i.e., which words indicate an im-
are: <fell, from, power> or <stir, in, ingredient>. age schema. Looking at the explanations of wrong
In a second step, each word of the triple is rep- model decisions can show us for which cases the
resented by their GloVe embedding (Pennington model requires additional training data or which
et al., 2014). These embeddings are averaged or dataset samples are faulty, thus, leading to insights
summed, resulting in a 300-dimensional vector for that lie beyond the power of strictly numerical met-
each triple. Lastly, similar vectors are grouped us- rics, such as accuracy.
ing spectral clustering (Ng et al., 2001) based on Additionally, we utilize LIME in order to gather
the implementation made available by scikit-learn global statistics about typical indicators for a spe-
(Pedregosa et al., 2011). Since we have a labeled cific image schema class. For each sample in the
dataset, we simply annotate each cluster with the test set we look at the classification made by our
label that is the most frequent among the contained model and add the words of the input sequence
triples. We, thus, can compute accuracy and F1 as well as the corresponding feature weights com-
score telling us how well the clusters separate dif- puted by LIME to a list for this image schema class.
ferent image schemas compared to the novel super- After iterating over all test samples, we rank the
vised approach. If the unsupervised method were words for each image schema class by their average
to be applied to a new and unlabelled dataset, this feature weight, thus, obtaining a list of words that
annotation would have to be made manually. We are strong indicators for a specific image schema
compute the clusters and their respective scores for according to the model.
different hyper-parameter combinations and report
the best resulting score: 6 Results
6.1 Scores Supervised Classifier
• Triple representation: summed vectors, aver-
aged vectors The cross-validated results for the test sets in En-
glish, German, Russian, French, and Mandarin can
• Number of clusters: 8, 16 be seen in Table 3. The highest scores are achieved
in German with an average accuracy of 79.8%, fol-
• Affinity matrix construction: nearest neigh- lowed by the accuracy in English with 68.6%, Man-
bors, radial basis function darin with 63.2%, Russian with 61.2% and French
with 56.6%. The macro F1 score, which gives equal
• Label assignment: k-means, discretization importance to all classes, is consistently lower than
5575
Language Accuracy Macro Avg. Weighted Avg.
Precision Recall F1 Precision Recall F1
English 68.6 0.690 0.606 0.630 0.694 0.686 0.682
German 79.8 0.728 0.736 0.724 0.816 0.798 0.802
Russian 61.2 0.636 0.592 0.574 0.660 0.612 0.598
French 56.6 0.636 0.538 0.518 0.662 0.566 0.542
Mandarin 63.2 0.772 0.632 0.690 0.772 0.632 0.690
Table 4: F1 scores for the individual classes of the test set (English and German)
the accuracy and the weighted F1 score showing matrix, and discretization. From the resulting clus-
that the classes having more training data were ters, 7 are labeled as C ONTAINMENT, 4 as F ORCE,
learned better. In comparison, a simple majority 4 as S OURCE -PATH -G OAL, and 1 as V ERTICAL -
classifier always predicting C ONTAINMENT would ITY . The obtained accuracy is 43.5%, thus, much
only achieve an accuracy of 31.0%, a weighted F1 lower than the results obtained by XLM-R. The
score of 0.147, and a macro F1 score of 0.059 on low macro-averaged F1 score of 0.20 shows the
the combined English and German test set. methods inability to properly deal with the class
In order to further detail the results, we present imbalance.
the class F1 scores in Table 4 as well as the confu- Choosing a higher number of output clusters, the
sion matrix in Figure 2, which were computed for scores can be increased, however, also requires a
one of the trained models for a mixed test set con- lot of manual analysis if being applied to unlabeled
sisting of the German and English samples. The real world data. For example, with 32 clusters, the
model performs best for the classes backed by the accuracy increases to 49.8%.
most training data, i.e., C ONTAINMENT, S OURCE -
PATH -G OAL, and V ERTICALITY. Although a lot 6.3 LIME Explanations
of data samples belong to the image schema F ORCE Looking at the LIME explanations for some
it only has a class F1 score of 0.55, which is due wrongly classified samples, especially for those
to the high confusion with S OURCE -PATH -G OAL. belonging to classes regularly confused according
For the classes with very little training data the to the confusion matrix in Figure 2, we gained cru-
model achieves a lower F1 score, although never cial insights regarding the inner workings of the
below 0.5. model and issues in the dataset. Firstly, some of
the salient points of the confusion matrix are due
6.2 Scores Unsupervised Baseline to common image schema collocations, i.e., two
From all English samples in the dataset, only or more image schemas occurring together in the
36.5% contained a verb–preposition–noun triple. same sentence. An example of this are the four
This low percentage highlights how important a expressions with the gold label S CALE which were
word class-independent approach is. After cluster- classified as V ERTICALITY by the model. In all
ing the resulting 613 triples, the highest score is samples the two image schemas are collocated, e.g.,
achieved with 16 clusters, averaged triple embed- in the expression He’s head and shoulders above
dings, nearest-neighbors for computing the affinity everyone in the industry, where LIME correctly
5576
15 0 4 0 0 0 7 1
C ENTER -P ERIPHERY
63% 0% 3% 0% 0% 0% 7% 1%
1 3 0 2 0 0 0 0
C ONTACT
4% 75% 0% 4% 0% 0% 0% 0%
5 0 99 9 1 0 6 1
C ONTAINMENT
21% 0% 77% 18% 25% 0% 6% 1%
True Label
1 0 10 30 0 1 14 4
F ORCE
4% 0% 8% 60% 0% 14% 14% 6%
0 0 3 0 3 0 0 0
PART-W HOLE
0% 0% 2% 0% 75% 0% 0% 0%
1 0 3 0 0 5 0 4
S CALE
4% 0% 2% 0% 0% 71% 0% 6%
1 1 4 7 0 1 74 5
S OURCE -PATH -G OAL
4% 25% 3% 14% 0% 14% 72% 7%
0 0 6 2 0 0 2 54
V ERTICALITY
0% 0% 5% 4% 0% 0% 2% 78%
Y
LE
LE
TY
C
EN
A
ER
A
TA
LI
H
FO
SC
-G
A
N
IP
-W
IN
C
O
TH
ER
TI
TA
C
RT
PA
ER
-P
PA
O
E-
V
R
C
TE
C
R
EN
U
SO
C
Predicted Label
Figure 2: Confusion matrix for the image schema extraction model on the test set (English and German)
Figure 3: Words LIME finds as strong indicators for specific image schema class
5577
identifies the word above as strong indicator for to each word or continuation of a word in a sen-
the image schema V ERTICALITY. However, due to tence individually. The classifier’s output would
its quantitative, comparing nature, the phrase also then directly indicate which words of a sentence are
belongs to the image schema S CALE as stated in used in an image-schematic way. However, one has
the gold standard. Interestingly, the confusion is to be careful not to treat words, especially prepo-
never the other way around, i.e., samples belong- sitions, in isolation of their context. For instance,
ing to V ERTICALITY are never classified as S CALE, the word on often indicates spatial languages as in
which is most likely due to V ERTICALITY being the phrase on the path to, but it can also be used
supported by more training data so that the model in non-spatial contexts, e.g., the book on biology.
develops a certain bias towards that class. Other When creating labels on a token-level, words need
samples show some unintended learned behavior to be carefully and consistently annotated with im-
exhibited by the model. The expression to have an age schemas, ideally following very explicit and
open marriage, having the gold label C ONTAIN - clear annotation guidelines.
MENT , is classified as S OURCE -PATH -G OAL by
the model although LIME identifies open as an
indicator for C ONTAINMENT. However, LIME’s
output suggests that the model identified marriage
as a concept that is often talked about in terms re-
lating to the image schema S OURCE -PATH -G OAL,
such as in conceptual metaphors like LOVE IS A
JOURNEY. However, as this is not the case in the
given context, the classifier takes a wrong decision.
Figure 3 shows the features with the highest indi-
cator scores for the image schemas V ERTICALITY
and C ONTAINMENT averaged over all samples in
the test set. The words shown for V ERTICALITY
are all correctly identified as strong markers. Only
looking further down the list, not shown in the fig-
ure anymore, one finds false positives, for instance, Figure 4: Learning curve computed in decimal intervals
from 10% to 100% of training data and showing the av-
wings which only is related to themes were V ERTI -
erage score and 95% confidence interval of three trained
CALITY plays a role. The words identified as strong
models
indicators for C ONTAINMENT contain more clear
false positives, such as white or answer. The word
white occurs in two natural language expressions Dataset Improvements. Moreover, the dataset
labeled as C ONTAINMENT in the dataset, while is missing data for some common image schemas,
answer occurs four times, however, surprisingly e.g. S UPPORT or BALANCE. In general, more data-
never in a phrase labeled as C ONTAINMENT. points, especially for C ONTACT and PART-W HOLE
as the two classes with the fewest datapoints and
7 Discussion the lowest class F1 scores, would likely lead to an
increase in the model’s performance. This is also
Task Design. A shortcoming of the current indicated by the learning curve in Figure 4 which
model and dataset is not considering multiple labels still shows an increasing weighted F1 score given
for one natural language expression. Thus, the task a higher number of overall training samples. For
should be changed to a multi-label classification the model to function in the wild, it additionally
task supported by a corresponding dataset, which requires training samples which are labeled as non-
could be created by manually adapting the current image-schematic language as it otherwise will label
annotations. every sentence as image-schematic language. Fur-
Moreover, instead of relying on additional expla- thermore, LIME revealed certain samples where
nations to identify constituents of image schematic the model made the correct decisions based on rel-
language, one could try to approach image schema evant features, but the gold standard had erroneous
extraction as a token-level classification task, in labels, which led to some corrections made on the
which a label is not attributed to a full sentence but dataset.
5578
Global Explanations. To gain first insights into References
the global behavior of the model we introduced Christa Baldauf. 1997. Metapher und Kognition:
a simple algorithm for averaging LIME results Grundlagen einer neuen Theorie der Alltagsmeta-
over multiple samples. However, changing the pher. Lang.
procedure to rank words by taking into account B. Bennett and C. Cialone. 2014. Corpus guided sense
how often they indicate a specific image schema cluster analysis: a methodology for ontology devel-
class would also allow to gather information of opment (with examples from the spatial domain). In
which parts-of-speech are most commonly used in 8th International Conference on Formal Ontology
in Information Systems (FOIS), volume 267 of Fron-
natural language expressions of a specific image
tiers in Artificial Intelligence and Applications, pages
schema. Such improved forms of global aggre- 213–226. IOS Press.
gations of local explanations were, for instance,
designed and evaluated in form of the Submodular Soonja Choi and Melissa Bowerman. 1991. Learning
to express motion events in english and korean: The
Pick algorithm proposed by Ribeiro et al. (2016) influence of language-specific lexicalization patterns.
or the Global Average and Global Homogeneity- Cognition, 41(1-3):83–121.
Weighted Importance proposed by van der Linden
Alan Cienki. 2008. Image schemas and gesture. In
et al. (2019), which we, in the future, plan to im- From perception to meaning, pages 421–442. De
plement and test in the context of image schema Gruyter Mouton.
extraction.
Alexis Conneau, Kartikay Khandelwal, Naman Goyal,
8 Conclusion Vishrav Chaudhary, Guillaume Wenzek, Francisco
Guzmán, Edouard Grave, Myle Ott, Luke Zettle-
We introduce a novel approach to perform image moyer, and Veselin Stoyanov. 2020. Unsupervised
cross-lingual representation learning at scale. In Pro-
schema extraction from natural languages based ceedings of the 58th Annual Meeting of the Asso-
on multilingual, pretrained neural language models. ciation for Computational Linguistics, pages 8440–
Thereby, a supervised training procedure can be 8451, Online. Association for Computational Lin-
implemented by finetuning the pretrained model guistics.
with only a few training samples without making Verna Dankers, Karan Malhotra, Gaurav Kudva,
any prior assumptions about word classes. The Volodymyr Medentsiy, and Ekaterina Shutova. 2020.
model shows a strong cross-validated performance Being neighbourly: Neural metaphor identification
in discourse. In Proceedings of the Second Workshop
in English and German, and even shows the ability
on Figurative Language Processing, pages 227–234,
to generalize to languages unseen during finetun- Online. Association for Computational Linguistics.
ing. Explanations generated by the explainable
AI approach show insights and shortcomings re- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
Kristina Toutanova. 2019. BERT: Pre-training of
garding the model behavior as well as the dataset deep bidirectional transformers for language under-
annotation. To further improve the differentiation standing. In Proceedings of the 2019 Conference of
between image-schematic classes, a more equal the North American Chapter of the Association for
distribution of training data would be beneficial. In Computational Linguistics: Human Language Tech-
nologies, Volume 1 (Long and Short Papers), pages
terms of future work, we intend to add non-image- 4171–4186, Minneapolis, Minnesota. Association for
schematic samples to further enable the trained Computational Linguistics.
classifier to distinguish image-schematic from non-
Ellen Dodge and George Lakoff. 2005. Image schemas:
image-schematic expressions. In addition, the task
From linguistic analysis to neural grounding. In
should be devised as a multi-label classification Beate Hampe and Joseph E. Grady, editors, From
task to account for the frequent phenomenon of im- perception to meaning: Image schemas in cognitive
age schema collocations. Lastly, we would like to linguistics, pages 57–91. Mouton de Gruyter, Berlin.
improve the aggregation of local explanations and Ellen K.. Dodge, Jisup Hong, and Elise Stickles. 2015.
utilize it in order to systematically analyse image Metanet: Deep semantic automatic metaphor anal-
schematic language in a text corpus. ysis. In Proceedings of the Third Workshop on
Metaphor in NLP, pages 40–49.
Acknowledgements Edith Durand, Pierre Berroir, and Ana Ines Ansaldo.
2018. The neural and behavioral correlates of anomia
We would like to thank Chao Xu from Shandong recovery following poem – personalized observation,
University for his incredible help with compiling execution, and mental imagery therapy: A proof of
the Chinese Mandarin test set for this study. concept. Neural Plasticity.
5579
Margaret H Freeman. 2002. Momentary stays, explod- Chee Wee Leong, Beata Beigman Klebanov, Chris
ing forces: A cognitive linguistic approach to the Hamill, Egon Stemle, Rutuja Ubale, and Xianyang
poetics of emily dickinson and robert frost. Journal Chen. 2020. A report on the 2020 vua and toefl
of English Linguistics, 30(1):73–90. metaphor detection shared task. In Proceedings of
the second workshop on figurative language process-
Raymond W. Gibbs and Herbert L. Colston. 1995. The ing, pages 18–29.
cognitive psychological reality of image schemas and
their transformation. Cognitive Linguistics, 6:347– Andrew Ng, Michael Jordan, and Yair Weiss. 2001. On
378. spectral clustering: Analysis and an algorithm. In
Advances in Neural Information Processing Systems,
Raymond W Gibbs Jr, Dinara A Beitel, Michael Har- volume 14. MIT Press.
rington, and Paul E Sanders. 1994. Taking a stand on Anna Papafragou, Christine Massey, and Lila Gleit-
the meanings of stand: Bodily experience as motiva- man. 2006. When english proposes what greek pre-
tion for polysemy. Journal of Semantics, 11(4):231– supposes: The cross-linguistic encoding of motion
251. events. Cognition, 98(3):B75–B87.
Dagmar Gromann and Maria M. Hedblom. 2017. Kines- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,
thetic mind reader: A method to identify image B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
schemas in natural language. In Proceedings of Ad- R. Weiss, V. Dubourg, J. Vanderplas, A. Passos,
vancements in Cognitive Systems. D. Cournapeau, M. Brucher, M. Perrot, and E. Duch-
esnay. 2011. Scikit-learn: Machine learning in
Jörn Hurtienne. 2017. Image schema database (iscat). Python. Journal of Machine Learning Research,
12:2825–2830.
Olaf Jäkel. 2003. Wie Metaphern Wissen schaffen: Jeffrey Pennington, Richard Socher, and Christopher
die kognitive Metapherntheorie und ihre Anwen- Manning. 2014. GloVe: Global vectors for word
dung in Modell-Analysen der Diskursbereiche Geis- representation. In Proceedings of the 2014 Confer-
testätigkeit, Wirtschaft, Wissenschaft und Religion. ence on Empirical Methods in Natural Language Pro-
Kovač Hamburg. cessing (EMNLP), pages 1532–1543, Doha, Qatar.
Association for Computational Linguistics.
Mark Johnson. 1987. The Body in the Mind. The Bodily
Basis of Meaning, Imagination, and Reasoning. The Peng Qi, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and
University of Chicago Press. Christopher D. Manning. 2020. Stanza: A Python
natural language processing toolkit for many human
Michael Kimmel. 2009. Analyzing image schemas in languages. In Proceedings of the 58th Annual Meet-
literature. Cognitive Semiotics, 5(1-2):159–188. ing of the Association for Computational Linguistics:
System Demonstrations.
Parisa Kordjamshidi, Martijn Van Otterlo, and Marie-
Francine Moens. 2011. Spatial role labeling: To- Nitin Ramrakhiyani, Girish Palshikar, and Vasudeva
wards extraction of spatial relations from natural lan- Varma. 2019. A simple neural approach to spatial
guage. ACM Transactions on Speech and Language role labelling. In Advances in Information Retrieval,
Processing (TSLP), 8(3):4. pages 102–108, Cham. Springer International Pub-
lishing.
Zoltán Kövecses. 2010. Metaphor: A Practical Intro- Marco Tulio Ribeiro, Sameer Singh, and Carlos
duction. Oxford University Press, USA. Guestrin. 2016. " why should i trust you?" explaining
the predictions of any classifier. In Proceedings of
George Lakoff. 1987. Women, fire, and dangerous the 22nd ACM SIGKDD international conference on
things. what categories reveal about the mind. The knowledge discovery and data mining, pages 1135–
University of Chicago Press. 1144.
George Lakoff, Jane Espenson, and Alan Schwartz. Leonard Talmy. 2005. The fundamental system of
1991. Master metaphor list. second draft copy. Uni- spatial schemas in language. In Beate Hampe and
versity of California, Berkeley. Joseph E Grady, editors, From perception to meaning:
Image schemas in cognitive linguistics, volume 29
George Lakoff and Mark Johnson. 1980. Metaphors We of Cognitive Linguistics Research, pages 199–234.
Live By. University of Chicago Press. Walter de Gruyter.
Jon Tolaas. 1991. Notes on the origin of some spatial-
Laura Lakusta and Barbara Landau. 2005. Starting at ization metaphors. Metaphor and Symbol, 6(3):203–
the end: The importance of goals in spatial language. 218.
Cognition, 96(1):1–33.
Ilse van der Linden, Hinda Haned, and Evangelos
Ronald W Langacker. 1987. Foundations of cognitive Kanoulas. 2019. Global Aggregations of Local Ex-
grammar: Theoretical prerequisites, volume 1. Stan- planations for Black Box models. arXiv e-prints,
ford university press. page arXiv:1907.03039.
5580
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
Uszkoreit, Llion Jones, Aidan N. Gomez, undefine-
dukasz Kaiser, and Illia Polosukhin. 2017. Attention
is all you need. In Proceedings of the 31st Interna-
tional Conference on Neural Information Processing
Systems, NIPS’17, page 6000–6010, Red Hook, NY,
USA. Curran Associates Inc.
Lennart Wachowiak. 2020. Semi-automatic extraction
of image schemas from natural language. In Proceed-
ings of the MEi:CogSci Conference 2020, page 105.
Comenius University, Bratislava.
Ning Yu. 1995. Metaphorical expressions of anger and
happiness in english and chinese. Metaphor and
symbol, 10(2):59–92.
5581