Dependency Grammar Classification and Exploration
Dependency Grammar Classification and Exploration
net/publication/226844467
CITATIONS READS
19 523
2 authors, including:
Ralph Debusmann
SEE PROFILE
All content following this page was uploaded by Ralph Debusmann on 31 May 2014.
1.1 Introduction
1
2 Ralph Debusmann and Marco Kuhlmann
The basic assumptions behind the notion of dependency are summarized in the fol-
lowing sentences from the seminal work of Tesnière [51]:
The sentence is an organized whole; its constituent parts are the words. Every word that
functions as part of a sentence is no longer isolated as in the dictionary: the mind perceives
connections between the word and its neighbours; the totality of these connections forms
the scaffolding of the sentence. The structural connections establish relations of dependency
among the words. Each such connection in principle links a superior term and an inferior
term. The superior term receives the name governor (régissant); the inferior term receives
the name dependent (subordonné). (ch. 1, §§ 2–4; ch. 2, §§ 1–2)
represent (occurrences of) words by circles, and dependencies among them by ar-
rows: the source of an arrow marks the governor of the corresponding dependency,
the target marks the dependent. Furthermore, following Hays [21], we use dotted
lines to indicate the left-to-right ordering of the words in the sentence.
r a a b b r a b a b r a b b a
(a) D1 (b) D2 (c) D3
that there must not be nodes i1 , i2 in the first subtree and nodes j1 , j2 in the sec-
ond such that i1 < j1 < i2 < j2 . The dependency structure depicted in Fig. 1.2c is
well-nested, while the structure depicted in Fig. 1.2b is not.
To investigate the practical relevance of the three structural constraints, we did
an empirical evaluation on two versions of the Prague Dependency Treebank [33]
(Table 1.1). This evaluation shows that while projectivity is too strong a constraint
on dependency structures (it excludes almost 23% of the analyses in both versions
of the treebank), already a small step beyond projectivity covers virtually all of
the data. In particular, even the rather restricted class of well-nested dependency
structures with block-degree at most 2 has a coverage of almost 99.5%.
Table 1.1: Structural properties of dependency structures in the Prague Dependency Treebank
One of the fundamental questions that we can ask about a grammar formalism is,
whether it adequately models natural language. We can answer this question by
studying the generative capacity of the formalism: when we interpret grammars as
generators of sets of linguistic structures (such as strings, parse trees, or predicate-
argument structures), then we can call a grammar adequate, if it generates exactly
those structures that we consider relevant for the description of natural language.
1 Dependency Grammar: Classification and Exploration 5
Grammars may be adequate with respect to one type of expression, but inadequate
with respect to another. Here we are interested in the generative capacity of gram-
mars when we interpret them as generators for sets of dependency structures:
Which grammars generate which sets of dependency structures?
An answer to this question is interesting for at least two reasons. First, dependency
structures make an attractive measure of the generative capacity of a grammar: they
are more informative than strings, but less formalism-specific and arguably closer to
a semantic representation than parse trees. Second, an answer to the question allows
us to tap the rich resource of formal results about generative grammar formalisms
and to transfer them to the work on dependency grammar. Specifically, it enables
us to import the expertise in developing parsing algorithms for lexicalized grammar
formalisms. This can help us identify the polynomial fragments of non-projective
dependency parsing.
SUBJ → Dan
OBJ → MOD parsnips Dan MOD parsnips
MOD → fresh
fresh
Fig. 1.3: A context-free grammar and a parse tree generated by this grammar
left half of Fig. 1.4 shows the derivation tree for the parse tree from our example.
If the underlying grammar is lexicalized, then there is a one-to-one correspondence
between the nodes in the derivation tree and the positions in the derived string: each
occurrence of a production participating in the derivation contributes exactly one
terminal symbol to this string. If we order the nodes of the derivation tree according
to the string positions of their corresponding terminal symbols, we get a dependency
tree. For our example, this procedure results in the tree depicted in Fig. 1.1. We say
that this dependency structure is induced by the derivation d.
Not all practically relevant dependency structures can be induced by derivations
in lexicalized context-free grammars. A famous counterexample is provided by the
verb-argument dependencies in German and Dutch subordinate clauses: context-
free grammar can only characterize the ‘nested’ dependencies of German, but not
the ‘cross-serial’ assignments of Dutch. This observation goes along with argu-
ments [25, 49] that certain constructions in Swiss German require grammar for-
malisms that adequately model these constructions to generate the so-called copy
language, which is beyond even the string-generative capacity of CFGs. If we ac-
cept this analysis, then we must conclude that context-free grammars are not ade-
quate for the description of natural language, and that we should look out for more
powerful formalisms. This conclusion is widely accepted today. Unfortunately, the
first class in Chomsky’s hierarchy of formal languages that does contain the copy
language, the class of context-sensitive languages, also contains many languages
that are considered to be beyond human capacity. Also, while CFGs can be parsed
in polynomial time, parsing of context-sensitive grammars is PSPACE-complete. In
search of a class of grammars that extends context-free grammar by the minimal
amount of generative power that is needed to account for natural language, sev-
eral so-called mildly context-sensitive grammar formalisms have been developed;
perhaps the best-known among these is Tree Adjoining Grammar (TAG) [27]. The
class of string languages generated by TAGs contains the copy language, but unlike
context-sensitive grammars, TAGs can be parsed in polynomial time. More impor-
tant to us than their increased string-generative capacity however is their stronger
power with respect to dependency representations: derivations in (lexicalized) TAGs
can induce the ‘cross-serial’ dependencies of Dutch [26]. The principal goal of our
classificatory work is to make the relations between grammars and the dependency
structures that they can induce precise.
1 Dependency Grammar: Classification and Exploration 7
We can now lift our results from individual dependency structures to sets of such
structures. The key to this transfer is the concept of regular sets of dependency struc-
tures [31], which we define as the recognizable subsets of dependency algebras in
the sense of Mezei and Wright [37]. Based on the isomorphism between dependency
algebras and term algebras, we obtain a natural grammar formalism for dependency
structures from the concept of a regular term grammar.
Definition 1. A regular dependency grammar is a construct G = (N, Σ , S, P), where
N is a ranked alphabet of non-terminal symbols, Σ is a finite set of order annotations,
S ∈ N1 is a distinguished start symbol, and P is a finite set of productions of the form
A → t, where A ∈ Nk is a non-terminal symbol, and t ∈ TΣ ,k is a well-formed term
over Σ of sort k, for some k ∈ N.
To illustrate the definition, we give two examples of regular dependency grammars.
The sets of dependency structures generated by these grammars mimic the verb-
argument relations found in German and Dutch subordinate clauses, respectively:
grammar G1 generates structures with nested dependencies, grammar G2 generates
structures with crossing dependencies. We only give the two sets of productions.
h120i
h0i h120i
h0i h10i
h1202i
h0i h1, 0i
Fig. 1.6: Terms and structures generated by two regular dependency grammars
Figure 1.6 shows terms generated by these grammars, and the corresponding depen-
dency structures.
The sets of dependency structures generated by regular dependency grammars
have all the characteristic properties of mildly context-sensitive languages. Further-
more, it turns out that the structural constraints that we have discussed above have
direct implications for their string-generative capacity and parsing complexity. First,
the block-degree measure gives rise to an infinite hierarchy of ever more powerful
string languages; adding the well-nestedness constraint leads to a proper decrease of
string-generative power on nearly all levels of this hierarchy [32]. Certain string lan-
guages enforce structural properties in the dependency languages that project them:
For every natural number k, the language
requires every regular set of dependency structures that projects it to contain struc-
tures with a block-degree of at most k. Similarly, the language
RESP(k) := { am m n n m m n n
1 b1 c1 d1 · · · ak bk ck dk | m, n ∈ N }
requires every regular set of dependency structures with block-degree at most k that
projects it to contain structures that are not well-nested. Second, while the parsing
problem of regular dependency languages is polynomial in the length of the input
string, the problem in which we take the grammar to be part of the input is still
NP-complete. Interestingly, for well-nested dependency languages, parsing is poly-
nomial even with the size of the grammar taken into account [30].
10 Ralph Debusmann and Marco Kuhlmann
To give an intuition, let us start with an example multigraph depicted in Fig. 1.7. The
multigraph has three dimensions called DEP (for dependency tree), QS (for quantifier
scope analysis) and DEP / QS (DEP/QS syntax-semantics interface). It is not necessary
to fully understand what we intend to model with these dimensions; they just serve
as an illustrative example, and are elucidated in more detail in Sect. 1.6 below.
In an XDG multigraph, each dimension is a dependency graph made up of a set of
nodes associated with indices, words and node attributes. The indices and words are
shared across all dimensions. For instance, the second node on the DEP dimension
is associated with the index 2, the word loves, and the node attributes in, out and
order. On the DEP / QS dimension, the node has the same index and word and the
node attribute dom. Node attributes always denote sets of tuples over finite domains
of atoms; their typical use is to model finite relations like functions and orders. The
nodes are connected by labeled edges. On the QS dimension for example, there is an
edge from node 3 to node 1 labeled sc, and another one from node 1 to node 2, also
labeled sc.
In the example, the DEP dimension states that everybody is the subject of loves,
and somebody the object. The in and out attributes represent the licensed incoming
and outgoing edges. For example, node 2 must not have any incoming edges, and it
must have one outgoing edge labeled subj and one labeled obj. The order attribute
represents a total order among the head (↑) and its dependents: the subj dependents
must precede the head, and head must precede the obj dependents.
The QS dimension is an analysis of the scopal relationships of the quantifiers in
the sentence. It models the reading where somebody takes scope over everybody,
which in turn takes scope over loves. The DEP / QS analysis represents the syntax-
semantics interface between DEP and QS. The attribute dom is a set of those depen-
dents on the DEP dimension that must dominate the head on the QS dimension. For
example, the subj and obj dependents of node 2 on DEP must dominate 2 on QS.
1 Dependency Grammar: Classification and Exploration 11
subj obj
1 2 3
everybody loves somebody
in : {(subj, !)} in : {} in : {(obj, !)}
out : {} out : {(subj, !), (obj, !)} out : {}
DEP order : {} order : {(subj, ↑), (subj, obj), (↑, obj)} order : {}
sc
sc
1 2 3
QS everybody loves somebody
1 2 3
everybody loves somebody
1.4.2 Grammars
XDG is a model-theoretic framework: grammars first delineate the set of all candi-
date structures, and second, all structures which are not well-formed according to
a set of constraints are eliminated. The remaining structures are the models of the
grammar. This contrasts with approaches such as the regular dependency grammars
of Sect. 1.3.3, where the models are generated using a set of productions.
An XDG grammar G = (MT, lex, P) has three components: a multigraph type MT,
a lexicon lex, and a set of principles P. The multigraph type specifies the dimensions,
words, edge labels and node attributes, and thus delineates the set of candidate struc-
tures of the grammar. The lexicon is a function from the words of the grammar to
sets of lexical entries, which determine the node attributes of the nodes with that
word. The principles are a set of formulas in first-order logic constituting the con-
straints of the grammar. Principles can talk about precedence, edges, dominances
(transitive closure1 of the edge relation), the words associated to the nodes, and the
node attributes. Here is an example principle forbidding cycles on dimension DEP.
It states that no node may dominate itself:
∀v : ¬(v →+
DEP v) (1.1)
The second example principle stipulates a constraint for all edges from v to v0 labeled
l on dimension DEP: if l is in the set denoted by the lexical attribute dom of v on
DEP / QS, then v0 must dominate v on QS:
1 Transitive closures cannot be expressed in first-order logic. As in practice, the only transitive
closure that we need is the transitive closure of the edge relation, we have decided to encode it in
the multigraph model and thus stay in first-order logic [10].
12 Ralph Debusmann and Marco Kuhlmann
l
∀v : ∀v0 : ∀l : v −→DEP v0 ∧ l ∈ domDEP / QS (v) ⇒ v0 →+
QS v
(1.2)
Observe that the principle is indeed satisfied in Fig. 1.7: the attribute dom for node
2 on DEP / QS includes subj and obj, and both the subj and the obj dependents of node
2 on DEP dominate node 2 on QS.
A multigraph is a model of a grammar G = (MT, lex, P) iff it is one of the candi-
date structures delineated by MT, it selects precisely one lexical entry from lex for
each node, and it satisfies all principles in P.
The string language L(G) of a grammar G is the set of yields of its models. The
recognition problem is the question given a grammar G and a string s, is s in L(G).
We have investigated the complexity of three kinds of recognition problems [9]: The
universal recognition problem where both G and s are variable is PSPACE-complete,
the fixed recognition problem where G is fixed and s is variable is NP-complete, and
the instance recognition problem where the principles are fixed, and the lexicon and
s are variable is also NP-complete. XDG parsing is NP-complete as well.
XDG is at least as expressive as CFG [8]. We have proven that the string languages
of XDG grammars are closed under union and intersection [10]. In Sect. 1.7, we
give a constructive proof that XDG is at least as expressive as the class of regular
dependency grammars introduced in Sect. 1.3.3, which entails through an encoding
of LCFRS in regular dependency grammars, that XDG is at least as expressive as
LCFRS . As XDG is able to model scrambling (see Sect. 1.5.2), which LCFRS is not
[3], it is indeed more expressive than LCFRS.
The first application for the multi-dimensionality of XDG in CHORUS is the design
of a new, elegant model of complex word order phenomena such as scrambling.
1.5.1 Scrambling
In German, the word order in subordinate sentences is such that all verbs are posi-
tioned at the right end in the so-called verb cluster, and are preceded by all the non-
verbal dependents in the so-called Mittelfeld. Whereas the mutual order of the verbs
is fixed, that of the non-verbal dependents in the Mittelfeld is totally free.2 This
leads to the phenomenon of scrambling. We show an example in Fig. 1.8, where the
subscripts indicate the dependencies between the verbs and their arguments.
In the dependency analysis in Fig. 1.9 (top), we can see that scrambling gives
rise to non-projectivity. In fact, scrambling even gives rise to an unbounded block-
2 These are of course simplifications: the order of the verbs can be subject to alternations such as
Oberfeldumstellung, and although all linearizations of the non-verbal dependents are grammatical,
some of them are clearly marked.
1 Dependency Grammar: Classification and Exploration 13
degree (see Sect. 1.2), which means that it can neither be modeled by LCFRS, nor
by regular dependency grammars.
subj e
vbs
iobj vbse
obj
mf mf mf vcf
vcf
Fig. 1.9: Dependency analysis (top) and topological analysis (bottom) of the scrambling example
As we have proven [8], scrambling can be modeled in XDG. But how? There is
no straightforward way of articulating appropriate word order constraints on the
DEP dimension directly. At this point, we can make use of the multi-dimensionality
of XDG. The idea is to keep the dependency analysis on the DEP dimension as it
is, and move all ordering constraints to an additional dimension called TOP. The
models on TOP are projective trees which represent the topological structure of the
sentence as in Topological Dependency Grammar (TDG) [15]. A TOP analysis of
the example sentence is depicted in Fig. 1.9 (bottom). Here, the non-verbal depen-
dents Nilpferde, Maria and Hans are dependents of the finite verb soll labeled mf
for “Mittelfeld”. The verbal dependent of soll, helfen, and that of helfen, füttern, are
labeled vcf for “verb cluster field”. With this additional dimension, articulating the
appropriate word order constraints is straightforward: all mf dependents of the finite
verb must precede its vcf dependents, and the mutual order of the mf dependents is
unconstrained.
14 Ralph Debusmann and Marco Kuhlmann
The relation between the DEP and TOP dimensions is such that the trees on TOP
are a flattening of the corresponding trees on DEP. We can express this in XDG by
requiring that the dominance relation on TOP is a subset of the dominance relation
on DEP:
∀v : ∀v0 : v →+ 0 +
TOP v ⇒ v → DEP v
0
This principle is called the climbing principle [15], and gets its name from the ob-
servation that the non-verbal dependents seem to “climb up” from their position on
DEP to a higher position on TOP. For example, in Fig. 1.9, the noun Nilpferde is a
dependent of füttern on DEP, and climbs up to become a dependent of the finite verb
soll on TOP.
Just using the climbing principle is too permissive. For example, in German,
extraction of determiners and adjectives out of noun phrases must be ruled out,
whereas relative clauses can be extracted. To this end, we apply a principle called
barriers principle [15], which allows each word to “block” certain dependents from
climbing up. This allows us to express that nouns block their determiner and adjec-
tive dependents from climbing up, but not their relative clause dependents.
The dominance constraint comprises three labeling literals and two dominance lit-
erals. A labeling literal such as X1 : everybody(X10 ) assigns labels to node variables,
and constrains the daughters: X1 must have the label everybody, and it must have
one daughter, viz. X10 . The dominance literals X10 C∗ X2 and X30 C∗ X2 stipulate that
the node variables X10 and X30 must dominate (or be equal to) the node variable X2 ,
expressing that the node variables corresponding to everybody and somebody must
dominate loves, but that their mutual dominance relationship is unknown.
The models of dominance constraints are trees called configurations. The ex-
ample dominance constraint (1.3) represents the two configurations displayed in
Fig. 1.10 (a) (strong reading) and (b) (weak reading).
X 3 : somebody X 1 : everybody
X 1 : everybody X 3 : somebody sc sc
sc sc
1 2 3 1 2 3
X 2 : loves X 2 : loves everybody loves somebody everybody loves somebody
Fig. 1.10: (a) Configuration representing the strong reading, (b) the weak reading of Everybody
loves somebody, (c) corresponding XDG dependency tree for the strong reading, (d) weak reading.
The set of QS tree structures which satisfy this principle corresponds precisely to the
set of configurations of the dominance constraint in (1.3), i.e., the two dependency
trees in Fig. 1.10 (c) and (d).
3 In our simple example, the labeling literals have at most one daughter. In a more realistic setting
[8], we distinguish the daughters of labeling literals with more than one daughter using distinct
edge labels.
16 Ralph Debusmann and Marco Kuhlmann
A B
A B
1 2 3 4 5 6
DEP a a a b b b
1 1
1 12
1 2 3 4 5 6
BLOCK a a a b b b
Fig. 1.11: XDG dependency tree (top) and XDG block graph (bottom) for the string aaabbb
In the second step, we examine the rules of REGDG. They can be best explained
by example. Consider the rule:
which is expanded by the second a in Fig. 1.11. First, the rule stipulates that a head
with incoming edge label A associated with the word a must have two dependents:
one labeled A and one labeled B. Second, the rule stipulates the order of the yields
of the dependents and the head, where the yields are divided into contiguous sets
of nodes called blocks. In the order tuples (e.g. h01, 21i), 0 represents the head, 1
the blocks in the yield of the first dependent (here: A), and 2 the blocks in the yield
of the second dependent (here: B). The tuple h01, 21i from the example rule then
states: the yield of the A dependent must consist of two blocks (two occurrences of
1 in the tuple) and that of the B dependent of one block (one occurrence of 2), the
head must precede the first block of the A dependent, which must precede the first
(and only) block of the B dependent, which must precede the second block of the A
dependent, and the yield of the head must be divided into two blocks, where the gap
is between the first block of the A dependent and the first (and only) block of the B
dependent.
As REGDG do not only make statements on dependency structures but also on
the yields of the nodes, we exploit the multi-dimensionality of XDG and introduce
a second dimension called BLOCK. The structures on the BLOCK dimension are
graphs representing the function from nodes to their yields on DEP. That is, each
edge from v to v0 on BLOCK corresponds to a sequence of zero or more edges from
v to v0 on DEP:
∀v : ∀v0 : v →QS v0 ⇔ v →∗DEP v0
18 Ralph Debusmann and Marco Kuhlmann
An edge from v to v0 labeled i on BLOCK states that v0 is in the ith block of the yield
of v on DEP.
We model that the blocks are contiguous sets of nodes by a principle stipulating
that for all pairs of edges, one from v to v0 , and one from v to v00 , both labeled with
the same label l, the set of nodes between v0 and v00 must also be in the yield of v:
l l
∀v : ∀v0 : ∀v00 : ∀l : (v −→BLOCK v0 ∧ v −→BLOCK v00 ) ⇒ (∀v000 : v0 < v000 ∧ v000 < v00 ⇒ v →∗BLOCK v000 )
Fig. 1.11 (bottom)5 shows an example BLOCK graph complementing the DEP
tree in Fig. 1.11 (top). On DEP, the yield of the second a (node 2) consists of itself
and the third a (node 3) in the first block, and the second b and the third b (nodes
5 and 6) in the second block. Hence, on the BLOCK dimension, the node has four
dependents: itself and the third a are dependents labeled 1, and the second b and the
third b are dependents labeled 2.
We model the rules of the REGDG in XDG in four steps. First, we lexically con-
strain the incoming and outgoing edges of the nodes on DEP. For example, to model
the example rule (1.5), we stipulate that the node associated with the word a must
have precisely one incoming edge labeled A, and one A and one B dependent, as
shown in Fig. 1.12 (a).
A! 1* 2*
DEP QS DEP, QS B
0 A
A! B! 1+ 2+ 2
1 1
a a a
Fig. 1.12: (a) Constraints on DEP, (b) BLOCK, and (c) on both DEP and BLOCK
must order the 1 dependents of the A dependent to the left of the 1 dependents of the
B dependent, and these in turn to the left of the 2 dependents of the A dependent.
Fourth, we lexically model the location of the gaps between the blocks. In the
example rule (1.5), there is one gap between the first block of the A dependent and
the first (and only) block of the B dependent, as indicated in Fig. 1.12 (c).
1.8.1 Parser
In this section, we answer the question whether the constraint parser of the XDK ac-
tually scales up for large-scale parsing. We find a positive answer to this question by
showing that the parser can be fine-tuned for parsing the large-scale TAG grammar
XTAG [55], such that most of the time, it finds the first parses of a sentence before a
fast TAG chart parser with polynomial time complexity. This is surprising given that
the XDK constraint parser has exponential time complexity in the worst case.
For our experiment, we applied the most recent version of the XTAG grammar
from February 2001, which has a full form lexicon of 45171 words and 1230 ele-
mentary treyes. The average lexical ambiguity is 28 elementary trees per word, and
the maximum lexical ambiguity 360 (for get). Verbs are typically assigned more
than 100 elementary trees. We developed an encoding of the XTAG grammar into
XDG based on ideas from [12] and our encoding of regular dependency grammars
(Sect. 1.7), and implemented these ideas in the XDK.
We tested the XDK with this grammar on a subset of section 23 of the Penn
Treebank, where we manually replaced words not in the XTAG lexicon by appropri-
ate words from the XTAG lexicon. We compared our results with the official XTAG
parser: the LEM parser [44], a chart parser implementation with polynomial com-
plexity. For the LEM parser, we measured the time required for building up the
chart, and for the XDK parser, the time required for the first solution and the first
1000 solutions. Contrary to the LEM parser, the XDK parser does not build up a
chart representation for the efficient enumeration of parses. Hence one of the most
interesting questions was how long the XDK parser would take to find not only the
first but the first 1000 parses.
We did not use the supertagger included in the LEM package, which significantly
increases its efficiency at the cost of accuracy [44]. We must also note that longer
sentences are assigned up to millions of parses by the XTAG grammar, making it
unlikely that the first 1000 parses found by the constraint parser also include the best
parses. This could be remedied with sophisticated search techniques for constraint
parsing [14, 39].
1 Dependency Grammar: Classification and Exploration 21
We parsed 596 sentences of section 23 of the Penn Treebank whose length ranged
from 1 to 30 on an Athlon 64 3000+ processor with 1 GByte of RAM. The average
sentence length was 12.36 words. From these 596 sentences, we first removed all
those which took longer than a timeout of 30 minutes using either the LEM or the
XDK parser. The LEM parser exceeded the timeout in 132 cases, and the XDK in 94
cases, where 52 of the timeouts were shared among both parsers. As a result, we
had to remove 174 sentences to end up with 422 sentences where neither LEM nor
the XDK had exceeded the timeout. They have an average length of 10.73 words.
The results of parsing these remaining 422 sentences is shown in Table 1.2.
Here, the second column shows the time the LEM parser required for building up
the chart, and the percentage of exceeded timeouts. The third and fourth column
show the times required by the standard XDK parser (using the constraint engine
of MOZART / OZ 1.3.2) for finding the first parse and the first 1000 parses, and the
percentage of exceeded timeouts. The fourth and fifth column show the times when
replacing the standard MOZART / OZ constraint engine with the new, faster GECODE
2.0.0 constraint library [46], and again the percentage of exceeded timeouts.
Interestingly, despite the polynomial complexity of the LEM parser, the XDK
parser not only less often ran into the 30 minute timeout, but was also faster than
LEM on the remaining sentences. Using the standard MOZART / OZ constraint engine,
the XDK found the first parse 3.2 times faster, and using GECODE, 16.8 times faster.
Even finding the first 1000 parses was 1.7 (MOZART / OZ) and 7.8 (GECODE) times
faster. The gap between LEM and the XDK parser increased with increased sentence
length. Of the sentences between 16 and 30 words, the LEM parser exceeded the
timeout in 82.14% of the cases, compared to 45.54% (MOZART / OZ) and 38.39%
(GECODE). Finding the first parse of the sentences between 16 and 30 words was
8.9 times faster using MOZART / OZ, and 41.1 times faster using GECODE. The XDK
parser also found the first 1000 parses of the longer sentences faster than LEM: 5.2
times faster using MOZART / OZ and 19.8 times faster using GECODE.
LEM XDK
MOZART / OZ GECODE
1.9 Conclusion
The goals of the research reported in this chapter were to classify dependency gram-
mars in terms of their generative capacity and parsing complexity, and to explore
their expressive power in the context of a practical system. To reach the first goal,
we have developed the framework of regular dependency grammars, which provides
a link between dependency structures on the one hand, and mildly context-sensitive
grammar formalisms such as TAG on the other. To reach the second goal, we have
designed a new meta grammar formalism, XDG, implemented a grammar develop-
ment environment for it, and used this to give novel accounts of linguistic phenom-
ena such as word order variation, and to develop a powerful syntax-semantics inter-
face. Taken together, our research has provided fundamental insights into both the
theoretical and the practical aspects of dependency grammars, and a more accurate
picture of their usability.
References
1. 45th Annual Meeting of the Association for Computational Linguistics (ACL) (2007)
2. Bader, R., Foeldesi, C., Pfeiffer, U., Steigner, J.: Modellierung grammatischer Phänomene der
deutschen Sprache mit Topologischer Dependenzgrammatik (2004). Softwareprojekt, Saar-
land University
3. Becker, T., Rambow, O., Niv, M.: The derivational generative power, or, scrambling is beyond
LCFRS. Tech. rep., University of Pennsylvania (1992)
4. Bodirsky, M., Kuhlmann, M., Möhl, M.: Well-nested drawings as models of syntactic struc-
ture. In: Tenth Conference on Formal Grammar and Ninth Meeting on Mathematics of Lan-
guage. Edinburgh, UK (2005)
5. Culotta, A., Sorensen, J.: Dependency tree kernels for relation extraction. In: 42nd Annual
Meeting of the Association for Computational Linguistics (ACL), pp. 423–429. Barcelona,
Spain (2004). DOI 10.3115/1218955.1219009
6. Debusmann, R.: A declarative grammar formalism for dependency grammar. Diploma thesis,
Saarland University (2001). Http://www.ps.uni-sb.de/Papers/abstracts/da.html
7. Debusmann, R.: Multiword expressions as dependency subgraphs. In: Proceedings of the ACL
2004 Workshop on Multiword Expressions: Integrating Processing. Barcelona/ES (2004)
8. Debusmann, R.: Extensible dependency grammar: A modular grammar formalism based on
multigraph description. Ph.D. thesis, Universität des Saarlandes (2006)
9. Debusmann, R.: The complexity of First-Order Extensible Dependency Grammar. Tech. rep.,
Saarland University (2007)
10. Debusmann, R.: Scrambling as the intersection of relaxed context-free grammars in a model-
theoretic grammar formalism. In: ESSLLI 2007 Workshop Model Theoretic Syntax at 10.
Dublin/IE (2007)
11. Debusmann, R., Duchier, D., Koller, A., Kuhlmann, M., Smolka, G., Thater, S.: A relational
syntax-semantics interface based on dependency grammar. In: Proceedings of COLING 2004.
Geneva/CH (2004)
12. Debusmann, R., Duchier, D., Kuhlmann, M., Thater, S.: TAG as dependency grammar. In:
Proceedings of TAG+7. Vancouver/CA (2004)
13. Debusmann, R., Duchier, D., Niehren, J.: The XDG grammar development kit. In: Proceed-
ings of the MOZ04 Conference, Lecture Notes in Computer Science, vol. 3389, pp. 190–201.
Springer, Charleroi/BE (2004)
1 Dependency Grammar: Classification and Exploration 23
14. Dienes, P., Koller, A., Kuhlmann, M.: Statistical A* dependency parsing. In: Prospects and
Advances in the Syntax/Semantics Interface. Nancy/FR (2003)
15. Duchier, D., Debusmann, R.: Topological dependency trees: A constraint-based account of
linear precedence. In: Proceedings of ACL 2001. Toulouse/FR (2001)
16. Egg, M., Koller, A., Niehren, J.: The Constraint Language for Lambda Structures. Journal of
Logic, Language, and Information (2001)
17. Eisner, J., Satta, G.: Efficient parsing for bilexical context-free grammars and Head Automaton
Grammars. In: 37th Annual Meeting of the Association for Computational Linguistics (ACL),
pp. 457–464. College Park, MD, USA (1999). DOI 10.3115/1034678.1034748
18. Gaifman, H.: Dependency systems and phrase-structure systems. Information and Control 8,
304–337 (1965)
19. Hajič, J., Panevová, J., Hajičová, E., Sgall, P., Pajas, P., Štěpánek, J., Havelka, J., Mikulová,
M.: Prague Dependency Treebank 2.0. Linguistic Data Consortium, 2006T01 (2006)
20. Havelka, J.: Beyond projectivity: Multilingual evaluation of constraints and measures on non-
projective structures. In: 45th Annual Meeting of the Association for Computational Linguis-
tics (ACL) [1], pp. 608–615. URL http://www.aclweb.org/anthology/P/P07/P07-1077.pdf
21. Hays, D.G.: Dependency theory: A formalism and some observations. Language 40(4), 511–
525 (1964). DOI 10.2307/411934
22. Holan, T., Kuboň, V., Oliva, K., Plátek, M.: Two useful measures of word order complexity.
In: Workshop on Processing of Dependency-Based Grammars, pp. 21–29. Montréal, Canada
(1998)
23. Hotz, G., Pitsch, G.: On parsing coupled-context-free languages. Theoretical Computer Sci-
ence 161(1–2), 205–233 (1996). DOI 10.1016/0304-3975(95)00114-X
24. Hudson, R.A.: English Word Grammar. B. Blackwell, Oxford/UK (1990)
25. Huybregts, R.: The weak inadequacy of context-free phrase structure grammars. In:
G. de Haan, M. Trommelen, W. Zonneveld (eds.) Van periferie naar kern, pp. 81–99. Foris,
Dordrecht, The Netherlands (1984)
26. Joshi, A.K.: Tree Adjoining Grammars: How much context-sensitivity is required to provide
reasonable structural descriptions? In: Natural Language Parsing, pp. 206–250. Cambridge
University Press (1985)
27. Joshi, A.K., Schabes, Y.: Tree-Adjoining Grammars. In: G. Rozenberg, A. Salomaa (eds.)
Handbook of Formal Languages, vol. 3, pp. 69–123. Springer (1997)
28. Koller, A., Striegnitz, K.: Generation as dependency parsing. In: Proceedings of ACL 2002.
Philadelphia/US (2002)
29. Kruijff, G.J.M.: Dependency grammar. In: Encyclopedia of Language and Linguistics, 2nd
edn., pp. 444–450. Elsevier (2005)
30. Kuhlmann, M.: Dependency structures and lexicalized grammars. Doctoral dissertation, Saar-
land University, Saarbrücken, Germany (2007)
31. Kuhlmann, M., Möhl, M.: Mildly context-sensitive dependency languages. In: 45th Annual
Meeting of the Association for Computational Linguistics (ACL) [1], pp. 160–167. URL
http://www.aclweb.org/anthology/P07-1021
32. Kuhlmann, M., Möhl, M.: The string-generative capacity of regular dependency languages.
In: Twelfth Conference on Formal Grammar. Dublin, Ireland (2007)
33. Kuhlmann, M., Nivre, J.: Mildly non-projective dependency structures. In: 21st International
Conference on Computational Linguistics and 44th Annual Meeting of the Association for
Computational Linguistics (COLING-ACL), Main Conference Poster Sessions, pp. 507–514.
Sydney, Australia (2006). URL http://www.aclweb.org/anthology/P06-2000
34. Marcus, S.: Algebraic Linguistics: Analytical Models, Mathematics in Science and Engineer-
ing, vol. 29. Academic Press, New York, USA (1967)
35. McDonald, R., Satta, G.: On the complexity of non-projective data-driven dependency parsing.
In: Tenth International Conference on Parsing Technologies (IWPT), pp. 121–132. Prague,
Czech Republic (2007). URL http://www.aclweb.org/anthology/W/W07/W07-2216
36. Mel’čuk, I.: Dependency Syntax: Theory and Practice. State Univ. Press of New York, Al-
bany/US (1988)
24 Ralph Debusmann and Marco Kuhlmann
37. Mezei, J.E., Wright, J.B.: Algebraic automata and context-free sets. Information and Control
11(1–2), 3–29 (1967). DOI 10.1016/S0019-9958(67)90353-1
38. Mozart Consortium: The Mozart-Oz website (2007). Http://www.mozart-oz.org/
39. Narendranath, R.: Evaluation of the stochastic extension of a constraint-based dependency
parser (2004). Bachelorarbeit, Saarland University
40. Neuhaus, P., Bröker, N.: The complexity of recognition of linguistically adequate dependency
grammars. In: 35th Annual Meeting of the Association for Computational Linguistics (ACL),
pp. 337–343. Madrid, Spain (1997). DOI 10.3115/979617.979660
41. Nivre, J.: Constraints on non-projective dependency parsing. In: Eleventh Conference of
the European Chapter of the Association for Computational Linguistics (EACL), pp. 73–80.
Trento, Italy (2006)
42. Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., Yuret, D.: The
CoNLL 2007 shared task on dependency parsing. In: Joint Conference on Empir-
ical Methods in Natural Language Processing and Computational Natural Language
Learning (EMNLP-CoNLL), pp. 915–932. Prague, Czech Republic (2007). URL
http://www.aclweb.org/anthology/D/D07/D07-1096
43. Quirk, C., Menezes, A., Cherry, C.: Dependency treelet translation: Syntactically informed
phrasal SMT. In: 43rd Annual Meeting of the Association for Computational Linguistics
(ACL), pp. 271–279. Ann Arbor, USA (2005). DOI 10.3115/1219840.1219874
44. Sarkar, A.: Complexity of Lexical Descriptions and its Relevance to Natural Language Pro-
cessing: A Supertagging Approach, chap. Combining SuperTagging with Lexicalized Tree-
Adjoining Grammar Parsing. MIT Press (2007)
45. Schulte, C.: Programming Constraint Services, Lecture Notes in Artificial Intelligence, vol.
2302. Springer-Verlag (2002)
46. Schulte, C., Lagerkvist, M., Tack, G.: GECODE—Generic Constraint Development Environ-
ment (2007). Http://www.gecode.org/
47. Setz, J.: A principle compiler for Extensible Dependency Grammar. Tech. rep., Saarland
University (2007). Bachelorarbeit
48. Sgall, P., Hajicova, E., Panevova, J.: The Meaning of the Sentence in its Semantic and Prag-
matic Aspects. D. Reidel, Dordrecht/NL (1986)
49. Shieber, S.M.: Evidence against the context-freeness of natural language. Linguistics and
Philosophy 8(3), 333–343 (1985). DOI 10.1007/BF00630917
50. Smolka, G.: The Oz programming model. In: J. van Leeuwen (ed.) Computer Science To-
day, Lecture Notes in Computer Science, vol. 1000, pp. 324–343. Springer-Verlag, Berlin/DE
(1995)
51. Tesnière, L.: Éléments de syntaxe structurale. Klinksieck, Paris, France (1959)
52. Veselá, K., Havelka, J., Hajičová, E.: Condition of projectivity in the underlying dependency
structures. In: 20th International Conference on Computational Linguistics (COLING), pp.
289–295. Geneva, Switzerland (2004). DOI 10.3115/1220355.1220397
53. Vijay-Shanker, K., Weir, D.J., Joshi, A.K.: Characterizing structural descriptions pro-
duced by various grammatical formalisms. In: 25th Annual Meeting of the Association
for Computational Linguistics (ACL), pp. 104–111. Stanford, CA, USA (1987). DOI
10.3115/981175.981190
54. Weir, D.J.: Characterizing mildly context-sensitive grammar formalisms.
Ph.D. thesis, University of Pennsylvania, Philadelphia, USA (1988). URL
http://wwwlib.umi.com/dissertations/fullcit/8908403
55. XTAG Research Group: A Lexicalized Tree Adjoining Grammar for English. Tech. Rep.
IRCS-01-03, IRCS, University of Pennsylvania (2001)
56. Yli-Jyrä, A.: Multiplanarity – a model for dependency structures in treebanks. In: Second
Workshop on Treebanks and Linguistic Theories (TLT), pp. 189–200. Växjö, Sweden (2003)