0% found this document useful (0 votes)
9 views43 pages

Lecture 6

Uploaded by

Beekan Gammadaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views43 pages

Lecture 6

Uploaded by

Beekan Gammadaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

Formal Grammar

Today
• Formal Grammars
– Context-free grammar
– Grammars for English
– Treebanks
– Dependency grammars

2
Syntax
• By grammar, or syntax, we have in mind
the kind of implicit knowledge of your
native language that you had mastered by
the time you were 3 years old without
explicit instruction
• Not the kind of stuff you were later taught
in “grammar” school

3
Syntax
• Why should you care?
• Grammars (and parsing) are key
components in many applications
– Grammar checkers
– Dialogue management
– Question answering
– Information extraction
– Machine translation

4
Syntax
• Key notions that we’ll cover
– Constituency
– Grammatical relations and Dependency
• Heads
• Key formalism
– Context-free grammars
• Resources
– Treebanks

5
Constituency
• The basic idea here is that groups of words
within utterances can be shown to act as
single units.
• And in a given language, these units form
coherent classes that can be shown to
behave in similar ways
– With respect to their internal structure
– And with respect to other units in the language

6
Constituency
• Internal structure
– We can describe an internal structure to the class
(might have to use disjunctions of somewhat
unlike sub-classes to do this).
• External behavior
– For example, we can say that noun phrases can
come before verbs

7
Constituency
• For example, it makes sense to the say that
the following are all noun phrases in
English...

• Why? One piece of evidence is that they


can all precede verbs.
– This is external evidence

8
Grammars and Constituency
• Of course, there’s nothing easy or obvious about
how we come up with right set of constituents and
the rules that govern how they combine...
• That’s why there are so many different theories of
grammar and competing analyses of the same
data.
• The approach to grammar, and the analyses,
adopted here are very generic (and don’t
correspond to any modern linguistic theory of
grammar).

9
Context-Free Grammars
• Context-free grammars (CFGs)
– Also known as
• Phrase structure grammars
• Backus-Naur form
• Consist of
– Rules
– Terminals
– Non-terminals

10
Context-Free Grammars
• Terminals
– We’ll take these to be words (for now)
• Non-Terminals
– The constituents in a language
• Like noun phrase, verb phrase and sentence
• Rules
– Rules are equations that consist of a single non-
terminal on the left and any number of
terminals and non-terminals on the right.

11
Some NP Rules
• Here are some rules for our noun phrases

• Together, these describe two kinds of NPs.


– One that consists of a determiner followed by a nominal
– And another that says that proper names are NPs.
– The third rule illustrates two things
• An explicit disjunction
– Two kinds of nominals
• A recursive definition
– Same non-terminal on the right and left-side of the rule
12
L0 Grammar

13
Generativity
• As with FSAs and FSTs, you can view these
rules as either analysis or synthesis
machines
– Generate strings in the language
– Reject strings not in the language
– Impose structures (trees) on strings in the
language

14
Derivations
• A derivation is a
sequence of rules
applied to a string that
accounts for that
string
– Covers all the elements
in the string
– Covers only the
elements in the string

15
Definition
• More formally, a CFG consists of

16
Parsing
• Parsing is the process of taking a string and
a grammar and returning a (multiple?) parse
tree(s) for that string
• It is completely analogous to running a
finite-state transducer with a tape
– It’s just more powerful
• Remember this means that there are languages we
can capture with CFGs that we can’t capture with
finite-state methods

17
An English Grammar Fragment
• Sentences
• Noun phrases
– Agreement
• Verb phrases
– Subcategorization

18
Sentence Types
• Declaratives: A plane left.
S  NP VP
• Imperatives: Leave!
S  VP
• Yes-No Questions: Did the plane leave?
S  Aux NP VP
• WH Questions: When did the plane leave?
S  WH-NP Aux NP VP

19
Noun Phrases
• Let’s consider the following rule in more
detail...
NP  Det Nominal
• Most of the complexity of English noun
phrases is hidden in this rule.
• Consider the derivation for the following
example
– All the morning flights from Denver to Tampa
leaving before 10

20
Noun Phrases

21
NP Structure
• Clearly this NP is really about flights.
That’s the central crucial noun in this NP.
Let’s call that the head.
• We can dissect this kind of NP into the
stuff that can come before the head, and the
stuff that can come after it.

22
Determiners
• Noun phrases can start with determiners...
• Determiners can be
– Simple lexical items: the, this, a, an, etc.
• A car
– Or simple possessives
• John’s car
– Or complex recursive versions of that
• John’s sister’s husband’s son’s car

23
Nominals
• Contains the head and any pre- and post-
modifiers of the head.
– Pre-
• Quantifiers, cardinals, ordinals...
– Three cars
• Adjectives
– large cars
• Ordering constraints
– Three large cars

24
Postmodifiers
• Three kinds
– Prepositional phrases
• From Seattle
– Non-finite clauses
• Arriving before noon
– Relative clauses
• That serve breakfast
• Same general (recursive) rule to handle these
– Nominal  Nominal PP
– Nominal  Nominal GerundVP
– Nominal  Nominal RelClause

25
Agreement
• By agreement, we have in mind constraints
that hold among various constituents that take
part in a rule or set of rules
• For example, in English, determiners and the
head nouns in NPs have to agree in their
number.

This flight *This flights


Those flights *Those flight

26
Problem
• Our earlier NP rules are clearly deficient
since they don’t capture this constraint
– NP  Det Nominal
• Accepts, and assigns correct structures, to
grammatical examples (this flight)
• But its also happy with incorrect examples (*these
flight)
– Such a rule is said to overgenerate.
– We’ll come back to this in a bit

27
Verb Phrases
• English VPs consist of a head verb along with
0 or more following constituents which we’ll
call arguments.

28
Subcategorization
• But, even though there are many valid VP
rules in English, not all verbs are allowed to
participate in all those VP rules.
• We can subcategorize the verbs in a
language according to the sets of VP rules
that they participate in.
• This is a modern take on the traditional
notion of transitive/intransitive.
• Modern grammars may have 100s or such
classes.
29
Subcategorization
• Sneeze: John sneezed
• Find: Please find [a flight to NY]NP
• Give: Give [me]NP[a cheaper fare]NP
• Help: Can you help [me]NP[with a flight]PP
• Prefer: I prefer [to leave earlier]TO-VP
• Told: I was told [United has a flight]S
• …

30
Subcategorization
• *John sneezed the book
• *I prefer United has a flight
• *Give with a flight

• As with agreement phenomena, we need a


way to formally express the constraints

31
Why?

• Right now, the various rules for VPs


overgenerate.
– They permit the presence of strings containing verbs
and arguments that don’t go together
– For example
– VP -> V NP therefore
Sneezed the book is a VP since “sneeze” is a verb
and “the book” is a valid NP

32
Treebanks
• Treebanks are corpora in which each sentence has
been paired with a parse tree (presumably the right
one).
• These are generally created
– By first parsing the collection with an automatic parser
– And then having human annotators correct each parse
as necessary.
• This generally requires detailed annotation
guidelines that provide a POS tagset, a grammar
and instructions for how to deal with particular
grammatical constructions.

33
Penn Treebank
• Penn TreeBank is a widely used treebank.

 Most well known


is the Wall
Street Journal
section of the
Penn TreeBank.
 1 M words from
the 1987-1989
Wall Street
Journal.

34
Treebank Grammars
• Treebanks implicitly define a grammar for
the language covered in the treebank.
• Simply take the local rules that make up the
sub-trees in all the trees in the collection
and you have a grammar.
• Not complete, but if you have decent size
corpus, you’ll have a grammar with decent
coverage.

35
Treebank Grammars
• Such grammars tend to be very flat due to
the fact that they tend to avoid recursion.
– To ease the annotators burden
• For example, the Penn Treebank has 4500
different rules for VPs. Among them...

36
Heads in Trees
• Finding heads in treebank trees is a task that
arises frequently in many applications.
– Particularly important in statistical parsing
• We can visualize this task by annotating the
nodes of a parse tree with the heads of each
corresponding node.

37
Lexically Decorated Tree

38
Head Finding
• The standard way to do head finding is to
use a simple set of tree traversal rules
specific to each non-terminal in the
grammar.

39
Noun Phrases

40
Treebank Uses
• Treebanks (and headfinding) are
particularly critical to the development of
statistical parsers
• Also valuable to Corpus Linguistics
– Investigating the empirical details of various
constructions in a given language

41
Dependency Grammars
• In CFG-style phrase-structure grammars the
main focus is on constituents.
• But it turns out you can get a lot done with
just binary relations among the words in an
utterance.
• In a dependency grammar framework, a
parse is a tree where
– the nodes stand for the words in an utterance
– The links between the words represent
dependency relations between pairs of words.
• Relations may be typed (labeled), or not.
42
Summary
• Context-free grammars can be used to model
various facts about the syntax of a language.
• When paired with parsers, such grammars
constitute a critical component in many
applications.
• Constituency is a key phenomena easily captured
with CFG rules.
– But agreement and subcategorization do pose
significant problems
• Treebanks pair sentences in corpus with their
corresponding trees.

43

You might also like