0% found this document useful (0 votes)
57 views5 pages

LL Nerd

This document discusses language acquisition by infants and defines it as a learning problem. It frames language as a set of sentences and defines the goal of learning as identifying the target language from a finite text of utterances. The document also discusses how languages can be represented finitely using grammars and automata, and how the infant can hypothesize grammars to identify the target language.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views5 pages

LL Nerd

This document discusses language acquisition by infants and defines it as a learning problem. It frames language as a set of sentences and defines the goal of learning as identifying the target language from a finite text of utterances. The document also discusses how languages can be represented finitely using grammars and automata, and how the infant can hypothesize grammars to identify the target language.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Why we Respect our Teachers

A Note on Language Learnability and Active Learning


Purushottam Kar
Department of Computer Science and Engineering
Indian Institute of Technology Kanpur
Kanpur, INDIA
[email protected]

ABSTRACT all that the child gets directed at itself are mollycoddles -
Language acquisition - especially in human infants - is a nonsensical blabbers that caretakers make in order to show
problem that intrigues the layman and baffles the expert. their affection toward the infant - which, in the author’s
This article takes a computational viewpoint toward the view, can only obfuscate things further).
problem and investigates the problem of learnability of lan- Despite being immersed in such a hostile environment, the
guages. We look at some results that show that learning is infant ends up learning the language of its caretakers and
impossible even under very weak criterion. We next look eventually becomes a proficient speaker. In order to un-
at a framework in which learning takes place with a helpful derstand this problem better let us try to pose it in formal
teacher and demonstrate the role such a teacher can play terms. Of course this will involve making certain simplifi-
in easing the learning problem. The discussion would define cations which we shall reason for as we go along. We shall
language learning formally, survey classical results and point strive to keep the exposition simple and shall supplement ar-
toward recent advances in the field. guments with schematic diagrams to better convey the key
ideas.

Categories and Subject Descriptors


2. LANGUAGES AND LEARNING
I.2.6 [Learning]: Language Acquisition; A.1 [General Lit-
The first simplification that we will make is that of in-
erature]: INTRODUCTORY AND SURVEY
terpreting a language as a set. To see why let us take the
set of all words present in our favorite English dictionary
General Terms which would be sufficient to communicate most intentions -
Learning Theory of course our favorite IITK lingo would be missing but let us
choose to live with this handicap - and call this set Σ. Let
Σn denote the set of all sentences1 ofS length n where the
Keywords words are taken from Σ. Let Σ∗ = Σi be the set of all
Active Learning, Language Learnability i>0
sentences of finite length that use words from Σ. Clearly Σ∗
contains all English sentences. However, it also contains sen-
1. INTRODUCTION tences like “This celebrating Jubilee is institute our Golden
Many of us have witnessed a younger sibling (or a niece/nephew) its year.” and “How it why now where is.” which are not
learn to speak and wondered about it. The wondersome well formed English sentences. Thus we see that the En-
feat achieved by these infants, namely learning a medium glish “language” can be thought of as that subset E ⊂ Σ∗
of communication used by adults far more experienced and which contains only well formed grammatical English sen-
developed neurologically, has captured the attention of re- tences.2 For the rest of the article, whenever we refer to a
searchers for quite some time now. Even the lay person language, it will always be a set L ⊆ Σ∗ .
finds himself devoting a minute or two to this problem. If A language L is said to be finite if |L| < ∞.3 Clearly
one ponders on the conditions in which language learning English is not a finite language since given a sentence s ∈ E,
takes place then one easily notices a paradox - the infant one can always create a grammatically correct sentence like
is simply exposed to utterances in its mother tongue, utter- “My friend thinks that s.” of length greater than that of s.
ances that are often ill-formed and spontaneous. Most of Hence I can construct sentences of arbitrarily large lengths
these utterances are not even directed at the child (in fact which makes |E| = ∞.
So suppose the caretakers speak a language Lt ⊂ Σ∗
1
We shall use the terms “string”, “utterance” and “sentence”
interchangeably in this discussion.
2
Of course the debate on whether to consider sentences
like “Colorless green ideas sleep furiously.” which although
grammatically correct, do not make any sense (or do they?)
can be waged here and the author invites readers to wage
these debates among themselves.
3
For a set S, |S| denotes its cardinality: loosely speaking,
the number of elements in S.
which the infant must identify or approximate in some sense. generates L2 . It is easy to write an algorithm to say YES
What the infant receives is a finite number of utterances on strings in L2 and No to others.
s1 , s2 , . . . , sn where each si ∈ Lt . We will call such a se- Whether humans use grammars, algorithmic procedures
quence a finite text. This is a reasonable assumption since or some other means to represent languages in their minds
the infant seldom receives ill-formed utterances which are is a matter of intense study in a very exciting field called
tagged as ill-formed [Niy06]. Now the job of the infant is to, Cognitive Linguistics. However for us it is sufficient that the
given a finite text, identify the target language Lt . Sup- infant have ways to posit a hypothesis (i.e. its guess of what
pose the infant has no prior information about the nature or Lt is) effectively. Given a grammar g hypothesized by the
properties of Lt . All it knows is that Lt is some language in infant, let Lg be the corresponding language. For purposes
Σ∗ which contains all the utterances that it has just heard. of evaluation let us assume that we have a notion of distance
In this case the infant is faced with the following dilemma between languages. Thus given two languages L1 and L2 ,
- there are an infinite number of such languages: which one we have a distance measure d : (L1 , L2 ) 7−→ R. Many such
should the infant identify as the target?4 We shall return distance measures can be considered, a natural one being a
to this question in a short while (in Section 2.2) after build- measure that depends on the symmetric difference of the two
ing some more notational apparatus to better discuss the languages when interpreted as sets i.e. L1 4L2 = L1 \L2 ∪
problem. L2 \L1 . This distance measure would penalize the infant
if it learns a grammar that classifies a large portion of Lt
2.1 Languages and their Grammars as ungrammatical and a large chunk outside of Lt as well-
A missing detail in the above discussion involves repre- formed (see Figure 1). One can be even stricter and define
sentation. Since we have already agreed that English (or for d(L1 , L2 ) = 0 if and only if L1 = L2 and 1 otherwise.
that matter any language that supports recursive embed-
dings - in particular all natural, i.e. human, languages) is
an infinite set, representation becomes a problem. In other
words how does the infant represent the infinite set it has
learnt - it certainly cannot store the entire set explicitly.
However we have an intuitive solution to this. All of us
have a representation of English in our minds, and a finite
one since our minds are finite objects. There do exist sev-
eral ways of finitely representing infinite sets, two commonly
used ones being automata and grammars. An automaton
is like a computer algorithm which can accept input and give
output. An automata corresponding to a language is simply
an algorithm that answers YES if and only if it is given a
string in that language. A grammar, on the other hand, is (a) This infant has proba- (b) This infant is close to
a set of rules that can be used to generate some strings. A bly learnt French learning English though
grammar corresponding to a language generates all and only
strings in that language.5 Figure 1: A distance measure between languages,
For example take the following language over binary strings The shaded portion is Lt 4Lg
L1 = {0n 1110m | n, m ≥ 0}. It is a Regular Language
generated by the following grammar which is a Regular
Expression G1 = 0∗ 1110∗ . This grammar generates strings A language Lt will be said to have been learnt on a finite
which consist of some (or possibly no) zeros followed by three text τ (consisting of strings from Lt ) as per a distance mea-
ones followed by some (or possibly no) zeros. It is clear that sure d if the infant (assuming it starts off with an “initial”
G1 generates L1 . It is a simple task to write an algorithm hypothesis g0 corresponding to the language L0 ) outputs a
that answers YES if and only if given a string in L1 . grammar gτ on being exposed to τ such that d(Lgτ , Lt ) = 0.
Take the following Context-free Language L2 = {0n 1n | n > A language that can be learnt on any given finite text (so
0}. This is generated by the following Context-free Gram- long as the text contains strings from Lt alone) is said to
mar G2 be learnable. A class of languages L (a class of languages
is simply a set of languages) is said to be learnable if each
S → 0S1 L ∈ L is learnable (see Figure 2).
S → 01
2.2 Language Learnability
This grammar generates the string 01 and for every string Let us formalize the dilemma faced by the infant discussed
s that can be generated, the grammar also generates 0s1. earlier. The infant is provided with a finite text and has to
Thus 01 can be generated which in turn paves way for the posit a language as its hypothesis. The problem for the in-
generation of 0011, and so on. Again it is clear that G2 fant is that the target language could be any language that
4
Note that we are assuming that the infant knows Σ. This contains the strings it received. In other words, if the infant
is a simplifying assumption but is not too unreasonable as were to be given an assurance before learning started that
concept and word learning predate syntax acquisition[Pin90] the language it has to learn will only come from a Target ∗
although these are not distinct stages. Language Class L, then the class in this case is L = 2Σ
5
Under the Church-Turing Hypothesis, only recursively enu- which in effect gives the infant no apriori information about
merable sets admit such finite representations - but we do Lt .6 One might wonder how such “assurances” can be given
not have to worry about this technicality - all our languages
6
will be far from raising recursive enumerability questions. Recall that for any set X, 2X denotes the power set con-
in the limit. Similarly we define what it means for a language
and a class of languages to be identifiable in the limit (see
Figure 3).

(a) Lt learnable on τ (b) Lt learnable on any


text

Figure 2: Learnability of Languages


Figure 3: The infant will eventually converge to Lt

to an infant. It turns out that if one believes in the Univer-


sal Grammar Hypothesis [Cho65], then such an assurance Clearly these conditions are weaker than those in Theo-
is inbuilt in all of us. The hypothesis, very broadly speak- rem1. However in a seminal paper, Gold [Gol67] demon-
ing, states that certain universal properties are shared by strated that even under these weakened conditions and in- ∗
grammars of all human languages. creased resources not only the does language class L = 2Σ

Coming back to our problem, the set of languages in 2Σ continue to be non-learnable but the non-learnability per-
which contain the finite text received by the infant (this sists even the infant is given some prior knowledge about
holds for any finite text) is vast and these languages are the target language by restricting the target class. We do
very different from each other according to the distance mea- not give Gold’s original proof here but one that follows from
sures discussed earlier (in fact they would differ widely as results by Blum and Blum [BB75] in a manner presented in
per any distance measure that encodes the generalization [Niy06].
performance of the infant’s learnt grammar). Hence the in-
fant has no surety of arriving at a grammar that even closely Theorem 2 (Locking Text Theorem). A language Lt
approximates the target language even if it chooses a lan- is learnable only if for every  > 0 there exists a finite “lock-
guage that contains all the utterances it has heard. Thus we ing” text τ composed of strings in Lt such that d(Lgτ , Lt ) <
arrive at the following result:  and for all finite texts σ composed of strings in Lt , d(Lg(τ ◦σ) , Lt ) <
 where τ ◦σ is the concatenation of the two finite sequences.

Theorem 1. The language class L = 2Σ is not learnable
Essentially, the theorem says that in order for a language
with finite texts.
to be learnable, there must exist finite texts that take the
2.3 Learning with Infinite Resources infant  close to the target and “lock” it there. Thus, after
viewing the locking text, no matter what subsequent ut-
Were we expecting too much from the infant in the earlier terances it observes, the infant never makes a subsequent
section? Can we relax the learning conditions a bit and see hypothesis that is farther off than  i.e. no further exposure
if learning can take place? In particular can we give the in- can mislead it. We shall prove the theorem by contradic-
fant more sentences to learn the language? Can we restrict tion. We shall show that if locking texts do not exist then
the target language class so as to increase the chances of we can construct an infinite text on which the infant will
arriving at the target language? We shall see in the follow- never converge to the target. Since convergence is necessary
ing discussion that even if we present the entire language on every infinite text for a language to be called learnable,
to the infant (by giving it an infinite number of sentences) we shall have proved the theorem.
and restrict the target class to one step beyond the trivial,
learnability continues to elude us. Proof. (Sketch) Now for the actual proof. Notice that if
First of all let us give the infant infinite texts. An infi- there did not exist locking texts for every  > 0, it means for
nite text τ for a language L is an infinite sequence of strings some ∗ > 0 there is no corresponding locking text, that is
s1 , s2 , . . . , sn , . . . all of which are in L such that every ele- to say no finite text τ taking the infant ∗ -close to the target
ment of L appears at least once in L. By τk we shall denote is able to lock it there. Thus for every such text τ that takes
the finite text comprising the first k elements of τ . the infant close to the target, there must exist a “violator”
Let gτk be the infant’s hypothesis after receiving τk for text σ such that although d(Lgτ , Lt ) < δ, after encountering
k > 0. Then we say that a language Lt is learnt on an σ, d(Lg(τ ◦σ) , Lt ) > δ. See Figure 4 for a schematic.7
infinite text τ as per a distance measure d in the limit if We can use these violator texts to create an infinite text ζ
lim d(Lgτk , Lt ) = 0 i.e. if the infant converges to the target for which lim d(Lgζk , Lt ) 6= 0. We do the following: when-
k→∞ k→∞
ever we observe the infant getting δ-close to the target on a
taining all subsets of X, including the empty one. Thus the
∗ finite text, we feed the infant the violator text corresponding
set 2Σ contains all languages. Actually we are just con-
7
cerned with languages that admit finite representations but Note that if there is no locking text for δ then there are
beg to gloss over this point. none for any  < δ either.
Theorem 3. Any language class L that contains all finite
languages and at least one infinite language is not learnable
in the limit with infinite texts.
Proof. (Sketch) Consider such a family L and an infi-
nite language L∞ ∈ L. Since L is learnable, there must
exist finite locking texts for L∞ for every  > 0. In partic-
ular consider the one corresponding to  = 12 and call it τ .
Note that the locking sets themselves are finite languages
and hence are contained in L (since L contains all finite
languages). Thus L contains Lτ , the set of strings in τ .
Now suppose the infant wants to learn Lτ and the infinite
text it gets starts with τ itself. Then we have a problem:
Figure 4: τ brings the infant close but the violator although the infant wanted to learn Lτ , it will get locked to
σ spoils the show - i.e. τ cannot be a locking text L∞ as τ is a locking text for L∞ . Thus the infant cannot
for δ learn Lτ in the limit and hence L is not learnable in the
limit as it contains a language that is not learnable in the
limit.
to the text the infant has seen until now to force it to give
a hypothesis that is at least δ far off from Lt . A little analysis will tell us that in the above situation,
For example if the infant gets δ-close on a finite text although we ended up proving that a finite language is not
τ1 , feed it the corresponding violator text (say σ1 ) so that learnable, it is actually the infinite language that is the trou-
d(Lg(τ1 ◦σ1 ) , Lt ) > δ. Now it is possible that after listening ble maker since finite languages are trivial to learn using
just finite texts. However if infinite languages are a problem
to some more utterances (in the form of another finite text
then we are in a fix since our English language is an infinite
τ 0 ), the infant again comes close to Lt . Let us call the text
one and is believed to be a part of a language class called
seen until now τ2 i.e. let τ2 = τ1 ◦ σ1 ◦ τ 0 . This means
Context-Free Languages8 which unfortunately contains all
d(Lgτ2 , Lt ) < δ. But there is no reason to worry because
finite languages and also contains English, an infinite lan-
even for τ2 there would exist some violator text σ2 (since
guage.
no text can lock the infant to a close neighborhood of Lt )
The same holds true for the class of Regular Languages
which we will next feed the infant and again take it far away
which is arguably the simplest possible non-trivial (read in-
from Lt . Hence we have d(Lg(τ2 ◦σ2 ) , Lt ) > δ. See Figure 5
teresting) class of languages. Hence we have the following
for a schematic.
result that dashes all hopes of learnability for interesting
language classes.

Theorem 4. The classes of Regular and Context-Free lan-


guages are not learnable in the limit.

2.4 Approximately Learning Languages


For those who consider this to be as bad as things can
get, the author apologizes for providing yet another set of
relaxations which fail to make these language classes learn-
able. Even if one does not expect the infant to learn the
target language exactly but learn some nice approximation
(i.e. a grammar that will be correct say 90% of the time),
then even learning such an approxmation in a reasonable
Figure 5: Each time the infant tries to perform well, amount of time is impossible if one believes that a certain
we can make it perform badly since the infant is not mathematical conjecture holds [KV94]. However this is a
able to lock its good performance much more difficult result to prove and we do not attempt
to even state the result formally, let alone prove it.
However the mathematical conjecture in itself is fairly in-
Thus at each step, we are assured of the existence of vi- teresting. The conjecture is called the Discrete Cube Roots
olator texts since there are no locking texts. This way the Assumption and it essentially says that a certain invertible
infant would at best constantly oscillate in and out of the function is easy to compute but hard to invert.9 What makes
δ-neighborhood of Lt and can never converge to Lt . this conjecture interesting is that all secure Internet com-
munication places faith on the its validity. If this conjecture
Thus in order for a language to be learnable (given any
proves to be false then none of the secure protocols in use to-
text), there must exist finite texts that take the infant arbi-
day would be secure anymore and attackers would easily be
trarily close to the target and lock it there. However notice
that the existence of such locking texts does not guaran- 8
There is some degree of context-sensitivity in English but
tee learnability - it is just that their absence negates any we yet again choose not to address this issue further.
possibility of learning. 9
Think of the multiplication function that takes two primes
This immediately gives us Gold’s celebrated result. and outputs their product. Multiplication is easy but fac-
torization is not.
able to decode encrypted data that is sent over the Internet 4. ACKNOWLEDGMENTS
easily. The author thanks Professor Achla M. Raina, Charles
Thus the joy of being able to learn in an approximate sense Benjamin Strauber and Vedula Vijaya Saradhi for comments
would have to be accompanied by a realization that the next on an earlier version of the paper
time we key in our credit card details into a payment portal,
it would be very simple for an attacker to get hold of all the
details.
5. REFERENCES
[Ang87] Dana Angluin. Learning Regular Sets from
Queries and Counterexamples. Information and
3. ENTER THE TEACHER Computation, 75:87–106, 1987.
It turns out all that we need to get rid of the non-learnability [BB75] Manuel Blum and Lenore Blum. Towards a
results given in the previous section is the presence of a Mathematical Theory of Inductive Inference.
teacher! A teacher who can provide answers to certain spe- Information and Control, 28:125–155, 1975.
cial types of queries made by the learner, facilitates learning [Cho65] Noam Chomsky. Aspects of the Theory of
to the extent that it can take place not only in a finite num- Syntax. The MIT Press, 1965.
ber of steps but actually in a fairly small number of steps. [CKM07] Corinna Cortes, Leonid Kontorovich, and
These results were presented in the seminal papers of Mehryar Mohri. Learning languages with
[Ang87, Sak90] who proved respectively that the classes of rational kernels. In 20th Annual Conference on
Regular and Context-free languages are learnable with the Learning Theory, 2007.
help of teachers capable of answering two types of queries : [Gol67] E. Mark Gold. Language Identification in the
1. Membership Queries : The learner gives the teacher a Limit. Information and Control, 10(5):447–474,
string s and asks whether s ∈ Lt or not. The teacher 1967.
replies back with a YES/NO. [KV94] Michael J. Kearns and Umesh V. Vazirani. An
Introduction to Computational Learning Theory.
2. Equivalence Queries : The learner gives the teacher The MIT Press, 1994.
its current hypothesis grammar g and asks whether [Niy06] Partha Niyogi. The Computational Nature of
Lg = Lt or not. The teacher either answers YES (in Language Learning and Evolution. The MIT
which case the learner is done) or gives a counterex- Press, 2006.
ample string s ∈ Lg 4Lt . [Pin90] Steven Pinker. Language Acquisition. In an
Invitation to Cognitive Science: Language,
These results outline learning algorithms which when pre- volume 1. The MIT Press, 1990.
sented with the problem of learning a target language Lt , [Sak90] Yasubumi Sakakibara. Learning Context-free
start asking questions to the teacher. The algorithms pro- Grammars from Structural Data in Polynomial
cess the replies given by the teacher and formulate hypothe- time. Theoretical Computer Science,
ses and new questions to ask. If on an equivalence query, 76:223–242, 1990.
the teacher replies back with a YES, then the algorithms [SPMF01] Ingo Schröder, Horia F. Pop, Wolfgang Menzel,
halt. It turns out that if the grammar target language Lt and Kilian A. Foth. Learning Grammar Weights
is encoded by the grammar gt , then there exists a universal Using Genetic Algorithms. In Recent Advances
constant c > 0 such that these algorithms do not take more in Natural Language Processing, 2001.
than |gt |c steps to converge to the target grammar.10 [Wel84] T. A. Welch. A Technique for High-Performance
These results, although stimulating, are fairly involved Data Compression. Computer, 17(6):8–19, 1984.
and well beyond the scope of this article. However we do
[ZL78] Jacob Ziv and Abraham Lempel. Compression
realize the importance of teachers in learning situations such
of Individual Sequences Via Variable-Rate
as these (and also in our real lives). Since these results came
Coding. IEEE Transactions on Information
up, researchers have improved upon them and made them
Theory, 1978.
more amenable to practical application. For example we
now have genetic algorithms [SPMF01], greedy algorithms
[ZL78, Wel84] and kernel based algorithms [CKM07]11 for
grammatical inference.

There has also been a lot of research on child language


development and although very far from having the final
word on child language acquisition, we now have a better
idea of how human infants form word-concept correlations
and acquire syntax. However this topic merits a dedicated
article and we conclude this one with a vote of thanks to all
our teachers for making the learning process fun and simple.
10
|g| denotes the size of the grammar g - i.e. how much space
does it take to write down the rules of the grammar.
11
See Purushottam Kar. An Introduction to Support Vec-
tor Machines and their Applications. Notes on Engineering
Research and Development, xx(yy):pp–qq, 2009. for a dis-
cussion on kernel based algorithms.

You might also like