Natural Language Processing

                  Daniel Dahlmeier

NUS Graduate School for Integrative Sciences and Engineering
              danielhe@comp.nus.edu.sg


            CSTalks 2 November 2011
Acknowledgments




  Examples and figures from Michael Collins’ lecture notes:
  http://www.cs.columbia.edu/∼mcollins.


  Some other figures are from Wikipedia: http://www.wikipedia.org.


  The rest I randomly found on the web.
Examples
                    What is NLP?
                     Background
                       NLP tasks
                   Why is it hard?
                    Related Stuff
                      Conclusion



Google translate




                                     3/25
Examples
                     What is NLP?
                      Background
                        NLP tasks
                    Why is it hard?
                     Related Stuff
                       Conclusion



IBM’s Watson computer wins at Jeopardy!




                                          4/25
Examples
        What is NLP?
         Background
           NLP tasks
       Why is it hard?
        Related Stuff
          Conclusion



Siri




                         5/25
Examples
                             What is NLP?
                              Background
                                NLP tasks
                            Why is it hard?
                             Related Stuff
                               Conclusion



What is Natural Language Processing?


   Natural Language Processing (NLP) or Computational Linguistics
   Language processing that goes beyond a “bag of words” representation.

   Example
       Translate from one language into the other.
       Answer natural language questions.
       Parse the syntactic/semantic structure of a sentence.

   The other NLP
   NLP = neuro-linguistic programming.


                                                                           6/25
Examples
                                What is NLP?
                                 Background
                                   NLP tasks
                               Why is it hard?
                                Related Stuff
                                  Conclusion



Background(s): Artificial Intelligence




   Talk to your computer
       Dave: Hello, HAL. Do you read me, HAL?
       HAL: Affirmative, Dave. I read you.
       Dave: Open the pod bay doors, HAL.
       HAL: I’m sorry, Dave. I’m afraid I can’t do that.

   The computer needs to ...
       Understand the user : Natural Language Understanding.
       Generate a well-formed reply : Natural Language Generation.
                                                                     7/25
Examples
                              What is NLP?
                               Background
                                 NLP tasks
                             Why is it hard?
                              Related Stuff
                                Conclusion



Background(s): Artificial Intelligence (cont.)




   Turing Test
       Experimenter talks to two parties A and B via a terminal.
       If C cannot distinguish which party is a computer and which is a
       human, we should consider the computer to be intelligent.
       Natural language is deeply intertwined with intelligence.
                                                                          8/25
Examples
                              What is NLP?
                               Background
                                 NLP tasks
                             Why is it hard?
                              Related Stuff
                                Conclusion



Background(s): Linguistics




   Generative Linguistics
       Humans can produce and understand an infinite number of
       sentences by means of a finite set of rules.
       Language is produced through a generative, recursive process in the
       human brain.
       The principles that underlie this process are universal to all
       languages (universal grammar).                                        9/25
Examples
                               What is NLP?
                                Background
                                  NLP tasks
                              Why is it hard?
                               Related Stuff
                                 Conclusion



Background(s): the Web



       “We are drowning in information but starved for knowledge.”
       by Edward Osborne Wilson

   Too much text to read...
       Wikipedia: over 3.7 million articles (English).
       PubMed: over 20 million citations.
       WWW: billions of pages, trillions of words.




                                                                     10/25
Examples
                              What is NLP?
                               Background
                                 NLP tasks
                             Why is it hard?
                              Related Stuff
                                Conclusion



Part-of-speech Tagging



   Part-of-speech tagging
       Input: a sentence.
       Output: a part-of-speech tag sequence, e.g., noun, verb, adjective,...

   Example
   Profits/N soared/V at/P Boeing/N Co./N ,/, easily/ADV topping/V
   forecasts/N on/P Wall/N Street/N ./.




                                                                                11/25
Examples
                               What is NLP?
                                Background
                                  NLP tasks
                              Why is it hard?
                               Related Stuff
                                 Conclusion



Named-entity recognition


   Named-entity recognition
       Input: a sentence.
       Output: a BIO-named entity tag sequence, e.g., PERSON,
       ORGANIZATION, OTHER.

   Example
   Profits/O soared/O at/O Boeing/B-ORG Co./I-ORG ,/O easily/O
   topping/O forecasts/O on/O Wall/O Street/O ./O




                                                                12/25
Examples
                             What is NLP?
                              Background
                                NLP tasks
                            Why is it hard?
                             Related Stuff
                               Conclusion



Word Sense Disambiguation



   Word sense disambiguation
       Input: a sentence.
       Output: the sense of each word in the sentence.

   Example
   I/sense1 can/sense1 can/sense2 a/sense1 can sense3 .




                                                          13/25
Examples
                               What is NLP?
                                Background
                                  NLP tasks
                              Why is it hard?
                               Related Stuff
                                 Conclusion



Parsing
   Parsing
       Input: a sentence.
       Output: the syntactic tree structure of the sentence.

   Example
   Boeing is located in Seattle.




                                                               14/25
Examples
                              What is NLP?
                               Background
                                 NLP tasks
                             Why is it hard?
                              Related Stuff
                                Conclusion



Machine translation


   Machine Translation
      Input: a sentence in language F .
       Output: the translated sentence in language E .

   Example
   Input: Syriens Pr¨sident Baschar al-Assad hat den Westen davor
                     a
   gewarnt, sich in die Angelegenheiten seines Landes einzumischen.

   Output: Syrian President Bashar al-Assad has warned the West against
   interfering in the affairs of his country.



                                                                          15/25
Examples
                           What is NLP?
                            Background
                              NLP tasks
                          Why is it hard?
                           Related Stuff
                             Conclusion



Why is it hard? ( example from L.Lee)




       “At last, a computer that understands you like your mother”




                                                                     16/25
Examples
                               What is NLP?
                                Background
                                  NLP tasks
                              Why is it hard?
                               Related Stuff
                                 Conclusion



Ambiguity of Natural Language



          “At last, a computer that understands you like your mother”

   This could mean...
     1   It understands you as well as your mother understands you.
     2   It understands (that) you like your mother.
     3   It understands you as well as it understands your mother.
   1 and 3: Does this mean well, or poorly?




                                                                        17/25
Examples
                               What is NLP?
                                Background
                                  NLP tasks
                              Why is it hard?
                               Related Stuff
                                 Conclusion



Ambiguity at the Acoustic Level




          “At last, a computer that understands you like your mother”

   This sounds like...
     1   “... a computer that understands you like your mother.”
     2   “... a computer that understands you lie cured mother.”




                                                                        18/25
Examples
                           What is NLP?
                            Background
                              NLP tasks
                          Why is it hard?
                           Related Stuff
                             Conclusion



Ambiguity at the Syntactic (structure) Level



       “At last, a computer that understands you like your mother”




                                                                     19/25
Examples
                        What is NLP?
                         Background
                           NLP tasks
                       Why is it hard?
                        Related Stuff
                          Conclusion



Ambiguity at the Syntactic (structure) Level
                   “List all flights on Tuesday.”




                                                   20/25
Examples
                                What is NLP?
                                 Background
                                   NLP tasks
                               Why is it hard?
                                Related Stuff
                                  Conclusion



Ambiguity at the Semantic (meaning) Level


   Definition of “mother”
     1   a woman who has given birth to a child
     2   a stringy slimy substance consisting of yeast cells and bacteria; is
         added to cider or wine to produce vinegar.

   More ambiguity
         They put money in the bank (= buried in mud?).
         I saw her duck with a telescope (= a duck carrying a telescope?).




                                                                                21/25
Examples
                              What is NLP?
                               Background
                                 NLP tasks
                             Why is it hard?
                              Related Stuff
                                Conclusion



Ambiguity at the Discourse (multi-clause) Level



   Anaphora resolution
   Alice says they’ve built a computer that understands you like your
   mother.
   But she ...
       ... doesn’t know any details (Alice)
       ... doesn’t understand me at all (my mother)




                                                                        22/25
Examples
                               What is NLP?
                                Background
                                  NLP tasks
                              Why is it hard?
                               Related Stuff
                                 Conclusion



Related Stuff

   Machine Learning
        This really made large-scale, open domain NLP applications possible.

   Information Retrieval
        Both need to “understand” language.

   Linguistics
        Interested in the nature of language.

   Psychology / Cognitive Science
        Both interested in human cognitive capabilities.


                                                                               23/25
Examples
                              What is NLP?
                               Background
                                 NLP tasks
                             Why is it hard?
                              Related Stuff
                                Conclusion



Conclusion


   What I have told you...
       What NLP is about.
       Some NLP tasks that people work on.
       Why it’s not that easy.

   What I haven’t told you
       How do you solve all these problems?
       How well does it work?
       What is left to be done?



                                               24/25
Examples
                            What is NLP?
                             Background
                               NLP tasks
                           Why is it hard?
                            Related Stuff
                              Conclusion



Would you like to know more?

   NLP courses at NUS
      CS4248: natural language processing
       CS6207: advanced natural language processing

   Books




   Jurafsky and Martin, Speech and Language Processing (2nd Edition)


                                                                       25/25

CSTalks-Natural Language Processing-17Aug

  • 1.
    Natural Language Processing Daniel Dahlmeier NUS Graduate School for Integrative Sciences and Engineering [email protected] CSTalks 2 November 2011
  • 2.
    Acknowledgments Examplesand figures from Michael Collins’ lecture notes: http://www.cs.columbia.edu/∼mcollins. Some other figures are from Wikipedia: http://www.wikipedia.org. The rest I randomly found on the web.
  • 3.
    Examples What is NLP? Background NLP tasks Why is it hard? Related Stuff Conclusion Google translate 3/25
  • 4.
    Examples What is NLP? Background NLP tasks Why is it hard? Related Stuff Conclusion IBM’s Watson computer wins at Jeopardy! 4/25
  • 5.
    Examples What is NLP? Background NLP tasks Why is it hard? Related Stuff Conclusion Siri 5/25
  • 6.
    Examples What is NLP? Background NLP tasks Why is it hard? Related Stuff Conclusion What is Natural Language Processing? Natural Language Processing (NLP) or Computational Linguistics Language processing that goes beyond a “bag of words” representation. Example Translate from one language into the other. Answer natural language questions. Parse the syntactic/semantic structure of a sentence. The other NLP NLP = neuro-linguistic programming. 6/25
  • 7.
    Examples What is NLP? Background NLP tasks Why is it hard? Related Stuff Conclusion Background(s): Artificial Intelligence Talk to your computer Dave: Hello, HAL. Do you read me, HAL? HAL: Affirmative, Dave. I read you. Dave: Open the pod bay doors, HAL. HAL: I’m sorry, Dave. I’m afraid I can’t do that. The computer needs to ... Understand the user : Natural Language Understanding. Generate a well-formed reply : Natural Language Generation. 7/25
  • 8.
    Examples What is NLP? Background NLP tasks Why is it hard? Related Stuff Conclusion Background(s): Artificial Intelligence (cont.) Turing Test Experimenter talks to two parties A and B via a terminal. If C cannot distinguish which party is a computer and which is a human, we should consider the computer to be intelligent. Natural language is deeply intertwined with intelligence. 8/25
  • 9.
    Examples What is NLP? Background NLP tasks Why is it hard? Related Stuff Conclusion Background(s): Linguistics Generative Linguistics Humans can produce and understand an infinite number of sentences by means of a finite set of rules. Language is produced through a generative, recursive process in the human brain. The principles that underlie this process are universal to all languages (universal grammar). 9/25
  • 10.
    Examples What is NLP? Background NLP tasks Why is it hard? Related Stuff Conclusion Background(s): the Web “We are drowning in information but starved for knowledge.” by Edward Osborne Wilson Too much text to read... Wikipedia: over 3.7 million articles (English). PubMed: over 20 million citations. WWW: billions of pages, trillions of words. 10/25
  • 11.
    Examples What is NLP? Background NLP tasks Why is it hard? Related Stuff Conclusion Part-of-speech Tagging Part-of-speech tagging Input: a sentence. Output: a part-of-speech tag sequence, e.g., noun, verb, adjective,... Example Profits/N soared/V at/P Boeing/N Co./N ,/, easily/ADV topping/V forecasts/N on/P Wall/N Street/N ./. 11/25
  • 12.
    Examples What is NLP? Background NLP tasks Why is it hard? Related Stuff Conclusion Named-entity recognition Named-entity recognition Input: a sentence. Output: a BIO-named entity tag sequence, e.g., PERSON, ORGANIZATION, OTHER. Example Profits/O soared/O at/O Boeing/B-ORG Co./I-ORG ,/O easily/O topping/O forecasts/O on/O Wall/O Street/O ./O 12/25
  • 13.
    Examples What is NLP? Background NLP tasks Why is it hard? Related Stuff Conclusion Word Sense Disambiguation Word sense disambiguation Input: a sentence. Output: the sense of each word in the sentence. Example I/sense1 can/sense1 can/sense2 a/sense1 can sense3 . 13/25
  • 14.
    Examples What is NLP? Background NLP tasks Why is it hard? Related Stuff Conclusion Parsing Parsing Input: a sentence. Output: the syntactic tree structure of the sentence. Example Boeing is located in Seattle. 14/25
  • 15.
    Examples What is NLP? Background NLP tasks Why is it hard? Related Stuff Conclusion Machine translation Machine Translation Input: a sentence in language F . Output: the translated sentence in language E . Example Input: Syriens Pr¨sident Baschar al-Assad hat den Westen davor a gewarnt, sich in die Angelegenheiten seines Landes einzumischen. Output: Syrian President Bashar al-Assad has warned the West against interfering in the affairs of his country. 15/25
  • 16.
    Examples What is NLP? Background NLP tasks Why is it hard? Related Stuff Conclusion Why is it hard? ( example from L.Lee) “At last, a computer that understands you like your mother” 16/25
  • 17.
    Examples What is NLP? Background NLP tasks Why is it hard? Related Stuff Conclusion Ambiguity of Natural Language “At last, a computer that understands you like your mother” This could mean... 1 It understands you as well as your mother understands you. 2 It understands (that) you like your mother. 3 It understands you as well as it understands your mother. 1 and 3: Does this mean well, or poorly? 17/25
  • 18.
    Examples What is NLP? Background NLP tasks Why is it hard? Related Stuff Conclusion Ambiguity at the Acoustic Level “At last, a computer that understands you like your mother” This sounds like... 1 “... a computer that understands you like your mother.” 2 “... a computer that understands you lie cured mother.” 18/25
  • 19.
    Examples What is NLP? Background NLP tasks Why is it hard? Related Stuff Conclusion Ambiguity at the Syntactic (structure) Level “At last, a computer that understands you like your mother” 19/25
  • 20.
    Examples What is NLP? Background NLP tasks Why is it hard? Related Stuff Conclusion Ambiguity at the Syntactic (structure) Level “List all flights on Tuesday.” 20/25
  • 21.
    Examples What is NLP? Background NLP tasks Why is it hard? Related Stuff Conclusion Ambiguity at the Semantic (meaning) Level Definition of “mother” 1 a woman who has given birth to a child 2 a stringy slimy substance consisting of yeast cells and bacteria; is added to cider or wine to produce vinegar. More ambiguity They put money in the bank (= buried in mud?). I saw her duck with a telescope (= a duck carrying a telescope?). 21/25
  • 22.
    Examples What is NLP? Background NLP tasks Why is it hard? Related Stuff Conclusion Ambiguity at the Discourse (multi-clause) Level Anaphora resolution Alice says they’ve built a computer that understands you like your mother. But she ... ... doesn’t know any details (Alice) ... doesn’t understand me at all (my mother) 22/25
  • 23.
    Examples What is NLP? Background NLP tasks Why is it hard? Related Stuff Conclusion Related Stuff Machine Learning This really made large-scale, open domain NLP applications possible. Information Retrieval Both need to “understand” language. Linguistics Interested in the nature of language. Psychology / Cognitive Science Both interested in human cognitive capabilities. 23/25
  • 24.
    Examples What is NLP? Background NLP tasks Why is it hard? Related Stuff Conclusion Conclusion What I have told you... What NLP is about. Some NLP tasks that people work on. Why it’s not that easy. What I haven’t told you How do you solve all these problems? How well does it work? What is left to be done? 24/25
  • 25.
    Examples What is NLP? Background NLP tasks Why is it hard? Related Stuff Conclusion Would you like to know more? NLP courses at NUS CS4248: natural language processing CS6207: advanced natural language processing Books Jurafsky and Martin, Speech and Language Processing (2nd Edition) 25/25