0% found this document useful (0 votes)
58 views25 pages

Lec1 Intro-2

The document serves as an introduction to a Computer & Linguistics course, outlining the syllabus, evaluation methods, and key topics including Natural Language Processing (NLP) and Computational Linguistics (CL). It emphasizes the importance of NLP in analyzing unstructured data and facilitating human-computer interaction. The course will cover various aspects of NLP, including text processing techniques and applications, along with relevant textbooks and resources.

Uploaded by

이정호
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views25 pages

Lec1 Intro-2

The document serves as an introduction to a Computer & Linguistics course, outlining the syllabus, evaluation methods, and key topics including Natural Language Processing (NLP) and Computational Linguistics (CL). It emphasizes the importance of NLP in analyzing unstructured data and facilitating human-computer interaction. The course will cover various aspects of NLP, including text processing techniques and applications, along with relevant textbooks and resources.

Uploaded by

이정호
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

2025-03-04

CLASS INTRODUCTION
Lecture 1: Introduction I

Computer & Linguistics


Jee Eun Kim
Mar. 05, 2025

Outline Lecturer
▪ Class Introduction ▪ Jee Eun Kim
▪ Introduction to NLP – Contact Info
• e-mail (preferred)
▪ Brief Introduction to AI
[email protected]
• Telephone
– 02-2173-3110 (O)
• Office
– Faculty Building #312
• Office Hour
– Upon Request/Appointment
» Through e-mail: preferred

03/05/2025 Computer & Linguistics 2025 2 03/05/2025 Computer & Linguistics 2025 4

1
2025-03-04

Books Syllabus
▪ Textbook ▪ Mar. 05
– Speech and Language Processing (SLP) – Introduction
• Daniel Jurafsky & James H. Martin. Prentice Hall. ▪ Mar. 12
– Relevant chapters are uploaded to e-class
– Introduction to NLTK
» 2nd ed. 2008. (Published)
– Introduction to Text Processing
» 3rd ed. 2025. (Online draft version)
▪ Mar. 19 ~ Apr. 02
▪ Reference – Text Processing I-1
– Language and Computers • Regular Expression
• Markus Dickinson, Chris Brew & Detmar Meuirers. Wiley- • Finite-State Automata
Blackwell, 2012.
▪ Apr. 09 & 16
– Text Processing I-2
• Morphology & Subword Tokenization
03/05/2025 Computer & Linguistics 2025 5 03/05/2025 Computer & Linguistics 2025 7

e-class Syllabus (cont’d)

▪ Computer & English Linguistics ▪ Apr. 23


▪ Folders – Midterm
– Lecture Notes • using e-class in the classroom
• Lecture notes are to be uploaded before class ▪ Apr. 30 & May 07
– Assignments
– Text Processing II-1
• Homework to be uploaded
• Context-Free Grammars (CFG)
▪ File format for class notes • Dependency Grammars
– pdf format: to read/view the file ▪ May 14 & 21
• Go to http://get.adobe.com/kr/reader/
– Text Processing II-2
• Install Adobe Reader
• Constituency Parsing I

03/05/2025 Computer & Linguistics 2025 6 03/05/2025 Computer & Linguistics 2025 8

2
2025-03-04

Syllabus (cont’d)

▪ May 28
– Text Processing II-2 INTRODUCTION TO NLP
• Constituency Parsing II
– Student Presentations on the final assignment I
▪ Jun. 04
– Student Presentations on the final assignment II
▪ Jun. 11
– Make-up week: No class
▪ Jun. 18
– Final Exam (using e-class in the classroom)

03/05/2025 Computer & Linguistics 2025 9

Evaluation Names of the Field


▪ Attendance ▪ Computational Linguistics (CL)
– 10%
▪ Assignments ▪ Natural Language Processing (NLP)
– 20% (final assignment: 10%)
▪ Human Language Technology (HLT)
▪ Midterm Exam
– 35%
▪ Language Engineering (LE)
▪ Final Exam
– 35% ▪ Natural Language Understanding (NLU)
▪ Extra Credit
– Can be earned by giving a presentation on the ▪ etc.
problems of the assignments (Except the final one)
03/05/2025 Computer & Linguistics 2025 10 03/05/2025 Computer & Linguistics 2025 12

3
2025-03-04

Related Conferences CL vs. NLP (cont’d)

▪ ACL: Association for Computational Linguistics ▪ Distinctions (cont’d)

▪ ANLP: Applied NLP – Methods


▪ COLING: COmputational LINGuistics • CL relies on a combination of linguistics, mathematics,
▪ EACL: European chapter of ACL and computer science to develop algorithms and models
that can analyze and generate language
▪ LREC: Language Resources & Evaluation Conference
• NLP typically relies on machine learning (ML) techniques,
▪ NAACL HLT: North American Chapter of the such as deep learning (DL) and statistical modeling, to
Association for Computational Linguistics analyze and generate natural language
- Human Language Technologies
▪ SIGDAT: Special Interest Group for Linguistic DATa and ▪ Both fields play an important role in the
Corpus-based Approaches to Natural development of AI systems that can
Language Processing understand and interact with humans using
▪ TREC: Text REtrieval Conference natural language
03/05/2025 Computer & Linguistics 2025 13 03/05/2025 Computer & Linguistics 2025 15

CL vs. NLP CL vs. NLP (cont’d)

▪ Often used interchangeably in the field of AI ▪ CL


– They share some common goals and methods – A field of linguistics
▪ Distinctions • Focusing on the study of language from a computational
perspective
– Approaches to analyzing & processing human language
• Involving the development of algorithms and models
• CL is primarily concerned with the scientific study of that enable computers to analyze and process human
language from a computational perspective language at various levels, such as phonetics, syntax,
• NLP is primarily concerned with the practical application semantics, and pragmatics
of language processing technology • Used in a wide range of applications, from machine
– Level of abstraction translation to speech recognition
• CL focuses on the theoretical and mathematical
foundations of language processing
• NLP focuses on the practical implementation of algorithms
and models that can process natural language
03/05/2025 Computer & Linguistics 2025 14 03/05/2025 Computer & Linguistics 2025 16

4
2025-03-04

CL vs. NLP (cont’d) Computational Linguistics (cont’d)

▪ NLP Logic
Philosophy
– a field of AI Language and Psychology
Linguistics
• Focusing on the interaction between humans and
computers using natural language
NLP /
• Involving the development of algorithms and models Computational Lx
that enable computers to understand, interpret, and Artificial
generate human language Phonetics
Intelligence

• Used in a wide range of applications, from virtual


assistants to sentiment analysis Signal
Processing HCI
Electrical Computer
Engineering Language Science
Engineering

03/05/2025 Computer & Linguistics 2025 17 03/05/2025 Computer & Linguistics 2025 19

Computational Linguistics Computational Linguistics (cont’d)

▪ Interdisciplinary science ▪ What is the ultimate goal of the CL field?


– Requiring knowledge on language, cognition and (repeat)
computation
– Building a computer system that could understand and
– Concerned with the computational aspects of the produce human language as well as humans can
human language faculty
▪ A subfield of cognitive science and overlaps ▪ Why to use computers and not humans to
with the field of Artificial Intelligence (AI) process or produce language
– Aiming at creating computational models of human – Humans are often unavailable, too slow, too expensive
cognition or too busy doing tasks they do better than machines
▪ Growing because of the world becoming
“information society”
– A very active field because it is closely connected to
the development of human language technology
03/05/2025 Computer & Linguistics 2025 18 03/05/2025 Computer & Linguistics 2025 20

5
2025-03-04

Computational Linguistics (cont’d) Computational Linguistics (cont’d)

▪ When and where to use computers to process ▪ Definition: Summing up


or produce language – Computational linguistics is the scientific study of
– Finding relevant documents in large collections of text language from a computational perspective
• Information Retrieval – Computational linguists are interested in providing
– Translating from one language to another computational models of various kinds of linguistic
• Machine Translation
phenomena
• These models may be
– Answering questions about a subject area
1. KNOWLEDGE-BASED (hand-crafted) or
• Expert Systems with Natural Language Interfaces
2. DATA-DRIVEN (statistical or empirical)
– Helping humans in learning languages spoken by other
humans
• Computer-Assisted Language Learning

03/05/2025 Computer & Linguistics 2025 21 03/05/2025 Computer & Linguistics 2025 23

Computational Linguistics (cont’d) What Is NLP?


▪ How to get computers to process or produce lg ▪ Natural Language Processing (NLP)
– Simulation
– A more specific term referring to the sub-field of
• Build a psychologically valid working model of human thinking
that includes language understanding and use computer science that deals with methods
• Then, use computer to implement the model in the efficient to analyze, model, and understand human language,
manner possible in comparison with CL
• This proves to be a vast and very far from solved problem,
currently broken into many subproblems addressed in ▪ A branch of AI
Cognitive Science, Artificial Intelligence, Biology, Psychology – Dealing with the interaction between computers and
and Linguistics
humans using the natural language (NL)
– Emulation
• Doesn't need human brain understanding or any resemblance ▪ A branch of data science
of how it works – Consisting of systematic processes for analyzing,
• Attempts to do as well as a human on tasks involving language understanding, and deriving information from the text
• Still an enormous task that needs to handle reasoning,
understand how the language works, and provide an data in a smart and efficient manner
encyclopedic knowledge
03/05/2025 Computer & Linguistics 2025 22 03/05/2025 Computer & Linguistics 2025 24

6
2025-03-04

What is NLP? (cont’d) What is NLP? (cont’d)

▪ The ultimate objective of NLP ▪ The fusion of human cognition with AI, neural
– To read, decipher, understand, and make sense of the networks, and data flow
human languages in a manner that is valuable
▪ NLP techniques
– Mostly rely on machine learning to derive meaning
from human languages

03/05/2025 Computer & Linguistics 2025 25 03/05/2025 Computer & Linguistics 2025 27

What is NLP? (cont’d) What is NLP? (cont’d)

▪ A subset of AI ▪ Interaction among humans and AI-driven


– Finding growing importance due to the increasing chatbots, virtual assistants, & real-time text
amount of unstructured language data processing machines using NLP
– The rapid growth of social media and digital data
creates significant challenges in analyzing vast user
data to generate insights
– Interactive automation systems such as chatbots are
unable to fully replace humans due to their lack of
understanding of semantics and context
▪ To tackle these issues, NL models are utilizing
advanced ML to better understand
unstructured voice and text data
03/05/2025 Computer & Linguistics 2025 26 03/05/2025 Computer & Linguistics 2025 28

7
2025-03-04

What is NLP? (cont’d) What is NLP? (cont’d)

▪ NLP tasks and language blocks

▪ NLP applications

03/05/2025 Computer & Linguistics 2025 29 03/05/2025 Computer & Linguistics 2025 31

What is NLP? (cont’d) What is NLP? (cont’d)

▪ In an AI Tree ▪ Levels of difficulties in NLP tasks

03/05/2025 Computer & Linguistics 2025 30 03/05/2025 Computer & Linguistics 2025 32

8
2025-03-04

What is NLP? (cont’d) Why is NLP Important? (cont’d)

▪ Fundamentals In NLP ▪ Despite having high dimension data, the


– Unveiling the captivating world where computers and information present in it is not directly
human language converge accessible unless it is processed (read and
• Empowering machines to comprehend, analyze & generate understood) manually or analyzed by an
natural language, the very essence of our communication
automated system

➢ In order to produce significant and actionable


insights from text data, it is important to get
acquainted with the techniques and principles
of NLP

03/05/2025 Computer & Linguistics 2025 33 03/05/2025 Computer & Linguistics 2025 35

Why is NLP Important? Why is NLP Important? (cont’d)

▪ According to industry estimates (as of Feb. ▪ Why pursue NLP? (sum-up)


2025), only 21% of the available data is present – More than 80% of the data in this world is unstructured
in structured form in nature, which includes text
– Data is being generated as we speak, as we tweet, as • Requiring text mining & NLP to make sense out of this
we send messages on Whatsapp and in various other data
activities
– NLP helps you extract insights from emails of customers,
– Majority of this data exists in the textual form, which is
their tweets, text messages
highly unstructured in nature
– Few notorious examples include – tweets / posts on – NLP can power many applications
social media, user to user chat conversations, news, • Language translation
blogs and articles, product or services reviews and • Question answering systems
patient records in the healthcare sector • Chatbots
• A few more recent ones includes chatbots and other • Document summarizers
voice driven bots
• etc.
03/05/2025 Computer & Linguistics 2025 34 03/05/2025 Computer & Linguistics 2025 36

9
2025-03-04

Why is NLP Important? (cont’d) Why is NLP Important? (cont’d)

▪ 5 important reasons to pursue NLP ▪ Interface between man and machine


1. Facilitating Communication
• NLP enables seamless interaction between humans and
computers, powering chatbots, virtual assistants, and
machine translation systems
2. Extracting Meaningful Information
• NLP helps extract insights from unstructured text data,
including sentiment analysis, named entity recognition,
and text summarization
3. Deriving Insights
• NLP algorithms analyze textual data to derive patterns
and insights, valuable for tasks like market research,
social media analysis, and customer feedback analysis

03/05/2025 Computer & Linguistics 2025 37 03/05/2025 Computer & Linguistics 2025 39

Why is NLP Important? (cont’d) Why is NLP Important? (cont’d)

▪ 5 important reasons to pursue NLP (cont’d) ▪ Information Extraction and Understanding


4. Automating Tasks
• NLP automates language-related tasks such as answering
queries, categorizing documents, and generating reports,
enhancing efficiency and accuracy
5. Personalizing Experiences
• NLP enables personalized recommendations, content
filtering, and targeted advertising by understanding user
preferences and behaviors from their language usage

03/05/2025 Computer & Linguistics 2025 38 03/05/2025 Computer & Linguistics 2025 40

10
2025-03-04

Why is NLP Important? (cont’d) NLP in The CS Taxonomy


▪ Improved Convenience and Productivity Computers

Databases Artificial Intelligence Algorithms Networking

Robotics Natural Language Processing Search

Information Machine Language


Retrieval Translation Analysis

Semantics Parsing

03/05/2025 Computer & Linguistics 2025 41 03/05/2025 Computer & Linguistics 2025 43

Why is NLP Difficult? Approaches to NLP


▪ Multidisciplinary ▪ Rule-based NLP
– Linguistics – aka Heuristics-based or knowledge-based NLP
• How words, phrases, and sentences are formed.
– Relying on predefined rules and patterns to process
– Psycholinguistics
• How people understand and communicate using human natural language
language • The rules are typically crafted by linguists or domain
– Philosophy experts based on linguistic principles and knowledge
• Relates to the semantics of language; notation of meaning.
NLP requires considerable knowledge about the world – Involving designing algorithms that encode linguistic
– Computer Science rules to perform tasks
• Deals with model formation and implementation
• POS tagging
– Mathematics and Statistics
• Deals with probabilities, statistical distribution and hypothesis • NER (Named Entity Recognition)
testing of language phenomena
• Syntactic parsing
– Artificial Intelligence
• Relates to knowledge representation and reasoning • etc.
03/05/2025 Computer & Linguistics 2025 42 03/05/2025 Computer & Linguistics 2025 44

11
2025-03-04

Approaches to NLP Approaches to NLP (cont’d)

▪ Rule-based NLP (cont’d) ▪ Statistical NLP


– The Pros – Leveraging statistical models and algorithms
• Producing accurate linguistic descriptions of language • To learn patterns and structures from large amounts of
• Effective for handling specific linguistic phenomena/tasks annotated text data
– Including describing idiosyncratic linguistic phenomena – Trained on labeled datasets
– The Cons • To automatically extract features and make predictions
• Requiring extensive manual effort to design/maintain the for various NLP tasks
rules – Seeking to solve the acquisition bottleneck
– Producing a knowledge acquisition bottleneck • By automatically learning preferences from corpora
• Struggling with handling the complexity & variability of NL – e.g., lexical or syntactic preferences
– Performing poorly on naturally occurring text
» Often too strict to characterize people’s use of language
» People tend to stretch and bend rules in order to meet their
communicative needs
03/05/2025 Computer & Linguistics 2025 45 03/05/2025 Computer & Linguistics 2025 47

Approaches to NLP Approaches to NLP (cont’d)

▪ Rule-based NLP (cont’d) ▪ Statistical NLP


– Examples – Compared to rule-based NLP
• Dictionary-based sentiment analysis • Claimed to be more flexible & robust
• WordNet for lexical relations – Behaving gracefully in the presence of errors and new data
– Dealing with “real” data
• Common sense world knowledge (Ontology)
• Handling more complex linguistic phenomena
• Regular Expressions
– Generalizing well
• Context-free grammar, etc.
• Requiring significant amounts of labeled data for training
– Rules based on domain-specific knowledge can and may be struggling with handling ambiguity and
efficiently reduce the mistakes that are sometimes very linguistic variability
expensive – The size of data becomes enormous

03/05/2025 Computer & Linguistics 2025 46 03/05/2025 Computer & Linguistics 2025 48

12
2025-03-04

Approaches to NLP (cont’d) Approaches to NLP (cont’d)

▪ Statistical NLP (cont’d) ▪ Statistical NLP (cont’d)

1. (Traditional) Machine Learning (ML) 2. Deep Learning (DL) (cont’d)

• Three common steps • Similar to "traditional" ML, but with a few differences
① Extracting features from texts – Feature engineering is generally skipped
» Feature engineering » Networks "learn" important features, which is One of the
• Word type, surrounding words, capitalized, plural, etc. claimed big benefits of using NNs for NLP
② Using the feature representation to train a model – Streams of raw parameters ("words” - actually vector
» Training data: a corpus with markup representations of words) without engineered features, are
fed into NNs
» Training a model on parameters, followed by fitting on
test data • Challenges
③ Evaluating and refining the model – Requiring the substantial computational resources
» Inference (applying model to test data) » Including very large training corpus
• Characterized by finding most probable words, next word, – The ongoing need to address biases in large training
best category, etc. datasets

03/05/2025 Computer & Linguistics 2025 49 03/05/2025 Computer & Linguistics 2025 51

Approaches to NLP (cont’d) Approaches to NLP (cont’d)

▪ Statistical NLP (cont’d) ▪ Traditional Machine Learning (ML)


2. Deep Learning (DL)
• Using Neural Networks (NNs) architectures with multiple
layers, to model and process NL data
• Able to learn hierarchical representations of text data
• Automatically learning intricate patterns & dependencies
in the data, making them highly effective for tasks ▪ Deep Learning (DL)
– language modeling, sequence-to-sequence generation,
contextual word embeddings, etc.
• Having led to significant advancements in NLP and
continuing to be an active area of research in the field
– machine translation, text summarization, question
answering, etc. ▪ Traditional ML is still used in many services
03/05/2025 Computer & Linguistics 2025 50 03/05/2025 Computer & Linguistics 2025 52

13
2025-03-04

Approaches to NLP (cont’d) Techniques Used in NLP


▪ Syntax
– Syntax refers to the arrangement of words in a
sentence such that they make grammatical sense
– In NLP, syntactic analysis is used to assess how the
natural language aligns with the grammatical rules
– Computer algorithms are used to apply grammatical
rules to a group of words and derive meaning from
them

03/05/2025 Computer & Linguistics 2025 53 03/05/2025 Computer & Linguistics 2025 55

Approaches to NLP (cont’d) Techniques Used in NLP (cont’d)

▪ Hybird approach ▪ Syntax (cont’d)

– Utilizing advantages of each approach – Techniques for syntactic analysis


• The rule-based precision • Word segmentation / Tokenization
• Machine learning’s adaptability – It involves dividing a large piece of continuous text into
distinct units
• The transformative power of neural networks
• Stemming
– Each technique has its role in deciphering the
– It involves cutting the inflected words to their root form
complexities of language
• Morphological segmentation
• Offering a spectrum of solutions for various NLP tasks
– It involves dividing words into individual units called
• Staying tuned to emerging trends in the fascinating morphemes
realm of NLP
• Lemmatization
– It entails reducing the various inflected forms of a word
into a single form (typically a dictionary entry) for easy
analysis
03/05/2025 Computer & Linguistics 2025 54 03/05/2025 Computer & Linguistics 2025 56

14
2025-03-04

Techniques Used in NLP (cont’d) Techniques Used in NLP (cont’d)

▪ Techniques for syntactic analysis (cont’d) ▪ Semantics (cont’d)

– Techniques for syntactic analysis (cont’d) – Techniques for semantic analysis


• Sentence breaking • Named entity recognition (NER)
– It involves placing sentence boundaries on a large piece of – It involves determining the parts of a text that can be
text identified and categorized into preset groups
• Part-of-speech tagging – Examples of such groups include names of people and
– It involves identifying the part of speech for every word names of places
• Parsing • Word sense disambiguation
– It involves undertaking grammatical analysis for the – It involves giving meaning to a word based on the context
provided sentence • Natural Language Generation (NLU)
– It involves using databases to derive semantic intentions
and convert them into human language

03/05/2025 Computer & Linguistics 2025 57 03/05/2025 Computer & Linguistics 2025 59

Techniques Used in NLP (cont’d) NLP Levels


▪ Semantics
– It refers to the meaning that is conveyed by a text
– Semantic analysis is one of the difficult aspects of NLP
that has not been fully resolved yet
– It involves applying computer algorithms to
understand the meaning and interpretation of words
and how sentences are structured

03/05/2025 Computer & Linguistics 2025 58 03/05/2025 Computer & Linguistics 2025 60

15
2025-03-04

NLP Tasks Multimodal NLP


▪ Emergence of Multimodal NLP
– By integrating multiple data modalities
– The availability of diverse data types catalyzed this shift
from unimodal to multimodal NLP
• Data types: images, audio, and videos
– Researchers realized that combining these modalities
could significantly enhance language understanding
and context
▪ A growing field of study which is expected to
become increasingly significant as more data
becomes available across multiple modalities
03/05/2025 Computer & Linguistics 2025 61 03/05/2025 Computer & Linguistics 2025 63

Applications of CL/NLP NLP in Industry … Has Taken Off


▪ Search (written and spoken)
▪ Online advertisement matching
▪ Automated/assisted translation
▪ Sentiment analysis for marketing of
finance/trading
▪ Speech recognition
▪ Chatbots / dialog agents
– Automating customer support
– Controlling devices
– Ordering goods

03/05/2025 Computer & Linguistics 2025 62 03/05/2025 Computer & Linguistics 2025 64

16
2025-03-04

NLP Software: Present NLP Software: Present (cont’d)

▪ IBM Watson ▪ IBM (cont’d)


– Question Answering – AI
• Quiz show Jeopardy contender: 2011 • Project Debater
– https://www.youtube.com/watch?v=P18EdAKuC1U – https://www.research.ibm.com/artificial-intelligence/project-
– https://www.youtube.com/watch?v=_Xcmh1LQB9I debater/
– https://fortune.com/2020/03/11/ibm-debate-a-i-watson/

03/05/2025 Computer & Linguistics 2025 65 03/05/2025 Computer & Linguistics 2025 67

NLP Software: Present (cont’d) NLP Software: Present (cont’d)

▪ IBM ▪ Google
– NLP – Natural Language Processing
• https://natural-language-classifier- • https://cloud.google.com/natural-language
demo.ng.bluemix.net/?cm_mc_uid=866573452701152157 – Entity Recognition
99647&cm_mc_sid_50200000=67404251549376437907&c
m_mc_sid_52640000=20772841549376437918 – Sentiment Analysis
– Syntactic Parsing
– Watson Assistant: Chatbot
– Text Categorization
• https://www.ibm.com/cloud/watson-assistant/
– Watson Speech to Text (STT) – Machine Translation
• https://www.ibm.com/cloud/watson-speech-to-text • Google Translate
– Watson Tone Analyzer – https://cloud.google.com/translate/#how-automl-
translationbeta-works
• https://www.ibm.com/cloud/watson-tone-analyzer
– Watson Knowledge Studio
• https://www.ibm.com/cloud/watson-knowledge-studio
03/05/2025 Computer & Linguistics 2025 66 03/05/2025 Computer & Linguistics 2025 68

17
2025-03-04

NLP Software: Present (cont’d) NLP Software: Present (cont’d)

▪ Google (cont’d) ▪ OpenAI


– SyntaxNet – ChatGPT
• https://research.googleblog.com/2016/05/announcing- • A chatbot which interacts in a conversational way
syntaxnet-worlds-most.html – Lacking knowledge of anything that happened after
– Gemini (Bard) September 2021
• An ai-powered chatbot tool to simulate human • 3.5: No 30, 2022
conversations using NLP & ML – Its dialogue format
– Powered by the Gemini large language model (LLM) » Answering follow-up questions
» Admitting its mistakes
• Can be integrated into websites, messaging platforms or
» Challenging incorrect premises
applications to provide realistic, natural language
» Rejecting inappropriate requests
responses to user questions, in addition to
– It can only give the direction we must take to solve the
supplementing google search
problem, rather than providing the actual solution
• https://gemini.google.com/app – Failing to understand the subtle nuances
03/05/2025 Computer & Linguistics 2025 69 03/05/2025 Computer & Linguistics 2025 71

NLP Software: Present (cont’d) NLP Software: Present (cont’d)

▪ Naver ▪ OpenAI (cont’d)

– Machine Translation – ChatGPT


• Papago • 4.0: March 14, 2023
– https://papago.naver.com/?sk=ko&tk=en – Multimodal: it can process images
– AI Platform w/ Line » Accepting both text and image prompts
– More processing power
• Clova
» Can solve various problems and equations in subjects as
– https://clova.ai/en/research/research-area-detail.html?id=0 varied as calculus, geometry and algebra
– Much more nuanced
» Can produce improved poems or essays with much better
coherence and creativity
» Can now retain up to 25,000 words of chats for context from
chats, while 3.5 has a mere 3,000 words
– Accuracy improvement
» More accurate, less prone to “hallucinations”
03/05/2025 Computer & Linguistics 2025 70 03/05/2025 Computer & Linguistics 2025 72

18
2025-03-04

NLP Software: Present (cont’d) NLP Trends For 2025


▪ OpenAI (cont’d) ▪ By leveraging NLP, we not only break the
– ChatGPT language barrier but also push machines to an
– GPT-4o: May 2024 extent where they can understand the intent
• Omni behind the query without much of an
• Upgrades: explanation
– Spoken conversation/reaction time
– Creation of pictures, videos and images
– https://www.shaip.com/blog/nlp-trends-2025/
– OpenAI(GPT) o1: Dec. 2024
• Recorded IQ 120 1. Real-Time Language Translation
– Surpassing the human average IQ of 100 for the first time
– Based on current advancements in
• Greatly improved ability to logically reason through complex
tasks NLP, these models can achieve up
– OpenAI(GPT) o3: Feb. 2025 to 98% accuracy when translating
• o3 is the first model to reach AGI spoken and written languages
– Artificial General Intelligence:
– State-of-the-art reasoning model
03/05/2025 Computer & Linguistics 2025 73 03/05/2025 Computer & Linguistics 2025 75

Recent Trends in NLP NLP Trends For 2025 (cont’d)

▪ NLP has heavily benefited from recent 2. Deep Learning Models for Specialized Tasks
advances in machine learning, especially from – We are witnessing Transformers models like GPT-4 and
deep learning techniques BERT are achieving excellent accuracy and in 2025
• They will surely reach new dynamics of possibilities
– Speech Processing (SP)
– These models can now handle niche tasks like drafting
• The translation of spoken language into text legal contracts & analyzing medical records of patients
• The translation of text into spoken language with close to human-like precision
– Natural Language Understanding (NLU) – When fine-tuned, you can customize them for
• The computer's ability to understand what we say industries like finance and law
– Natural Language Generation (NLG)
• The generation of natural language by a computer

03/05/2025 Computer & Linguistics 2025 74 03/05/2025 Computer & Linguistics 2025 76

19
2025-03-04

NLP Trends For 2025 (cont’d) NLP Trends For 2025 (cont’d)

3. Better Emotional Intelligence 6. Ethical AI will be Prioritized more than Ever


– Modern AI models now go beyond merely identifying – As NLP becomes more and more powerful, it will raise
positive or negative sentiments concerns about biases and privacy
– They can detect a wide range of emotions such as – It will eventually raise concerns as models trained over
anger, joy, frustration, and more biased data will discriminate in hiring and lending
– This capability allows for a deeper understanding of – To solve this, we might witness the formation of
human interactions multiple regulatory authorities to mandate transparency,
forcing companies to disclose training data sources

03/05/2025 Computer & Linguistics 2025 77 03/05/2025 Computer & Linguistics 2025 79

NLP Trends For 2025 (cont’d) NLP Trends For 2025 (cont’d)

4. Better Healthcare 7. E-Commerce gets Personalized


– Hospitals with NLP can extract data – Companies would be able to use
from unstructured sources like NLP
clinical notes and medical reports • to analyze browsing patterns &
• Doctors can identify patterns in • to provide tailored recommendations
patients’ clinical history, predict diseases, and suggest to the user
treatments • Using semantic search and
5. Conversational AI Gets Even Better personalized suggestions
– Apple integrated ChatGPT into Siri 8. The Age of Hybrid AI Systems
– Google integrated Gemini to – NLP once matured enough
Google Assistant • Will be integrated into computer
– These chatbots will be capable vision applications
enough to distinguish between – such as automated medical report
generation and real-time image
sarcasm and genuine requests captioning
03/05/2025 Computer & Linguistics 2025 78 03/05/2025 Computer & Linguistics 2025 80

20
2025-03-04

NLP Trends For 2025 (cont’d) Artificial Intelligence


9. Multilingual Model Support ▪ The term Created by John McCarthy in 1956
– As of now, NLP systems can handle 300+ languages ▪ Science and engineering to create intelligent
and with initiatives like Google’s Universal Speech machines
Model (USM), the aim is to cover 1,000 languages
– The branch of computer science that develops
• Currently, USM supports 400+ languages including some
machines and software with human-like intelligence
low-resource languages like Amharic and Assamese,
enhancing accessibility in regions like Africa and South – The field as "the study and design of intelligent agents“,
Asia where an intelligent agent is a system that perceives its
environment and takes actions that maximize its
chances of success

03/05/2025 Computer & Linguistics 2025 81 03/05/2025 Computer & Linguistics 2025 83

INTRODUCTION TO AI

03/05/2025 Computer & Linguistics 2025 84

21
2025-03-04

Imitation Game… Timeline of AI


▪ Movie? ▪ https://verloop.io/blog/the-timeline-of-
▪ Fiction? artificial-intelligence-from-the-1940s/
▪ Nonfiction? ▪ https://www.theverge.com/2016/3/11/1120807
▪ Who is he? 8/lee-se-dol-go-google-kasparov-jennings-ai
• https://www.deepmind.com/research/highlighted-
research/alphago

• DeepMind
– https://deepmind.com/

Computer & Linguistics 2025


▪ https://aiartists.org/ai-timeline-art
Computer & Linguistics 2025
03/05/2025 85 03/05/2025 87

Turing Test Timeline of AI (cont’d)

▪ Created by Alan Turing in 1950 ▪ The Timeline of AI – From the 1940s


– https://verloop.io/blog/the-timeline-of-artificial-intelligence-from-the-
▪ Can machines think? 1940s/

▪ Are there imaginable digital ▪ OpenAI


computers which would do – 2022: ChatGPT 3.5
well in the imitation game? – 2023: ChatGPT 4.0
– 2024: GPT-4o, GPT o1
▪ It suggested to accept the
– 2025: GPT o3
proposition a computer can
think ▪ Google
– 2023: Bard (Gemini)
– If computer’s response cannot be
distinguished from that of human ▪ Microsoft
– 2023: Copilot
03/05/2025 Computer & Linguistics 2025 86 03/05/2025 Computer & Linguistics 2025 88

22
2025-03-04

AI, ML, & DL

03/05/2025 Computer & Linguistics 2025 89 Computer & Linguistics 2025


03/05/2025 91

Hot Issues Regarding Software AI, ML & DL (cont’d)

▪ Artificial Intelligence (AI)

▪ Machine Learning (ML)

▪ Deep Learning (DL)

03/05/2025 Computer & Linguistics 2025 90 03/05/2025 Computer & Linguistics 2025 92

23
2025-03-04

AI, ML & DL (cont’d) AI, ML, DL & DS (cont’d)

▪ AI
– Enables the machine to think
▪ ML
– Use statistical tools to explore and analyze the data
• Supervised Learning
• Unsupervised Learning (Clustering)
• Reinforcement Learning
– Semi-supervised Learning
▪ DL
– Multi Neural Network Architecture
• ANN (Artificial Neural Network)
• CNN (Convolutional Neural Network): Transfer Learning
• RNN (Recurrent Neural Network)
03/05/2025 Computer & Linguistics 2025 93 03/05/2025 Computer & Linguistics 2025 95

AI, ML, DL & DS Artificial Intelligence Again


▪ DS
– Data Science
• Statistics
• Probability
• Linear Algebra

Classic Deep
Rule-based Learning
Machine
System Representation Learning
Learning

03/05/2025 Computer & Linguistics 2025 94 03/05/2025 Computer & Linguistics 2025 96

24
2025-03-04

Machine Learning Deep Learning (cont’d)

▪ Subfield of Artificial Intelligence ▪ Deep Learning =


▪ Does a machine learn by itself?
– Lots of Training Data
– Systems that can learn from data +
▪ What is it? – Parallel Computation
+
– Programming computers to optimize a performance
criterion for some task using example data or past – Scalable, Smart Algorithms
experience
– The ability to learn without being explicitly programmed ▪ Deep Learning =
▪ Why learning? – Large volume of Big Data + Increased Computing Power
– No known exact method
Prototyping Human Brain and cognition process
• Vision, speech, recognition, robotics, spam filters, etc.
– Exact method too expensive
• Statistical physics
03/05/2025 Computer & Linguistics 2025 97 03/05/2025 Computer & Linguistics 2025 99

Deep Learning What’s Next?


▪ A subfield of Machine Learning ▪ Text Processing I-1
– Machine learning is a subfield of AI
▪ A set of algorithms
– Let the machine learn in the way human learns

▪ “Deep Learning waves have lapped at the


shores of computational linguistics for several
years now, but 2015 seems like the year when
the full force of the tsunami hit the major
Natural Language Processing (NLP)
conferences.”

03/05/2025 Computer & Linguistics 2025 98 03/05/2025 Computer & Linguistics 2025 100

25

You might also like