SlideShare a Scribd company logo
3
Most read
8
Most read
9
Most read
Distributional Semantics
Rabindra Nath Nandi
Clustering maps of biomedical articles
Distributional Semantics
Distributional semantics is a research area that develops and studies theories
and methods for quantifying and categorizing semantic similarities between
linguistic items based on their distributional properties in large samples of
language data.
Distributional hypothesis: linguistic items with similar distributions have similar
meaning
a word is characterized by the company it keeps" was popularized by Firth
Distributional Semantics
❖ Depends on Statistical Semantics
statistical semantics applies the methods of statistics to the problem of
determining the meaning of words or phrases, ideally through unsupervised
learning, to a degree of precision at least sufficient for the purpose of information
retrieval.
Theory
Distributional semantics favor the use of linear algebra as computational tool and
representational framework.
The basic approach is to collect distributional information in high-dimensional
vectors, and to define distributional/semantic similarity in terms of vector similarity.
Basics Theory of NLP
Corpus: A corpus is a large body of natural language text used for accumulating
statistics on natural language text.
Lexicons: A collection of information about the words of a language about the
lexical categories to which they belong. A lexicon is usually structured as a
collection of lexical entries, like ("pig" N V ADJ). "pig" is familiar as a N, but also
occurs as a verb ("Jane pigged herself on pizza") and an adjective, in the phrase
"pig iron".
Distributional semantic models (DSMs)
❖ Idea of using corpus-based statistics to extract information about semantic
properties of words and other linguistic units is extremely common in
computational linguistics.
❖ 1) Word-Doc Distribution ,2) Topic-Doc Distribution 3) Word distribution is a
semantic Space.
Usecase:
Information retrieval , document clustering, document quick understanding
Distributional Semantics Models
❖ Term frequency–Inverse document frequency(tf-idf)
❖ Latent Semantic Analysis(LSA)
❖ Latent Dirichlet Allocation (LDA)
❖ WordEmbedding (word2vec)
Distributional Semantics Models
❖ Term frequency–Inverse document frequency(tf-idf)
Term Frequency: The number of times a term occurs in a document is called its
term frequency.
Inverse document frequency: An inverse document frequency factor is
incorporated which diminishes the weight of terms that occur very frequently in the
document set and increases the weight of terms that occur rarely.
Distributional Semantics Models
the idf of a rare term is high, whereas the idf of a frequent term is likely to be low.
Distributional Semantics Models
Distributional Semantics Models
❖ Latent Semantic Analysis(LSA)
- Dimensionality Reduction
- Finding latent relationship between words and documents
- Words and documents are sorted by their relationship
Distributional Semantics Models
❖ Latent Semantic Analysis(LSA)
Dimensionality Reduction
Reduce the target-word-by-context matrix to a lower dimensionality matrix (a
matrix with less – linearly independent – columns/dimensions).
Two main reasons: 1)Smoothing: capture “latent dimensions” that generalize over
sparser surface dimensions (Singular Value Decomposition or SVD)
2)Efficiency/space: sometimes the matrix is so large that you don’t even want to
construct it explicitly (Random Indexing)
Distributional Semantics Models
❖ Latent Semantic Analysis(LSA)
Where the animation video !!
Distributional Semantics Models
Distributional Semantics Models
Ranking System Design:
Query => {terms}
Docs={term-document,term-concepts}
Query on Docs => Finding relevant docs.
Thank you

More Related Content

PPTX
Introduction to Distributional Semantics
PPTX
Introduction to linguistics
PDF
Syntactic analysis in NLP
PDF
Computational linguistics
PPTX
Computational linguistics
PPTX
PPTX
Morphology
PDF
Nlp ambiguity presentation
Introduction to Distributional Semantics
Introduction to linguistics
Syntactic analysis in NLP
Computational linguistics
Computational linguistics
Morphology
Nlp ambiguity presentation

What's hot (20)

PPTX
Phrase Structure Grammar
PPTX
NLP_KASHK:Parsing with Context-Free Grammar
PDF
Semantic Role Labeling
PDF
Lecture: Word Sense Disambiguation
PPTX
Machine Tanslation
PPTX
Corpus linguistics the basics
PPTX
Generative grammar
PPSX
Semantic analysis
PPTX
Semantics
PPT
Semantics and pragmatics
PDF
Introduction to Phonology
PPTX
Morphology-Syntax Interface
PPTX
Computational linguistics
PPTX
Collocation
PPTX
PPT
Lecture 04 syntax analysis
DOC
Latest Development On Phonetics And Phonology
PDF
Semantics and Computational Semantics
PPT
European linguistics in the 20th century
Phrase Structure Grammar
NLP_KASHK:Parsing with Context-Free Grammar
Semantic Role Labeling
Lecture: Word Sense Disambiguation
Machine Tanslation
Corpus linguistics the basics
Generative grammar
Semantic analysis
Semantics
Semantics and pragmatics
Introduction to Phonology
Morphology-Syntax Interface
Computational linguistics
Collocation
Lecture 04 syntax analysis
Latest Development On Phonetics And Phonology
Semantics and Computational Semantics
European linguistics in the 20th century
Ad

Similar to Distributional semantics (20)

PPT
IR CHAPTER_TWO Most important for students
PPTX
Corpus study design
PDF
Natural Language Processing
PDF
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
PDF
IJNLC 2013 - Ambiguity-Aware Document Similarity
PDF
AMBIGUITY-AWARE DOCUMENT SIMILARITY
PDF
Chapter 2 Text Operation and Term Weighting.pdf
PPT
What can a corpus tell us about discourse
PPTX
1 l5eng
PDF
Chapter 2: Text Operation in information stroage and retrieval
PDF
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
PDF
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
PDF
DDH 2021-03-03: Text Processing and Searching in the Medical Domain
PDF
Ny3424442448
PDF
Application Of Linguistic Cues In The Analysis Of Language Of Hate Groups
PDF
NLP Lecture on the preprocessing approaches
PDF
A comparative analysis of particle swarm optimization and k means algorithm f...
PDF
Automatize Document Topic And Subtopic Detection With Support Of A Corpus
PPTX
Sentence Processing by Muhammad Saleem.pptx
PPTX
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
IR CHAPTER_TWO Most important for students
Corpus study design
Natural Language Processing
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
IJNLC 2013 - Ambiguity-Aware Document Similarity
AMBIGUITY-AWARE DOCUMENT SIMILARITY
Chapter 2 Text Operation and Term Weighting.pdf
What can a corpus tell us about discourse
1 l5eng
Chapter 2: Text Operation in information stroage and retrieval
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
DDH 2021-03-03: Text Processing and Searching in the Medical Domain
Ny3424442448
Application Of Linguistic Cues In The Analysis Of Language Of Hate Groups
NLP Lecture on the preprocessing approaches
A comparative analysis of particle swarm optimization and k means algorithm f...
Automatize Document Topic And Subtopic Detection With Support Of A Corpus
Sentence Processing by Muhammad Saleem.pptx
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
Ad

Recently uploaded (20)

PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
Updated Idioms and Phrasal Verbs in English subject
PPTX
Cell Structure & Organelles in detailed.
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Trump Administration's workforce development strategy
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
master seminar digital applications in india
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PPTX
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PDF
01-Introduction-to-Information-Management.pdf
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PPTX
Lesson notes of climatology university.
Final Presentation General Medicine 03-08-2024.pptx
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Updated Idioms and Phrasal Verbs in English subject
Cell Structure & Organelles in detailed.
2.FourierTransform-ShortQuestionswithAnswers.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Orientation - ARALprogram of Deped to the Parents.pptx
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Trump Administration's workforce development strategy
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
master seminar digital applications in india
Paper A Mock Exam 9_ Attempt review.pdf.
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Final Presentation General Medicine 03-08-2024.pptx
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
01-Introduction-to-Information-Management.pdf
LDMMIA Reiki Yoga Finals Review Spring Summer
Lesson notes of climatology university.

Distributional semantics

  • 2. Clustering maps of biomedical articles
  • 3. Distributional Semantics Distributional semantics is a research area that develops and studies theories and methods for quantifying and categorizing semantic similarities between linguistic items based on their distributional properties in large samples of language data. Distributional hypothesis: linguistic items with similar distributions have similar meaning a word is characterized by the company it keeps" was popularized by Firth
  • 4. Distributional Semantics ❖ Depends on Statistical Semantics statistical semantics applies the methods of statistics to the problem of determining the meaning of words or phrases, ideally through unsupervised learning, to a degree of precision at least sufficient for the purpose of information retrieval.
  • 5. Theory Distributional semantics favor the use of linear algebra as computational tool and representational framework. The basic approach is to collect distributional information in high-dimensional vectors, and to define distributional/semantic similarity in terms of vector similarity.
  • 6. Basics Theory of NLP Corpus: A corpus is a large body of natural language text used for accumulating statistics on natural language text. Lexicons: A collection of information about the words of a language about the lexical categories to which they belong. A lexicon is usually structured as a collection of lexical entries, like ("pig" N V ADJ). "pig" is familiar as a N, but also occurs as a verb ("Jane pigged herself on pizza") and an adjective, in the phrase "pig iron".
  • 7. Distributional semantic models (DSMs) ❖ Idea of using corpus-based statistics to extract information about semantic properties of words and other linguistic units is extremely common in computational linguistics. ❖ 1) Word-Doc Distribution ,2) Topic-Doc Distribution 3) Word distribution is a semantic Space. Usecase: Information retrieval , document clustering, document quick understanding
  • 8. Distributional Semantics Models ❖ Term frequency–Inverse document frequency(tf-idf) ❖ Latent Semantic Analysis(LSA) ❖ Latent Dirichlet Allocation (LDA) ❖ WordEmbedding (word2vec)
  • 9. Distributional Semantics Models ❖ Term frequency–Inverse document frequency(tf-idf) Term Frequency: The number of times a term occurs in a document is called its term frequency. Inverse document frequency: An inverse document frequency factor is incorporated which diminishes the weight of terms that occur very frequently in the document set and increases the weight of terms that occur rarely.
  • 10. Distributional Semantics Models the idf of a rare term is high, whereas the idf of a frequent term is likely to be low.
  • 12. Distributional Semantics Models ❖ Latent Semantic Analysis(LSA) - Dimensionality Reduction - Finding latent relationship between words and documents - Words and documents are sorted by their relationship
  • 13. Distributional Semantics Models ❖ Latent Semantic Analysis(LSA) Dimensionality Reduction Reduce the target-word-by-context matrix to a lower dimensionality matrix (a matrix with less – linearly independent – columns/dimensions). Two main reasons: 1)Smoothing: capture “latent dimensions” that generalize over sparser surface dimensions (Singular Value Decomposition or SVD) 2)Efficiency/space: sometimes the matrix is so large that you don’t even want to construct it explicitly (Random Indexing)
  • 14. Distributional Semantics Models ❖ Latent Semantic Analysis(LSA) Where the animation video !!
  • 16. Distributional Semantics Models Ranking System Design: Query => {terms} Docs={term-document,term-concepts} Query on Docs => Finding relevant docs.