Distributional semantics is a research area that uses statistical analysis of linguistic contexts to develop theories and methods for determining the semantic similarities between words and linguistic items based on their distributional properties in large text corpora. It is based on the distributional hypothesis that words with similar distributions have similar meanings. Distributional semantic models represent words as vectors in a high-dimensional semantic space based on their co-occurrence with other words, allowing semantic similarity to be measured using vector similarity methods. Common distributional semantic models include term frequency-inverse document frequency (tf-idf), latent semantic analysis (LSA), latent Dirichlet allocation (LDA), and word embeddings.
Related topics: