0% found this document useful (0 votes)

553 views5 pages

Unsupervised Learning Hyperlex

Uploaded by

Kranti Gajmal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

553 views5 pages

Unsupervised Learning Hyperlex

Uploaded by

Kranti Gajmal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Unsupervised-learning approaches HyperLex

The HyperLex algorithm is an unsupervised approach for Word Sense

Disambiguation (WSD) that operates without a predefined sense inventory or
labelled data. Instead, it induces word senses directly from a large text corpus by
leveraging graph theory and the "small-world" property of word co-occurrence
networks.

Core principles
The central idea behind HyperLex is that different senses of a word tend to co-occur
with different sets of related words. These co-occurrences form distinct, highly
interconnected clusters within a larger co-occurrence graph.
 Small-world graphs: Word co-occurrence graphs exhibit "small-world" properties,
meaning that while the overall graph is very large, any node can be reached from
any other node via a short path.

 Highly connected components: Within this small-world graph, the different senses
of an ambiguous word appear as tightly interconnected "bundles" of co-occurring
words, also known as high-density components.

 Hubs: The most central and highly connected words within these high-density
components are called "hubs." These hubs act as prototypes for each distinct word
sense.

The HyperLex process

1. Corpus selection: First, a sub-corpus is extracted containing all paragraphs or
sentences where the target ambiguous word appears.

2. Graph construction: A co-occurrence graph is built for the target word using this
sub-corpus.

1. Nodes: The nodes of the graph represent the content words (nouns, verbs,
adjectives) that co-occur with the target word within the context window (e.g., a
paragraph).

2. Edges: An edge is drawn between two words if they co-occur in the same
paragraph. The weight of the edge is typically inversely proportional to the frequency
of co-occurrence, indicating a stronger relationship for less frequent but more
specific pairings.

3. Hub detection: The algorithm identifies the high-density, highly-connected

components within the graph. The most central nodes within these components are
designated as "hubs". These hubs represent the prototypes for each of the target
word's senses.

4. Disambiguation: To disambiguate a new instance of the target word, its context

words are compared to the known hubs. The sense associated with the closest hub
(or the most similar hub-component) is assigned to the word.
Strengths and weaknesses

Strengths Weaknesses

No labeled data required: Because it Parameter sensitivity: The algorithm's

is an unsupervised method, HyperLex performance is heavily influenced by a set of
does not require any sense-tagged heuristic parameters, such as the context window
training data. size and minimum co-occurrence frequency.

Corpus-based sense induction: The Limited granularity: While it is effective at

senses are induced directly from the distinguishing between coarse-grained,
corpus, making them specific to the polysemous uses, HyperLex may struggle to
domain of the text and more flexible differentiate between very fine-grained word
than fixed-sense inventories from senses.
lexicons like WordNet.

Handles rare senses: It is capable of Heuristic limitations: The core algorithm for
isolating and identifying infrequent word detecting hubs and high-density components in
uses by detecting hubs and high- large graphs is an NP-hard problem, so HyperLex
density components, something that relies on approximate algorithms and heuristics.
earlier word-vector methods struggled
with.

Effective for information Complexity for interpretation: Since it doesn't

retrieval: HyperLex was originally use a predefined sense inventory, the "senses" or
developed for information retrieval and "uses" discovered by HyperLex are simply
showed excellent performance in clusters of co-occurring words (hubs). Mapping
identifying relevant contexts for these induced senses to standard, human-
ambiguous query words. understandable senses requires a separate step.

Example
For a practical example of the HyperLex algorithm, let's consider the ambiguous
word "bank" using a large, raw text corpus. The algorithm will induce its different
senses without any prior knowledge or human labeling.

Step 1: Sub-corpus extraction

 First, we collect all paragraphs or sentences from a large corpus where the word
"bank" appears.

 Example paragraphs:

o Context 1: "He walked along the river bank and watched the boats sail by."
o Context 2: "She deposited her savings at the local bank."

o Context 3: "The company took a loan from the investment bank."

o Context 4: "Birds nested in the mud bank after the flood receded."

Step 2: Co-occurrence graph construction

 A co-occurrence graph is built using the content words found in the sub-corpus,
excluding the target word "bank.".

 Nodes: The nodes of the graph would be content words

like: river , boats , sail , savings , deposited , local , company , loan , inves
tment , birds , nested , mud , flood .

 Edges: Edges connect words that co-occur within the same paragraph. The weight
of the edge indicates the strength of the relationship. Stronger, less frequent co-
occurrences are given more weight.

o Edges in Sense A (river): (river, boats) , (river, sail) , (boats, sail) .

o Edges in Sense B (financial): (deposited, savings) , (deposited,

bank) , (savings, bank) .

o Edges in Sense C (mud): (birds, nested) , (birds, mud) , (nested, flood) .

Step 3: Hub detection

 The algorithm analyzes the graph to find high-density components or clusters. The
most central, highly connected nodes within these clusters are the "hubs."

 Root Hubs detected:

o Hub for Sense A (river): river

o Hub for Sense B (financial): deposited

o Hub for Sense C (mud): mud

o The induced senses would be represented by these hubs and their surrounding
clusters of co-occurring words.

Step 4: Disambiguation
 To disambiguate a new sentence, its context words are compared to the induced
hubs. The sentence is assigned the sense corresponding to the closest-matching
hub.

 New Sentence: "The investor secured a loan from the bank."

 Disambiguation process:

o The context words are investor , secured , loan .

o The algorithm compares these words with the induced hubs: river , deposited ,
and mud .

o The words loan and investor show a strong co-occurrence relation with
the deposited hub, representing the financial sense.

 Result: The algorithm assigns the financial institution sense to "bank" in this
sentence because its context aligns with the deposited hub cluster.

Step 5: Interpretation (manual)

Since HyperLex operates without a dictionary, a human would need to inspect the
discovered "senses" and their corresponding hubs to understand their meaning.

 Sense A (Hub: river ): This cluster of words ( river , boats , sail , water )
corresponds to the "river bank" sense.

 Sense B (Hub: deposited ): This cluster

( deposited , savings , loan , investment ) corresponds to the "financial
institution" sense.

 Sense C (Hub: mud ): This cluster ( mud , birds , nested , flood ) corresponds to
the less frequent "mud bank" or "sand bank" sense. This demonstrates HyperLex's
ability to find infrequent senses.

NLP Unit Test 2
No ratings yet
NLP Unit Test 2
10 pages
Semi Supervised Yorsky Algorithm
No ratings yet
Semi Supervised Yorsky Algorithm
5 pages
Hobbs and Cantering Algorithm
No ratings yet
Hobbs and Cantering Algorithm
16 pages
NLP Qna Sem 7 2024 18 11 05 03 29 1
No ratings yet
NLP Qna Sem 7 2024 18 11 05 03 29 1
37 pages
M6 QA Univ Sol
No ratings yet
M6 QA Univ Sol
19 pages
2024 May DL
No ratings yet
2024 May DL
1 page
Open and Closed Class Words in NLP
No ratings yet
Open and Closed Class Words in NLP
2 pages
BCT Techknowledge Want All Subjects Notes Pls
No ratings yet
BCT Techknowledge Want All Subjects Notes Pls
193 pages
NLP Chapter 2
No ratings yet
NLP Chapter 2
10 pages
Mis Techknowledge
No ratings yet
Mis Techknowledge
128 pages
SPCC - Tech Max PDF
No ratings yet
SPCC - Tech Max PDF
408 pages
SMA Techneo (2022-23)
No ratings yet
SMA Techneo (2022-23)
173 pages
Supervised Learning Naive Biased Algorithm in NLP
No ratings yet
Supervised Learning Naive Biased Algorithm in NLP
7 pages
Mobile IP and GSM Network Overview
No ratings yet
Mobile IP and GSM Network Overview
45 pages
Blockchain Tech Knowledge
No ratings yet
Blockchain Tech Knowledge
94 pages
Unit - 3 - DL
No ratings yet
Unit - 3 - DL
15 pages
Compiler Design Handwritten Notes PDF
No ratings yet
Compiler Design Handwritten Notes PDF
93 pages
Dbms Unit 4 Unit Notes
No ratings yet
Dbms Unit 4 Unit Notes
11 pages
Artificial-Intelligence-Toppers Solution Book
No ratings yet
Artificial-Intelligence-Toppers Solution Book
108 pages
BDA Topper Solutions for Mumbai University
No ratings yet
BDA Topper Solutions for Mumbai University
71 pages
Variants of Convolution Functions in CNN
No ratings yet
Variants of Convolution Functions in CNN
7 pages
1.what Is Mixed Language Programming?
100% (1)
1.what Is Mixed Language Programming?
2 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
105 pages
Bda Chapter 1 Techneo
No ratings yet
Bda Chapter 1 Techneo
27 pages
COA Module - 4 (Handwritten Notes)
100% (1)
COA Module - 4 (Handwritten Notes)
18 pages
DSA With Python Handwritten Notes
100% (1)
DSA With Python Handwritten Notes
81 pages
BDA VIVA Question Answers
No ratings yet
BDA VIVA Question Answers
21 pages
Unit 5 NLP
No ratings yet
Unit 5 NLP
6 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
43 pages
Digital Image Processing Note
No ratings yet
Digital Image Processing Note
108 pages
High Performance Computing (HPC) (80-Marks Sem8 - 2022)
No ratings yet
High Performance Computing (HPC) (80-Marks Sem8 - 2022)
6 pages
Techmax Data Structures
No ratings yet
Techmax Data Structures
66 pages
Rich Internet Applications (Rias) : Characteristics of Ria
No ratings yet
Rich Internet Applications (Rias) : Characteristics of Ria
24 pages
Digital Image Processing Unit 1 Notes
No ratings yet
Digital Image Processing Unit 1 Notes
38 pages
Efficient Convolution Algorithms
100% (1)
Efficient Convolution Algorithms
13 pages
MP Toppers Solution
No ratings yet
MP Toppers Solution
98 pages
Session 13 AO Memory Bounded Heuristic Search Heuristic Functions
No ratings yet
Session 13 AO Memory Bounded Heuristic Search Heuristic Functions
22 pages
SPCC or Compiler Design LMT Notes
67% (3)
SPCC or Compiler Design LMT Notes
281 pages
Tangent Prop and Manifold Tangent Classifier Are B
No ratings yet
Tangent Prop and Manifold Tangent Classifier Are B
4 pages
AKTU Notes Machine Learning (ROE083) Unit-1 - UPTU Notes PDF
50% (2)
AKTU Notes Machine Learning (ROE083) Unit-1 - UPTU Notes PDF
66 pages
AI Practical TYCS
No ratings yet
AI Practical TYCS
31 pages
Multi-Traffic Scene Perception Based On Supervised Learning
78% (9)
Multi-Traffic Scene Perception Based On Supervised Learning
26 pages
Projects 2025 - Mtech
No ratings yet
Projects 2025 - Mtech
9 pages
Sample Solutions Unit Test 1 For Set A, B, C and D
No ratings yet
Sample Solutions Unit Test 1 For Set A, B, C and D
33 pages
ML Handwritten Notes
No ratings yet
ML Handwritten Notes
35 pages
Css Techmax Compressed
No ratings yet
Css Techmax Compressed
392 pages
FTR vs Walkthrough in Software Engineering
No ratings yet
FTR vs Walkthrough in Software Engineering
1 page
DC Techmax Everything You Need To Prepare For Distributed Computing Subject in The Final
No ratings yet
DC Techmax Everything You Need To Prepare For Distributed Computing Subject in The Final
193 pages
Neural Network Unit 1 Handwritten Notes
No ratings yet
Neural Network Unit 1 Handwritten Notes
30 pages
Distributed Computing Tech Knowledge
100% (1)
Distributed Computing Tech Knowledge
149 pages
CD - Sem 7 - GTU - Study Material - 15112016 - 100740AM PDF
50% (2)
CD - Sem 7 - GTU - Study Material - 15112016 - 100740AM PDF
100 pages
Mini ProjectA17
0% (1)
Mini ProjectA17
25 pages
Module 1 ML Mumbai University
No ratings yet
Module 1 ML Mumbai University
47 pages
Chap4-Loaders-And-Linkers
100% (1)
Chap4-Loaders-And-Linkers
25 pages
CHAPTER - 4 Transaction Flow Testing
100% (2)
CHAPTER - 4 Transaction Flow Testing
3 pages
DWM Text Book Textbook of DWM
No ratings yet
DWM Text Book Textbook of DWM
169 pages
Mobile Computing Notes Topper Solution
No ratings yet
Mobile Computing Notes Topper Solution
112 pages
CSL TechKnowledge Searchable
50% (2)
CSL TechKnowledge Searchable
83 pages
NLP 4.4 Hyperlex Disambiguation
No ratings yet
NLP 4.4 Hyperlex Disambiguation
10 pages
Semantic Parsing
No ratings yet
Semantic Parsing
79 pages
Lab Manual Distributed Systems - 01
No ratings yet
Lab Manual Distributed Systems - 01
38 pages
Balanced Parenthesis Checker
No ratings yet
Balanced Parenthesis Checker
2 pages
Types of Queues
No ratings yet
Types of Queues
23 pages
Dec 2023 Data Structure Solution Mumbai University
No ratings yet
Dec 2023 Data Structure Solution Mumbai University
20 pages
Hemanth Project Documentation
No ratings yet
Hemanth Project Documentation
68 pages
B.Tech Exam Timetable May 2024
No ratings yet
B.Tech Exam Timetable May 2024
3 pages
SM150001 Service Manual
No ratings yet
SM150001 Service Manual
34 pages
Visualizador de Celdas de Cargas Mas Utilizado
No ratings yet
Visualizador de Celdas de Cargas Mas Utilizado
2 pages
Experimental and Numerical Investigation of Aerodynamic Forces On Various Models of Cars
No ratings yet
Experimental and Numerical Investigation of Aerodynamic Forces On Various Models of Cars
34 pages
OrangePi Zero3 H618 User Manual v1.3
No ratings yet
OrangePi Zero3 H618 User Manual v1.3
321 pages
User'S Guide: HP Deskjet 420 Series Printer
No ratings yet
User'S Guide: HP Deskjet 420 Series Printer
24 pages
Perspective View
No ratings yet
Perspective View
14 pages
Development of A Material Databook For Api STD 530
100% (1)
Development of A Material Databook For Api STD 530
10 pages
CS Discs Msys51 Maguyon - A H 2023 2
No ratings yet
CS Discs Msys51 Maguyon - A H 2023 2
6 pages
PMP Cheat Sheet
No ratings yet
PMP Cheat Sheet
2 pages
2-Parallel FIR Filter, 2-Parallel Fast FIR Filter
No ratings yet
2-Parallel FIR Filter, 2-Parallel Fast FIR Filter
7 pages
The Use of Computer in Civil Engineering
No ratings yet
The Use of Computer in Civil Engineering
2 pages
Bandwidth Allocation and Power Control Optimization For Multi-UAVs Enabled 6G Network
No ratings yet
Bandwidth Allocation and Power Control Optimization For Multi-UAVs Enabled 6G Network
11 pages
2023 02 Exam
No ratings yet
2023 02 Exam
5 pages
Elmeasure 8400
50% (2)
Elmeasure 8400
40 pages
Master of Engineering (M. Eng.) Program in Electrical Engineering
No ratings yet
Master of Engineering (M. Eng.) Program in Electrical Engineering
5 pages
SN 74 HCT 244
No ratings yet
SN 74 HCT 244
22 pages
En FM-Eco4 User Manual
No ratings yet
En FM-Eco4 User Manual
34 pages
Lufthansa Travel Itinerary
No ratings yet
Lufthansa Travel Itinerary
4 pages
Cyber Duck UX Handbook Second Edition
100% (1)
Cyber Duck UX Handbook Second Edition
116 pages
Data-Driven Aerospace Engineering With ML
No ratings yet
Data-Driven Aerospace Engineering With ML
28 pages
Aras Innovator 12.0 - Enterprise Search Installation Guide
No ratings yet
Aras Innovator 12.0 - Enterprise Search Installation Guide
39 pages
Result PDF
No ratings yet
Result PDF
1 page
Aloe Vera Cosmetics Social Media Strategy by Slidesgo
No ratings yet
Aloe Vera Cosmetics Social Media Strategy by Slidesgo
40 pages
Verizon Bill August 15 2020
100% (1)
Verizon Bill August 15 2020
13 pages
RRU3268 Hardware Description (10) (PDF) - EN
100% (1)
RRU3268 Hardware Description (10) (PDF) - EN
41 pages
CLASS 10 PRACTICAL FILE-format
100% (1)
CLASS 10 PRACTICAL FILE-format
31 pages
Fronius Symo Hybrid Inverter Specs
No ratings yet
Fronius Symo Hybrid Inverter Specs
2 pages
SAP Startup Troubleshooting Guide For Netweaver Application Server
No ratings yet
SAP Startup Troubleshooting Guide For Netweaver Application Server
16 pages

Unsupervised Learning Hyperlex

Uploaded by

Unsupervised Learning Hyperlex

Uploaded by

Unsupervised-learning approaches HyperLex

The HyperLex algorithm is an unsupervised approach for Word Sense

The HyperLex process

3. Hub detection: The algorithm identifies the high-density, highly-connected

4. Disambiguation: To disambiguate a new instance of the target word, its context

No labeled data required: Because it Parameter sensitivity: The algorithm's

Corpus-based sense induction: The Limited granularity: While it is effective at

Effective for information Complexity for interpretation: Since it doesn't

Step 1: Sub-corpus extraction

o Context 3: "The company took a loan from the investment bank."

Step 2: Co-occurrence graph construction

 Nodes: The nodes of the graph would be content words

o Edges in Sense A (river): (river, boats) , (river, sail) , (boats, sail) .

o Edges in Sense B (financial): (deposited, savings) , (deposited,

o Edges in Sense C (mud): (birds, nested) , (birds, mud) , (nested, flood) .

Step 3: Hub detection

 Root Hubs detected:

o Hub for Sense A (river): river

o Hub for Sense B (financial): deposited

o Hub for Sense C (mud): mud

 New Sentence: "The investor secured a loan from the bank."

o The context words are investor , secured , loan .

Step 5: Interpretation (manual)

 Sense B (Hub: deposited ): This cluster

You might also like