Word Net

WordNet is a semantic lexicon for English that organizes words into synsets and describes their relationships, particularly hypernyms. The document outlines the structure and requirements for implementing a WordNet digraph, including data file formats and performance specifications for methods like distance and sca. It also discusses the concept of semantic relatedness, outcast detection, and provides guidelines for implementing related data types with specific APIs.

Uploaded by

ftinews.al

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views8 pages

Word Net

Uploaded by

ftinews.al

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

WORDNET

WordNet is a semantic lexicon for the English language that computational linguists and
cognitive scientists use extensively.
For example, WordNet was a key component in IBM’s Jeopardy-playing Watson computer
system. WordNet groups words into sets of synonyms called synsets.
For example, { AND circuit, AND gate } is a synset that represent a logical gate that fires only
when all of its inputs fire. WordNet also describes semantic relationships between synsets.
One such relationship is the is-a relationship, which connects a hyponym (more specific
synset) to a hypernym (more general synset). For example, the synset { gate, logic gate } is a
hypernym of { AND circuit, AND gate } because an AND gate is a kind of logic gate.

The WordNet digraph.

Your first task is to build the WordNet digraph: each vertex v is an integer that represents a
synset, and each directed edge v→w represents that w is a hypernym of v. The WordNet
digraph is a rooted DAG: it is acyclic and has one vertex—the root—that is an ancestor of
every other vertex. However, it is not necessarily a tree because a synset can have more than
one hypernym.
Here is a small subgraph of the WordNet digraph:
The WordNet input file formats.
We now describe the two data files that you will use to create the WordNet digraph. The files
are in comma-separated values (CSV) format: each line contains a sequence of fields,
separated by commas.
 List of synsets. The file [Link] contains all noun synsets in WordNet, one per line.
Line i of the file (counting from 0) contains the information for synset i. The first field
is the synset id, which is always the integer i; the second field is the synonym set (or
synset); and the third field is its dictionary definition (or gloss), which is not relevant
to this assignment.

For example, line 36 means that the synset { AND_circuit, AND_gate } has an id
number of 36 and its gloss is a circuit in a computer that fires only when all of its
inputs fire. The individual nouns that constitute a synset are separated by spaces. If a
noun contains more than one word, the underscore character connects the words (and
not the space character).
 List of hypernyms. The file [Link] contains the hypernym relationships. Line i
of the file (counting from 0) contains the hypernyms of synset i. The first field is the
synset id, which is always the integer i; subsequent fields are the id numbers of the
synset’s hypernyms.

For example, line 36 means that synset 36 (AND_circuit AND_Gate) has 43273 (gate
logic_gate) as its only hypernym. Line 34 means that synset 34 (AIDS
acquired_immune_deficiency_syndrome) has two hypernyms: 48504
(immunodeficiency) and 49019 (infectious_disease).

WordNet data type.

Implement an immutable data type WordNet with the following API:

Corner cases. Throw an IllegalArgumentException in the following situations:

 Any argument to the constructor or an instance method is null
 Any of the noun arguments in distance() or sca() is not a WordNet noun.
You may assume that the input files are in the specified format and that the underlying
digraph is a rooted DAG.
Unit testing. Your main() method must call each public constructor and method directly and
help verify that they work as prescribed (e.g., by printing results to standard output).
Performance requirements. Your implementation must achieve the following performance
requirements. In the requirements below, assume that the number of characters in a noun or
synset is bounded by a constant.
 Your data type must use space linear in the input size (size of synsets and hypernyms
files).
 The constructor must take time linearithmic (or better) in the input size.
 The method isNoun() must run in time logarithmic (or better) in the number of nouns.
 The methods distance() and sca() must make exactly one call to the lengthSubset()
and ancestorSubset() methods in ShortestCommonAncestor, respectively.
Shortest common ancestor.
An ancestral path between two vertices v and w in a rooted DAG is a directed path from v to a
common ancestor x, together with a directed path from w to the same ancestor x. A shortest
ancestral path is an ancestral path of minimum total length. We refer to the common ancestor
in a shortest ancestral path as a shortest common ancestor. Note that a shortest common
ancestor always exists because the root is an ancestor of every vertex. Note also that an
ancestral path is a path, but not a directed path.

We generalize the notion of shortest common ancestor to subsets of vertices. A shortest ancestral
path of two subsets of vertices A and B is a shortest ancestral path among all pairs of vertices
v and w, with v in A and w in B. As an example, the following figure ([Link])
identifies several (but not all) ancestral paths between the red and blue vertices, including the
shortest one.
Shortest common ancestor data type.
Implement an immutable data type ShortestCommonAncestor with the following API:

Corner cases. Throw an IllegalArgumentException in the following situations:

 The argument to the constructor is not a rooted DAG
 Any argument is null
 Any vertex argument is outside its prescribed range
 Any iterable argument contains zero vertices
 Any iterable argument contains a null item
Unit testing. Your main() method must call each public constructor and method directly and
help verify that they work as prescribed (e.g., by printing results to standard output).
Basic performance requirements. Your implementation must achieve the following worst-
case performance requirements, where E and V are the number of edges and vertices in the
digraph, respectively.
 Your data type must use O(E+V) space.
 All methods and the constructor must take O(E+V) time.
Test client.
The following test client takes the name of a digraph input file as as a command-line
argument; creates the digraph; reads vertex pairs from standard input; and prints the length of
the shortest ancestral path between the two vertices, along with a shortest common ancestor:
Here is a sample execution (the yellow text indicates what you type):

Measuring the semantic relatedness of two nouns.

Semantic relatedness refers to the degree to which two concepts are related. Measuring
semantic relatedness is a challenging problem. For example, you consider George W. Bush
and John F. Kennedy (two U.S. presidents) to be more closely related than George W. Bush
and chimpanzee (two primates). It might not be clear whether George W. Bush and Eric
Arthur Blair are more related than two arbitrary people. However, both George W. Bush and
Eric Arthur Blair (a.k.a. George Orwell) are famous communicators and, therefore, closely
related.
We define the semantic relatedness of two WordNet nouns x and y as follows:
 A = set of synsets in which x appears
 B = set of synsets in which y appears
 distance(x, y) = length of shortest ancestral path of subsets A and B
 sca(x, y) = a shortest common ancestor of subsets A and B
This is the notion of distance that you will use to implement the distance() and sca() methods
in the WordNet data type.
Outcast detection.
Given a list of WordNet nouns x1, x2, ..., xn, which noun is the least related to the others? To
identify an outcast, compute the sum of the distances between each noun and every other one:
di = distance(xi, x1) + distance(xi, x2) + ... + distance(xi, xn)
and return a noun xt for which dt is maximum. Note that distance(xi, xi) = 0, so it will not
contribute to the sum.
Implement an immutable data type Outcast with the following API:

Corner cases. Assume that the argument to outcast() contains only valid WordNet nouns and
that it contains at least two such nouns.
Test client. The following test client takes from the command line the name of a synset file,
the name of a hypernym file, followed by the names of outcast files, and prints an outcast in
each file:
Here is a sample execution:

Analysis of running time.

Analyze the potential effectiveness of your approach to this problem by answering the
following questions:
 What is the order of growth of the worst-case running time of the length(),
lengthAncestor(), ancestor(), and ancestorSubset() methods in
ShortestCommonAncestor?
 What is the order of growth of the best-case running time of the length(),
lengthAncestor(), ancestor(), and ancestorSubset() methods in
ShortestCommonAncestor?
Give your answers as a function of the number of vertices V and the number of edges E in the
digraph.

Për tu dorëzuar janë klasat:

[Link], [Link], and [Link].
Dorëzimi i laboratorit duhet të bëhet deri me date 17.01.2025:
[Link]
Shënim: Mund të përdorni klasat dhe metodat e lidhura me grafet, si [Link] etj.

WordNet Programming Guide
No ratings yet
WordNet Programming Guide
4 pages
Semantic Networks
100% (1)
Semantic Networks
68 pages
Other Representation Formalisms: Version 1 CSE IIT, Kharagpur
No ratings yet
Other Representation Formalisms: Version 1 CSE IIT, Kharagpur
11 pages
Improving WordNet Using Word Embeddings
No ratings yet
Improving WordNet Using Word Embeddings
8 pages
A Review of Semantic Similarity Measures in WordNet
No ratings yet
A Review of Semantic Similarity Measures in WordNet
12 pages
SUMSEM12024-25 CSE3002 TH AP2024257000083 2025-05-30 Reference-Material-II
No ratings yet
SUMSEM12024-25 CSE3002 TH AP2024257000083 2025-05-30 Reference-Material-II
35 pages
6th Sem End Sem All Ques
100% (1)
6th Sem End Sem All Ques
15 pages
Slot and Filler
No ratings yet
Slot and Filler
5 pages
Measure Term Similarity Using A Semantic Network Approach
No ratings yet
Measure Term Similarity Using A Semantic Network Approach
5 pages
Master Thesis
No ratings yet
Master Thesis
74 pages
Semantic Nets
100% (1)
Semantic Nets
40 pages
Conceptnet 5.5: An Open Multilingual Graph of General Knowledge
No ratings yet
Conceptnet 5.5: An Open Multilingual Graph of General Knowledge
9 pages
Ontology Engineering PDF
No ratings yet
Ontology Engineering PDF
25 pages
Weak-Slot and Filler Structure
No ratings yet
Weak-Slot and Filler Structure
33 pages
Weak Slot and Filler Structures
No ratings yet
Weak Slot and Filler Structures
33 pages
NLP - Module 2
No ratings yet
NLP - Module 2
54 pages
Semantic Networks for AI Beginners
No ratings yet
Semantic Networks for AI Beginners
21 pages
Algorithm For Automatic Evaluation of Single Sentence Descriptive Answer
No ratings yet
Algorithm For Automatic Evaluation of Single Sentence Descriptive Answer
4 pages
Slot and Filler Structures Explained
No ratings yet
Slot and Filler Structures Explained
32 pages
P' To Node V: CT2-Answer Key Set-A-Batch-2
No ratings yet
P' To Node V: CT2-Answer Key Set-A-Batch-2
9 pages
Job
No ratings yet
Job
4 pages
Disambiguating Ontology with WordNet
No ratings yet
Disambiguating Ontology with WordNet
35 pages
Intro to NLP & Word Vectors
No ratings yet
Intro to NLP & Word Vectors
42 pages
Akshay DBpedia GSoC 2017 Proposal
No ratings yet
Akshay DBpedia GSoC 2017 Proposal
12 pages
NLP and Word Vector Representation
No ratings yet
NLP and Word Vector Representation
86 pages
Knowledge Representation Techniques Explained
No ratings yet
Knowledge Representation Techniques Explained
14 pages
Data Redundancy Using LSTM
No ratings yet
Data Redundancy Using LSTM
24 pages
Semantic Network, Frames and Scripts
No ratings yet
Semantic Network, Frames and Scripts
53 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
36 pages
Semantic Net Metric for Experts
No ratings yet
Semantic Net Metric for Experts
14 pages
AINotes 4
No ratings yet
AINotes 4
16 pages
Week8 Semantics Lab
No ratings yet
Week8 Semantics Lab
2 pages
XCS224N Module1 Slides
No ratings yet
XCS224N Module1 Slides
72 pages
Inference Sys.4
No ratings yet
Inference Sys.4
5 pages
2 Knowledge Representation - v3
No ratings yet
2 Knowledge Representation - v3
20 pages
The Design of A System For The Automatic Extraction of A Lexical Database Analogous To Wordnet From Raw Text
No ratings yet
The Design of A System For The Automatic Extraction of A Lexical Database Analogous To Wordnet From Raw Text
8 pages
Similarity Engine Thought Paper
No ratings yet
Similarity Engine Thought Paper
7 pages
Algorithms
No ratings yet
Algorithms
49 pages
NLP Word Vectors for Students
No ratings yet
NLP Word Vectors for Students
33 pages
Understanding Slot-Filler Structures
No ratings yet
Understanding Slot-Filler Structures
33 pages
Unit5 01
No ratings yet
Unit5 01
9 pages
Prolog WordNet Database Guide
No ratings yet
Prolog WordNet Database Guide
5 pages
UNIT IV 5 Weak Slot and Filler Structures
No ratings yet
UNIT IV 5 Weak Slot and Filler Structures
41 pages
NLP 4.4 Hyperlex Disambiguation
No ratings yet
NLP 4.4 Hyperlex Disambiguation
10 pages
4 Semantic Networks
No ratings yet
4 Semantic Networks
6 pages
Creative Web Services with Pattern
No ratings yet
Creative Web Services with Pattern
3 pages
2017 Computing Semantic Similarity of Concepts in Knowledge Graphs
No ratings yet
2017 Computing Semantic Similarity of Concepts in Knowledge Graphs
14 pages
WordNet: A Lexical Thesaurus Guide
No ratings yet
WordNet: A Lexical Thesaurus Guide
6 pages
Knowledge Representation in AI Systems
No ratings yet
Knowledge Representation in AI Systems
81 pages
Knowledge Representation - Types - Examples
No ratings yet
Knowledge Representation - Types - Examples
14 pages
Knowledge Representation
No ratings yet
Knowledge Representation
36 pages
Rooter Methodology for Access Points
No ratings yet
Rooter Methodology for Access Points
4 pages
NLP Basic - YL
No ratings yet
NLP Basic - YL
16 pages
4.2.? Lecture Extras. Selected Lexical: Ontology, Knowledge Base Projects
No ratings yet
4.2.? Lecture Extras. Selected Lexical: Ontology, Knowledge Base Projects
7 pages
Lecture 6 - Knowledge Representation 2
No ratings yet
Lecture 6 - Knowledge Representation 2
33 pages
The Semantic Phenomena
No ratings yet
The Semantic Phenomena
6 pages
Irony, Euphemism, Onomatopoeia Explained
No ratings yet
Irony, Euphemism, Onomatopoeia Explained
3 pages
Semantics & Discourse Analysis Test
No ratings yet
Semantics & Discourse Analysis Test
4 pages
Lecture 1 2025
No ratings yet
Lecture 1 2025
32 pages
NLP Notes Unit-3
No ratings yet
NLP Notes Unit-3
19 pages
Experiment 1: Problem Statement
No ratings yet
Experiment 1: Problem Statement
11 pages
Eng508 Assignment Solution
No ratings yet
Eng508 Assignment Solution
3 pages
The Mind-Brain Relationship
No ratings yet
The Mind-Brain Relationship
14 pages
The Main Lexicological Problems
No ratings yet
The Main Lexicological Problems
9 pages
1. Lý thuyết
No ratings yet
1. Lý thuyết
4 pages
018 Introduction To Linguistics Reviewer
No ratings yet
018 Introduction To Linguistics Reviewer
19 pages
Semantics & Pragmatics Course
No ratings yet
Semantics & Pragmatics Course
8 pages
Semantics: Making Meaning With Words
No ratings yet
Semantics: Making Meaning With Words
27 pages
Семінар 6 з лексикології
No ratings yet
Семінар 6 з лексикології
10 pages
Chapter 4 NLP
No ratings yet
Chapter 4 NLP
17 pages
ENG210 Zayma Zarnaz Tahriya 2031427015 ..
No ratings yet
ENG210 Zayma Zarnaz Tahriya 2031427015 ..
5 pages
Paul Lexical Relations With Examples
No ratings yet
Paul Lexical Relations With Examples
7 pages
05-Lexical Relationship in Psycho
No ratings yet
05-Lexical Relationship in Psycho
2 pages
Understanding Synonymy in Language
No ratings yet
Understanding Synonymy in Language
10 pages
Lecture 13
No ratings yet
Lecture 13
35 pages
10. ლექცია 10. ომონიმები და პოლისემანტური სიტყვები.
No ratings yet
10. ლექცია 10. ომონიმები და პოლისემანტური სიტყვები.
11 pages
bài tập
No ratings yet
bài tập
23 pages
Development of Shahmukhi Punjabi Language
No ratings yet
Development of Shahmukhi Punjabi Language
18 pages
NLP UNIT 5 Part B
100% (2)
NLP UNIT 5 Part B
31 pages
Unit 4 - Semantic Analysis
No ratings yet
Unit 4 - Semantic Analysis
67 pages
Ai Unit03
No ratings yet
Ai Unit03
94 pages
Ngữ Nghĩa Học: Định Nghĩa và Ví Dụ
No ratings yet
Ngữ Nghĩa Học: Định Nghĩa và Ví Dụ
27 pages
тезаурус пдф
No ratings yet
тезаурус пдф
5 pages
PRW Question Bank
No ratings yet
PRW Question Bank
117 pages

Word Net

Uploaded by

Word Net

Uploaded by

WORDNET

The WordNet digraph.

WordNet data type.

Corner cases. Throw an IllegalArgumentException in the following situations:

Corner cases. Throw an IllegalArgumentException in the following situations:

Measuring the semantic relatedness of two nouns.

Analysis of running time.

Për tu dorëzuar janë klasat:

You might also like