0% found this document useful (0 votes)
12 views19 pages

Session 6

This document discusses information extraction from unstructured text sources. It defines information extraction as the automatic extraction of structured information such as entities, relationships, and attributes from unstructured sources. The document outlines various techniques for information extraction including named entity recognition to identify important entities, relation classification to determine relationships between entities, and knowledge graphs to store extracted information in a structured format.

Uploaded by

arash.hasanpour
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views19 pages

Session 6

This document discusses information extraction from unstructured text sources. It defines information extraction as the automatic extraction of structured information such as entities, relationships, and attributes from unstructured sources. The document outlines various techniques for information extraction including named entity recognition to identify important entities, relation classification to determine relationships between entities, and knowledge graphs to store extracted information in a structured format.

Uploaded by

arash.hasanpour
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Computational Knowledge Analysis –

Natural Language Processsing with Python


Session 6
Information Extraction, Relation Classification, and Knowledge Graphs
12.06.2023
Dr. Maria Becker
Summer Term 2023
Areas of NLP: Syntactic vs. Semantic Analysis
• Syntax and semantic analysis are two main techniques used with
natural language processing:
• NLP uses syntax to assess meaning from a language based on grammatical
rules
• Syntax techniques include Morphological segmentation, POS tagging, chunking,
dependency parsing…

• Semantics involves the use of and meaning behind words. NLP applies
algorithms to understand the meaning and structure of sentences
• Semantics techniques include word sense disambiguation, named entity, sentiment
analysis, text summarization…
Introduction to Information Extraction

• Watch the Video by Chris Manning (9 Minutes): Introduction to


Information Extraction
• https://www.youtube.com/watch?v=kKaGLGAQrmw
What is Information Extraction?

• Information Extraction/ Information retrieval (mostly used as


synonyms): Automatic extraction of structured information such as
entities, relationships between entities, and attributes describing
entities from unstructured, possibly noisy sources
• Opened up new avenues for querying, organizing, and analyzing data
• Enables much richer forms of queries on the abundant unstructured
sources than possible with keyword searches alone
Information Extraction: From Unstructured to Structured Texts

From: Jurafsky/Martin (2021): Speech and Language Processing.


Why Information Extraction?
• Information Extraction enables
• finding entities
• classifying entities
• classifying relations between entities
• storing entities and their relations in a database (knowledge graphs)
• with the ultimate goal of making unstructured texts machine-readable
Applications of Information Extraction
• News Tracking: automatically tracking specific event types from news sources
• Customer Care: Any customer-oriented enterprise collects many forms of
unstructured data from customer interaction
• Personal information management (PIM) systems: seek to organize personal data
like documents, emails, projects and people in a structured inter-linked format
• Comparison Shopping: creating comparison shopping web sites that automatically
crawl merchant web sites to find products and their prices which can then be used
for comparison shopping
• Ad Placement on Webpages: advertisements of a product next to the text that
both mentions the product and expresses a positive opinion about it
• Scientific Applications: E.g. extracting biological objects such as proteins and genes
from paper repositories such as Pubmed
Subtask of Information Extraction
• IE can involve a couple of subtasks:
• Template filling
• Event extraction
• Table information extraction
• Terminology extraction
• Coreference resolution
• Named entity recognition
• Relationship extraction
Step 1: Extract all (important) entities from texts –
Named Entity Recognition
• Subtask of information extraction
• Goal: detect and classify named entities mentioned in unstructured text into pre-
defined categories
• Common Categories:
Step 2: Relation Classification and Knowledge Graphs

• Watch the Video (13 Minutes): Introduction to Relation Extraction


• https://www.youtube.com/watch?v=4AjieiJ1CXo
From Knowledge Relations to Knowledge Graphs
• Entities and the relations between them can be stored in knowledge graphs
• A knowledge graph (or semantic network) represents a network of entities
(i.e. objects, events, situations, or concepts) and illustrates the relationships
between them
• This information is usually stored in a graph database and visualized as a
graph structure, prompting the term knowledge “graph”
• A knowledge graph is made up of three main components: nodes, edges,
and labels
• Any object, place, or person can be a node
• An edge defines the relationship between the nodes
• Knowledge graphs are very useful for a lot of NLP downstream tasks such as
automatic question answering
Example of a Knowledge Graph: ConceptNet

• Semantic network containing common


sense knowledge
• Diverse and simple facts about the world,
people and everyday life

• Collected from volunteers on the


Internet
• Nodes represent words/phrases, edges
represent relations
• Triples: ⟨left term, relation, right term⟩

• English version (5.6):


• set of 37 relations
• about 1,900,000 nodes
ConceptNet Relations

14
ConceptNet Relations & Examples
ExternalURL knowledge → dbpedia.org HasProperty ice → cold
FormOf slept → sleep MotivatedByGoal compete → win
IsA car → vehicle; Chicago → city ObstructedBy sleep → noise
PartOf gearshift → car Desires person → love
HasA bird → wing; pen → ink CreatedBy cake → bake
UsedFor bridge → cross water Synonym sunlight ↔ sunshine
CapableOf knife → cut Antonym black ↔ white; hot ↔ cold
AtLocation Boston → Massachusetts DerivedFrom pocketbook → book
Causes exercise → sweat SymbolOf red → fervor
HasSubevent eating → chewing DefinedAs peace → absence of war
HasFirstSubevent sleep → close eyes Entails run → move
HasLastSubevent cook → clean up kitchen MannerOf auction → sale
HasPrerequisite dream → sleep LocatedNear chair ↔ table
Use case scenario: Commonsense knowledge graphs are
helpful for argument relation classification (Paul et al., 2020)
Injecting Knowledge Relations into a
Neural Argument Classifier

Knowledge paths from ConceptNet


Examples
Next session
• Next session (19.06.2023) will take place in Darmstadt
• The session will include some exercises, so please bring your laptops
• This week there is no homework

You might also like