0% found this document useful (0 votes)

41 views46 pages

Text Mining and The Semantic Web: DR Diana Maynard NLP Group Department of Computer Science University of Sheffield

The document discusses text mining and information extraction for the semantic web. It describes how text mining involves knowledge discovery from large collections of unstructured text using natural language processing, machine learning, and statistical techniques. Information extraction is a key component and aims to extract structured facts and information from unstructured text. The document outlines common text mining components like document selection, preprocessing, and processing. It provides examples of information extraction systems like HaSIE, KIM, and Threat Tracker that extract structured metadata that can populate databases or ontologies.

Uploaded by

MohammedNaushad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views46 pages

Text Mining and The Semantic Web: DR Diana Maynard NLP Group Department of Computer Science University of Sheffield

Uploaded by

MohammedNaushad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 46

Text mining and the Semantic Web

Dr Diana Maynard
NLP Group
Department of Computer Science
University of Sheffield

http://nlp.shef.ac.uk

Structure of this lecture

Text Mining and the Semantic Web

Text Mining Components / Methods
Information Extraction
Evaluation
Visualisation
Summary

University of Manchester 15 March

Introduction to Text Mining and

the Semantic Web

http://nlp.shef.ac.uk

What is Text Mining?

Text mining is about knowledge discovery from
large collections of unstructured text.
Its not the same as data mining, which is
more about discovering patterns in structured
data stored in databases.
Similar techniques are sometimes used,
however text mining has many additional
constraints caused by the unstructured nature
of the text and the use of natural language.
Information extraction (IE) is a major
component of text mining.
IE is about extracting facts and structured
information from unstructured text.
University of Manchester 15 March

http://nlp.shef.ac.uk

Challenge of the Semantic Web

The Semantic Web requires machine
processable, repurposable data to complement
hypertext
Such metadata can be divided into two types of
information: explicit and implicit. IE is mainly
concerned with implicit (semantic) metadata.
More on this later

University of Manchester 15 March

Text mining components and

methods

http://nlp.shef.ac.uk

Text mining stages

Document selection and filtering (IR
techniques)
Document pre-processing (NLP
techniques)
Document processing (NLP / ML /
statistical techniques)

University of Manchester 15 March

http://nlp.shef.ac.uk

Stages of document processing

Document selection involves identification and retrieval of
potentially relevant documents from a large set (e.g. the
web) in order to reduce the search space. Standard or
semantically-enhanced IR techniques can be used for this.
Document pre-processing involves cleaning and preparing
the documents, e.g. removal of extraneous information,
error correction, spelling normalisation, tokenisation, POS
tagging, etc.
Document processing consists mainly of information
extraction
For the Semantic Web, this is realised in terms of metadata
extraction

University of Manchester 15 March

http://nlp.shef.ac.uk

Metadata extraction
Metadata extraction consists of two types:
Explicit metadata extraction involves information
describing the document, such as that contained
in the header information of HTML documents
(titles, abstracts, authors, creation date, etc.)
Implicit metadata extraction involves semantic
information deduced from the material itself, i.e.
endogenous information such as names of entities
and relations contained in the text. This essentially
involves Information Extraction techniques, often
with the help of an ontology.
University of Manchester 15 March

Information Extraction (IE)

http://nlp.shef.ac.uk

IE is not IR
IR pulls documents
from large text
collections (usually the
Web) in response to
specific keywords or
queries. You analyse
the documents.
IE pulls facts and
structured information
from the content of large
text collections. You
analyse the facts.
University of Manchester 15 March

http://nlp.shef.ac.uk

IE for Document Access

With traditional query engines, getting the facts can
be hard and slow
Where has the Queen visited in the last year?
Which places on the East Coast of the US
have had cases of West Nile Virus?
Which search terms would you use to get this kind
of information?
How can you specify you want someones home
page?
IE returns information in a structured way
IR returns documents containing the relevant
information somewhere (if youre lucky)

University of Manchester 15 March

http://nlp.shef.ac.uk

IE as an alternative to IR
IE returns knowledge at a much deeper
level than traditional IR
Constructing a database through IE and
linking it back to the documents can
provide a valuable alternative search tool.
Even if results are not always accurate,
they can be valuable if linked back to the
original text
University of Manchester 15 March

http://nlp.shef.ac.uk

Some example applications

HaSIE
KIM
Threat Trackers

University of Manchester 15 March

http://nlp.shef.ac.uk

HaSIE
Application developed by University of
Sheffield, which aims to find out how
companies report about health and safety
information
Answers questions such as:
How many members of staff died or had accidents
in the last year?
Is there anyone responsible for health and safety?
What measures have been put in place to improve
health and safety in the workplace?
University of Manchester 15 March

http://nlp.shef.ac.uk

HASIE
Identification of such information is too
time-consuming and arduous to be done
manually
IR systems cant cope with this because
they return whole documents, which could
be hundreds of pages
System identifies relevant sections of each
document, pulls out sentences about
health and safety issues, and populates a
database with relevant information
University of Manchester 15 March

http://nlp.shef.ac.uk

HASIE

University of Manchester 15 March

http://nlp.shef.ac.uk

KIM
KIM is a software platform developed by
Ontotext for semantic annotation of text.
KIM performs automatic ontology
population and semantic annotation for
Semantic Web and KM applications
Indexing and retrieval (an IE-enhanced
search technology)
Query and exploration of formal
knowledge
University of Manchester 15 March

http://nlp.shef.ac.uk

KIM
Ontotexts KIM query and results

University of Manchester 15 March

http://nlp.shef.ac.uk

Threat tracker
Application developed by Alias-I which finds and
relates information in documents
Intended for use by Information Analysts who
use unstructured news feeds and standing
collections as sources
Used by DARPA for tracking possible
information about terrorists etc.
Identification of entities, aliases, relations etc.
enables you to build up chains of related people
and things
University of Manchester 15 March

http://nlp.shef.ac.uk

Threat tracker

University of Manchester 15 March

http://nlp.shef.ac.uk

What is Named Entity

Recognition?
Identification of proper names in texts,
and their classification into a set of
predefined categories of interest
Persons
Organisations (companies, government
organisations, committees, etc)
Locations (cities, countries, rivers, etc)
Date and time expressions
Various other types as appropriate
University of Manchester 15 March

http://nlp.shef.ac.uk

Why is NE important?
NE provides a foundation from which to build
more complex IE systems
Relations between NEs can provide tracking,
ontological information and scenario building
Tracking (co-reference) Dr Head, John, he
Ontologies Manchester, CT
Scenario Dr Head became the new director
of Shiny Rockets Corp
University of Manchester 15 March

http://nlp.shef.ac.uk

Two kinds of approaches

Knowledge Engineering Learning Systems
rule based
developed by
experienced language
engineers
make use of human
intuition
require only small
amount of training data
development can be
very time consuming
some changes may be
hard to accommodate

use statistics or other

machine learning
developers do not
need LE expertise
require large amounts
of annotated training
data
some changes may
require re-annotation
of the entire training
corpus

University of Manchester 15 March

http://nlp.shef.ac.uk

Typical NE pipeline
Pre-processing (tokenisation, sentence
splitting, morphological analysis, POS
tagging)
Entity finding (gazeteer lookup, NE
grammars)
Coreference (alias finding, orthographic
coreference etc.)
Export to database / XML
University of Manchester 15 March

http://nlp.shef.ac.uk

GATE and ANNIE

GATE (Generalised Architecture for Text Engineering)
is a framework for language processing
ANNIE (A Nearly New Information Extraction system)
is a suite of language processing tools, which
provides NE recognition
GATE also includes:
plugins for language processing, e.g. parsers,
machine learning tools, stemmers, IR tools, IE
components for various languages etc.
tools for visualising and manipulating ontologies
ontology-based information extraction tools
evaluation and benchmarking tools
University of Manchester 15 March

http://nlp.shef.ac.uk

GATE

University of Manchester 15 March

http://nlp.shef.ac.uk

Information Extraction for the Semantic Web

Traditional IE is based on a flat structure, e.g.
recognising Person, Location, Organisation,
Date, Time etc.
For the Semantic Web, we need information in a
hierarchical structure
Idea is that we attach semantic metadata to the
documents, pointing to concepts in an ontology
Information can be exported as an ontology
annotated with instances, or as text annotated
with links to the ontology
University of Manchester 15 March

http://nlp.shef.ac.uk

Richer NE Tagging
Attachment of
instances in the text to
concepts in the
domain ontology
Disambiguation of
instances, e.g.
Cambridge, MA vs
Cambridge, UK

University of Manchester 15 March

http://nlp.shef.ac.uk

Magpie
Developed by the Open University
Plugin for standard web browser
Automatically associates an ontology-based
semantic layer to web resources, allowing
relevant services to be linked
Provides means for a structured and informed
exploration of the web resources
e.g. looking at a list of publications, we can find
information about an author such as projects
they work on, other people they work with, etc.
University of Manchester 15 March

http://nlp.shef.ac.uk

MAGPIE in action

University of Manchester 15 March

http://nlp.shef.ac.uk

MAGPIE in action

University of Manchester 15 March

Evaluation

http://nlp.shef.ac.uk

Evaluation metrics and tools

Evaluation metrics mathematically define how to
measure the systems performance against humanannotated gold standard
Scoring program implements the metric and
provides performance measures
for each document and over the entire corpus
for each type of NE
may also evaluate changes over time

A gold standard reference set also needs to be

provided this may be time-consuming to produce
Visualisation tools show the results graphically and
enable easy comparison
University of Manchester 15 March

http://nlp.shef.ac.uk

Methods of evaluation
Traditional IE is evaluated in terms of Precision
and Recall
Precision - how accurate were the answers the
system produced?
correct answers/answers produced
Recall - how good was the system at finding
everything it should have found?
correct answers/total possible correct answers
There is usually a tradeoff between precision
and recall, so a weighted average of the two (Fmeasure) is generally also used.
University of Manchester 15 March

http://nlp.shef.ac.uk

GATE AnnotationDiff Tool

University of Manchester 15 March

http://nlp.shef.ac.uk

Metrics for Richer IE

Precision and Recall are not sufficient for
ontology-based IE, because the distinction
between right and wrong is less obvious
Recognising a Person as a Location is clearly
wrong, but recognising a Research Assistant as a
Lecturer is not so wrong
Similarity metrics need to be integrated
additionally, such that items closer together in the
hierarchy are given a higher score, if wrong
Also possible is a cost-based approach, where
different weights can be given to each concept in
the hierarchy, and to different types of error, and
combined to form a single score
University of Manchester 15 March

Visualisation of Results

http://nlp.shef.ac.uk

Visualisation of Results
Cluster Map example
Traditionally used to show documents classified
according to topic
Here shows instances classified according to
concept
Enables analysis, comparison and querying of
results
Examples here created by Marta Sabou (Free
University of Amsterdam) using Aduna software
University of Manchester 15 March

http://nlp.shef.ac.uk

The principle Venn Diagrams

Documents
classified
according to topic

University of Manchester 15 March

http://nlp.shef.ac.uk

Jobs by region

Instances
classified by
concept

University of Manchester 15 March

http://nlp.shef.ac.uk

Concept distribution

Shows the
relative
importance of
different concepts
University of Manchester 15 March

http://nlp.shef.ac.uk

Correct and
incorrect
instances
attached to
concepts

University of Manchester 15 March

http://nlp.shef.ac.uk

Summary
Introduction to text mining and the
semantic web
How traditional information extraction
techniques, including visualisation and
evaluation, can be extended to deal with
complexity of the Semantic Web
How text mining can help the progression
of the Semantic Web
University of Manchester 15 March

http://nlp.shef.ac.uk

Research questions
Automatic annotation tools are currently
mainly domain and ontology-dependent,
and work best on a small scale
Tools designed for large scale applications
lose out on accuracy
Ontology population works best when the
ontology already exists, but how do we
ensure accurate ontology generation?
Need large scale evaluation programs
University of Manchester 15 March

http://nlp.shef.ac.uk

Some useful links

NaCTem (National centre for text mining)
http://www.nactem.ac.uk
GATE
http://gate.ac.uk
KIM
http://www.ontotext.com/kim/
h-TechSight
http://www.h-techsight.org
Magpie
http://www.kmi.open.ac.uk/projects/magpie
University of Manchester 15 March

22 WebIntelligence Tools Feb2008
No ratings yet
22 WebIntelligence Tools Feb2008
26 pages
Yamaha
No ratings yet
Yamaha
45 pages
Web Systems and Technologies: Objectives
100% (1)
Web Systems and Technologies: Objectives
1 page
Text Analysis Semantic Search
No ratings yet
Text Analysis Semantic Search
165 pages
VV - IR - UNIT-I - Part2
No ratings yet
VV - IR - UNIT-I - Part2
35 pages
WINSEM2023-24 BCSE306L TH VL2023240500598 2024-04-30 Reference-Material-I
No ratings yet
WINSEM2023-24 BCSE306L TH VL2023240500598 2024-04-30 Reference-Material-I
44 pages
Pert23 - NLP
No ratings yet
Pert23 - NLP
30 pages
20200728204914D5872 - COMP6639 - Session 28 - Natural Language Processing
No ratings yet
20200728204914D5872 - COMP6639 - Session 28 - Natural Language Processing
29 pages
unit 4 DNLP
No ratings yet
unit 4 DNLP
52 pages
FALLSEM2023-24 CSE4022 ETH VL2023240103739 2023-08-23 Reference-Material-II
No ratings yet
FALLSEM2023-24 CSE4022 ETH VL2023240103739 2023-08-23 Reference-Material-II
5 pages
AI UNIT-5 Notes
No ratings yet
AI UNIT-5 Notes
27 pages
Session 6
No ratings yet
Session 6
19 pages
Artificial Intelligence and The Internet: Edward Brent Theodore Carnahan
No ratings yet
Artificial Intelligence and The Internet: Edward Brent Theodore Carnahan
32 pages
Introduction to Text Mining NLP Foundations
No ratings yet
Introduction to Text Mining NLP Foundations
10 pages
Ch2 - IR and LT
No ratings yet
Ch2 - IR and LT
45 pages
Unit 4 Updated
No ratings yet
Unit 4 Updated
178 pages
Unit 4 NLP
No ratings yet
Unit 4 NLP
29 pages
Chapter #7 Applicatios of NLP (Reading Ass)
No ratings yet
Chapter #7 Applicatios of NLP (Reading Ass)
58 pages
Unit 4
No ratings yet
Unit 4
174 pages
Artificial Intelligence in Information Retrieval
No ratings yet
Artificial Intelligence in Information Retrieval
5 pages
IR U1
No ratings yet
IR U1
103 pages
Unit4 Final
No ratings yet
Unit4 Final
57 pages
Natural Language Processing Using Java: Sang Venkatraman April 21, 2015
No ratings yet
Natural Language Processing Using Java: Sang Venkatraman April 21, 2015
51 pages
NLP Unit 3&4
No ratings yet
NLP Unit 3&4
37 pages
Offered To Final Year B.Tech. CSE by Dept. of C.Tech.: 18CSE359T Natural Language Processing
No ratings yet
Offered To Final Year B.Tech. CSE by Dept. of C.Tech.: 18CSE359T Natural Language Processing
178 pages
Aplicacion de Tecnicas de Extraccion de Informacion A Bibliotecas Digitales Applying Information Extraction Techniques To Dls 0
No ratings yet
Aplicacion de Tecnicas de Extraccion de Informacion A Bibliotecas Digitales Applying Information Extraction Techniques To Dls 0
10 pages
Unit 5 AI
No ratings yet
Unit 5 AI
9 pages
UNIT 5
No ratings yet
UNIT 5
91 pages
Information Extraction
No ratings yet
Information Extraction
7 pages
Data Mining
No ratings yet
Data Mining
84 pages
MNM: Ontology Driven Semi-Automatic and Automatic Support For Semantic Markup
No ratings yet
MNM: Ontology Driven Semi-Automatic and Automatic Support For Semantic Markup
14 pages
DLT Unit-5
No ratings yet
DLT Unit-5
48 pages
1-Overview of Information Retrieval - New
No ratings yet
1-Overview of Information Retrieval - New
47 pages
Semantic Web: BY-MANIT PANWAR (00116404509) M.C.A (SE), 1 SEM
No ratings yet
Semantic Web: BY-MANIT PANWAR (00116404509) M.C.A (SE), 1 SEM
12 pages
NLP M5 Part-1 SPP
No ratings yet
NLP M5 Part-1 SPP
55 pages
Different Text Mining Techniques
100% (1)
Different Text Mining Techniques
4 pages
The Semantic Web: Vineeta, 8PGC18, M.Tech (III Semester)
No ratings yet
The Semantic Web: Vineeta, 8PGC18, M.Tech (III Semester)
24 pages
NLP unit4 mat (1)
No ratings yet
NLP unit4 mat (1)
13 pages
Semantic News Finder: A Semantic Retrieval From News Items: M.Thangaraj G.Sujatha
No ratings yet
Semantic News Finder: A Semantic Retrieval From News Items: M.Thangaraj G.Sujatha
9 pages
2 词汇挖掘与实体挖掘
No ratings yet
2 词汇挖掘与实体挖掘
80 pages
Text Mining
No ratings yet
Text Mining
25 pages
Unit5 NLP RNP
No ratings yet
Unit5 NLP RNP
112 pages
Ir and NLP
No ratings yet
Ir and NLP
6 pages
1 introIR
No ratings yet
1 introIR
15 pages
1 Introduction MIR
No ratings yet
1 Introduction MIR
35 pages
CS 523 - Essentials of Natural Language Processing: Project Title: Report On Named Entity Recognition
No ratings yet
CS 523 - Essentials of Natural Language Processing: Project Title: Report On Named Entity Recognition
19 pages
Unit 1a
No ratings yet
Unit 1a
53 pages
IntroWebSem en
No ratings yet
IntroWebSem en
19 pages
Where Is Technology Going
No ratings yet
Where Is Technology Going
5 pages
Text
No ratings yet
Text
5 pages
ISE IR F16 Hasan 67186 HW
No ratings yet
ISE IR F16 Hasan 67186 HW
8 pages
NLP QBS Module 4 & 5
No ratings yet
NLP QBS Module 4 & 5
21 pages
Presentation Seminar
No ratings yet
Presentation Seminar
16 pages
A Survey On Hidden Markov Models For Information Extraction
No ratings yet
A Survey On Hidden Markov Models For Information Extraction
4 pages
Introduction To The Semantic Web
No ratings yet
Introduction To The Semantic Web
11 pages
Mining The Web Searching and Integration
No ratings yet
Mining The Web Searching and Integration
5 pages
Chapter - 6 Part 1
No ratings yet
Chapter - 6 Part 1
21 pages
Week 12
No ratings yet
Week 12
19 pages
AI PPT
No ratings yet
AI PPT
14 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
50 pages
Module 3-2
No ratings yet
Module 3-2
17 pages
North America South America Asia Europe Africa Country Country Country Country Country
100% (1)
North America South America Asia Europe Africa Country Country Country Country Country
3 pages
Twelfth Five-Year Development Plan of Paper Industry
No ratings yet
Twelfth Five-Year Development Plan of Paper Industry
43 pages
Group Assignments and Projects: 86 Data, Text, and Document Management
No ratings yet
Group Assignments and Projects: 86 Data, Text, and Document Management
1 page
To Create A Spreadsheet Using MS-Excel and Perform Formatting of Cells and Columns
No ratings yet
To Create A Spreadsheet Using MS-Excel and Perform Formatting of Cells and Columns
2 pages
Parent Company Category Sector Tagline/ Slogan STP Segment
No ratings yet
Parent Company Category Sector Tagline/ Slogan STP Segment
5 pages
Semantic Web Literature Review
100% (2)
Semantic Web Literature Review
5 pages
Empowerment Technologies: Student Reader For Senior High School
No ratings yet
Empowerment Technologies: Student Reader For Senior High School
9 pages
Data Is Useless ENFR
No ratings yet
Data Is Useless ENFR
28 pages
Introduction To ICT Class Lesson & Study Guide
No ratings yet
Introduction To ICT Class Lesson & Study Guide
56 pages
EmTech Module Week1-8
No ratings yet
EmTech Module Week1-8
16 pages
May 2022: Most Download Articles in Web & Semantic Technology
No ratings yet
May 2022: Most Download Articles in Web & Semantic Technology
35 pages
Module 1 To 6
No ratings yet
Module 1 To 6
257 pages
Sea Life Leaflet
No ratings yet
Sea Life Leaflet
2 pages
Empotech Las 11 18
No ratings yet
Empotech Las 11 18
4 pages
E-Tech Module (F.A.R.D)
100% (1)
E-Tech Module (F.A.R.D)
27 pages
Semantic Web Unit - 1 & 2
No ratings yet
Semantic Web Unit - 1 & 2
16 pages
4.3 A Ontological Engineering in Artificial Intelligence
100% (2)
4.3 A Ontological Engineering in Artificial Intelligence
5 pages
Adama Science and Technology University
No ratings yet
Adama Science and Technology University
102 pages
Crisis Management 3 McDonald Slides
No ratings yet
Crisis Management 3 McDonald Slides
26 pages
Lesson 1 Empowerment Technology Informat
No ratings yet
Lesson 1 Empowerment Technology Informat
3 pages
I. ICT (Information & Communication Technology: LESSON 1: Introduction To ICT
No ratings yet
I. ICT (Information & Communication Technology: LESSON 1: Introduction To ICT
2 pages
App 001 Reviewer
No ratings yet
App 001 Reviewer
6 pages
Heiko Paulheim Dissertation
100% (2)
Heiko Paulheim Dissertation
5 pages
ETEC101-Introduction-to-ICT G-11 Humss
No ratings yet
ETEC101-Introduction-to-ICT G-11 Humss
31 pages
Empowerme NT Technologi ES: Introduction To Information and Communication Technologies
No ratings yet
Empowerme NT Technologi ES: Introduction To Information and Communication Technologies
29 pages
E Tech
100% (1)
E Tech
10 pages
Emtech Reviewer
0% (1)
Emtech Reviewer
6 pages
What Are The Examples of Web 2.0 Applications?
No ratings yet
What Are The Examples of Web 2.0 Applications?
4 pages
Automatic Web Services Composition Using Combining HTN and CSP
No ratings yet
Automatic Web Services Composition Using Combining HTN and CSP
6 pages
CS6010-Social Network Analysis PDF
100% (1)
CS6010-Social Network Analysis PDF
9 pages
Diagnostic Test - Empotech - Q1 2022 2023
No ratings yet
Diagnostic Test - Empotech - Q1 2022 2023
6 pages
Empowerment Technologies m1.1
No ratings yet
Empowerment Technologies m1.1
8 pages
Future Trends in Web Development
No ratings yet
Future Trends in Web Development
12 pages
Aaron Swartz's "A Programmable Web: An Unfinished Work"
No ratings yet
Aaron Swartz's "A Programmable Web: An Unfinished Work"
66 pages

Text Mining and The Semantic Web: DR Diana Maynard NLP Group Department of Computer Science University of Sheffield

Uploaded by

Text Mining and The Semantic Web: DR Diana Maynard NLP Group Department of Computer Science University of Sheffield

Uploaded by

Text mining and the Semantic Web

Structure of this lecture

Text Mining and the Semantic Web

University of Manchester 15 March

Introduction to Text Mining and

What is Text Mining?

Challenge of the Semantic Web

University of Manchester 15 March

Text mining components and

Text mining stages

University of Manchester 15 March

Stages of document processing

University of Manchester 15 March

Information Extraction (IE)

IE for Document Access

University of Manchester 15 March

Some example applications

University of Manchester 15 March

University of Manchester 15 March

University of Manchester 15 March

University of Manchester 15 March

What is Named Entity

Two kinds of approaches

use statistics or other

University of Manchester 15 March

GATE and ANNIE

University of Manchester 15 March

Information Extraction for the Semantic Web

University of Manchester 15 March

University of Manchester 15 March

University of Manchester 15 March

Evaluation metrics and tools

A gold standard reference set also needs to be

GATE AnnotationDiff Tool

University of Manchester 15 March

Metrics for Richer IE

The principle Venn Diagrams

University of Manchester 15 March

University of Manchester 15 March

University of Manchester 15 March

Some useful links

You might also like