SlideShare a Scribd company logo
A review of the state of the art in
Machine Learning on the Semantic Web
Simon Price
University of Bristol
http://www.cs.bris.ac.uk/~price
Outline
• Introduction to the Semantic Web
– Semantic Web layers
– URI, RDF(S), OWL
– Web Services and the Semantic Web
• Applications of Machine Learning
– Creating the Semantic Web
– Using the Semantic Web
• Summary and pointers to further info.
Introduction to the Semantic Web
Definition
"The Semantic Web is the representation of data
on the World Wide Web. It is a collaborative
effort led by W3C with participation from a large
number of researchers and industrial partners. It
is based on the Resource Description Framework
(RDF), which integrates a variety of applications
using XML for syntax and URIs for naming."
Uniform Resource Identifier (URI)
• URI addressing scheme
http://...
ftp://...
mailto:..., etc.
• Each URI points to a resource (or a specific point
within a resource)
• Typically, the resource is somewhere on the Web but
it may be a non-network retrievable entity
e.g.
- human beings, corporations, bound books in a library,
- concepts, topics, relations, ...
URIs - Good news. Bad news.
• Good news: decentralisation
– anyone can create a URI
– allows rapid growth of Web
• Bad news: decentralisation
– no centralised register or clearing house
– multiple URIs can refer to same entity
– testing for equality (or equivalence) poses interesting problems
Resource Description Framework (RDF)
• A language of URI triples.
• An RDF statement has the form:
{ subject, predicate, object }
• e.g. "http://www.example.org/index.html has a creator whose
value is the literal John Smith" could be represented as a plain
text triple:
subject http://www.example.org/index.html
predicate http://purl.org/dc/elements/1.1/creator
object John Smith
Representing RDF
• Default syntax is XML (not human friendly)
• SQL triple stores commonly used
• RDF toolkits: Jena (HP) and Redland (Dave Beckett)
• Prolog: SWI-Prolog (40M triples per 100MB RAM)
e.g.
rdf( 'http://www.example.org/index.html',
'http://purl.org/dc/elements/1.1/creator',
'John Smith' ).
Semantic Web Layers
RDF Schema
• A language for describing properties and classes of
RDF resources
• Includes semantics for generalisation-hierarchies of
such properties and classes
• Simple data typing model:
– is-a relationships and properties
– some range and domain restriction
Notes:
1. RDF Schema recently renamed as "RDF Vocabulary Description Layer"
2. In the literature, RDF + RDF Schema is often referred to as RDF(S)
Ontology Vocabulary Layer
• Huge number of different ontologies:
– simple: thesauri, taxonomies
– complex: DAML+OIL, OWL
• OWL supersedes the older DAML+OIL
• OWL goes further than RDF Schema, adding:
– relations between classes
– cardinality
– equality
– richer typing
– characteristics of properties
– enumerated classes
Web Ontology Language (OWL)
• OWL Lite - hierarchical classification (ideal for
thesauri and other taxonomies).
• OWL DL - description logics (computationally
complete but inference services are restricted to
classification and subsumption).
• OWL Full - full syntactic freedom of RDF (no
computational guarantees).
Web Services and the Semantic Web
• Web Services
– XML-based interfaces to programs accessible via the Web
– Operating system neutral Remote Procedure Call (RPC) protocol
• Today's Web Services
– Business-orientated, simple, short transactional operations
– Domain-specific XML vocabularies (not RDF)
• Tomorrow's Web Services
– Combination of simple services to achieve complex operations
– Automated discovery, selection and pipelining of Web Services
• Semantic Web + Machine Learning may have an
important role to play in the future of Web Services
Applications of Machine Learning
• Attempts to apply Machine Learning are being made
within each of the Semantic Web layers.
• Research activity within each layer can be divided
into two parts:
The application of Machine Learning in:
• creating the Semantic Web
• using the Semantic Web
• Most activity to-date is in creating the Semantic Web
Activity
Creating the Semantic Web
• Why can't people do this themselves?
– People are frequently unaware of metadata standards
– People are (usually) unwilling to spend time creating metadata
• May be no direct benefit (to them)
• Boring
– People are incapable of applying metadata consistently
• Consistency varies from person to person
• Consistency varies in the same person over time
– There's already a huge backlog of unlabelled data on the existing web!
– Also, someone else's metadata may not be what you want
• e.g. Site content rating from supplier may be unreliable
Automatic Generation of Metadata
• Paper describes examples of ML research that use:
– Inductive Logic Programming (on popular science articles)
• F-Score close to human expert. Precision between 0.7 and 1.0
– Hidden Markov Models (on marked-up MUC and MEDLINE texts)
• Reported as adequate but not able to scale due to fragmentation of
probability distribution. Portable across domains. SVMs suggested.
– Association Analysis (using Web Directory for labelling examples)
• Work in progress but looks for terms in text that indicate directory path
e.g. of a path .../Manufacturing/Materials/Metals/Steel/..
Application of ML to Ontologies
• Ontology Vocabulary Layer is currently a popular
area of Semantic Web research
• Most ontologies hand-crafted
• Creating ontologies is far more complex than RDF
metadata extraction
– ILP has been used to revise and maintain, but not create
– Association rule learning has been used to partly automate
– Regular expression (FSA) rewriting guided by Minimum Description
Length to create Document Type Descriptors (DTD) for XML docs.
• Ontology mapping
– Hard problem
– Some work using Naive Bayes
Using the Semantic Web
• Not much ML research in this area (yet)
• Datasets exists
– RSS newsfeeds and Weblogs/Blogs
– DAML repository
– Dave Beckett's RDF Resource Guide
• Locating suitable data can be a problem
• Semantic Web Mining has been conjectured
– combines Semantic Web with Web Mining
– Relational Data Mining (RDM) suggested to exploit structure in data
Summary
• Semantic Web is rapidly evolving
• Key languages:
– RDF
– vocabularies built on top of RDF
• Publicly available RDF datasets exist
– in applications like RSS
– and repositories like DAML
• RDF maps well to Prolog (and SQL)
• Machine Learning looks promising for both the
creation and use of the Semantic Web
Simon Price
University of Bristol
http://www.cs.bris.ac.uk/~price

More Related Content

PPTX
semantic web & natural language
PPTX
General Introduction for Semantic Web and Linked Open Data
PPTX
Interverbum falcon-10oct14-az
PDF
Ontologies and semantic web
PPTX
Linked Open Data and Digital Curation (Islandora)
PDF
Linked (Open) Data
PDF
Resource description framework
semantic web & natural language
General Introduction for Semantic Web and Linked Open Data
Interverbum falcon-10oct14-az
Ontologies and semantic web
Linked Open Data and Digital Curation (Islandora)
Linked (Open) Data
Resource description framework

What's hot (7)

PPTX
Linked Open Data in Romania
PDF
Islandora and Linked Open Data
PPTX
Linkator: enriching web pages by automatically adding dereferenceable semanti...
PPTX
Development of Semantic Web based Disaster Management System
PDF
Sparql a simple knowledge query
PPTX
Semantic web
Linked Open Data in Romania
Islandora and Linked Open Data
Linkator: enriching web pages by automatically adding dereferenceable semanti...
Development of Semantic Web based Disaster Management System
Sparql a simple knowledge query
Semantic web
Ad

Viewers also liked (20)

PPTX
data.bris - Use case, role and functionality for CKAN adoption
PPTX
Visualising China - historical photos of China
PPT
Nature Locator
PPTX
Co-designing Research IT and Research Data Services
PPTX
NewsPatterns - visualisation layer of news feed mining
PPT
Cost of Migrating Large-Scale Computer Assisted Learning (CAL) Software to We...
PPTX
Managing Large-scale Multimedia Development Projects
PPT
Managing research data at Bristol
PPTX
Research IT at the University of Bristol
PPTX
Best of Bristol Media City - MyMobileBristol, NatureLocator, Visualising China
PPTX
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
PPTX
Adapting CARDIO for BOS
PPT
Webs of People, Webs of Data
PPTX
Historical Photographs of China - the journey towards sustainability and utility
PPTX
Supporting Big Data, Open Data, Data Analytics and Data Science
PPTX
Data Sharing and Standards
PPTX
Academic IT support for Data Science
PPT
SubSift web services and workflows for profiling and comparing scientists and...
PPTX
Code Club - a Fight Club inspired approach to software inspection and review
PPTX
A Higher-Order Data Flow Model for Heterogeneous Big Data
data.bris - Use case, role and functionality for CKAN adoption
Visualising China - historical photos of China
Nature Locator
Co-designing Research IT and Research Data Services
NewsPatterns - visualisation layer of news feed mining
Cost of Migrating Large-Scale Computer Assisted Learning (CAL) Software to We...
Managing Large-scale Multimedia Development Projects
Managing research data at Bristol
Research IT at the University of Bristol
Best of Bristol Media City - MyMobileBristol, NatureLocator, Visualising China
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Adapting CARDIO for BOS
Webs of People, Webs of Data
Historical Photographs of China - the journey towards sustainability and utility
Supporting Big Data, Open Data, Data Analytics and Data Science
Data Sharing and Standards
Academic IT support for Data Science
SubSift web services and workflows for profiling and comparing scientists and...
Code Club - a Fight Club inspired approach to software inspection and review
A Higher-Order Data Flow Model for Heterogeneous Big Data
Ad

Similar to A review of the state of the art in Machine Learning on the Semantic Web (20)

PPTX
Intro to the semantic web (for libraries)
PPTX
Semantic Web: introduction & overview
PPTX
Semantic web
PDF
The Web of Data: The W3C Semantic Web Initiative
PPTX
Semantic framework for web scraping.
PPTX
Usage of Linked Data: Introduction and Application Scenarios
PPTX
unit 1.pptx
PDF
Resource Oriented Architectures: The Future of Data API?
PPT
ontology.ppt
PDF
RDF Seminar Presentation
PDF
WebGUI And The Semantic Web
PDF
Metadata is back!
PPTX
From ontology to wiki
PDF
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
PPT
Analysis on semantic web layer cake entities
PPTX
Knowledge Representation, Semantic Web
PPTX
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
PDF
Chapter 1 semantic web
PDF
Semantic Web Nature
PPTX
Longwell final ppt
Intro to the semantic web (for libraries)
Semantic Web: introduction & overview
Semantic web
The Web of Data: The W3C Semantic Web Initiative
Semantic framework for web scraping.
Usage of Linked Data: Introduction and Application Scenarios
unit 1.pptx
Resource Oriented Architectures: The Future of Data API?
ontology.ppt
RDF Seminar Presentation
WebGUI And The Semantic Web
Metadata is back!
From ontology to wiki
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
Analysis on semantic web layer cake entities
Knowledge Representation, Semantic Web
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
Chapter 1 semantic web
Semantic Web Nature
Longwell final ppt

More from Simon Price (6)

PPTX
Adding Open Data Value to 'Closed Data' Problems
PPT
Citizen Science and Crowd-sourcing Biological Surveys
PPT
Mining and Mapping the Research Landscape
PPT
SubSift: a novel application of the vector space model to support the academi...
PPTX
Mobile Apps for Research Data Collection
PPTX
Clinical Experience Recorder
Adding Open Data Value to 'Closed Data' Problems
Citizen Science and Crowd-sourcing Biological Surveys
Mining and Mapping the Research Landscape
SubSift: a novel application of the vector space model to support the academi...
Mobile Apps for Research Data Collection
Clinical Experience Recorder

Recently uploaded (20)

PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPT
ISS -ESG Data flows What is ESG and HowHow
PPT
Predictive modeling basics in data cleaning process
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PDF
Transcultural that can help you someday.
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Introduction to the R Programming Language
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
New ISO 27001_2022 standard and the changes
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PDF
How to run a consulting project- client discovery
PPTX
modul_python (1).pptx for professional and student
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Managing Community Partner Relationships
PPTX
Leprosy and NLEP programme community medicine
Topic 5 Presentation 5 Lesson 5 Corporate Fin
ISS -ESG Data flows What is ESG and HowHow
Predictive modeling basics in data cleaning process
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Transcultural that can help you someday.
Optimise Shopper Experiences with a Strong Data Estate.pdf
STERILIZATION AND DISINFECTION-1.ppthhhbx
Introduction to the R Programming Language
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
New ISO 27001_2022 standard and the changes
Pilar Kemerdekaan dan Identi Bangsa.pptx
How to run a consulting project- client discovery
modul_python (1).pptx for professional and student
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Qualitative Qantitative and Mixed Methods.pptx
Managing Community Partner Relationships
Leprosy and NLEP programme community medicine

A review of the state of the art in Machine Learning on the Semantic Web

  • 1. A review of the state of the art in Machine Learning on the Semantic Web Simon Price University of Bristol http://www.cs.bris.ac.uk/~price
  • 2. Outline • Introduction to the Semantic Web – Semantic Web layers – URI, RDF(S), OWL – Web Services and the Semantic Web • Applications of Machine Learning – Creating the Semantic Web – Using the Semantic Web • Summary and pointers to further info.
  • 3. Introduction to the Semantic Web
  • 4. Definition "The Semantic Web is the representation of data on the World Wide Web. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF), which integrates a variety of applications using XML for syntax and URIs for naming."
  • 5. Uniform Resource Identifier (URI) • URI addressing scheme http://... ftp://... mailto:..., etc. • Each URI points to a resource (or a specific point within a resource) • Typically, the resource is somewhere on the Web but it may be a non-network retrievable entity e.g. - human beings, corporations, bound books in a library, - concepts, topics, relations, ...
  • 6. URIs - Good news. Bad news. • Good news: decentralisation – anyone can create a URI – allows rapid growth of Web • Bad news: decentralisation – no centralised register or clearing house – multiple URIs can refer to same entity – testing for equality (or equivalence) poses interesting problems
  • 7. Resource Description Framework (RDF) • A language of URI triples. • An RDF statement has the form: { subject, predicate, object } • e.g. "http://www.example.org/index.html has a creator whose value is the literal John Smith" could be represented as a plain text triple: subject http://www.example.org/index.html predicate http://purl.org/dc/elements/1.1/creator object John Smith
  • 8. Representing RDF • Default syntax is XML (not human friendly) • SQL triple stores commonly used • RDF toolkits: Jena (HP) and Redland (Dave Beckett) • Prolog: SWI-Prolog (40M triples per 100MB RAM) e.g. rdf( 'http://www.example.org/index.html', 'http://purl.org/dc/elements/1.1/creator', 'John Smith' ).
  • 10. RDF Schema • A language for describing properties and classes of RDF resources • Includes semantics for generalisation-hierarchies of such properties and classes • Simple data typing model: – is-a relationships and properties – some range and domain restriction Notes: 1. RDF Schema recently renamed as "RDF Vocabulary Description Layer" 2. In the literature, RDF + RDF Schema is often referred to as RDF(S)
  • 11. Ontology Vocabulary Layer • Huge number of different ontologies: – simple: thesauri, taxonomies – complex: DAML+OIL, OWL • OWL supersedes the older DAML+OIL • OWL goes further than RDF Schema, adding: – relations between classes – cardinality – equality – richer typing – characteristics of properties – enumerated classes
  • 12. Web Ontology Language (OWL) • OWL Lite - hierarchical classification (ideal for thesauri and other taxonomies). • OWL DL - description logics (computationally complete but inference services are restricted to classification and subsumption). • OWL Full - full syntactic freedom of RDF (no computational guarantees).
  • 13. Web Services and the Semantic Web • Web Services – XML-based interfaces to programs accessible via the Web – Operating system neutral Remote Procedure Call (RPC) protocol • Today's Web Services – Business-orientated, simple, short transactional operations – Domain-specific XML vocabularies (not RDF) • Tomorrow's Web Services – Combination of simple services to achieve complex operations – Automated discovery, selection and pipelining of Web Services • Semantic Web + Machine Learning may have an important role to play in the future of Web Services
  • 15. • Attempts to apply Machine Learning are being made within each of the Semantic Web layers. • Research activity within each layer can be divided into two parts: The application of Machine Learning in: • creating the Semantic Web • using the Semantic Web • Most activity to-date is in creating the Semantic Web Activity
  • 16. Creating the Semantic Web • Why can't people do this themselves? – People are frequently unaware of metadata standards – People are (usually) unwilling to spend time creating metadata • May be no direct benefit (to them) • Boring – People are incapable of applying metadata consistently • Consistency varies from person to person • Consistency varies in the same person over time – There's already a huge backlog of unlabelled data on the existing web! – Also, someone else's metadata may not be what you want • e.g. Site content rating from supplier may be unreliable
  • 17. Automatic Generation of Metadata • Paper describes examples of ML research that use: – Inductive Logic Programming (on popular science articles) • F-Score close to human expert. Precision between 0.7 and 1.0 – Hidden Markov Models (on marked-up MUC and MEDLINE texts) • Reported as adequate but not able to scale due to fragmentation of probability distribution. Portable across domains. SVMs suggested. – Association Analysis (using Web Directory for labelling examples) • Work in progress but looks for terms in text that indicate directory path e.g. of a path .../Manufacturing/Materials/Metals/Steel/..
  • 18. Application of ML to Ontologies • Ontology Vocabulary Layer is currently a popular area of Semantic Web research • Most ontologies hand-crafted • Creating ontologies is far more complex than RDF metadata extraction – ILP has been used to revise and maintain, but not create – Association rule learning has been used to partly automate – Regular expression (FSA) rewriting guided by Minimum Description Length to create Document Type Descriptors (DTD) for XML docs. • Ontology mapping – Hard problem – Some work using Naive Bayes
  • 19. Using the Semantic Web • Not much ML research in this area (yet) • Datasets exists – RSS newsfeeds and Weblogs/Blogs – DAML repository – Dave Beckett's RDF Resource Guide • Locating suitable data can be a problem • Semantic Web Mining has been conjectured – combines Semantic Web with Web Mining – Relational Data Mining (RDM) suggested to exploit structure in data
  • 20. Summary • Semantic Web is rapidly evolving • Key languages: – RDF – vocabularies built on top of RDF • Publicly available RDF datasets exist – in applications like RSS – and repositories like DAML • RDF maps well to Prolog (and SQL) • Machine Learning looks promising for both the creation and use of the Semantic Web
  • 21. Simon Price University of Bristol http://www.cs.bris.ac.uk/~price