SlideShare a Scribd company logo
Motivation
Data on the Web
Some eyecatching opener illustrating growth and or diversity of web data
Curation and profiling of Linked Data
KnowEscape workshop, Open Knowledge Conference 2013 (OKCon2013)
Stefan Dietze1, Besnik Fetahu1, Mathieu d’Aquin2
1 L3S Research Center (Germany); 2 The Open University (UK)
http://linkedup-project.eu
http://purl.org/dietze
@stefandietze
19/09/2013 1Stefan Dietze
17/09/2013 2Stefan Dietze
Success models:
data & applications
 LinkedUp Challenge
to identify innovative
tools & applications
 Evaluation methods
and approaches
http://www.linkedup-challenge.org/
“LinkedUp” – Linking Web Data for Education
L
Data curation
Technology transfer
& community-building
 Collecting & exposing open
data of educational relevance
=> LinkedUp Data Catalog
 Profiling and linking of Web
Data for education
=> educational data graph
 Disseminating knowledge &
building communities
(educators, computer
scientists, data engineers)
 Gathering stakeholder
feedback: use cases, and
requirements
http://linkedup-challenge.org/#usecases
http://data.linkededucation.org
http://linkedup-project.eu/events
European project aimed at
advancing take-up of open data
and related technologies
http://linkedup-project.eu
Problem: too many datasets, too few information
Stefan Dietze 19/09/13
http://datahub.io/dataset/bbc
60.000.000 triples
Using/exploiting Linked Data in Education ?
 Lack of reliable dataset metadata about
 Resource types
 Topics & disciplines
 Quality, currentness & availability
 Provenance
 Lack of links and cross-dataset references
 Lack of scalable query methods
 LOD: 300+ datasets, 32++ billion
distinct RDF statements
 DataHub: 6000+ open datasets
 Goal: dataset metadata & search for data consumers
 “LinkedUp/Linked Education cloud” as “expanded” subset of LOD cloud at
The DataHub (http://datahub.io/groups/linked-education)
 RDF (VoID) catalog of datasets = dataset of datasets (Linked Education
Catalog): classification of datasets according to, eg, represented types,
disciplines/topics, data quality, accessability
 Links and coreferences => unified view on data => Linked Education Graph
 Infrastructure, unified (SPARQL) endpoint & APIs for distributed/federated
querying
Data curation and dataset profiling
LinkedUp approach
Educational Datasets
LinkedUp
Catalog
LinkedUp
Links
Automated processing to generate:
 Descriptive VoID/RDF Dataset Catalog
 Data links
19/09/2013 4Stefan Dietze
Assessing the Educational Linked Data
Landscape, D’Aquin, M., Adamou, A.,
Dietze, S., ACM Web Science 2013
(WebSci2013), Paris, France, May 2013.
[WEBSCI‘13]
19/09/2013 5Stefan Dietze
Linked Data „Observatory“ for linking and profiling
Endpoint Retrieval
& Graph
Extraction
Schema
Extraction and
Mapping
Sample Graph
Extraction
(per dataset)
NER & NED
(per resource)
Interlinking & Co-
Resolution
(cross-dataset)
Category Mapping,
Normalisation,
Filtering
Dataset
Catalog/Index
Links/
Cross-references
rdfs:label:„…ECB….“
?
Dataset metadata (RDF/VoID):
 Schema mappings
(types, properties)
 Entities & categories
 Topic relevance scores
 Availability, currentness
data (tbc)
dbpedia:Finance
dbpedia:Sports
dbpedia:England-Wales-Cricket-Board
dbpedia:European_Central_Bank
Combining a co-occurrence-based and a
semantic measure for entity linking, B. P.
Nunes, S. Dietze, M.A. Casanova, R.
Kawase, B. Fetahu, and W. Nejdl. , ESWC
2013 - 10th Extended Semantic Web
Conference, (May 2013).
Generating structured Profiles of Linked
Data Graphs, Fetahu, B; Adamou, A.,
Dietze, S., d’Aquin, M., Nunes, B.P.,
ISWC2013 – 12th International Semantic
Web Conference; under review.
[ESCW‘13]
[ISWC‘13]
Schema assessment and mapping
Co-occurence of
data types
(in 146 datasets:
144 Vocabularies,
588 highly
overlapping types,
719 Properties)
Assessing the Educational Linked Data Landscape,
D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science
2013 (WebSci2013), Paris, France, May 2013.
<po:Programme …>
<po:title>Secret Universe –
The Life of the Cell</po:title>
…
</po:Programme…>
BBC Programme
<sioc:Item …>
<label>Viral diseases &
bacteria</title>
…
</sioc:Item ….>
SlideShare Set
po:Programme
sioc:Item
?
http://datahub.io/group/linked-education
19/09/2013 6Stefan Dietze
Schema assessment and mapping
Co-occurence of
data types
(in 146 datasets:
144 Vocabularies,
588 highly
overlapping types,
719 Properties)
Co-occurence graph
after mapping
(201 frequent types
mapped into 79 classes)
Assessing the Educational Linked Data Landscape,
D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science
2013 (WebSci2013), Paris, France, May 2013.
bibo:Slideshow
bibo:Film
bibo:Document
19/09/2013 7Stefan Dietze
<po:Programme …>
<po:title>Secret Universe –
The Life of the Cell</po:title>
…
</po:Programme…>
BBC Programme
<sioc:Item …>
<label>Viral diseases &
bacteria</title>
…
</sioc:Item ….>
SlideShare Set
po:Programme
sioc:Item
LinkedUp Data Catalog
in a nutshell http://datahub.io/group/linked-education
http://data.linkededucation.org/linkedup/catalog/
 VoID dataset catalog: browse, explore and query for
datasets/types
 Federated queries using type mappings
19/09/2013 8Stefan Dietze
<yo:Video 8748720>
<dc:title>Pluto & the
Dwarf Planets</dc:title>
…
</yo:Video 8748720>
Video
<sioc:Item 2139393292>
<title>Planetary motion
& gravity</title>
…
</sioc:Item 2139393292>
Slideset
Topics/categories addressed?
Relatedness of resources/entities?
(types, semantics)
<po:Programme519215>
<po:Series>Wonders of the Solar
System</po:Series>
<po:Episode>Emp. of the Sun</po:Episode>
<po:Actor>Brian Cox</po:Actor>
</po:Programme519215 >
Programme
Combining a co-occurrence-based and a semantic measure
for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R.
Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended
Semantic Web Conference, (May 2013).
Generating structured Profiles of Linked Data Graphs,
Fetahu, B; Adamou, A., Dietze, S., d’Aquin, M., Nunes, B.P.,
ISWC2013 – 12th International Semantic Web Conference; under
review.
Dataset topic profiling: data heterogeneity?
19/09/2013 9Stefan Dietze
<yo:Video 8748720>
<dc:title>Pluto & the
Dwarf Planets</dc:title>
…
</yo:Video 8748720>
Video
<po:Programme519215>
<po:Series>Wonders of the Solar
System</po:Series>
<po:Episode>Emp. of the Sun</po:Episode>
<po:Actor>Brian Cox</po:Actor>
</po:Programme519215 >
Programme
Data disambiguation, linking & profiling
Brian Cox?
Sun?
Pluto?
19/09/2013 10Stefan Dietze
db:Pluto
(Dwarf Planet)
db:Astrono-
mical Objects
db:Sun
Data disambiguation, linking & profiling
db:Astronomy
19/09/2013 11Stefan Dietze
<yo:Video 8748720>
<dc:title>Pluto & the
Dwarf Planets</dc:title>
…
</yo:Video 8748720>
Video
<po:Programme519215>
<po:Series>Wonders of the Solar
System</po:Series>
<po:Episode>Emp. of the Sun</po:Episode>
<po:Actor>Brian Cox</po:Actor>
</po:Programme519215 >
Programme
<sioc:Item 2139393292>
<title>Planetary motion
& gravity</title>
…
</sioc:Item 2139393292>
Slideset
db:Pluto
(Dwarf Planet)
db:Astrono-
mical Objects
<yov:Lecture8748720>
<title>Pluto & the Dwarf
Planets</title>
…
< yov:Lecture8748720>
Online Lecture
db:Astronomy
 Computation of connectivity scores
between resources/entities
 Method: combination of a
 (i) semantic (graph-based) connectivity
score (SCS) with
 (ii) a Web co-occurence-based measure
(CBM) (similar to NGD)
 For (i): adaptation of Katz-Index from SNA
for (linked) data graphs (considering path
number and path lengths of transversal
properties)
Data linking
Dataset categorisation: computation of
normalised (DBpedia) category relevance
scores for datasets
db:Sun
SCS = 0.32
CBM = 0.24
http://purl.org/vol/doc/
http://purl.org/vol/ns/
19/09/2013 12Stefan Dietze
Combining a co-occurrence-based and a semantic
measure for entity linking, B. P. Nunes, S. Dietze, M.A.
Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013
- 10th Extended Semantic Web Conference, (May 2013).
Data disambiguation, linking & profiling
<sioc:Item 2139393292>
<title>Planetary motion
& gravity</title>
…
</sioc:Item 2139393292>
Slideset
<po:Programme519215>
<po:Series>Wonders of the Solar
System</po:Series>
<po:Episode>Emp. of the Sun</po:Episode>
<po:Actor>Brian Cox</po:Actor>
</po:Programme519215 >
Programme
<po:Programme519215>
<po:Series>Wonders of the Solar
System</po:Series>
<po:Episode>Emp. of the Sun</po:Episode>
<po:Actor>Brian Cox</po:Actor>
</po:Programme519215 >
Programme
db:Astrono-
mical Objects
db:Astronomy
db:Sun
Dataset profiling
 Goal: extracting representative metadata („topic profile“) for each dataset
 Approach: computation of normalised (DBpedia) category relevance scores
 Using representative sample resource sets per reource type & dataset
Generating structured Profiles of Linked Data
Graphs, Fetahu, B; Adamou, A., Dietze, S., d’Aquin,
M., Nunes, B.P., ISWC2013 – 12th International
Semantic Web Conference; under review.
DBpedia category graph
Endpoint Retrieval
& Graph
Extraction
Schema
Extraction and
Mapping
Sample Graph
Extraction
(per dataset/type)
NER & NED
(per resource)
Interlinking & Co-
Resolution
(cross-dataset)
Dataset
Catalog/Index
Links/
Cross-references
rdfs:label:„…ECB….“
?
Dataset metadata (RDF/VoID):
 Schema mappings
(types, properties)
 Entities & categories
 Topic relevance scores
 Availability, currentness
data (tbc)
dbpedia:Finance
dbpedia:Sports
dbpedia:England-Wales-Cricket-Board
dbpedia:European_Central_Bank
19/09/2013 14Stefan Dietze
Dataset profiling: topic extraction process (1/2)
Category Mapping,
Normalisation,
Filtering
Step 1 – NER:
 Online NER & NED vs. incremental similarity-based „NER“:
 Online NER: DBpedia Spotlight
 Incremental & similarity-based NER: compare [via Jaccard
Index] textual desc of already extracted entities with
literal values of a resource instance
(assumption: recurring entities likely within datasets)
Endpoint Retrieval
& Graph
Extraction
Schema
Extraction and
Mapping
Sample Graph
Extraction
(per dataset/type)
NER & NED
(per resource)
Interlinking & Co-
Resolution
(cross-dataset)
Dataset
Catalog/Index
Links/
Cross-references
rdfs:label:„…ECB….“
?
Dataset metadata (RDF/VoID):
 Schema mappings
(types, properties)
 Entities & categories
 Topic relevance scores
 Availability, currentness
data (tbc)
dbpedia:Finance
dbpedia:Sports
dbpedia:England-Wales-Cricket-Board
dbpedia:European_Central_Bank
19/09/2013 15Stefan Dietze
Dataset profiling: topic extraction process (1/2)
Category Mapping,
Normalisation,
Filtering
Step 1 – NER:
 Online NER & NED vs. incremental similarity-based „NER“:
 Online NER: DBpedia Spotlight
 Incremental & similarity-based NER: compare [via Jaccard
Index] textual desc of already extracted entities with
literal values of a resource instance
(assumption: recurring entities likely within datasets)
Step 2 – Computation of profile (ranked categories)
 Entities => DBpedia categories = “Topics”: extraction of topics
from DBpedia entities via dcterms:subject
 Expand the set of topics by leveraging hierarchical category
organization (skos:broader)
 Normalised topic score:
topics datasets
# entities
for dataset D
# entities
for all datasets
# of entities for t
in dataset D
# of entities for t
for all datasets
http://data.linkededucation.org/linkedup/categories-explorer
http://data.linkededucation.org/
Dataset profile explorer
http://data.linkededucation.org/request/pipeline/sparql
LinkedUp Data Catalog – hands-on
in a nutshell
http://data.linkededucation.org
http://data.linkededucation.org/linkedup/catalog/sparql
http://data.linkededucation.org/request/pipeline/sparql
Querying FOR datasets
• Retrieving datasets for categories
SELECT ?datasetname ?link ?score WHERE
{ ?linkset a void:Linkset.
?linkset vol:hasLink ?link.
?link vol:linksResource <http://dbpedia.org/resource/Category:Technology>.
?link vol:hasScore ?score.
?dataset a void:Dataset.
?linkset void:target ?dataset.
?dataset dcterms:title ?datasetname.
FILTER (?score > 0.5) }
• Retrieve datasets describing schools:
select distinct ?endpoint ?cl where
{ ?ds void:sparqlEndpoint ?endpoint. {{?ds void:classPartition [ void:class ?cl]} UNION {?ds void:subset [ void:classPartition [
void:class ?cl] ]}}
{{?cl owl:equivalentClass aiiso:School} UNION {?cl rdfs:subClassOf aiiso:School} UNION {FILTER ( str(?cl) = str(aiiso:School) ) }} }
Querying THE datasets
• Federated queries using mappings beetwen aaiso:school and other „school“ types
prefix void: <http://rdfs.org/ns/void#> prefix aiiso: <http://purl.org/vocab/aiiso/schema#> prefix owl:
<http://www.w3.org/2002/07/owl#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select distinct ?endpoint ?school ?cl where { … as above …. }
service silent ?endpoint { ?school a ?cl } }
19/09/2013 17Stefan Dietze
type mappings!
topic profiles/scores!
query federation!
Outlookin a nutshell
 Merging the two VoID datasets
 Datasets and type mappings (LinkedUp Catalog)
 Category annotations (data.linkededucation.org)
 Extracting statistical observations (RDF Data Cube)
 Feeding data back into the DataHub
 Application to entire LOD cloud group on DataHub
 Consideration of additional profiling features
 Quality aspects
 Dataset and link dynamics
 Temporal and spatial coverage (=> http://www.duraark.eu)
fake example
19/09/2013 18Stefan Dietze
LinkedUp Vidi Competition
19/09/13 19
Tools and demos that analyse or integrate open web data for educational purposes
• Wanted: applications tools that address real educational needs
• Anyone can participate - researchers, students, developers, industry
• Challenging focused tracks with clear goals
• More data, more challenging, more support, more prizes
More info: http://linkedup-challenge.org/
Launch at 4 November 2013
Submission deadline is 14 February 2014
20,000 Euro prize money
Thank you!
Contact
 http://purl.org/dietze | @stefandietze
See also (data)
 http://datahub.io/group/linked-education
 http://data.linkededucation.org
 http://data.linkededucation.org/linkedup/catalog/
 http://lak.linkededucation.org
See also (general)
 http://linkedup-project.eu
 http://linkedup-challenge.org
 http://linkededucation.org
 http://linkeduniversities.org
19/09/2013 20Stefan Dietze

More Related Content

PDF
What's all the data about? - Linking and Profiling of Linked Datasets
PDF
Turning Data into Knowledge (KESW2014 Keynote)
PDF
A structured catalog of open educational datasets
PDF
Demo: Profiling & Exploration of Linked Open Data
PPT
Combining a co-occurrence-based and a semantic measure for entity linking
PDF
WWW2013 Tutorial: Linked Data & Education
PPT
Learning Analytics & Linked Data – Opportunities, Challenges, Examples
PDF
LAK Dataset and Challenge (April 2013)
What's all the data about? - Linking and Profiling of Linked Datasets
Turning Data into Knowledge (KESW2014 Keynote)
A structured catalog of open educational datasets
Demo: Profiling & Exploration of Linked Open Data
Combining a co-occurrence-based and a semantic measure for entity linking
WWW2013 Tutorial: Linked Data & Education
Learning Analytics & Linked Data – Opportunities, Challenges, Examples
LAK Dataset and Challenge (April 2013)

What's hot (20)

PDF
Linked Data for Federation of OER Data &amp; Repositories
PDF
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
PDF
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
PDF
A distributed network of digital heritage information - Unesco/NDL India
PDF
Geospatial Metadata and Spatial Data: It's all Greek to me!
PPT
Geospatial Metadata Workshop
PPT
Glasgow University Geo Metadata Workshop
PDF
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
PPTX
Linked data and semantic wikis
PDF
Why should semantic technologies pay more attention to privacy... and vice-ve...
PDF
Big Data, Beyond the Data Center
PPT
Leeds University Geospatial Metadata Workshop 20110617
PDF
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
PPTX
Data Management Planning at the DCC: a human factor
PDF
Infrastructure, Standards, and Policies for Research Data Management
PPTX
Linked data life cycles
PDF
Linked Data
PPT
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
PPTX
Research data management & planning: an introduction
PDF
B2SHARE: Record lifecycle and HTTP API| www.eudat.eu |
Linked Data for Federation of OER Data &amp; Repositories
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
A distributed network of digital heritage information - Unesco/NDL India
Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata Workshop
Glasgow University Geo Metadata Workshop
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Linked data and semantic wikis
Why should semantic technologies pay more attention to privacy... and vice-ve...
Big Data, Beyond the Data Center
Leeds University Geospatial Metadata Workshop 20110617
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
Data Management Planning at the DCC: a human factor
Infrastructure, Standards, and Policies for Research Data Management
Linked data life cycles
Linked Data
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Research data management & planning: an introduction
B2SHARE: Record lifecycle and HTTP API| www.eudat.eu |
Ad

Viewers also liked (13)

PDF
A Domain-driven Approach to Digital Curation and Preservation of 3D Architect...
PPT
DURAARK at IGeLU 2014
PPTX
Presentation nokobit
PDF
Towards preservation of semantically enriched architectural knowledge
PDF
Grapp2014 presentation
PDF
Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.
PPT
DURAARK at Bibliotheksymposium Wildau
PPTX
DURAARK presentation CIB W78 "Applications of IT in AEC" conference Beijing 2...
PPT
DURAARK at AUdS 2015
PDF
Quality criteria for architectural 3D data in usage and preservation processes
PDF
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
PDF
DURAARK presentation at DEDICATE final seminar, October 21st 2013, Michelle L...
PPT
Preservation of 3 d objects of buildings
A Domain-driven Approach to Digital Curation and Preservation of 3D Architect...
DURAARK at IGeLU 2014
Presentation nokobit
Towards preservation of semantically enriched architectural knowledge
Grapp2014 presentation
Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.
DURAARK at Bibliotheksymposium Wildau
DURAARK presentation CIB W78 "Applications of IT in AEC" conference Beijing 2...
DURAARK at AUdS 2015
Quality criteria for architectural 3D data in usage and preservation processes
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
DURAARK presentation at DEDICATE final seminar, October 21st 2013, Michelle L...
Preservation of 3 d objects of buildings
Ad

Similar to KnowEscape workshop, OKCon 2013 (20)

PDF
From Data to Knowledge - Profiling & Interlinking Web Datasets
PDF
LinkedUp - Linked Data Europe Workshop 2014
PPT
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
PDF
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
PDF
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
PDF
Open Data & Education Seminar, ITMO, St Petersburg, March 2014
PDF
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
PDF
Linked Data for Architecture, Engineering and Construction (AEC)
PDF
Towards research data knowledge graphs
PPTX
Putting Data to Work: Moving science forward together beyond where we thought...
PPTX
Modeling Data Life Cycles with PROV
PPTX
AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...
PDF
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
PPT
Seminario Sobre Datasets Consorcio Madrono
PDF
Semantic Linking & Retrieval for Digital Libraries
PDF
Camp 4-data workshop presentation
PDF
KESW2012 Hackathon St Petersburg
PDF
Mining and Understanding Activities and Resources on the Web
PPTX
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
PDF
Open Data Dialog 2013 - Linked Data in Education
From Data to Knowledge - Profiling & Interlinking Web Datasets
LinkedUp - Linked Data Europe Workshop 2014
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
Open Data & Education Seminar, ITMO, St Petersburg, March 2014
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
Linked Data for Architecture, Engineering and Construction (AEC)
Towards research data knowledge graphs
Putting Data to Work: Moving science forward together beyond where we thought...
Modeling Data Life Cycles with PROV
AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Seminario Sobre Datasets Consorcio Madrono
Semantic Linking & Retrieval for Digital Libraries
Camp 4-data workshop presentation
KESW2012 Hackathon St Petersburg
Mining and Understanding Activities and Resources on the Web
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Open Data Dialog 2013 - Linked Data in Education

More from Stefan Dietze (17)

PDF
Understanding Scientific and Societal Adoption and Impact of Science Through ...
PDF
NEWORDER Project - Science in the online knowledge order
PDF
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
PDF
AI in between online and offline discourse - and what has ChatGPT to do with ...
PDF
An interdisciplinary journey with the SAL spaceship – results and challenges ...
PDF
Research Knowledge Graphs at NFDI4DS & GESIS
PDF
Research Knowledge Graphs at GESIS & NFDI4DataScience
PDF
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
PDF
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
PDF
Beyond research data infrastructures: exploiting artificial & crowd intellige...
PDF
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
PDF
Using AI to understand everyday learning on the Web
PDF
Analysing User Knowledge, Competence and Learning during Online Activities
PDF
Analysing & Improving Learning Resources Markup on the Web
PDF
Big Data in Learning Analytics - Analytics for Everyday Learning
PDF
Towards embedded Markup of Learning Resources on the Web
PDF
Dietze linked data-vr-es
Understanding Scientific and Societal Adoption and Impact of Science Through ...
NEWORDER Project - Science in the online knowledge order
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
AI in between online and offline discourse - and what has ChatGPT to do with ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...
Research Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at GESIS & NFDI4DataScience
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Beyond research data infrastructures: exploiting artificial & crowd intellige...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
Using AI to understand everyday learning on the Web
Analysing User Knowledge, Competence and Learning during Online Activities
Analysing & Improving Learning Resources Markup on the Web
Big Data in Learning Analytics - Analytics for Everyday Learning
Towards embedded Markup of Learning Resources on the Web
Dietze linked data-vr-es

Recently uploaded (20)

PDF
HVAC Specification 2024 according to central public works department
PDF
Uderstanding digital marketing and marketing stratergie for engaging the digi...
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
advance database management system book.pdf
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
My India Quiz Book_20210205121199924.pdf
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PPTX
History, Philosophy and sociology of education (1).pptx
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
Hazard Identification & Risk Assessment .pdf
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PDF
Trump Administration's workforce development strategy
HVAC Specification 2024 according to central public works department
Uderstanding digital marketing and marketing stratergie for engaging the digi...
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
advance database management system book.pdf
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
A powerpoint presentation on the Revised K-10 Science Shaping Paper
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
My India Quiz Book_20210205121199924.pdf
Share_Module_2_Power_conflict_and_negotiation.pptx
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
History, Philosophy and sociology of education (1).pptx
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Hazard Identification & Risk Assessment .pdf
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
Trump Administration's workforce development strategy

KnowEscape workshop, OKCon 2013

  • 1. Motivation Data on the Web Some eyecatching opener illustrating growth and or diversity of web data Curation and profiling of Linked Data KnowEscape workshop, Open Knowledge Conference 2013 (OKCon2013) Stefan Dietze1, Besnik Fetahu1, Mathieu d’Aquin2 1 L3S Research Center (Germany); 2 The Open University (UK) http://linkedup-project.eu http://purl.org/dietze @stefandietze 19/09/2013 1Stefan Dietze
  • 2. 17/09/2013 2Stefan Dietze Success models: data & applications  LinkedUp Challenge to identify innovative tools & applications  Evaluation methods and approaches http://www.linkedup-challenge.org/ “LinkedUp” – Linking Web Data for Education L Data curation Technology transfer & community-building  Collecting & exposing open data of educational relevance => LinkedUp Data Catalog  Profiling and linking of Web Data for education => educational data graph  Disseminating knowledge & building communities (educators, computer scientists, data engineers)  Gathering stakeholder feedback: use cases, and requirements http://linkedup-challenge.org/#usecases http://data.linkededucation.org http://linkedup-project.eu/events European project aimed at advancing take-up of open data and related technologies http://linkedup-project.eu
  • 3. Problem: too many datasets, too few information Stefan Dietze 19/09/13 http://datahub.io/dataset/bbc 60.000.000 triples Using/exploiting Linked Data in Education ?  Lack of reliable dataset metadata about  Resource types  Topics & disciplines  Quality, currentness & availability  Provenance  Lack of links and cross-dataset references  Lack of scalable query methods  LOD: 300+ datasets, 32++ billion distinct RDF statements  DataHub: 6000+ open datasets
  • 4.  Goal: dataset metadata & search for data consumers  “LinkedUp/Linked Education cloud” as “expanded” subset of LOD cloud at The DataHub (http://datahub.io/groups/linked-education)  RDF (VoID) catalog of datasets = dataset of datasets (Linked Education Catalog): classification of datasets according to, eg, represented types, disciplines/topics, data quality, accessability  Links and coreferences => unified view on data => Linked Education Graph  Infrastructure, unified (SPARQL) endpoint & APIs for distributed/federated querying Data curation and dataset profiling LinkedUp approach Educational Datasets LinkedUp Catalog LinkedUp Links Automated processing to generate:  Descriptive VoID/RDF Dataset Catalog  Data links 19/09/2013 4Stefan Dietze
  • 5. Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013. [WEBSCI‘13] 19/09/2013 5Stefan Dietze Linked Data „Observatory“ for linking and profiling Endpoint Retrieval & Graph Extraction Schema Extraction and Mapping Sample Graph Extraction (per dataset) NER & NED (per resource) Interlinking & Co- Resolution (cross-dataset) Category Mapping, Normalisation, Filtering Dataset Catalog/Index Links/ Cross-references rdfs:label:„…ECB….“ ? Dataset metadata (RDF/VoID):  Schema mappings (types, properties)  Entities & categories  Topic relevance scores  Availability, currentness data (tbc) dbpedia:Finance dbpedia:Sports dbpedia:England-Wales-Cricket-Board dbpedia:European_Central_Bank Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl. , ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013). Generating structured Profiles of Linked Data Graphs, Fetahu, B; Adamou, A., Dietze, S., d’Aquin, M., Nunes, B.P., ISWC2013 – 12th International Semantic Web Conference; under review. [ESCW‘13] [ISWC‘13]
  • 6. Schema assessment and mapping Co-occurence of data types (in 146 datasets: 144 Vocabularies, 588 highly overlapping types, 719 Properties) Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013. <po:Programme …> <po:title>Secret Universe – The Life of the Cell</po:title> … </po:Programme…> BBC Programme <sioc:Item …> <label>Viral diseases & bacteria</title> … </sioc:Item ….> SlideShare Set po:Programme sioc:Item ? http://datahub.io/group/linked-education 19/09/2013 6Stefan Dietze
  • 7. Schema assessment and mapping Co-occurence of data types (in 146 datasets: 144 Vocabularies, 588 highly overlapping types, 719 Properties) Co-occurence graph after mapping (201 frequent types mapped into 79 classes) Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013. bibo:Slideshow bibo:Film bibo:Document 19/09/2013 7Stefan Dietze <po:Programme …> <po:title>Secret Universe – The Life of the Cell</po:title> … </po:Programme…> BBC Programme <sioc:Item …> <label>Viral diseases & bacteria</title> … </sioc:Item ….> SlideShare Set po:Programme sioc:Item
  • 8. LinkedUp Data Catalog in a nutshell http://datahub.io/group/linked-education http://data.linkededucation.org/linkedup/catalog/  VoID dataset catalog: browse, explore and query for datasets/types  Federated queries using type mappings 19/09/2013 8Stefan Dietze
  • 9. <yo:Video 8748720> <dc:title>Pluto & the Dwarf Planets</dc:title> … </yo:Video 8748720> Video <sioc:Item 2139393292> <title>Planetary motion & gravity</title> … </sioc:Item 2139393292> Slideset Topics/categories addressed? Relatedness of resources/entities? (types, semantics) <po:Programme519215> <po:Series>Wonders of the Solar System</po:Series> <po:Episode>Emp. of the Sun</po:Episode> <po:Actor>Brian Cox</po:Actor> </po:Programme519215 > Programme Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013). Generating structured Profiles of Linked Data Graphs, Fetahu, B; Adamou, A., Dietze, S., d’Aquin, M., Nunes, B.P., ISWC2013 – 12th International Semantic Web Conference; under review. Dataset topic profiling: data heterogeneity? 19/09/2013 9Stefan Dietze
  • 10. <yo:Video 8748720> <dc:title>Pluto & the Dwarf Planets</dc:title> … </yo:Video 8748720> Video <po:Programme519215> <po:Series>Wonders of the Solar System</po:Series> <po:Episode>Emp. of the Sun</po:Episode> <po:Actor>Brian Cox</po:Actor> </po:Programme519215 > Programme Data disambiguation, linking & profiling Brian Cox? Sun? Pluto? 19/09/2013 10Stefan Dietze
  • 11. db:Pluto (Dwarf Planet) db:Astrono- mical Objects db:Sun Data disambiguation, linking & profiling db:Astronomy 19/09/2013 11Stefan Dietze <yo:Video 8748720> <dc:title>Pluto & the Dwarf Planets</dc:title> … </yo:Video 8748720> Video <po:Programme519215> <po:Series>Wonders of the Solar System</po:Series> <po:Episode>Emp. of the Sun</po:Episode> <po:Actor>Brian Cox</po:Actor> </po:Programme519215 > Programme <sioc:Item 2139393292> <title>Planetary motion & gravity</title> … </sioc:Item 2139393292> Slideset
  • 12. db:Pluto (Dwarf Planet) db:Astrono- mical Objects <yov:Lecture8748720> <title>Pluto & the Dwarf Planets</title> … < yov:Lecture8748720> Online Lecture db:Astronomy  Computation of connectivity scores between resources/entities  Method: combination of a  (i) semantic (graph-based) connectivity score (SCS) with  (ii) a Web co-occurence-based measure (CBM) (similar to NGD)  For (i): adaptation of Katz-Index from SNA for (linked) data graphs (considering path number and path lengths of transversal properties) Data linking Dataset categorisation: computation of normalised (DBpedia) category relevance scores for datasets db:Sun SCS = 0.32 CBM = 0.24 http://purl.org/vol/doc/ http://purl.org/vol/ns/ 19/09/2013 12Stefan Dietze Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013). Data disambiguation, linking & profiling <sioc:Item 2139393292> <title>Planetary motion & gravity</title> … </sioc:Item 2139393292> Slideset <po:Programme519215> <po:Series>Wonders of the Solar System</po:Series> <po:Episode>Emp. of the Sun</po:Episode> <po:Actor>Brian Cox</po:Actor> </po:Programme519215 > Programme
  • 13. <po:Programme519215> <po:Series>Wonders of the Solar System</po:Series> <po:Episode>Emp. of the Sun</po:Episode> <po:Actor>Brian Cox</po:Actor> </po:Programme519215 > Programme db:Astrono- mical Objects db:Astronomy db:Sun Dataset profiling  Goal: extracting representative metadata („topic profile“) for each dataset  Approach: computation of normalised (DBpedia) category relevance scores  Using representative sample resource sets per reource type & dataset Generating structured Profiles of Linked Data Graphs, Fetahu, B; Adamou, A., Dietze, S., d’Aquin, M., Nunes, B.P., ISWC2013 – 12th International Semantic Web Conference; under review. DBpedia category graph
  • 14. Endpoint Retrieval & Graph Extraction Schema Extraction and Mapping Sample Graph Extraction (per dataset/type) NER & NED (per resource) Interlinking & Co- Resolution (cross-dataset) Dataset Catalog/Index Links/ Cross-references rdfs:label:„…ECB….“ ? Dataset metadata (RDF/VoID):  Schema mappings (types, properties)  Entities & categories  Topic relevance scores  Availability, currentness data (tbc) dbpedia:Finance dbpedia:Sports dbpedia:England-Wales-Cricket-Board dbpedia:European_Central_Bank 19/09/2013 14Stefan Dietze Dataset profiling: topic extraction process (1/2) Category Mapping, Normalisation, Filtering Step 1 – NER:  Online NER & NED vs. incremental similarity-based „NER“:  Online NER: DBpedia Spotlight  Incremental & similarity-based NER: compare [via Jaccard Index] textual desc of already extracted entities with literal values of a resource instance (assumption: recurring entities likely within datasets)
  • 15. Endpoint Retrieval & Graph Extraction Schema Extraction and Mapping Sample Graph Extraction (per dataset/type) NER & NED (per resource) Interlinking & Co- Resolution (cross-dataset) Dataset Catalog/Index Links/ Cross-references rdfs:label:„…ECB….“ ? Dataset metadata (RDF/VoID):  Schema mappings (types, properties)  Entities & categories  Topic relevance scores  Availability, currentness data (tbc) dbpedia:Finance dbpedia:Sports dbpedia:England-Wales-Cricket-Board dbpedia:European_Central_Bank 19/09/2013 15Stefan Dietze Dataset profiling: topic extraction process (1/2) Category Mapping, Normalisation, Filtering Step 1 – NER:  Online NER & NED vs. incremental similarity-based „NER“:  Online NER: DBpedia Spotlight  Incremental & similarity-based NER: compare [via Jaccard Index] textual desc of already extracted entities with literal values of a resource instance (assumption: recurring entities likely within datasets) Step 2 – Computation of profile (ranked categories)  Entities => DBpedia categories = “Topics”: extraction of topics from DBpedia entities via dcterms:subject  Expand the set of topics by leveraging hierarchical category organization (skos:broader)  Normalised topic score: topics datasets # entities for dataset D # entities for all datasets # of entities for t in dataset D # of entities for t for all datasets
  • 17. LinkedUp Data Catalog – hands-on in a nutshell http://data.linkededucation.org http://data.linkededucation.org/linkedup/catalog/sparql http://data.linkededucation.org/request/pipeline/sparql Querying FOR datasets • Retrieving datasets for categories SELECT ?datasetname ?link ?score WHERE { ?linkset a void:Linkset. ?linkset vol:hasLink ?link. ?link vol:linksResource <http://dbpedia.org/resource/Category:Technology>. ?link vol:hasScore ?score. ?dataset a void:Dataset. ?linkset void:target ?dataset. ?dataset dcterms:title ?datasetname. FILTER (?score > 0.5) } • Retrieve datasets describing schools: select distinct ?endpoint ?cl where { ?ds void:sparqlEndpoint ?endpoint. {{?ds void:classPartition [ void:class ?cl]} UNION {?ds void:subset [ void:classPartition [ void:class ?cl] ]}} {{?cl owl:equivalentClass aiiso:School} UNION {?cl rdfs:subClassOf aiiso:School} UNION {FILTER ( str(?cl) = str(aiiso:School) ) }} } Querying THE datasets • Federated queries using mappings beetwen aaiso:school and other „school“ types prefix void: <http://rdfs.org/ns/void#> prefix aiiso: <http://purl.org/vocab/aiiso/schema#> prefix owl: <http://www.w3.org/2002/07/owl#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> select distinct ?endpoint ?school ?cl where { … as above …. } service silent ?endpoint { ?school a ?cl } } 19/09/2013 17Stefan Dietze type mappings! topic profiles/scores! query federation!
  • 18. Outlookin a nutshell  Merging the two VoID datasets  Datasets and type mappings (LinkedUp Catalog)  Category annotations (data.linkededucation.org)  Extracting statistical observations (RDF Data Cube)  Feeding data back into the DataHub  Application to entire LOD cloud group on DataHub  Consideration of additional profiling features  Quality aspects  Dataset and link dynamics  Temporal and spatial coverage (=> http://www.duraark.eu) fake example 19/09/2013 18Stefan Dietze
  • 19. LinkedUp Vidi Competition 19/09/13 19 Tools and demos that analyse or integrate open web data for educational purposes • Wanted: applications tools that address real educational needs • Anyone can participate - researchers, students, developers, industry • Challenging focused tracks with clear goals • More data, more challenging, more support, more prizes More info: http://linkedup-challenge.org/ Launch at 4 November 2013 Submission deadline is 14 February 2014 20,000 Euro prize money
  • 20. Thank you! Contact  http://purl.org/dietze | @stefandietze See also (data)  http://datahub.io/group/linked-education  http://data.linkededucation.org  http://data.linkededucation.org/linkedup/catalog/  http://lak.linkededucation.org See also (general)  http://linkedup-project.eu  http://linkedup-challenge.org  http://linkededucation.org  http://linkeduniversities.org 19/09/2013 20Stefan Dietze