BUILDING BETTER
KNOWLEDGE
GRAPHS
THROUGH SOCIAL
COMPUTING
Elena Simperl
University of Southampton, UK
@esimperl
Erasmus University Rotterdam
November 16th 2018
OVERVIEW
Knowledge graphs have
become a critical AI resource
We study them as socio-
technical constructs
Our research
 Explores the links between social and
technical qualities of knowledge graphs
 Proposes methods and tools to make
knowledge graphs better
Picture from https://medium.com/@sderymail/challenges-of-knowledge-graph-part-1-d9ffe9e35214
IN THIS TALK
Effects of editing behaviour and
community make-up on the quality
of knowledge graph
Crowdsourcing methods to enhance
knowledge graphs
EXAMPLE: DBPEDIA
Community project, extracts structured data from Wikipedia
Consistent, centrally defined ontology; support for 125
languages; represents 4.5M items
Open licence
RDF exports, connected to Linked Open Data Cloud
EXAMPLE: WIKIDATA
Wikipedia project creating a knowledge graph
collaboratively
20k active users
52M items, no ‘explicit’ ontology
Open licence
RDF exports, connected to Linked Open Data Cloud
‘ONTOLOGIES ARE US’
Piscopo, A., Phethean, C., & Simperl, E. (2017). What Makes a
Good Collaborative Knowledge Graph: Group Composition and
Quality in Wikidata. International Conference on Social
Informatics, 305-322, Springer.
Piscopo, A., & Simperl, E. (2018). Who Models the World?:
Collaborative Ontology Creation and User Roles in
Wikidata. Proceedings of the ACM on Human-Computer
Interaction, 2(CSCW), 141.
BACKGROUND
Wikidata editors have varied tenure and
interests
Editors and editing behaviour impact
outcomes
 Group composition can have multiple effects
 Tenure and interest diversity can increase outcome
quality and group productivity
 Different editors groups focus on different types of
activities
Chen, J., Ren, Y., Riedl, J.: The effects of diversity on group productivityand member withdrawalin online volunteer groups. In: Proceedingsof the 28th international
conference on human factors in computing systems - CHI ’10. p. 821. ACM Press, New York, USA (2010)
FIRST STUDY: ITEM QUALITY
Analysed the edit history of items
Corpus of 5k items, whose quality has been
manually assessed (5 levels)*
Edit history focused on community make-up
Community is defined as set of editors of item
Considered features from group diversity
literature and Wikidata-specific aspects
*https://www.wikidata.org/wiki/Wikidata:Item_quality
RESEARCH HYPOTHESES
Activity Outcome
H1 Bots edits Item quality
H2 Bot-human interaction Item quality
H3 Anonymous edits Item quality
H4 Tenure diversity Item quality
H5 Interest diversity Item quality
DATA AND METHODS
Ordinal regression analysis, trained four models
Dependent variable: 5k labelled Wikidata items
Independent variables
 Proportion of bot edits
 Bot human edit proportion
 Proportion of anonymous edits
 Tenure diversity: Coefficient of variation
 Interest diversity: User editing matrix
Control variables: group size, item age
RESULTS
ALL HYPOTHESES SUPPORTED
H1
H2
H3 H4
H5
SUMMARY AND IMPLICATIONS
The more is
not always
the merrier
01
Bot edits are
key for quality,
but bots and
humans are
better
02
Registered
editors have
a positive
impact
Diversity
matters
04
Encourage
registration
01
Identify further
areas for bot
editing
02
Design effective
human-bot
workflows
03
Suggest items
to edit based
on tenure and
interests
04
03
SECOND STUDY: ONTOLOGY QUALITY
Analysed the Wikidata ontology and its
edit context
Defined as the graph of all items linked through
P31 (instance of) & P279 (subclass of)
Calculated evolution of quality metrics and
editing activity over time and the links between
them
Based on features from literature on ontology
evaluation and community-driven ontology
engineering
DATA AND METHODS
Wikidata dumps from March 2013 (creation of P279)
to September 2017
 Analysed data in 55 monthly time frames
Literature survey to defined Wikidata ontology
quality framework
Clustering to identify ontology editor roles
Lagged multiple regression to link roles and ontology
features
 Dependent variable: Changes in ontology quality across time
 Independent variables: number of edits by different roles
 Control variables: Bot and anonymous edits
ONTOLOGY QUALITY: METRICS
Based on 7 ontology evaluation frameworks
Compiled structural metrics that can be determined from the dumps
15
Indicator Description Indicator Description
noi Number of instances ap; mp Average and median
population
noc Number of classes rr Relationship richness
norc Number of root classes ir, mr Inheritance and median
richness
nolc Number of leaf classes cr Class richness
nop Number of properties ad, md, maxd Average, median, and max
explicit depth
Sicilia, M. A., Rodríguez, D., García-Barriocanal, E., & Sánchez-Alonso, S. (2012). Empirical findings on ontology metrics. Expert Systems with
Applications, 39(8), 6706-6711.
ONTOLOGY QUALITY: RESULTS
LARGE ONTOLOGY, UNEVEN QUALITY
>1.5M classes, ~4000 properties
No of classes increases at same rate as
overall no of items, likely due to users
incorrectly using P31 & P279
ap and cr decrease over time (several classes
are either without instances or sub-classes or
both)
ir & maxd increase over time (part of the
Wikidata ontology is distributed vertically)
16
EDITOR ROLES: METHODS
K-means, features based on previous studies
Analysis by yearly cohort
17
Feature Description Feature Description
# edits Total number of edits per month. # property edits Total number of edits on
Properties in a month.
# ontology edits Number of edits on classes. # taxonomy
edits
Number of edits on P31 and
P279 statements.
# discussion
edits
Number of edits on talk pages. p batch edits Number of edits done through
automated tools.
# modifying
edits
Number of revisions on
previously existing statements.
item diversity Proportion between number of
edits and number of items edited.
admin True if user in an admin user
group, false otherwise.
lower admin True if user in a user group
with enhanced user rights,
false otherwise.
EDITOR ROLES: RESULTS
190,765 unique editors over 55 months (783k
total)
18k editors active for 10+ months
2 clusters, obtained using gap statistic (tested
2≥k≥8)
Leaders: more active minority (~1%), higher
number of contributions to ontology, engaged
within the community
Contributors: less active, lower number of
contributions to ontology and lower proportion of
batch edits
18
EDITOR ROLES: RESULTS
People who joined the project early tend to be
more active & are more likely to become leaders
Levels of activity of leaders decrease over time
(alternatively, people move on to different tasks)
19
RESEARCH HYPOTHESES
H1 Higher levels of leader activity are negatively correlated to
number of classes (noc), number of root classes (norc), and
number of leaf classes (nolc)
H2 Higher levels of leader activity are positively correlated to
inheritance richness (ir), average population (ap), and average
depth (ad)
20
ROLES & ONTOLOGY: RESULTS
H1 not supported
H2 partially supported
Only inheritance richness (ir) and average depth (ad)
related significantly with leader edits (p<0.01)
Bot edits significantly and positively affect the number of
subclasses and instances per class (ir & ap) (p<0.05)
21
SUMMARY AND IMPLICATIONS
Creating ontologies still a challenging task
Size of the ontology renders existing automatic quality
assessment methods unfeasible
Broader curation efforts are needed: large number of
empty classes
Editor roles less well articulated than in other ontology
engineering projects
Possible decline in motivation after several months
NOBODY KNOWS
EVERYTHING, BUT
EVERYBODY KNOWS
SOMETHING
Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D.,
Flöck, F., & Lehmann, J. (2016). Detecting Linked Data
quality issues via crowdsourcing: A DBpedia
study. Semantic Web Journal, 1-34.
23
BACKGROUND
Varying quality of Linked Data sources
Detecting and correcting errors may require manual
inspection
Different crowds are more or less motivated (or
skilled) to undertake specific aspects of this work
We propose a scalable way to carry out this work
dbpedia:Dave_Dobbyn dbprop:dateOfBirth “3”.
Contest
LD Experts
Difficult task
Final prize
Find Verify
Microtasks
Workers
Easy task
Micropayments
TripleCheckMate MTurk
Incorrect object
Incorrect data type
Incorrect outlink
Object
values
Data types Interlinks
Linked Data
experts
0.7151 0.8270 0.1525
MTurk
(majority voting)
0.8977 0.4752 0.9412
Results: Precision
Approach MTurk interfaces Findings
Use the right
crowd for the
right task
Experts detect a range
of issues, but will not
invest additional effort
Turkers can carry out the
three tasks and are
exceptionally good at
data comparisons
ALL ROADS LEAD TO
ROME
Bu, Q., Simperl, E., Zerr, S., & Li, Y. (2016). Using
microtasks to crowdsource DBpedia entity classification:
A study in workflow design. Semantic Web Journal, 1-
18
26
THREE WORKFLOWS
TO ADD MISSING
ITEM TYPES
Free associations
Validating the machine
Exploring the DBpedia ontology
Findings
 Shortlists are easy & fast
 Popular classes are not enough
 Alternative ways to explore the taxonomy
 Freedom comes with a price
 Unclassified entities might be unclassifiable
 Different human data interfaces
 Working at the basic level of abstraction achieves
greatest precision
 But when given the freedom to choose, users suggest more specific
classes
4.58M
things
SUMMARY OF FINDINGS
Social computing offer a useful lens to study knowledge
graphs
Social fabric of graphs affect quality
Crowdsourcing methods can be used to curate and
enhance knowledge graphs
BUILDING
BETTER
KNOWLEDGE
GRAPHS
THROUGH
SOCIAL
COMPUTING
• Bu, Q., Simperl, E., Zerr, S., & Li, Y. (2016). Using
microtasks to crowdsource DBpedia entity classification: A
study in workflow design. Semantic Web Journal, 1-18
• Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Flöck,
F., & Lehmann, J. (2016). Detecting Linked Data quality
issues via crowdsourcing: A DBpedia study. Semantic Web
Journal, 1-34.
• Piscopo, A., Phethean, C., & Simperl, E. (2017). What
Makes a Good Collaborative Knowledge Graph: Group
Composition and Quality in Wikidata. International
Conference on Social Informatics, 305-322, Springer.
• Piscopo, A., & Simperl, E. (2018). Who Models the
World?: Collaborative Ontology Creation and User Roles
in Wikidata. Proceedings of the ACM on Human-Computer
Interaction, 2(CSCW), 141.

More Related Content

PPTX
DesignOps 101
PDF
Voices of Product: Discovery and Framing
PDF
Ethnography in Software Design - An Anthropologist's Perspective
PDF
What Wikidata teaches us about knowledge engineering
PDF
What Wikidata teaches us about knowledge engineering
PDF
Who models the world? Collaborative ontology creation and user roles in Wikidata
PDF
Loops of humans and bots in Wikidata
PDF
Quality and collaboration in Wikidata
DesignOps 101
Voices of Product: Discovery and Framing
Ethnography in Software Design - An Anthropologist's Perspective
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
Who models the world? Collaborative ontology creation and user roles in Wikidata
Loops of humans and bots in Wikidata
Quality and collaboration in Wikidata

Similar to Building better knowledge graphs through social computing (20)

PDF
Building and using ontologies (2015)
PDF
Ontologies supporting research related information foraging using knowledge g...
PDF
Building and using ontologies
PDF
Tutorial: Building and using ontologies - E.Simperl - ESWC SS 2014
PDF
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
PDF
Ontology Engineering Synthesis Lectures on Data Semantics and Knowledge 1st ...
PPTX
Validating ontologies with OOPS! - EKAW2012
PPTX
sOCIAL NETWORK ANALYSIS AND ONTOLOGIES A VIEW
PDF
WEB-BASED ONTOLOGY EDITOR ENHANCED BY PROPERTY VALUE EXTRACTION
PDF
Building OBO Foundry ontology using semantic web tools
PPT
Communities building ontologies: Tensions and Reality
PPTX
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
PPSX
Exploiting Semantic Web Techniques For Representing And Utilising
PDF
ESWC SS 2013 - Wednesday Tutorial Elena Simperl: Creating and Using Ontologie...
PDF
On the many graphs of the Web and the interest of adding their missing links.
PPTX
Issues and activities in authoring ontologies
PPT
Collaborative Ontology Building Project
PPTX
The Web of Data: do we actually understand what we built?
ODP
Research on collaborative information sharing systems
Building and using ontologies (2015)
Ontologies supporting research related information foraging using knowledge g...
Building and using ontologies
Tutorial: Building and using ontologies - E.Simperl - ESWC SS 2014
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
Ontology Engineering Synthesis Lectures on Data Semantics and Knowledge 1st ...
Validating ontologies with OOPS! - EKAW2012
sOCIAL NETWORK ANALYSIS AND ONTOLOGIES A VIEW
WEB-BASED ONTOLOGY EDITOR ENHANCED BY PROPERTY VALUE EXTRACTION
Building OBO Foundry ontology using semantic web tools
Communities building ontologies: Tensions and Reality
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
Exploiting Semantic Web Techniques For Representing And Utilising
ESWC SS 2013 - Wednesday Tutorial Elena Simperl: Creating and Using Ontologie...
On the many graphs of the Web and the interest of adding their missing links.
Issues and activities in authoring ontologies
Collaborative Ontology Building Project
The Web of Data: do we actually understand what we built?
Research on collaborative information sharing systems
Ad

More from Elena Simperl (20)

PDF
When stars align: studies in data quality, knowledge graphs, and machine lear...
PDF
Knowledge engineering: from people to machines and back
PDF
This talk was not generated with ChatGPT: how AI is changing science
PDF
Knowledge graph use cases in natural language generation
PDF
Knowledge engineering: from people to machines and back
PDF
The web of data: how are we doing so far
PDF
Open government data portals: from publishing to use and impact
PDF
Ten myths about knowledge graphs.pdf
PDF
Data commons and their role in fighting misinformation.pdf
PDF
Are our knowledge graphs trustworthy?
PDF
The web of data: how are we doing so far?
PDF
Crowdsourcing and citizen engagement for people-centric smart cities
PDF
Pie chart or pizza: identifying chart types and their virality on Twitter
PDF
High-value datasets: from publication to impact
PDF
The story of Data Stories
PDF
The human face of AI: how collective and augmented intelligence can help sol...
PDF
Qrowd and the city: designing people-centric smart cities
PDF
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
PDF
Qrowd and the city
PDF
Inclusive cities: a crowdsourcing approach
When stars align: studies in data quality, knowledge graphs, and machine lear...
Knowledge engineering: from people to machines and back
This talk was not generated with ChatGPT: how AI is changing science
Knowledge graph use cases in natural language generation
Knowledge engineering: from people to machines and back
The web of data: how are we doing so far
Open government data portals: from publishing to use and impact
Ten myths about knowledge graphs.pdf
Data commons and their role in fighting misinformation.pdf
Are our knowledge graphs trustworthy?
The web of data: how are we doing so far?
Crowdsourcing and citizen engagement for people-centric smart cities
Pie chart or pizza: identifying chart types and their virality on Twitter
High-value datasets: from publication to impact
The story of Data Stories
The human face of AI: how collective and augmented intelligence can help sol...
Qrowd and the city: designing people-centric smart cities
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
Qrowd and the city
Inclusive cities: a crowdsourcing approach
Ad

Recently uploaded (20)

PDF
Zenith AI: Advanced Artificial Intelligence
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PPTX
Modernising the Digital Integration Hub
PDF
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on dee...
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
STKI Israel Market Study 2025 version august
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PPTX
2018-HIPAA-Renewal-Training for executives
PDF
Architecture types and enterprise applications.pdf
PPTX
The various Industrial Revolutions .pptx
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Abstractive summarization using multilingual text-to-text transfer transforme...
Zenith AI: Advanced Artificial Intelligence
OpenACC and Open Hackathons Monthly Highlights July 2025
Custom Battery Pack Design Considerations for Performance and Safety
Developing a website for English-speaking practice to English as a foreign la...
The influence of sentiment analysis in enhancing early warning system model f...
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
A proposed approach for plagiarism detection in Myanmar Unicode text
Modernising the Digital Integration Hub
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on dee...
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
STKI Israel Market Study 2025 version august
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
A review of recent deep learning applications in wood surface defect identifi...
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
2018-HIPAA-Renewal-Training for executives
Architecture types and enterprise applications.pdf
The various Industrial Revolutions .pptx
Taming the Chaos: How to Turn Unstructured Data into Decisions
Abstractive summarization using multilingual text-to-text transfer transforme...

Building better knowledge graphs through social computing

  • 1. BUILDING BETTER KNOWLEDGE GRAPHS THROUGH SOCIAL COMPUTING Elena Simperl University of Southampton, UK @esimperl Erasmus University Rotterdam November 16th 2018
  • 2. OVERVIEW Knowledge graphs have become a critical AI resource We study them as socio- technical constructs Our research  Explores the links between social and technical qualities of knowledge graphs  Proposes methods and tools to make knowledge graphs better Picture from https://medium.com/@sderymail/challenges-of-knowledge-graph-part-1-d9ffe9e35214
  • 3. IN THIS TALK Effects of editing behaviour and community make-up on the quality of knowledge graph Crowdsourcing methods to enhance knowledge graphs
  • 4. EXAMPLE: DBPEDIA Community project, extracts structured data from Wikipedia Consistent, centrally defined ontology; support for 125 languages; represents 4.5M items Open licence RDF exports, connected to Linked Open Data Cloud
  • 5. EXAMPLE: WIKIDATA Wikipedia project creating a knowledge graph collaboratively 20k active users 52M items, no ‘explicit’ ontology Open licence RDF exports, connected to Linked Open Data Cloud
  • 6. ‘ONTOLOGIES ARE US’ Piscopo, A., Phethean, C., & Simperl, E. (2017). What Makes a Good Collaborative Knowledge Graph: Group Composition and Quality in Wikidata. International Conference on Social Informatics, 305-322, Springer. Piscopo, A., & Simperl, E. (2018). Who Models the World?: Collaborative Ontology Creation and User Roles in Wikidata. Proceedings of the ACM on Human-Computer Interaction, 2(CSCW), 141.
  • 7. BACKGROUND Wikidata editors have varied tenure and interests Editors and editing behaviour impact outcomes  Group composition can have multiple effects  Tenure and interest diversity can increase outcome quality and group productivity  Different editors groups focus on different types of activities Chen, J., Ren, Y., Riedl, J.: The effects of diversity on group productivityand member withdrawalin online volunteer groups. In: Proceedingsof the 28th international conference on human factors in computing systems - CHI ’10. p. 821. ACM Press, New York, USA (2010)
  • 8. FIRST STUDY: ITEM QUALITY Analysed the edit history of items Corpus of 5k items, whose quality has been manually assessed (5 levels)* Edit history focused on community make-up Community is defined as set of editors of item Considered features from group diversity literature and Wikidata-specific aspects *https://www.wikidata.org/wiki/Wikidata:Item_quality
  • 9. RESEARCH HYPOTHESES Activity Outcome H1 Bots edits Item quality H2 Bot-human interaction Item quality H3 Anonymous edits Item quality H4 Tenure diversity Item quality H5 Interest diversity Item quality
  • 10. DATA AND METHODS Ordinal regression analysis, trained four models Dependent variable: 5k labelled Wikidata items Independent variables  Proportion of bot edits  Bot human edit proportion  Proportion of anonymous edits  Tenure diversity: Coefficient of variation  Interest diversity: User editing matrix Control variables: group size, item age
  • 12. SUMMARY AND IMPLICATIONS The more is not always the merrier 01 Bot edits are key for quality, but bots and humans are better 02 Registered editors have a positive impact Diversity matters 04 Encourage registration 01 Identify further areas for bot editing 02 Design effective human-bot workflows 03 Suggest items to edit based on tenure and interests 04 03
  • 13. SECOND STUDY: ONTOLOGY QUALITY Analysed the Wikidata ontology and its edit context Defined as the graph of all items linked through P31 (instance of) & P279 (subclass of) Calculated evolution of quality metrics and editing activity over time and the links between them Based on features from literature on ontology evaluation and community-driven ontology engineering
  • 14. DATA AND METHODS Wikidata dumps from March 2013 (creation of P279) to September 2017  Analysed data in 55 monthly time frames Literature survey to defined Wikidata ontology quality framework Clustering to identify ontology editor roles Lagged multiple regression to link roles and ontology features  Dependent variable: Changes in ontology quality across time  Independent variables: number of edits by different roles  Control variables: Bot and anonymous edits
  • 15. ONTOLOGY QUALITY: METRICS Based on 7 ontology evaluation frameworks Compiled structural metrics that can be determined from the dumps 15 Indicator Description Indicator Description noi Number of instances ap; mp Average and median population noc Number of classes rr Relationship richness norc Number of root classes ir, mr Inheritance and median richness nolc Number of leaf classes cr Class richness nop Number of properties ad, md, maxd Average, median, and max explicit depth Sicilia, M. A., Rodríguez, D., García-Barriocanal, E., & Sánchez-Alonso, S. (2012). Empirical findings on ontology metrics. Expert Systems with Applications, 39(8), 6706-6711.
  • 16. ONTOLOGY QUALITY: RESULTS LARGE ONTOLOGY, UNEVEN QUALITY >1.5M classes, ~4000 properties No of classes increases at same rate as overall no of items, likely due to users incorrectly using P31 & P279 ap and cr decrease over time (several classes are either without instances or sub-classes or both) ir & maxd increase over time (part of the Wikidata ontology is distributed vertically) 16
  • 17. EDITOR ROLES: METHODS K-means, features based on previous studies Analysis by yearly cohort 17 Feature Description Feature Description # edits Total number of edits per month. # property edits Total number of edits on Properties in a month. # ontology edits Number of edits on classes. # taxonomy edits Number of edits on P31 and P279 statements. # discussion edits Number of edits on talk pages. p batch edits Number of edits done through automated tools. # modifying edits Number of revisions on previously existing statements. item diversity Proportion between number of edits and number of items edited. admin True if user in an admin user group, false otherwise. lower admin True if user in a user group with enhanced user rights, false otherwise.
  • 18. EDITOR ROLES: RESULTS 190,765 unique editors over 55 months (783k total) 18k editors active for 10+ months 2 clusters, obtained using gap statistic (tested 2≥k≥8) Leaders: more active minority (~1%), higher number of contributions to ontology, engaged within the community Contributors: less active, lower number of contributions to ontology and lower proportion of batch edits 18
  • 19. EDITOR ROLES: RESULTS People who joined the project early tend to be more active & are more likely to become leaders Levels of activity of leaders decrease over time (alternatively, people move on to different tasks) 19
  • 20. RESEARCH HYPOTHESES H1 Higher levels of leader activity are negatively correlated to number of classes (noc), number of root classes (norc), and number of leaf classes (nolc) H2 Higher levels of leader activity are positively correlated to inheritance richness (ir), average population (ap), and average depth (ad) 20
  • 21. ROLES & ONTOLOGY: RESULTS H1 not supported H2 partially supported Only inheritance richness (ir) and average depth (ad) related significantly with leader edits (p<0.01) Bot edits significantly and positively affect the number of subclasses and instances per class (ir & ap) (p<0.05) 21
  • 22. SUMMARY AND IMPLICATIONS Creating ontologies still a challenging task Size of the ontology renders existing automatic quality assessment methods unfeasible Broader curation efforts are needed: large number of empty classes Editor roles less well articulated than in other ontology engineering projects Possible decline in motivation after several months
  • 23. NOBODY KNOWS EVERYTHING, BUT EVERYBODY KNOWS SOMETHING Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Flöck, F., & Lehmann, J. (2016). Detecting Linked Data quality issues via crowdsourcing: A DBpedia study. Semantic Web Journal, 1-34. 23
  • 24. BACKGROUND Varying quality of Linked Data sources Detecting and correcting errors may require manual inspection Different crowds are more or less motivated (or skilled) to undertake specific aspects of this work We propose a scalable way to carry out this work dbpedia:Dave_Dobbyn dbprop:dateOfBirth “3”.
  • 25. Contest LD Experts Difficult task Final prize Find Verify Microtasks Workers Easy task Micropayments TripleCheckMate MTurk Incorrect object Incorrect data type Incorrect outlink Object values Data types Interlinks Linked Data experts 0.7151 0.8270 0.1525 MTurk (majority voting) 0.8977 0.4752 0.9412 Results: Precision Approach MTurk interfaces Findings Use the right crowd for the right task Experts detect a range of issues, but will not invest additional effort Turkers can carry out the three tasks and are exceptionally good at data comparisons
  • 26. ALL ROADS LEAD TO ROME Bu, Q., Simperl, E., Zerr, S., & Li, Y. (2016). Using microtasks to crowdsource DBpedia entity classification: A study in workflow design. Semantic Web Journal, 1- 18 26
  • 27. THREE WORKFLOWS TO ADD MISSING ITEM TYPES Free associations Validating the machine Exploring the DBpedia ontology Findings  Shortlists are easy & fast  Popular classes are not enough  Alternative ways to explore the taxonomy  Freedom comes with a price  Unclassified entities might be unclassifiable  Different human data interfaces  Working at the basic level of abstraction achieves greatest precision  But when given the freedom to choose, users suggest more specific classes 4.58M things
  • 28. SUMMARY OF FINDINGS Social computing offer a useful lens to study knowledge graphs Social fabric of graphs affect quality Crowdsourcing methods can be used to curate and enhance knowledge graphs
  • 29. BUILDING BETTER KNOWLEDGE GRAPHS THROUGH SOCIAL COMPUTING • Bu, Q., Simperl, E., Zerr, S., & Li, Y. (2016). Using microtasks to crowdsource DBpedia entity classification: A study in workflow design. Semantic Web Journal, 1-18 • Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Flöck, F., & Lehmann, J. (2016). Detecting Linked Data quality issues via crowdsourcing: A DBpedia study. Semantic Web Journal, 1-34. • Piscopo, A., Phethean, C., & Simperl, E. (2017). What Makes a Good Collaborative Knowledge Graph: Group Composition and Quality in Wikidata. International Conference on Social Informatics, 305-322, Springer. • Piscopo, A., & Simperl, E. (2018). Who Models the World?: Collaborative Ontology Creation and User Roles in Wikidata. Proceedings of the ACM on Human-Computer Interaction, 2(CSCW), 141.