Mechanisms for
Data Quality and Validation
in Citizen Science
A. Wiggins, G. Newman, R. Stevenson & K. Crowston
Presented by Nathan Prestopnik
Motivation

 Data quality and validation are a primary concern
  for most citizen science projects
   More contributors = more opportunities for error

 There has been no review of appropriate data
  quality and validation mechanisms
   Diverse projects face similar challenges

 Contributors’ skills and scale of participation are
  important considerations in ensuring quality
Methods

 Survey
   Questionnaire with 70 items, all optional
   63 completed questionnaires representing 62 projects
   Mostly small-to-medium sized projects in US, Canada,
    UK; most focus on monitoring and observation

 Inductive development of framework
   Based on survey results and authors’ direct experience
    with citizen science projects
Survey: Resources

 FTEs: 0 – 50+
   Average: 2.4; Median: 1
   Often small fractions of several individuals’ time

 Annual budgets: $125 - $1,000,000
   Average: $105,000; Median: $35,000; Mode: $20,000
   Up to 5 different funding sources, usually grants, in-
    kind contributions (staff time), & private donations

 Age/duration: -1 to 100 years
   Average age: 13 years; Median: 9 years; Mode: 2 years
Survey: Methods Used
Method                                                n    Percentage
Expert review                                         46      77%
Photo submissions                                     24      40%
Paper data sheets submitted along with online entry   20      33%
Replication/rating by multiple participants           14      23%
QA/QC training program                                13      22%
Automatic filtering of unusual reports                11      18%
Uniform equipment                                     9       15%
Validation planned but not yet implemented            5       8%
Replication/rating, by the same participant           2       3%
Rating of established control items                   2       3%
None                                                  2       3%
Not sure/don’t know                                   2       3%
Survey:
         Combining Methods
Methods                                      n    Percentage
Single method                                10      17%
Multiple methods, up to 5 (average 2.5)      45      75%
Expert review + Automatic filtering          11      18%
Expert review + Paper data sheets            10      17%
Expert review + Photos                       14      23%
Expert review + Photos + Paper data sheets   6       10%
Expert review + Replication, multiple        10      17%
Survey:
     Resources & Methods
 Number of validation methods and staff are
  positively correlated (r2 = 0.11)
   More staffing = more supervisory capacity

 Number of validation methods and budget are
  negatively correlated (r2 = -0.15)
   If larger budgets means more contributors, this
    constrains scalability of multiple methods
   Larger projects may use fewer but more sophisticated
    mechanisms
   Suggests that human-supervised methods don’t scale
Survey:
 Other Validation Options
 “Please describe any additional validation methods
  used in your project”
   Several projects rely on personal knowledge of
    contributing individuals for data quality
     Not scientifically robust, but understandably relevant
   Most comments referred to details of expert review
     Reinforces the perceived value of expertise
   Reporting interface and associated error-checking is
    often overlooked, but provides important initial data
    verification
Choosing Mechanisms

 Data characteristics to consider when choosing
  mechanisms to ensure quality
   Accuracy and precision: taxonomic, spatial, temporal,
    etc.
   Error prevention: malfeasance (gaming the system),
    inexperience, data entry errors, etc.

 Evaluate assumptions about error and accuracy
   Where does error originate? How do mechanisms
    address this? At what step in the research process?
    How transparent is data review and outcomes? How
    much data will be reviewed? In how much detail?
Mechanisms: Protocols
Mechanism                 Process   Type/Detail
QA project plans          Before    SOP in some areas
Repeated samples/tasks    During    By multiple participants, single
                                    participant, or experts (calibration)
Tasks involving control   During    Contributions compared to known states
items
Uniform/calibrated        During    Used for measurements; cost/scale
equipment                           tradeoff; who pays?
Paper data sheets +       During    Extended details, verifying data entry
online entry*                       accuracy
Digital vouchers*         During    Photos, audio, specimens/archives
Data triangulation,       After     Corroboration from other data sources;
normalization, mining*              statistical & computer science methods
Data documentation*       After     Provide metadata about processes
Mechanisms: Participants

Mechanism                 Process   Types/Details
Participant training      Before,   Initial; Ongoing; Formal QA/QC
                          During
Participant testing       Before,   Following training; Pre/test-retest
                          During
Rating participant        During,   Unknown to participant; Known to
performance               After     participant
Filtering of unusual      During,   Automatically; Manually
reports                   After
Contacting participants   After     May alienate/educate contributors
about unusual reports
Automatic recognition     After     Techniques for image/text processing
Expert review             After     By professionals, experienced contributors,
                                    or multiple parties
Discussion

 Need to pay more attention to way that data are
  created, not just protocols but also qualities of data
  like accuracy, precision

 Clear need for quality/validation mechanisms for
  analysis, not only for data collection/processing
   Data mining techniques
   Spatio-temporal modeling

 Scalability of validation may be limited
   May need to plan different quality management
    techniques based on expected/actual project growth
Future Work

 Most projects worry more about contributor
  expertise than appropriate analysis methods
   Resources are needed to support suitable analysis
    approaches and tools

 Comparative valuation of the efficacy of the data
  quality and validation mechanisms identified
   Develop a QA/QC planning and evaluation tool

 Develop examples of appropriate data
  documentation for citizen science projects
   Necessary for peer review, data re-use
Thanks!

 Nate Prestopnik

 DataONE working group on Public Participation in
  Scientific Research

 US NSF grants 09-43049 & 11-11107

More Related Content

PPTX
Just In Time Clinical Trial Monitoring Final
PDF
TRI Webinar: RBM - Protocol Risk Assessment and Designing Site Quality Risk ...
PPT
CLINICAL TRIAL PROJECT MANAGEMENT
PPTX
Demonstrating Clinical Utility
PDF
Mitigating Risks in Clinical Studies
PPTX
Risk Based Monitoring presentation by Triumph Research Intelligence January 2014
PDF
Integrating Clinical Operations and Clinical Data Management Through EDC
PPTX
Mobile CRAs: Transforming Clinical Monitoring Processes through Mobile Techno...
Just In Time Clinical Trial Monitoring Final
TRI Webinar: RBM - Protocol Risk Assessment and Designing Site Quality Risk ...
CLINICAL TRIAL PROJECT MANAGEMENT
Demonstrating Clinical Utility
Mitigating Risks in Clinical Studies
Risk Based Monitoring presentation by Triumph Research Intelligence January 2014
Integrating Clinical Operations and Clinical Data Management Through EDC
Mobile CRAs: Transforming Clinical Monitoring Processes through Mobile Techno...

What's hot (6)

PDF
CRO - Clinical Vendor Oversight Webinar.
PDF
Clean File_Form_Lock_Katalyst HLS
PDF
Risk Based Monitoring in Practice
PPTX
Bab 6 Tool Support For Testing
DOCX
Monitoring Plan Template
PPTX
The secrets to conducting a rapid safety trial
CRO - Clinical Vendor Oversight Webinar.
Clean File_Form_Lock_Katalyst HLS
Risk Based Monitoring in Practice
Bab 6 Tool Support For Testing
Monitoring Plan Template
The secrets to conducting a rapid safety trial

Viewers also liked (8)

PPT
GeoChronos - CANARIE NEP Showcase 2009 Presentation
PPT
E scidocdays review
PPT
Intellectual Diversity in the iSchools: Past, Present and Future
PPT
Tales of the Field: Building Small Science Cyberinfrastructure
PDF
Online Communities in Citizen Science & BirdCams
PDF
4. sistema nervioso autonomo
PDF
Free as in Puppies: Compensating for ICT Constraints in Citizen Science
PPT
All About me
GeoChronos - CANARIE NEP Showcase 2009 Presentation
E scidocdays review
Intellectual Diversity in the iSchools: Past, Present and Future
Tales of the Field: Building Small Science Cyberinfrastructure
Online Communities in Citizen Science & BirdCams
4. sistema nervioso autonomo
Free as in Puppies: Compensating for ICT Constraints in Citizen Science
All About me

Similar to Mechanisms for Data Quality and Validation in Citizen Science (20)

PPT
Optimising Clinical Trials Monitoring Data review - Neill Barron
PDF
Presentation on dealing with data quality sushanta, MEAL part-2 training 28 ...
PPTX
Quality payment program 2018
PDF
RBM 101 Infographic FINAL 2016
PPT
Test process
PPT
sources of data.ppt
PPTX
ISCRAM 2013: Designing towards an impact evaluation framework for a collabora...
PPTX
ISCRAM Impact Evaluation
PDF
Final-Audit-Sampling.pdf
PDF
Scientific Data Stewardship Maturity Matrix
PDF
SUCCESS STORY: Increasing Audit Processing Throughput by Over 100% With Lynne...
PPT
Preliminary results from a survey on the use of metrics and evaluation strate...
PPT
Acceptance Testing
PDF
Use of Qualitative Approaches for Impact Assessments of Integrated Systems Re...
PDF
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
PDF
Freeing Up Investigators' Time to Engage with Patients
PPTX
Strengthening an Organization’s Capacity to Demand and Use Data
PPTX
TRI's DIA 2015 Presentation, Therapeutic KRIs: Digestive Disease
PPT
#W4A2011 - C. Bailey
PDF
Quality Journey- Measurement System Analysis .pdf
Optimising Clinical Trials Monitoring Data review - Neill Barron
Presentation on dealing with data quality sushanta, MEAL part-2 training 28 ...
Quality payment program 2018
RBM 101 Infographic FINAL 2016
Test process
sources of data.ppt
ISCRAM 2013: Designing towards an impact evaluation framework for a collabora...
ISCRAM Impact Evaluation
Final-Audit-Sampling.pdf
Scientific Data Stewardship Maturity Matrix
SUCCESS STORY: Increasing Audit Processing Throughput by Over 100% With Lynne...
Preliminary results from a survey on the use of metrics and evaluation strate...
Acceptance Testing
Use of Qualitative Approaches for Impact Assessments of Integrated Systems Re...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Freeing Up Investigators' Time to Engage with Patients
Strengthening an Organization’s Capacity to Demand and Use Data
TRI's DIA 2015 Presentation, Therapeutic KRIs: Digestive Disease
#W4A2011 - C. Bailey
Quality Journey- Measurement System Analysis .pdf

More from Andrea Wiggins (20)

PDF
Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...
PDF
Online Communities in Citizen Science
PPT
Citizen Science Phenotypes
PPT
The Evolving Landscape of Citizen Science
PDF
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...
PPT
Data Management for Citizen Science
PDF
With Great Data Comes Great Responsibility
PDF
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...
KEY
Open Source & Citizen Science
PPT
From Conservation to Crowdsourcing: A Typology of Citizen Science
PDF
Motivation by Design: Technologies, Experiences, and Incentives
PDF
Data Intensive Collaboration in Science and Engineering: CSCW workshop themes
PDF
Secondary data analysis with digital trace data
KEY
Open Source, Open Science, & Citizen Science
PPT
Reclassifying Success and Tragedy in FLOSS Projects
PDF
Crowdsourcing Science
PPT
Distributed Scientific Collaboration: Research Opportunities in Citizen Science
PPT
Designing Virtual Organizations for Citizen Science
PPT
National Park System Property Designations
PPT
Collaborative Data Analysis with Taverna Workflows
Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...
Online Communities in Citizen Science
Citizen Science Phenotypes
The Evolving Landscape of Citizen Science
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...
Data Management for Citizen Science
With Great Data Comes Great Responsibility
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...
Open Source & Citizen Science
From Conservation to Crowdsourcing: A Typology of Citizen Science
Motivation by Design: Technologies, Experiences, and Incentives
Data Intensive Collaboration in Science and Engineering: CSCW workshop themes
Secondary data analysis with digital trace data
Open Source, Open Science, & Citizen Science
Reclassifying Success and Tragedy in FLOSS Projects
Crowdsourcing Science
Distributed Scientific Collaboration: Research Opportunities in Citizen Science
Designing Virtual Organizations for Citizen Science
National Park System Property Designations
Collaborative Data Analysis with Taverna Workflows

Recently uploaded (20)

PPTX
Blending method and technology for hydrogen.pptx
PDF
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
PPTX
How to Convert Tickets Into Sales Opportunity in Odoo 18
PDF
Streamline Vulnerability Management From Minimal Images to SBOMs
PPTX
Information-Technology-in-Human-Society.pptx
PDF
CEH Module 2 Footprinting CEH V13, concepts
PDF
NewMind AI Journal Monthly Chronicles - August 2025
PPTX
How to use fields_get method in Odoo 18
PDF
Connector Corner: Transform Unstructured Documents with Agentic Automation
PPT
Overviiew on Intellectual property right
PPTX
Strategic Picks — Prioritising the Right Agentic Use Cases [2/6]
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PDF
Ericsson 5G Feature,KPIs Analysis_ Overview, Dependencies & Recommendations (...
PPTX
Report in SIP_Distance_Learning_Technology_Impact.pptx
PPTX
CRM(Customer Relationship Managmnet) Presentation
PDF
The AI Revolution in Customer Service - 2025
PDF
Decision Optimization - From Theory to Practice
PDF
State of AI in Business 2025 - MIT NANDA
PPTX
maintenance powerrpoint for adaprive and preventive
PDF
Optimizing bioinformatics applications: a novel approach with human protein d...
Blending method and technology for hydrogen.pptx
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
How to Convert Tickets Into Sales Opportunity in Odoo 18
Streamline Vulnerability Management From Minimal Images to SBOMs
Information-Technology-in-Human-Society.pptx
CEH Module 2 Footprinting CEH V13, concepts
NewMind AI Journal Monthly Chronicles - August 2025
How to use fields_get method in Odoo 18
Connector Corner: Transform Unstructured Documents with Agentic Automation
Overviiew on Intellectual property right
Strategic Picks — Prioritising the Right Agentic Use Cases [2/6]
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
Ericsson 5G Feature,KPIs Analysis_ Overview, Dependencies & Recommendations (...
Report in SIP_Distance_Learning_Technology_Impact.pptx
CRM(Customer Relationship Managmnet) Presentation
The AI Revolution in Customer Service - 2025
Decision Optimization - From Theory to Practice
State of AI in Business 2025 - MIT NANDA
maintenance powerrpoint for adaprive and preventive
Optimizing bioinformatics applications: a novel approach with human protein d...

Mechanisms for Data Quality and Validation in Citizen Science

  • 1. Mechanisms for Data Quality and Validation in Citizen Science A. Wiggins, G. Newman, R. Stevenson & K. Crowston Presented by Nathan Prestopnik
  • 2. Motivation  Data quality and validation are a primary concern for most citizen science projects  More contributors = more opportunities for error  There has been no review of appropriate data quality and validation mechanisms  Diverse projects face similar challenges  Contributors’ skills and scale of participation are important considerations in ensuring quality
  • 3. Methods  Survey  Questionnaire with 70 items, all optional  63 completed questionnaires representing 62 projects  Mostly small-to-medium sized projects in US, Canada, UK; most focus on monitoring and observation  Inductive development of framework  Based on survey results and authors’ direct experience with citizen science projects
  • 4. Survey: Resources  FTEs: 0 – 50+  Average: 2.4; Median: 1  Often small fractions of several individuals’ time  Annual budgets: $125 - $1,000,000  Average: $105,000; Median: $35,000; Mode: $20,000  Up to 5 different funding sources, usually grants, in- kind contributions (staff time), & private donations  Age/duration: -1 to 100 years  Average age: 13 years; Median: 9 years; Mode: 2 years
  • 5. Survey: Methods Used Method n Percentage Expert review 46 77% Photo submissions 24 40% Paper data sheets submitted along with online entry 20 33% Replication/rating by multiple participants 14 23% QA/QC training program 13 22% Automatic filtering of unusual reports 11 18% Uniform equipment 9 15% Validation planned but not yet implemented 5 8% Replication/rating, by the same participant 2 3% Rating of established control items 2 3% None 2 3% Not sure/don’t know 2 3%
  • 6. Survey: Combining Methods Methods n Percentage Single method 10 17% Multiple methods, up to 5 (average 2.5) 45 75% Expert review + Automatic filtering 11 18% Expert review + Paper data sheets 10 17% Expert review + Photos 14 23% Expert review + Photos + Paper data sheets 6 10% Expert review + Replication, multiple 10 17%
  • 7. Survey: Resources & Methods  Number of validation methods and staff are positively correlated (r2 = 0.11)  More staffing = more supervisory capacity  Number of validation methods and budget are negatively correlated (r2 = -0.15)  If larger budgets means more contributors, this constrains scalability of multiple methods  Larger projects may use fewer but more sophisticated mechanisms  Suggests that human-supervised methods don’t scale
  • 8. Survey: Other Validation Options  “Please describe any additional validation methods used in your project”  Several projects rely on personal knowledge of contributing individuals for data quality  Not scientifically robust, but understandably relevant  Most comments referred to details of expert review  Reinforces the perceived value of expertise  Reporting interface and associated error-checking is often overlooked, but provides important initial data verification
  • 9. Choosing Mechanisms  Data characteristics to consider when choosing mechanisms to ensure quality  Accuracy and precision: taxonomic, spatial, temporal, etc.  Error prevention: malfeasance (gaming the system), inexperience, data entry errors, etc.  Evaluate assumptions about error and accuracy  Where does error originate? How do mechanisms address this? At what step in the research process? How transparent is data review and outcomes? How much data will be reviewed? In how much detail?
  • 10. Mechanisms: Protocols Mechanism Process Type/Detail QA project plans Before SOP in some areas Repeated samples/tasks During By multiple participants, single participant, or experts (calibration) Tasks involving control During Contributions compared to known states items Uniform/calibrated During Used for measurements; cost/scale equipment tradeoff; who pays? Paper data sheets + During Extended details, verifying data entry online entry* accuracy Digital vouchers* During Photos, audio, specimens/archives Data triangulation, After Corroboration from other data sources; normalization, mining* statistical & computer science methods Data documentation* After Provide metadata about processes
  • 11. Mechanisms: Participants Mechanism Process Types/Details Participant training Before, Initial; Ongoing; Formal QA/QC During Participant testing Before, Following training; Pre/test-retest During Rating participant During, Unknown to participant; Known to performance After participant Filtering of unusual During, Automatically; Manually reports After Contacting participants After May alienate/educate contributors about unusual reports Automatic recognition After Techniques for image/text processing Expert review After By professionals, experienced contributors, or multiple parties
  • 12. Discussion  Need to pay more attention to way that data are created, not just protocols but also qualities of data like accuracy, precision  Clear need for quality/validation mechanisms for analysis, not only for data collection/processing  Data mining techniques  Spatio-temporal modeling  Scalability of validation may be limited  May need to plan different quality management techniques based on expected/actual project growth
  • 13. Future Work  Most projects worry more about contributor expertise than appropriate analysis methods  Resources are needed to support suitable analysis approaches and tools  Comparative valuation of the efficacy of the data quality and validation mechanisms identified  Develop a QA/QC planning and evaluation tool  Develop examples of appropriate data documentation for citizen science projects  Necessary for peer review, data re-use
  • 14. Thanks!  Nate Prestopnik  DataONE working group on Public Participation in Scientific Research  US NSF grants 09-43049 & 11-11107

Editor's Notes

  • #6: Rating = classification or judgment tasks, admittedly not the clearest wording, but no one corrected this in text responsesPercentage = percentage of responding projects that use each method
  • #7: Percentage = Percentage of responding projects that use this combination of methodsThere were a few other combinations that a handful of projects used; these were the dominant ones.Surprised to see so many with photos, as they are hard to use and store, and the frequency of using paper data sheets
  • #8: Note that we did ask about numbers of contributions, but the units of contribution for each project (and even the way they count volunteers) were so different that they couldn’t be used for analysis
  • #11: Split framework of mechanisms in two for ease of viewing; these are methods that address the protocol as the presumed source of errorStarred items address errors arising from both protocols and participants
  • #12: These methods all address expected errors form participants, focusing primarily on skill evaluation and filtering or review for unusual reports