Whythawk
Data Curation
Data probity in a time of COVID
SIKM, June 2021
www.vperemen.com, CC BY-SA 4.0, via Wikimedia Commons
The screaming need for data
Who is effected?
How are they effected?
What can we do about it?
What might happen in response?
How do we recover afterwards?
Will things ever be the same?
Badics, CC BY-SA 3.0, via Wikimedia Commons
The intersection of Policy & Politics
Data, analysis & the evidence illusion
Post-hoc support & plausible deniability
Competing self-interest
Changing circumstance, changing evidence
Harvesting longitudinal data is
not joyful
Instant answers don’t happen instantly
Longitudinal source data are incoherent
Data probity takes method, practice & time
Esayas Ayele, CC BY-SA 4.0, via Wikimedia Commons
CDC Global, CC BY-SA 2.0, via flickr
What we talk about
when we talk about
probity
Identifiable source
Transparent methods
Publication before analysis
Point data before aggregation
Repeatable, auditable trail
Transparency in practice
Pre-publication of research protocol, methods & data
Systematic review
Open licences
No trust without support for peer review & validation
Yakuzakorat, CC BY 4.0, via Wikimedia Commons
Photo by Clay Banks on Unsplash
Protocols & ambiguity
Maintain your source
Pick sensible defaults
Make no destructive changes
Document every action
Expect to be audited
Photo by Lubo Minar on Unsplash
Uncertainty & the distant future
Data harvested today must answer unknown
questions to unknown problems in an
unknown – but different – future environment
Poverty is expensive
A legacy of futility risks becoming self-perpetuating
Olga Ernst, CC BY-SA 4.0, via Wikimedia Commons
A history in 35 million rows
Where are businesses compared to
where we think they are?
Does a change in tax rates cause business closure?
How should we measure energy consumption?
Who wins & loses from
COVID commute changes?
Who wants to be a millionaire?
Photo by Sylvie Tittel on Unsplash
Protocol with sensible defaults
1. All units are occupied & pay full rates.
2. When data are ambiguous, refer to 1.
3. Ask for data, even when you know they’ll say no.
4. Never delete anything.
5. Document everything.
6. When in doubt, ask the data source.
7. Accept the weird but keep looking for answers.
8. Ensure the process is public.
1. Track every step
2. Disclose every request
3. Non-destructive auditable transformation
4. Always ready to explain
5. Make the data useful
Because …
Photo by Sylvie Tittel on Unsplash
Sqwyre data probity protocol
1. Instant answers don’t happen instantly
2. Data probity takes method, practice & patience
3. Maintain all source data
4. Pick sensible & transparent defaults
5. Transformations must be documented
6. Make no destructive changes
7. Point data before aggregation or analysis
8. Open licences to encourage use & reuse
9. Collaborate to make the data wanted & useful
10. Be ready to explain & be audited
Hansueli Krapf This file was uploaded with Commonist., CC BY-SA 3.0, via Wikimedia Commons
Know your business
Whythawk
Gavin Chait
gchait@whythawk.com
https://whythawk.com/

Data Curation - Data probity in a time of COVID

  • 1.
    Whythawk Data Curation Data probityin a time of COVID SIKM, June 2021
  • 2.
    www.vperemen.com, CC BY-SA4.0, via Wikimedia Commons The screaming need for data Who is effected? How are they effected? What can we do about it? What might happen in response? How do we recover afterwards? Will things ever be the same?
  • 6.
    Badics, CC BY-SA3.0, via Wikimedia Commons The intersection of Policy & Politics Data, analysis & the evidence illusion Post-hoc support & plausible deniability Competing self-interest Changing circumstance, changing evidence
  • 7.
    Harvesting longitudinal datais not joyful Instant answers don’t happen instantly Longitudinal source data are incoherent Data probity takes method, practice & time Esayas Ayele, CC BY-SA 4.0, via Wikimedia Commons
  • 8.
    CDC Global, CCBY-SA 2.0, via flickr What we talk about when we talk about probity Identifiable source Transparent methods Publication before analysis Point data before aggregation Repeatable, auditable trail
  • 9.
    Transparency in practice Pre-publicationof research protocol, methods & data Systematic review Open licences No trust without support for peer review & validation Yakuzakorat, CC BY 4.0, via Wikimedia Commons
  • 10.
    Photo by ClayBanks on Unsplash Protocols & ambiguity Maintain your source Pick sensible defaults Make no destructive changes Document every action Expect to be audited
  • 11.
    Photo by LuboMinar on Unsplash Uncertainty & the distant future Data harvested today must answer unknown questions to unknown problems in an unknown – but different – future environment
  • 12.
    Poverty is expensive Alegacy of futility risks becoming self-perpetuating Olga Ernst, CC BY-SA 4.0, via Wikimedia Commons
  • 13.
    A history in35 million rows
  • 14.
    Where are businessescompared to where we think they are? Does a change in tax rates cause business closure? How should we measure energy consumption? Who wins & loses from COVID commute changes?
  • 15.
    Who wants tobe a millionaire?
  • 16.
    Photo by SylvieTittel on Unsplash Protocol with sensible defaults 1. All units are occupied & pay full rates. 2. When data are ambiguous, refer to 1. 3. Ask for data, even when you know they’ll say no. 4. Never delete anything. 5. Document everything. 6. When in doubt, ask the data source. 7. Accept the weird but keep looking for answers. 8. Ensure the process is public.
  • 17.
  • 18.
  • 19.
  • 20.
    4. Always readyto explain
  • 21.
    5. Make thedata useful
  • 22.
  • 23.
    Photo by SylvieTittel on Unsplash Sqwyre data probity protocol 1. Instant answers don’t happen instantly 2. Data probity takes method, practice & patience 3. Maintain all source data 4. Pick sensible & transparent defaults 5. Transformations must be documented 6. Make no destructive changes 7. Point data before aggregation or analysis 8. Open licences to encourage use & reuse 9. Collaborate to make the data wanted & useful 10. Be ready to explain & be audited
  • 24.
    Hansueli Krapf Thisfile was uploaded with Commonist., CC BY-SA 3.0, via Wikimedia Commons Know your business
  • 25.