0% found this document useful (0 votes)
8 views59 pages

13 July - Webinar Slides

Uploaded by

ANKITA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views59 pages

13 July - Webinar Slides

Uploaded by

ANKITA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

This Photo by Unknown Author is

licensed under CC BY-SA

@chochrphd
Topics Of Discussion

• Overview of Research Data Management


• Data Lifecycle
• Data Management Plan (DMP)
• Websites-Resources
Research Data
Management
Definition
Importance
Research Data Management Defined
• Statistical Records

What is • Video & Audio recordings


research data? • Images
• Measurements
• Software & Code
• Algorithms
• Lab notebooks
• Biospecimens
Research Data Management Defined

Research Data Management

• The organization, storage, preservation, and sharing of data


collected and used in a research project.

✓ Everyday management of research data during lifetime of a


project

✓ Decisions about how data will be preserved and shared after


the project is completed
Research Data Management Defined

Importance of research data


management
• Verify the integrity of your data
• Make your data findable and reusable

• Help others understand your data


• Encourage other researchers to reuse and cite your data
• It is required by some funding agencies
Components of Data
Management
Data Life Cycle
Best Practices
Research Data Management
Defined
Plan

Data Share Discover

Lifecycle

Collect &
Store Organize

Describe Quality
Stage 1 Things to consider:
• Policies
• Type of data
PLAN • Versions
• Backup
• Describing and labeling
• Access and Sharing
• Rights and Permissions
• Roles and Responsibilities
• Budget
Stage 1: Plan

Data Management Plan (DMP)


• A document that describes how you will treat your data throughout a project and
what happens with the data after the project ends.

• Some funding agencies require a Data Management Plan


Stage 1: Plan

• Data Management Plans address:


1. Data Type
2. Data Format
3. Data Sharing Plan
4. Data Archiving/Preserving Plan

https://old.dataone.org/data-management-planning

https://dmptool.org/
EXAMPLE: National Science Foundation (NSF) DMP

Length
• The data management plan is a supplementary document.
• Plans should be no longer than two pages.
Components
• Types of data produced
• Data and metadata standards
• Data access and sharing
• Data re-use and re-distribituion
• Archiving and preservation
EXAMPLE: National Science Foundation (NSF) DMP

Types of data produced:

The types of data, samples, physical collections, software, curriculum


materials, and other materials to be produced in the course of the
project.
EXAMPLE: National Science Foundation (NSF) DMP

Types of data produced:

Questions To Ask
1. What types of data will be produced for your project?
2. How will the data be created or captured?
3. What software programs will be used to generate your data?
4. How much data will be produced?
5. How big will your digital files be and how many will there be?
6. Will you be using existing data? If so, what is the source of that data?
EXAMPLE: National Science Foundation (NSF) DMP

Data and Metadata Standards

The standards to be used for data and metadata format and content
(where existing standards are absent or deemed inadequate, this
should be documented along with any proposed solutions or
remedies).
EXAMPLE: National Science Foundation (NSF) DMP

Data and Metadata Standards

Questions to Ask
1. How will you document your data and project?
2. What file formats will you be using in your project and why?
3. How will you organize your files into directories, and what naming
conventions will you use?
4. How often will your data change or be updated, and will versions need to be
tracked?
5. What types of metadata do you need to collect in order for someone else to
fully understand your data?
EXAMPLE: National Science Foundation (NSF) DMP

Data Access, Sharing, Reuse, and Redistribution

Policies for access and sharing, including provisions for the appropriate protection
of privacy, confidentiality, security, intellectual property, or other rights or
requirements. Policies and provisions for reuse, redistribution, and the production
of derivatives.
EXAMPLE: National Science Foundation (NSF) DMP

Data Access, Sharing, Reuse, and Redistribution


Questions to Ask
1. Who is responsible for managing and controlling your data?
2. Who is likely to be interested in your data and what are the foreseeable future uses of the
data?
3. When and where do you intend to publish or distribute your data?
4. How will the data be made available?
5. Will there be an embargo period before the data is made available for wider distribution? If
so, explain why.
6. Are there issues regarding privacy or restricted, confidential, or sensitive data?
7. How have you addressed any institutional review board (IRB) protocols that may apply to
your research?
8. Are there intellectual property issues or agreements with industry or government agencies
that affect sharing?
9. If you are using data from other sources, do you have the right to share that data?
EXAMPLE: National Science Foundation (NSF) DMP

Storage and Preservation

Plans for archiving data, samples, and other research products, and for
preservation of access to them.
EXAMPLE: National Science Foundation (NSF) DMP

Storage and Preservation

Questions to Ask
1. What is your strategy for data storage and backup?
2. What data will be preserved for the long term?
3. Are extra steps required to prepare the data for preservation?
4. What related information or metadata will be preserved along with the data?
5. Where and how will the data be preserved?
6. What procedures does the archive have in place to ensure preservation and
backup?
7. How long will the data be kept after the project is completed?
Stage 2 Things to consider:
• Locate existing data

Discover • Cite data


Stage 2: Discover

Locate existing data


• Data Directories (e.g. re3data, OpenAccessDirectory)
• General Repositories (e.g. figshare)
• Discipline-related repositories (e.g. DRYAD for life sciences)
• Data Journals (e.g. https://www.nature.com/sdata/)

DataCite is an international organization that helps researchers to find,


access, and use data.
Stage 2: Discover
Provide proper recognition.
Citing Data

Cite datasets

Follow standards
Stage 3 Things to consider:
• Finding and reusing data
Collect & • Choosing a file format
• Naming data files
Organize • Data versioning
Stage 3: Collect and Organize

Organize your data

• Name

• Format

• Version
Stage 3: Collect and Organize 1. Meaningful
2. Length
File name 3. Underscores & Hyphens
4. YYYYMMDD

Before staring your project,


5. Zeros
decide on a naming convention 6. No special characters
for your files.
7. Versions

• Stanford University Libraries - Data Management Services


• University of Wisconsin Research Data Services
• Purdue University Libraries - Data Management for Graduate Researchers
• Cornell University Research Data Management Service Group
Stage 3: Collect and Organize

File Format
• Choose one and stick to it
• Consider the software that will be used to
access data
• Repository requirements
• Lost features during conversion
• Stanford University Libraries - Data Management
Services
• Cornell University Research Data Management
Service Group
• Cambridge University Libraries - Data Management
Stage 3: Collect and Organize

Data Versioning

Saving new copies of your files when you make changes so that you can go
back and retrieve specific versions of your files later.

DataFileName_1.0 = original document


DataFileName_1.1 = original document with minor revisions
DataFileName_2.0 = document with substantial revisions
Data Versioning

Style 1: end of the file name.

image1_v1.jpg
image1_v2.jpg
image2_v1.jpg
image2_v2.jpg
Data Versioning

Style 2: Dated

image1_20151021
image1_20151214
image1_20160123
Data Versioning

Style 1: incorporate names or initials of collaborators

dataset1_20160402_KES
dataset1_20160301_WTC
dataset1_20160814_GSC
Stage 4 Things to consider:
• Assurance (QA)

Data Quality • Control (QC)


• QA/QC Plan
Stage 4: Data Quality

Quality Assurance vs. Quality


Control
• Assurance: Process oriented and focuses on defect prevention

• Control: Product oriented and focuses on defect identification.


Stage 4: Data Quality

Importance of QA/QC plan

Help others Avoid mistakes Track errors and


understand how due to poor data conflicts
to use data quality
Stage 4: Data Quality

QA/QC plans should include

• Methods to deal with erroneous data (Assurance)


• Methods to identify erroneous data (Control)
• Methods to mark erroneous data (Control)
Stage 4: Data Quality

Methods

• Consistent techniques, processes, and environments


• Mechanisms to compare data sets
• Scripts or macros
Stage 5 Things to consider:
• Metadata
Data • Data Dictionary

Description
Stage 5: Data Description

Components of data
description
• Describe scientific context

• Include critical information

• Identifiers within datasets

• Create a data dictionary


Stage 5: Data Description

Metadata

• “Data about data” (context)

• Description of your research data


Stage 5: Data Description

What does metadata do?

Makes your data Increases Makes your data and


easier to find. understanding and associated research
reusability of data. verifiable
Stage 5: Data Description

What to include in metadata


• General Information who created the data
what the data file contains
• Data and File Overview
when the data were generated
• Methodological Information where the data were generated
• Data specific-information why the data were generated
how the data were generated.
Stage 5: Data Description

Where can metadata be collected?

• Lab notebooks
• Plain text README files
• Within data file
• Web forms
Stage 5: Data Description

Data Dictionary

• Describes all the data stored in a data set or used by a


database

• Describes the data, does not contain the data


Stage 5: Data Description Examples:
• Ag Data Commons

Components of data • National Renewable Energy


Laboratory (NREL)
dictionary • Protein Data Bank Exchange
Data Dictionary (PDBx/mmCIF
V4.0)
• List of all files

• Type of data included

• List of field and variable names

• Description of information contained in each


field
Stage 6 Things to consider:
• Size of dataset

Data Storage • Computational requirements


• Backup
• Security
Stage 6: Data Storage

Backup rule of Three

Keep an original Second local copy Remote copy


copy
Stage 7 Things to consider:
• Benefits

Data Share • Location


• Preparation
Stage 7: Data Sharing

Benefits of Data Sharing


• Promote new discoveries

• Enhance Impact

• Support Validation

• Encourage Collaboration

• Increase Public investment

• Reduce redundancy
Stage 7: Data Sharing

Locations

• Disciplinary repository
• Data journal
• Supplementary File
• Web-based tools
Stage 7: Data Sharing

Preparation for sharing


• Use consistent and meaningful file names
• Use self-explanatory variable names and abbreviations
• Remove redundant variables and labels
• Apply anonymization as needed
• Check copyright and privacy permissions
Summary

• Data management is the organization, storage, preservation,


and sharing of data collected and used in a research project.

• Data management is critical in every stage of the data lifecycle


• Things to always remember:
• RECORD and TRACK
• NAME FILES
• STORE and BACKUP
• GUIDELINES and REQUIREMENTS
Free Data Management software
Service Description

Adobe Bridge Adobe Bridge is free software for locally organizing images.

Figshare is a multidisciplinary repository where users can make all of their


research outputs available in a citable, shareable and discoverable manner.
Figshare allows users to upload any file format to be made visualisable in
Figshare the browser so that figures, datasets, media, papers, posters, presentations
and filesets can be disseminated. Figshare uses Datacite DOIs for persistent
data citation.
The Open Science Framework (OSF) is a free, open source web application
that connects and supports the research workflow, enabling scientists to
Open Science Framework increase the efficiency and effectiveness of their research. Researchers use
the OSF to collaborate, document, archive, share, and register research
projects, materials, and data.

XSEDE national infrastructure facility hosted at the Pittsburgh (PA)


Supercomputer Center. Campus XSEDE champion is Aaron Culich (as of
XSEDE Bridges computing and storage 2016). XSEDE offers free computing and storage to qualified researchers
through a competitive application process.
XSEDE is a set of national facilities that scientists can use to interactively
share computing resources, data and expertise. People around the world
XSEDE Storage Services use these resources and services — things like supercomputers, collections
of data and new tools — to improve our planet. XSEDE resources include
several services for storing research data.
References

https://guides.library.yale.edu/rdm_healthsci/home
https://pitt.libguides.com/managedata/understanding#s-lg-box-
4890536
https://data.research.cornell.edu/content/readme
https://ukdataservice.ac.uk/deposit-data/preparing-data.aspx
https://dmptool.org/
https://www.dataone.org/
https://datadryad.org/stash
Thank you
Questions?
Pre-submitted questions
Online resources
Qualitative Data Management

• Create a data dictionary that contains:


• Dates
• Locations
• Individual or group characteristics
• Interview characteristics
• Other defining features
• Ensure fidelity of analyzed data
• Ethics requirements
• Version control

Mack N, Woodsong C, MacQueen KM, Guest G, Namey E. Qualitative research methods: a


data collectors field guide.
Managing Geospatial Data
Review Matrix (for literature/references)

• Review Matrix: Using a spreadsheet/table to organize key


elements

https://guides.library.vcu.edu/health-sciences-lit-review/organize

You might also like