icf.com ©Copyright 2018 ICF 11
Executive Summary
Overall, the demand for data science talent is outpacing the current supply,
and many who are being trained in data science methods are pursuing careers
in sectors other than public service or biomedical/behavioral research. On
June 6, 2018 ICF hosted a workshop with participants from academia, industry
organizations and the federal government. The following themes emerged when
discussing data science training and workforce development.
 Organizations interested in incorporating data science need to increase
knowledge and/or expertise among staff of data science methods and how
they can be implemented to solve problems.
 Identifying engaged mentors is critical for the success of data science training
programs.
 Coordination of data science training opportunities across departments of a
large organization may be challenging. But, coordination offers several benefits
for the organization and the workforce.
 Organizations may have a variety of data sources amenable to analysis using
data science methods, beyond “research datasets”. Importantly, organizations
need to generate questions or problems they wish to address using data
science methods with the various datasets available.
 Diverse training opportunities are needed to attract data science talent to
projects in public service and/or biomedical/behavioral research. Incentives to
encourage data science talent to pursue work in the public sector should be
explored. Novel partnerships between the public sector, academia and industry
should also be explored to meet the increasing demand for data science in all
three sectors.
As interest in employing data science methods expands in federal government
agencies and other organizations, how to structure data science training
opportunities and improve data science skills in the workforce of the organization
will become more salient.
Data Science Training and
Workforce Development
By Audie A. Atienza, PhD, and Lewis Berman, PhD, MS, ICF
Acknowledgments:
We wish to thank the workshop
participants for their thoughtful
discussions during the meeting. We also
express much gratitude to Philip Bourne,
Brian Wright, Jessica Mazerik, Debra
Tomchek, and Jeffrey Neal1
for
their insightful comments on earlier
drafts of this paper.
1
Debra Tomchek and Jeffrey Neal are at ICF
and did not attend the workshop.
icf.com
White Paper
Public Health
icf.com ©Copyright 2018 ICF 2
Data Science Training and Workforce Development
Introduction
ICF held a workshop - Biomedical and Behavioral Data Sciences Regional
Consortium Planning Workshop (June 6, 2018; Bethesda, MD) - with participants
from academia, industry organizations and the federal government (i.e., NIH)
located in the Maryland, Virginia, and Washington, DC area to discuss data
science training and workforce development in biomedical and behavioral
research (see Appendix A for participants). The purpose of this White Paper is
two-fold: 1) Discuss several of the key workshop themes focused on biomedical
and behavioral research. 2) Discuss the implications of these themes for the
broader data science community.
The Challenge
National Institutes of Health, and other federal organizations, are experiencing an
increasing demand for data scientists. However, there are multiple factors which
have led to a dearth of data scientists within federal government workforce.
Underlying factors include:
The demand for data science research is growing—Several influential research
organizations have produced reports focused on Data Science. In December
2016, The National Science Foundation published an advisory committee
report – Realizing the Potential of Data Science. In 2018, the National Institutes
of Health (NIH) released their Strategic Plan for Data Science, one of the key
recommendations of the 2012 Data and Informatics Working Group Report to
the Advisory Committee to the NIH Director. In 2018, the National Academies of
Sciences, Engineering, and Medicine (NASEM) published a consensus study report
- Data Science for Undergraduates: Opportunities and Options.
This growing demand is notably illustrated by interest in data science in
biomedical research. An analysis conducted by Dr. Audie Atienza, of ICF, on the
various NIH Strategic Plans (as of January 2017) found that 22 out of the 25 NIH
Institutes and Center (ICs) with publicly available strategic plans highlighted
at least one data science term (terms: “data science”, “big data”, “common
data”, “bioinformatics”, “computational biology”, and “data integration”) within
their respective strategic plans. Of all the data science terms examined,
“bioinformatics” and “big data” had the highest number of instances (most
mentions) at 75 and 44, respectively. [Data available from the author, Dr. Audie
Atienza, upon request]. Thus, Data Science is strategically important for most all
of the NIH ICs. Of particular note in recent NIH documents focused on data science
(e.g., NIH Strategic Plan for Data Science; A Platform for Biomedical Discovery and
Data-Powered Health: National Library of Medicine Strategic Plan 2017-2027) is
the emphasis on expanding data science skills within NIH’s internal workforce,
which will require novel strategies to train existing staff and attract new data
science talent.
There is an insufficient supply of data scientists to meet demand—This need
for more data science skills in biomedical/behavioral research reflects a broader
view that, in all sectors, supply for data science talent is significantly lagging
icf.com ©Copyright 2018 ICF 3
Data Science Training and Workforce Development
behind demand, a gap that is only getting wider (Markow et al., 2017 - Burning
Glass Technologies; McKinsey Global Institute, 2011, 2016; NASEM, 2018). It has
been reported that 59% of data science jobs are in the finance and insurance,
professional services, and information technology sectors (Burning Glass
Technologies, 2017), with healthcare representing only a small percentage (5-8%)
of data science jobs. Biomedical and behavioral research was not discussed in
this report, but it is safe to say that the competition for data science talent in
other sectors besides health care and biomedical/behavioral research is fierce,
with data science salaries in the top three sectors averaging $80,000+, and
exceeding six figures for PhD level Data Scientists and Data Engineers.
How do we solve this problem? Getting more people trained and interested
in biomedical/behavioral data science will be critical to meet the increasing
need for data science expertise in the health research workforce. It was with
this perspective in mind that we organized a workshop to discuss biomedical /
behavioral data science training and workforce development, with a particular
focus on establishing opportunities and training experiences for data science
students.
Workshop Structure (see Appendix B for Workshop Agenda)
On June 6, 2018, ICF conducted a workshop to convene a select group of
biomedical leaders, data scientists, industry participants, and academics to
examine the problem and recommend solutions. We structured the workshop
in two main parts. During the first part of the workshop, academic institutions,
industry organizations, and the federal government (i.e., NIH) highlighted the data
science talent, resources, programs, and/or opportunities at their respective
institutions. During the second part of the workshop, participants were divided
into three breakout groups to brainstorm ideas, concepts and issues in three main
areas: 1) behavioral data science, 2) biomedical data science, and 3) processes
for assigning students to data science training opportunities.
Each breakout group included individuals from academia, industry, and
government (i.e., NIH) to provide different perspectives. For the behavioral and
biomedical data science breakout groups, facilitators were instructed to “have
the group generate ideas for behavioral or biomedical exemplar projects where
the inclusion of data science students could significantly enhance or accelerate
the research. The projects can be existing data sets/projects, data sets/projects
currently in the process of being collected, and/or research areas of high impact
that could benefit from data science.” For the “processes for assigning students
to data science training opportunities” workgroup, facilitators were instructed to
“have the group discuss models for matching students with opportunities, as well
as, potential challenges in the process.”
A scientific writer took notes during the workshop. In addition, breakout groups
used flip chart paper and/or dry erase boards to generate and organize their
ideas. Following the concurrent breakout group sessions, breakout group ideas
were shared with all participants and a general group discussion generated
icf.com ©Copyright 2018 ICF 4
Data Science Training and Workforce Development
additional ideas and comments. Themes were derived from reviewing all
information gathered and workshop participants had opportunities to review,
modify, challenge and/or add to the themes.
Key Themes
From the discussions among data science and biomedical/behavioral research
experts at this workshop, several key themes emerged.
Theme #1: Data Science Recognition and Awareness
While NIH has recently emphasized “Data Science” as a priority area for
development (i.e., NIH Strategic Plan for Data Science), NIH research scientists
and program administrators with existing or developing datasets may not have
the same level of understanding of the potential benefits that data science
methods can afford, and there is variability in the level of expertise to employ
those methods. Without knowledge and expertise of the various types of data
science methods that can be applied, scientists and staff could miss some key
opportunities to further explore their datasets.
Recommendation: Provide concrete examples of data science projects to
relevant scientists and staff to increase their knowledge of ways to incorporate
data science methods in various programs and projects.
Theme #2: Finding Engaged Mentors
The importance of identifying and periodically assessing research mentors
was a theme repeated throughout the workshop. Mentors need to understand
the needs of their respective students and have data readily available for
analysis. In addition, a process for assessing how things are proceeding in the
project was noted as critical. Others noted that it would be important to identify
mentors (or points of contact) at the organizations offering data science training
opportunities (e.g., internships, capstone projects, etc.), as well as, the academic
institutions who provide students.
Recommendation: For training programs to work well, mentors should: 1) be
engaged, 2) have data readily available at the start of the project for analysis,
and 3) be matched appropriately based on student characteristics/profiles.
Developing standardized processes or criteria for assessing the readiness of
mentors, as well as, fit with available mentees on a periodic basis will be critical.
Theme #3: Coordinating Opportunities Across Federal Government
Departments / Institutes
NIH has existing infrastructure for certain research training programs and
internships, particularly in the intramural research program. However,
coordination of the various programs and internships across the 27 NIH ICs could
be enhanced. With each IC having their own respective missions and separate
budgets, coordinating activities across ICs can be a challenge. Yet, coordination
icf.com ©Copyright 2018 ICF 5
Data Science Training and Workforce Development
among opportunities could prevent duplication of efforts, provide consistency
across programs, and help the various programs learn from each other. It was
also suggested that, ideally, students should belong to a cohort of students that
are linked by common experiences.
Recommendation: Agency leaders should systematically evaluate the benefits
and challenges of coordinating data science training opportunities across various
departments/institutes. Based on this evaluation, the agency can then identify
potential current and future training activities where the coordination may
maximize the benefits to and synergistically enhance the respective missions of
various institutes, centers, or departments.
Theme #4: Alternative Sources of Data.
Existing intramural and extramural biomedical and behavioral research datasets
have been the primary focus of data science efforts thus far at NIH (e.g., NIH Data
Sharing Repositories, NIH Data Commons). However, other sources of data could
also be made available for analysis by data science mentees. These alternative
data sources include: Program administrative data (e.g., grants administration
data), research products data (e.g., PubMed), survey data (e.g., NCI HINTS) and
public information programs (e.g., NCATS GARD program). Applying data science
methods to these alternative data sets could offer additional insights not only
with health research topics, but also improve internal processes within the
organization.
Recommendation: Creating a catalog of data sources, outside of research
data sets, that exist in various institutes, centers and divisions, as well as, key
questions/problems that could be addressed or explored with data science
methods could advance the mission of the agency.
Theme #5: Need for Diverse Data Science Training Opportunities.
Summer internships or projects are relatively short-term in duration and give
students an introduction to biomedical and behavioral research. However, they
may not be optimal for getting students the in-depth knowledge and experience
for making biomedical/behavioral data science a career choice. Longer-term
training projects (e.g., capstone projects) offer more in-depth experiences, but
these projects and opportunities tend to be less centralized at NIH, and are
typically made available within each NIH IC and/or NIH research project. A multi-
module approach is needed with the perspective that engaging universities
and partners with as much flexibility as possible to set up data science training
programs for success. In addition, the incentive for students to participate in
training opportunities may be diverse. Monetary stipends offer one incentive for
students to apply for training opportunities. Another incentive entails offering
course credit as part of the academic program for data science projects.
Recommendation: Build on current university and partner relationships to
establish and offer multi-module training and intern opportunities, aligned with
specific learner incentives.
icf.com ©Copyright 2018 ICF 6
Data Science Training and Workforce Development
Implications
The themes derived from this workshop were generated with the NIH environment
in mind. However, the concepts and themes have relevance to other federal
agencies, organizations offering data science training opportunities, and the
overall data science community.
Data Science Recognition and Awareness: Staff within Federal agencies
and other organizations may not have the knowledge nor expertise to apply
data science methods to highly valuable information. Establishing a common
understanding of data science across the community would be helpful. It may
be more useful to focus on specifying methods that are relevant and pertinent
to data science, rather than establishing a consensus definition of data science
(see Atienza and Berman article 2018). Highlighting existing data science projects
that utilize these methods could help inform interested organizations and staff on
what data science methods are best suited to new or future projects. Moreover,
elucidating data science methods that can be applied generally regardless of
content area, versus methods that are specific to a particular content area (e.g.,
GWAS analyses in genomics research) would help match analytic skill sets with
project needs.
Finding Engaged Mentors: Researchers, program administrative staff, professors,
and data science professionals often have multiple work demands. For data
science training opportunities to be successful, and for students to benefit from
the experiences, identified mentors (in all organizations involved) need to be
actively engaged in various aspects of the training. This brings up the question
about incentives for mentors to choose to participate in data science training
opportunities. For some, training may be an essential part of their job description.
For others, training may be in addition to their primary work roles. Still others
may wish to engage in mentoring (and be excellent mentors), but work demands
preclude them from carving out time to participate in training. While finding
mentors is critical to the success of training programs, the organizational and
individual incentives (and resources) for staff to pursue mentorship activities
need to be examined and delineated.
Coordinating Opportunities: As noted with the NIH environment, various institutes,
centers, and departments have specific missions and separate budgets, and can
operate in silos. This is not unique to NIH, as other federal agencies and industry
organizations also have specific operating budgets. However, coordinating
data science opportunities across large multi-department organizations can
afford benefits with respect to administrative efficiency, synergistic activities,
and enhanced sharing of ideas. But, such coordination likely requires dedicated
resources and senior leadership commitment to make inter-departmental data
science training a priority.
For federal government agencies, decision makers must not only examine
the cost-benefit aspects of coordinating training opportunities, but also the
regulatory and administrative rules governing data and/or the operation of the
agency. For agencies with a strong research mission (e.g., AHRQ, CDC, NIH, NSF),
icf.com ©Copyright 2018 ICF 7
Data Science Training and Workforce Development
human subjects research protection policies (e.g., IRB, OMB) must be considered
when addressing data availability. For agencies with a strong clinical focus (e.g.,
CMS, HRSA, VA), policies and regulations around patient privacy (e.g., HIPAA) need
to be addressed with respect to health information access. Other agencies (e.g.,
DoD, DoJ, DHS) have various datasets that require security clearances to access
and analyze the information. An in-depth discussion of privacy, security, rules
and regulations around data access is well beyond the scope of this paper.
However, these issues must all be considered as agencies consider the various
datasets available to trainees and how to best coordinate the sharing and
analyses of this data.
Alternative Sources of Data: As noted above, not all government agencies have
a primary research mission. Thus, exploring other sources of data to which data
science methods can be utilized may be even more important for other agencies
who wish to develop training programs. Application of data science methods
to administrative data could provide insight on how to make administrative
processes more efficient. Information curated for the public by various agencies
may also be of interest to data scientists, particularly if it is “big data”. Even an
organization’s website and/or social media data may be useful for analysis using
data science methods. The types of data that could be made available for data
scientists will depend on the mission of each organization and the information
each systematically collects. But, datasets that could be examined or explored
with data science methods are not simply limited to “research data” collected by
“research organizations”.
Diverse Training Opportunities: Data science jobs are proliferating in the finance
and insurance, professional services, and information technology sectors, with
other fields lagging behind. To attract student talent to the government and/or
biomedical/behavioral research, diverse training opportunities will be important
to offer. In this highly competitive landscape, the demand for data scientists is
currently greater than the supply side. While some students may be intrinsically
motivated to pursue careers in public service and/or biomedical/behavioral
research, a significant number of students will be lured to the lucrative offerings
in other sectors.
Having training opportunities that introduce data science students to important
work in public service and/or biomedical/behavioral research and offering a
variety of career tracks into these sectors will be important.
The inter-relationship of data sharing and data science training: One additional
note regarding data sharing and data science training opportunities. It is not
beyond the realm of possibilities for public-private partnerships to develop with
data science training as a key component to the partnership. The sharing of data
among the respective organizations could offer exciting possibilities for students
to analyze merged datasets. Data sharing agreements would be required in such
collaborations.
icf.com ©Copyright 2018 ICF 8
Data Science Training and Workforce Development
The State of Virginia passed the Government Data Collection & Dissemination
Practices Act in 2017, which provides guidelines for data sharing and rights of data
subjects and can be used as a general model for future data sharing activities. In
addition, the National Academies 2018 publication puts forth a “data science oath”
and refers to another set of (evolving) principles on the ethics of data sharing,
published by datapractices.org. Delving deeper into these documents, guidelines
and principles could be instructive for establishing data sharing guidelines and
procedures for organizations that wish to collaborate on data science training
partnerships.
Discussion
ICF’s workshop on training opportunities and workforce development for data
science and biomedical/behavioral research generated a number of key themes.
This White Paper outlines these themes and discusses the implication of these
themes as they apply to the broader data science community. The workshop
highlighted student training opportunities at NIH. Other aspects of data science
training and workforce development that deserve further exploration include:
seminar/webinar development, course offerings at various sectors of the
ecosystem, opportunities for data science professionals who wish to transition
to the public sector and/or biomedical/behavioral research, and ethical and
business issues to address with data science partnerships that involve data
science training. With respect to the development of future data science training
programs and opportunities, addressing each of the themes discussed in this
paper would enable public sector, private sector, and academia to craft programs
that simultaneously meet the needs of students, mentors and organizations while
driving federal agency mission success via sound data science research methods.
icf.com ©Copyright 2018 ICF 9
Data Science Training and Workforce Development
References
National Science Foundation (2016). Realizing the potential of data
science. Available at: https://www.nsf.gov/cise/ac-data-science-report/
CISEACDataScienceReport1.19.17.pdf
National Institutes of Health (2018). NIH strategic plan for data science. Available
at: https://datascience.nih.gov/sites/default/files/NIH_Strategic_Plan_for_Data_
Science_Final_508.pdf
National Institutes of Health (2012). Data and informatics working group report
to The Advisory Committee to the Director. Available at: https://acd.od.nih.gov/
documents/reports/DataandInformaticsWorkingGroupReport.pdf
National Academies of Sciences, Engineering, and Medicine (2018). Data
science for undergraduates: Opportunities and options. Available at: http://sites.
nationalacademies.org/cstb/currentprojects/cstb_175246
National Institutes of Health (2017). A platform for biomedical discovery and data-
powered health: National Library of Medicine strategic plan 2017-2027. Available at:
https://www.nlm.nih.gov/pubs/plan/lrp17/NLM_StrategicReport2017_2027.pdf
Markow, W., Braganza, S., Taska, B., Miller S. M., and Hughes D. (2017). The quants
crunch: How the demand for data science skills is disrupting the job market.
Available at: https://www.burning-glass.com/wp-content/uploads/The_Quant_
Crunch.pdf
McKinsey Global Institute (2011). Big data: The next frontier for innovation,
competition and productivity. Available at: https://www.mckinsey.com/business-
functions/digital-mckinsey/our-insights/big-data-the-next-frontier-for-innovation
McKinsey Global Institute (2016). The age of analytics: Competing in a data-driven
world. Available at: https://www.mckinsey.com/business-functions/mckinsey-
analytics/our-insights/the-age-of-analytics-competing-in-a-data-driven-world
The State of Virginia (2017). Government Data Collection & Dissemination Practices
Act, Chapter 38 of Title 2.2 of the Code of Virginia (§ 2.2-3800 et seq.). Available at:
http://dls.virginia.gov/commission/Materials/GDCDPA.pdf
Bloomberg’s Data for Good Exchange (2017). Community principles on ethical data
sharing. Available at: https://datapractices.org/community-principles-on-ethical-
data-sharing/
icf.com ©Copyright 2018 ICF 10
Data Science Training and Workforce Development
Audie A. Atienza, PhD
Senior Fellow
ICF
Richard Ballew, PhD
Senior Business Analyst
ICF
Christopher Barrett, PhD
Executive Director
Biocomplexity Institute of Virginia Tech
Andy Baxevanis, PhD
Director of Computational Biology
George Mason University
Diane Ben-Senia
Science Writer
ICF
Brett Berlin, PhD
Adjunct Professor, Computer Science
George Mason University
Lew Berman, PhD, MS
Vice President
ICF
Philip E. Bourne, PhD
Professor/Director, Data Science Institute
University of Virginia
Regina Bures, PhD
Health Scientist Administrator
NIH/NICHD
Joanne Campbell
Program Coordinator
George Washington University
Elaine Collier, MD
Senior Advisor
NIH/NCATS
Richard Conroy, PhD, MBA
Program Leader, NIH Common Fund
NIH OD
Leslie Derr, PhD
Program Leader, NIH Common Fund
NIH OD
Lisa Federer, MLIS
Informationist
NIH/NLM
Erin Fitzgerald, PhD
Director of National Research Initiatives
University of Maryland
Elizabeth Ginexi, PhD
Health Scientist Administrator
NIH OD/OBSSR
William Hazel, MD
Senior Advisor for Strategic Initiatives
George Mason University
Michael Huerta, PhD
Associate Director
NIH/NLM
Jane Lockmuller, PhD
Special Advisor
NIH/NIAID
Jessica Mazerik, PhD
Health Scientist Administrator/
Special Asst NIH OD
Michele McGuirl, PhD
Health Scientist Administrator
NIH/NCI
Richard Moser, PhD
Health Scientist Administrator
NIH/NCI
Andrea Norris, PhD
Director and CIO
NIH OD/CIT
Nick Weber, MBA
Program Manager, Cloud Services
NIH OD/CIT
Brian Wright, PhD
Co-Director, Data Science Institute
George Washington University
Appendix A:
Workshop Attendees
icf.com ©Copyright 2018 ICF 11
Data Science Training and Workforce Development
Biomedical and Behavioral Data Sciences
The purpose of this workshop is to discuss a proposed data science consortium
aimed at data science training and workforce development in biomedical
and behavioral research. Attendees will discuss needs, opportunities, and
challenges in data science training, as well as, the development of a data science
consortium to enhance and accelerate data science research in the biomedical
and behavioral areas.
Appendix B:
Workshop Agenda
Agenda
7:45 a.m. Registration
8:30 a.m. Welcome and Introductions
Dr. Audie Atienza, ICF
9:00 a.m. Workshop Purpose
Dr. Philip Bourne, University of Virginia
9:15 a.m. Lightning talks (5 minutes each)
Moderator: Dr. Lew Berman, ICF
Topic: Data Science Talent and Opportunities
 Dr. Brett Berlin, George Mason University
 Dr. Brian Wright, George Washington University
 Dr. Erin Fitzgerald, University of Maryland
 Dr. Phil Bourne, University of Virginia
 Dr. Chris Barrett, Virginia Polytechnic Institute and State University
 Dr. Greg Eley, INOVA
 Dr. Jessica Mazerik, National Institutes of Health
9:50 a.m. Discussion of the breakout groups goals and deliverables
Dr. Philip Bourne, University of Virginia
10:00 a.m. Break
10:15 a.m. Breakout groups
 Behavioral data science
Facilitator: Dr. Brian Wright, GWU
 Biomedical data science
Facilitator: Dr. Philip Bourne, UVA
 Process for assigning students to specific projects
Facilitator: Dr. Audie Atienza, ICF
11:45 a.m. Break
12:00 noon Report Out from Breakout Groups
Dr. Brett Berlin, GMU
12:30 p.m. Discussion and Next Steps
Drs. Audie Atienza & Philip Bourne
1:00 p.m. Meeting Adjourned
icf.com ©Copyright 2018 ICF 12
Data Science Training and Workforce Development
Aboout the Authors
Dr. Audie Atienza is a Senior Fellow at ICF currently working in
support of ICF’s growing clinical and biomedical informatics
areas, developing strategies and applying broad ICF subject
matter expertise clients’ largest data analytics challenges. He
provides expertise in the areas of health information technology
innovation, data science, real-time data capture, research
methodology, disease prevention, behavioral science, and health policy.
Dr. Atienza holds more than 15 years of experience leading initiatives across the
U.S. Health and Human Services (HHS) agency addressing major federal data
challenges. Prior to joining ICF, Dr. Atienza served as Senior Advisor to the Associate
Director for Data Science at the National Institutes of Health, and Behavioral
Scientist at the National Cancer Institute. He also previously served as Senior
Advisor to the Chief Technology Officer at HHS, where he led health technology
initiatives in collaboration with the White House, Office of the U.S. Surgeon General,
and Office of National Coordinator for Health Information Technology.
Dr. Atienza holds a B.A. in psychology from the University of California at San
Diego and a Ph.D. in clinical psychology from Kent State University. He completed
his clinical psychology internship with a specialty in behavioral medicine at the
Palo Alto VA, and was selected as a post-doctoral research fellow at the Stanford
University School of Medicine.
Dr. Lewis Berman is a Vice President with ICF, providing
professional services to National Institutes of Health (NIH) and
Centers for Disease Control and Prevention (CDC) clients in
health data collection and survey research. Dr. Berman also
serves as the ICF Institutional Official for Human Subjects
Research Protection. He leads research and development (R&D)
efforts in data science, health data brokering, nonprobability designs, improving
the efficiency and quality of health surveys, and voice recognition cognitive
computing. Previously, Dr. Berman served as the CDC National Health and Nutrition
Examination Survey (NHANES) Deputy Director and held positions at National
Library of Medicine and Naval Research Laboratory. During his career, he has
contributed to prominent national public health and defense programs including
NHANES, National Health Interview Survey (NHIS), National Children’s Study (NCS),
New York City HANES I, Survey of the Health of Wisconsin, and the Oregon Health
Insurance Experiment. He has conducted research and development in public
health, informatics, imaging, and survey research.
Dr. Berman holds a PhD in Computer Science from George Washington University,
an MS in Computer Science / Artificial Intelligence from George Washington
University, and a BS in Computer Science from the University of Maryland.
icf.com ©Copyright 2018 ICF 13
EET PPR 1015 0511
Data Science
Training and Workforce
Development
Visit us at icf.com/health
Any views or opinions expressed in this white
paper are solely those of the author(s) and do
not necessarily represent those of ICF. This white
paper is provided for informational purposes only
and the contents are subject to change without
notice. No contractual obligations are formed
directly or indirectly by this document. ICF MAKES
NO WARRANTIES, EXPRESS, IMPLIED, OR STATUTORY,
AS TO THE INFORMATION IN THIS DOCUMENT.
No part of this document may be reproduced
or transmitted in any form, or by any means
(electronic, mechanical, or otherwise), for any
purpose without prior written permission.
ICF and ICF INTERNATIONAL are registered
trademarks of ICF and/or its affiliates. Other
names may be trademarks of their respective
owners.
For more information, contact:
Dr. Audie Atienza
Audie.Atienza@icf.com +1.301.5720.5361
Dr. Lewis Berman
Lewis.Berman@icf.com +1.301.407.6833
twitter.com/icfi
linkedin.com/company/icf-international
facebook.com/ICFInternational/
icf.com ©Copyright 2018 ICF
About ICF
ICF (NASDAQ:ICFI) is a global consulting services company with over 5,000 specialized
experts, but we are not your typical consultants. At ICF, business analysts and policy
specialists work together with digital strategists, data scientists and creatives. We
combine unmatched industry expertise with cutting-edge engagement capabilities to
help organizations solve their most complex challenges. Since 1969, public and private
sector clients have worked with ICF to navigate change and shape the future. Learn
more at icf.com.

More Related Content

PDF
Federal funder mandates
PPT
Data Science in Biomedicine - Where Are We Headed?
PPT
The Thinking Behind Big Data at the NIH
PPT
A Successful Academic Medical Center Must be a Truly Digital Enterprise
PPTX
Health data mining
PDF
CADRA Project - Program Evaluation Final Report 2015
PDF
Improving health care outcomes with responsible data science #escience2018
PDF
Improving health care outcomes with responsible data science
Federal funder mandates
Data Science in Biomedicine - Where Are We Headed?
The Thinking Behind Big Data at the NIH
A Successful Academic Medical Center Must be a Truly Digital Enterprise
Health data mining
CADRA Project - Program Evaluation Final Report 2015
Improving health care outcomes with responsible data science #escience2018
Improving health care outcomes with responsible data science

What's hot (20)

PDF
Developing-a-Clustering-Model-based-on-K-Means-Algorithm-in-order-to-Creating...
PPTX
NSF Data Management Requirements 101
PDF
What to do about data? An overview of guidelines and policies for dataset co...
PPTX
The Role of the FAIR Guiding Principles for an effective Learning Health System
PPT
The Vision for Data @ the NIH
PDF
Foresight by Online Communities - The Case of Renewable Energies
PDF
Big Data Analytics and E Learning in Higher Education. Tulasi.B & Suchithra.R
PPTX
A SWOT Analysis of Data Science @ NIH
PPTX
The role of the FAIR Guiding Principles in a Learning Health System
PPTX
2011 dlf-halbert-data res
PPT
Big Data in Biomedicine – An NIH Perspective
PPTX
Building and providing data management services a framework for everyone!
PPT
Data Analytics
PDF
Clinical Research Informatics Year-in-Review 2021
PDF
1 actonnect overview
PDF
Hahn "Wikidata as a hub to library linked data re-use"
PPT
Big Data in Biomedicine: Where is the NIH Headed
PPTX
DDOD for FOIA organizations
PDF
Big Data Research Trend and Forecast (2005-2015): An Informetrics Perspective
PPTX
What Can Happen when Genome Sciences Meets Data Sciences?
Developing-a-Clustering-Model-based-on-K-Means-Algorithm-in-order-to-Creating...
NSF Data Management Requirements 101
What to do about data? An overview of guidelines and policies for dataset co...
The Role of the FAIR Guiding Principles for an effective Learning Health System
The Vision for Data @ the NIH
Foresight by Online Communities - The Case of Renewable Energies
Big Data Analytics and E Learning in Higher Education. Tulasi.B & Suchithra.R
A SWOT Analysis of Data Science @ NIH
The role of the FAIR Guiding Principles in a Learning Health System
2011 dlf-halbert-data res
Big Data in Biomedicine – An NIH Perspective
Building and providing data management services a framework for everyone!
Data Analytics
Clinical Research Informatics Year-in-Review 2021
1 actonnect overview
Hahn "Wikidata as a hub to library linked data re-use"
Big Data in Biomedicine: Where is the NIH Headed
DDOD for FOIA organizations
Big Data Research Trend and Forecast (2005-2015): An Informetrics Perspective
What Can Happen when Genome Sciences Meets Data Sciences?
Ad

Similar to Data Science Training and Workforce Development (20)

PPTX
Preface to a Strategic Plan for Data Science at the NIH
PDF
Insight white paper_2014
PPTX
PSB2014 A Vision for Biomedical Research
PDF
Accretive Health - Quality Management in Health Care
PPT
Data at the NIH
PPTX
Managing and Sharing Research Data
PDF
Driving Data and Cognitive Sciences Curriculum at the Nexus of Society, Polic...
PPT
Workshop intro090314
PDF
Whitepaper - The need self service data tools, not scientists
PPT
AMIA 2014
PDF
1030 track1 bennett
PDF
Data Science Growth Accelerator
PDF
Data Science Whitepaper
PDF
The Human Face of Data: People Science
PPTX
Atul Butte's presentation to the Association of Medical School Pediatric Depa...
PPTX
Are Funders and Academic Institutions Approaches to Data Science Aligned
PDF
The Committed Workforce Evidence From The Field 1st Edition Yannis Markovits
PPT
Yale Day of Data
PDF
Data Management and Broader Impacts: a holistic approach
PPTX
January RWJF HC3 Webinar - Future of Nursing: Campaign for Action
Preface to a Strategic Plan for Data Science at the NIH
Insight white paper_2014
PSB2014 A Vision for Biomedical Research
Accretive Health - Quality Management in Health Care
Data at the NIH
Managing and Sharing Research Data
Driving Data and Cognitive Sciences Curriculum at the Nexus of Society, Polic...
Workshop intro090314
Whitepaper - The need self service data tools, not scientists
AMIA 2014
1030 track1 bennett
Data Science Growth Accelerator
Data Science Whitepaper
The Human Face of Data: People Science
Atul Butte's presentation to the Association of Medical School Pediatric Depa...
Are Funders and Academic Institutions Approaches to Data Science Aligned
The Committed Workforce Evidence From The Field 1st Edition Yannis Markovits
Yale Day of Data
Data Management and Broader Impacts: a holistic approach
January RWJF HC3 Webinar - Future of Nursing: Campaign for Action
Ad

More from Lew Berman (17)

PPTX
2022-06-07 Berman Lew Great Plains Conference FINAL.pptx
PPTX
accm-brfss-2022-presentation-draft.pptx
PPTX
FedCASIC 2019: Survey Respondent Segmentation: Trust in Government Surveys
PPTX
FedCASIC 2019: Dimensions of Participation: Physical Measures
PPTX
FedCASIC 2019: Topic Salience and Propensity to Respond to Surveys: Findings ...
PPTX
FedCASIC 2019: On Using Cognitive Computing and Machine Learning Tools to Imp...
PPTX
FedCASIC 2019: Designing, implementing, and analyzing Leverage Saliency Theor...
PPTX
FedCASIC 2017: Childhood Immunization Attitudes and Behavior: National Survey...
PPTX
IFD&TC 2012: Validating in-home Measures for the National Health Interview Su...
PPTX
IFD&TC 2012: Use of Text Messaging for NHANES
PPTX
IFD&TC 2019: Technical Challenges and Solutions in Center Management
PPTX
IFD&TC 2019: Automating Call Center Monitoring
PPTX
Willingness and Reasons for Unlikeliness to Share Child Immunization Records ...
PDF
IFD&TC 2018: An Experiment with Voice Recognition to Improve Call Center Quality
PDF
IFD&TC 2018: A Novel Approach for Conveniently and Securely Collecting Person...
PDF
Berman pcori challenge powerpoint
PDF
Berman pcori challenge document
2022-06-07 Berman Lew Great Plains Conference FINAL.pptx
accm-brfss-2022-presentation-draft.pptx
FedCASIC 2019: Survey Respondent Segmentation: Trust in Government Surveys
FedCASIC 2019: Dimensions of Participation: Physical Measures
FedCASIC 2019: Topic Salience and Propensity to Respond to Surveys: Findings ...
FedCASIC 2019: On Using Cognitive Computing and Machine Learning Tools to Imp...
FedCASIC 2019: Designing, implementing, and analyzing Leverage Saliency Theor...
FedCASIC 2017: Childhood Immunization Attitudes and Behavior: National Survey...
IFD&TC 2012: Validating in-home Measures for the National Health Interview Su...
IFD&TC 2012: Use of Text Messaging for NHANES
IFD&TC 2019: Technical Challenges and Solutions in Center Management
IFD&TC 2019: Automating Call Center Monitoring
Willingness and Reasons for Unlikeliness to Share Child Immunization Records ...
IFD&TC 2018: An Experiment with Voice Recognition to Improve Call Center Quality
IFD&TC 2018: A Novel Approach for Conveniently and Securely Collecting Person...
Berman pcori challenge powerpoint
Berman pcori challenge document

Recently uploaded (20)

PPTX
research framework and review of related literature chapter 2
PPT
dsa Lec-1 Introduction FOR THE STUDENTS OF bscs
PDF
Concepts of Database Management, 10th Edition by Lisa Friedrichsen Test Bank.pdf
PPTX
ifsm.pptx, institutional food service management
PPT
What is life? We never know the answer exactly
PPTX
DATA MODELING, data model concepts, types of data concepts
PDF
technical specifications solar ear 2025.
PPTX
1 hour to get there before the game is done so you don’t need a car seat for ...
PDF
Session 11 - Data Visualization Storytelling (2).pdf
PPTX
transformers as a tool for understanding advance algorithms in deep learning
PDF
Grey Minimalist Professional Project Presentation (1).pdf
PPTX
langchainpptforbeginners_easy_explanation.pptx
PDF
The Role of Pathology AI in Translational Cancer Research and Education
PPTX
cp-and-safeguarding-training-2018-2019-mmfv2-230818062456-767bc1a7.pptx
PPTX
ch20 Database System Architecture by Rizvee
PDF
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
PPTX
Capstone Presentation a.pptx on data sci
PPTX
recommendation Project PPT with details attached
PPTX
Machine Learning and working of machine Learning
PPT
Classification methods in data analytics.ppt
research framework and review of related literature chapter 2
dsa Lec-1 Introduction FOR THE STUDENTS OF bscs
Concepts of Database Management, 10th Edition by Lisa Friedrichsen Test Bank.pdf
ifsm.pptx, institutional food service management
What is life? We never know the answer exactly
DATA MODELING, data model concepts, types of data concepts
technical specifications solar ear 2025.
1 hour to get there before the game is done so you don’t need a car seat for ...
Session 11 - Data Visualization Storytelling (2).pdf
transformers as a tool for understanding advance algorithms in deep learning
Grey Minimalist Professional Project Presentation (1).pdf
langchainpptforbeginners_easy_explanation.pptx
The Role of Pathology AI in Translational Cancer Research and Education
cp-and-safeguarding-training-2018-2019-mmfv2-230818062456-767bc1a7.pptx
ch20 Database System Architecture by Rizvee
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
Capstone Presentation a.pptx on data sci
recommendation Project PPT with details attached
Machine Learning and working of machine Learning
Classification methods in data analytics.ppt

Data Science Training and Workforce Development

  • 1. icf.com ©Copyright 2018 ICF 11 Executive Summary Overall, the demand for data science talent is outpacing the current supply, and many who are being trained in data science methods are pursuing careers in sectors other than public service or biomedical/behavioral research. On June 6, 2018 ICF hosted a workshop with participants from academia, industry organizations and the federal government. The following themes emerged when discussing data science training and workforce development.  Organizations interested in incorporating data science need to increase knowledge and/or expertise among staff of data science methods and how they can be implemented to solve problems.  Identifying engaged mentors is critical for the success of data science training programs.  Coordination of data science training opportunities across departments of a large organization may be challenging. But, coordination offers several benefits for the organization and the workforce.  Organizations may have a variety of data sources amenable to analysis using data science methods, beyond “research datasets”. Importantly, organizations need to generate questions or problems they wish to address using data science methods with the various datasets available.  Diverse training opportunities are needed to attract data science talent to projects in public service and/or biomedical/behavioral research. Incentives to encourage data science talent to pursue work in the public sector should be explored. Novel partnerships between the public sector, academia and industry should also be explored to meet the increasing demand for data science in all three sectors. As interest in employing data science methods expands in federal government agencies and other organizations, how to structure data science training opportunities and improve data science skills in the workforce of the organization will become more salient. Data Science Training and Workforce Development By Audie A. Atienza, PhD, and Lewis Berman, PhD, MS, ICF Acknowledgments: We wish to thank the workshop participants for their thoughtful discussions during the meeting. We also express much gratitude to Philip Bourne, Brian Wright, Jessica Mazerik, Debra Tomchek, and Jeffrey Neal1 for their insightful comments on earlier drafts of this paper. 1 Debra Tomchek and Jeffrey Neal are at ICF and did not attend the workshop. icf.com White Paper Public Health
  • 2. icf.com ©Copyright 2018 ICF 2 Data Science Training and Workforce Development Introduction ICF held a workshop - Biomedical and Behavioral Data Sciences Regional Consortium Planning Workshop (June 6, 2018; Bethesda, MD) - with participants from academia, industry organizations and the federal government (i.e., NIH) located in the Maryland, Virginia, and Washington, DC area to discuss data science training and workforce development in biomedical and behavioral research (see Appendix A for participants). The purpose of this White Paper is two-fold: 1) Discuss several of the key workshop themes focused on biomedical and behavioral research. 2) Discuss the implications of these themes for the broader data science community. The Challenge National Institutes of Health, and other federal organizations, are experiencing an increasing demand for data scientists. However, there are multiple factors which have led to a dearth of data scientists within federal government workforce. Underlying factors include: The demand for data science research is growing—Several influential research organizations have produced reports focused on Data Science. In December 2016, The National Science Foundation published an advisory committee report – Realizing the Potential of Data Science. In 2018, the National Institutes of Health (NIH) released their Strategic Plan for Data Science, one of the key recommendations of the 2012 Data and Informatics Working Group Report to the Advisory Committee to the NIH Director. In 2018, the National Academies of Sciences, Engineering, and Medicine (NASEM) published a consensus study report - Data Science for Undergraduates: Opportunities and Options. This growing demand is notably illustrated by interest in data science in biomedical research. An analysis conducted by Dr. Audie Atienza, of ICF, on the various NIH Strategic Plans (as of January 2017) found that 22 out of the 25 NIH Institutes and Center (ICs) with publicly available strategic plans highlighted at least one data science term (terms: “data science”, “big data”, “common data”, “bioinformatics”, “computational biology”, and “data integration”) within their respective strategic plans. Of all the data science terms examined, “bioinformatics” and “big data” had the highest number of instances (most mentions) at 75 and 44, respectively. [Data available from the author, Dr. Audie Atienza, upon request]. Thus, Data Science is strategically important for most all of the NIH ICs. Of particular note in recent NIH documents focused on data science (e.g., NIH Strategic Plan for Data Science; A Platform for Biomedical Discovery and Data-Powered Health: National Library of Medicine Strategic Plan 2017-2027) is the emphasis on expanding data science skills within NIH’s internal workforce, which will require novel strategies to train existing staff and attract new data science talent. There is an insufficient supply of data scientists to meet demand—This need for more data science skills in biomedical/behavioral research reflects a broader view that, in all sectors, supply for data science talent is significantly lagging
  • 3. icf.com ©Copyright 2018 ICF 3 Data Science Training and Workforce Development behind demand, a gap that is only getting wider (Markow et al., 2017 - Burning Glass Technologies; McKinsey Global Institute, 2011, 2016; NASEM, 2018). It has been reported that 59% of data science jobs are in the finance and insurance, professional services, and information technology sectors (Burning Glass Technologies, 2017), with healthcare representing only a small percentage (5-8%) of data science jobs. Biomedical and behavioral research was not discussed in this report, but it is safe to say that the competition for data science talent in other sectors besides health care and biomedical/behavioral research is fierce, with data science salaries in the top three sectors averaging $80,000+, and exceeding six figures for PhD level Data Scientists and Data Engineers. How do we solve this problem? Getting more people trained and interested in biomedical/behavioral data science will be critical to meet the increasing need for data science expertise in the health research workforce. It was with this perspective in mind that we organized a workshop to discuss biomedical / behavioral data science training and workforce development, with a particular focus on establishing opportunities and training experiences for data science students. Workshop Structure (see Appendix B for Workshop Agenda) On June 6, 2018, ICF conducted a workshop to convene a select group of biomedical leaders, data scientists, industry participants, and academics to examine the problem and recommend solutions. We structured the workshop in two main parts. During the first part of the workshop, academic institutions, industry organizations, and the federal government (i.e., NIH) highlighted the data science talent, resources, programs, and/or opportunities at their respective institutions. During the second part of the workshop, participants were divided into three breakout groups to brainstorm ideas, concepts and issues in three main areas: 1) behavioral data science, 2) biomedical data science, and 3) processes for assigning students to data science training opportunities. Each breakout group included individuals from academia, industry, and government (i.e., NIH) to provide different perspectives. For the behavioral and biomedical data science breakout groups, facilitators were instructed to “have the group generate ideas for behavioral or biomedical exemplar projects where the inclusion of data science students could significantly enhance or accelerate the research. The projects can be existing data sets/projects, data sets/projects currently in the process of being collected, and/or research areas of high impact that could benefit from data science.” For the “processes for assigning students to data science training opportunities” workgroup, facilitators were instructed to “have the group discuss models for matching students with opportunities, as well as, potential challenges in the process.” A scientific writer took notes during the workshop. In addition, breakout groups used flip chart paper and/or dry erase boards to generate and organize their ideas. Following the concurrent breakout group sessions, breakout group ideas were shared with all participants and a general group discussion generated
  • 4. icf.com ©Copyright 2018 ICF 4 Data Science Training and Workforce Development additional ideas and comments. Themes were derived from reviewing all information gathered and workshop participants had opportunities to review, modify, challenge and/or add to the themes. Key Themes From the discussions among data science and biomedical/behavioral research experts at this workshop, several key themes emerged. Theme #1: Data Science Recognition and Awareness While NIH has recently emphasized “Data Science” as a priority area for development (i.e., NIH Strategic Plan for Data Science), NIH research scientists and program administrators with existing or developing datasets may not have the same level of understanding of the potential benefits that data science methods can afford, and there is variability in the level of expertise to employ those methods. Without knowledge and expertise of the various types of data science methods that can be applied, scientists and staff could miss some key opportunities to further explore their datasets. Recommendation: Provide concrete examples of data science projects to relevant scientists and staff to increase their knowledge of ways to incorporate data science methods in various programs and projects. Theme #2: Finding Engaged Mentors The importance of identifying and periodically assessing research mentors was a theme repeated throughout the workshop. Mentors need to understand the needs of their respective students and have data readily available for analysis. In addition, a process for assessing how things are proceeding in the project was noted as critical. Others noted that it would be important to identify mentors (or points of contact) at the organizations offering data science training opportunities (e.g., internships, capstone projects, etc.), as well as, the academic institutions who provide students. Recommendation: For training programs to work well, mentors should: 1) be engaged, 2) have data readily available at the start of the project for analysis, and 3) be matched appropriately based on student characteristics/profiles. Developing standardized processes or criteria for assessing the readiness of mentors, as well as, fit with available mentees on a periodic basis will be critical. Theme #3: Coordinating Opportunities Across Federal Government Departments / Institutes NIH has existing infrastructure for certain research training programs and internships, particularly in the intramural research program. However, coordination of the various programs and internships across the 27 NIH ICs could be enhanced. With each IC having their own respective missions and separate budgets, coordinating activities across ICs can be a challenge. Yet, coordination
  • 5. icf.com ©Copyright 2018 ICF 5 Data Science Training and Workforce Development among opportunities could prevent duplication of efforts, provide consistency across programs, and help the various programs learn from each other. It was also suggested that, ideally, students should belong to a cohort of students that are linked by common experiences. Recommendation: Agency leaders should systematically evaluate the benefits and challenges of coordinating data science training opportunities across various departments/institutes. Based on this evaluation, the agency can then identify potential current and future training activities where the coordination may maximize the benefits to and synergistically enhance the respective missions of various institutes, centers, or departments. Theme #4: Alternative Sources of Data. Existing intramural and extramural biomedical and behavioral research datasets have been the primary focus of data science efforts thus far at NIH (e.g., NIH Data Sharing Repositories, NIH Data Commons). However, other sources of data could also be made available for analysis by data science mentees. These alternative data sources include: Program administrative data (e.g., grants administration data), research products data (e.g., PubMed), survey data (e.g., NCI HINTS) and public information programs (e.g., NCATS GARD program). Applying data science methods to these alternative data sets could offer additional insights not only with health research topics, but also improve internal processes within the organization. Recommendation: Creating a catalog of data sources, outside of research data sets, that exist in various institutes, centers and divisions, as well as, key questions/problems that could be addressed or explored with data science methods could advance the mission of the agency. Theme #5: Need for Diverse Data Science Training Opportunities. Summer internships or projects are relatively short-term in duration and give students an introduction to biomedical and behavioral research. However, they may not be optimal for getting students the in-depth knowledge and experience for making biomedical/behavioral data science a career choice. Longer-term training projects (e.g., capstone projects) offer more in-depth experiences, but these projects and opportunities tend to be less centralized at NIH, and are typically made available within each NIH IC and/or NIH research project. A multi- module approach is needed with the perspective that engaging universities and partners with as much flexibility as possible to set up data science training programs for success. In addition, the incentive for students to participate in training opportunities may be diverse. Monetary stipends offer one incentive for students to apply for training opportunities. Another incentive entails offering course credit as part of the academic program for data science projects. Recommendation: Build on current university and partner relationships to establish and offer multi-module training and intern opportunities, aligned with specific learner incentives.
  • 6. icf.com ©Copyright 2018 ICF 6 Data Science Training and Workforce Development Implications The themes derived from this workshop were generated with the NIH environment in mind. However, the concepts and themes have relevance to other federal agencies, organizations offering data science training opportunities, and the overall data science community. Data Science Recognition and Awareness: Staff within Federal agencies and other organizations may not have the knowledge nor expertise to apply data science methods to highly valuable information. Establishing a common understanding of data science across the community would be helpful. It may be more useful to focus on specifying methods that are relevant and pertinent to data science, rather than establishing a consensus definition of data science (see Atienza and Berman article 2018). Highlighting existing data science projects that utilize these methods could help inform interested organizations and staff on what data science methods are best suited to new or future projects. Moreover, elucidating data science methods that can be applied generally regardless of content area, versus methods that are specific to a particular content area (e.g., GWAS analyses in genomics research) would help match analytic skill sets with project needs. Finding Engaged Mentors: Researchers, program administrative staff, professors, and data science professionals often have multiple work demands. For data science training opportunities to be successful, and for students to benefit from the experiences, identified mentors (in all organizations involved) need to be actively engaged in various aspects of the training. This brings up the question about incentives for mentors to choose to participate in data science training opportunities. For some, training may be an essential part of their job description. For others, training may be in addition to their primary work roles. Still others may wish to engage in mentoring (and be excellent mentors), but work demands preclude them from carving out time to participate in training. While finding mentors is critical to the success of training programs, the organizational and individual incentives (and resources) for staff to pursue mentorship activities need to be examined and delineated. Coordinating Opportunities: As noted with the NIH environment, various institutes, centers, and departments have specific missions and separate budgets, and can operate in silos. This is not unique to NIH, as other federal agencies and industry organizations also have specific operating budgets. However, coordinating data science opportunities across large multi-department organizations can afford benefits with respect to administrative efficiency, synergistic activities, and enhanced sharing of ideas. But, such coordination likely requires dedicated resources and senior leadership commitment to make inter-departmental data science training a priority. For federal government agencies, decision makers must not only examine the cost-benefit aspects of coordinating training opportunities, but also the regulatory and administrative rules governing data and/or the operation of the agency. For agencies with a strong research mission (e.g., AHRQ, CDC, NIH, NSF),
  • 7. icf.com ©Copyright 2018 ICF 7 Data Science Training and Workforce Development human subjects research protection policies (e.g., IRB, OMB) must be considered when addressing data availability. For agencies with a strong clinical focus (e.g., CMS, HRSA, VA), policies and regulations around patient privacy (e.g., HIPAA) need to be addressed with respect to health information access. Other agencies (e.g., DoD, DoJ, DHS) have various datasets that require security clearances to access and analyze the information. An in-depth discussion of privacy, security, rules and regulations around data access is well beyond the scope of this paper. However, these issues must all be considered as agencies consider the various datasets available to trainees and how to best coordinate the sharing and analyses of this data. Alternative Sources of Data: As noted above, not all government agencies have a primary research mission. Thus, exploring other sources of data to which data science methods can be utilized may be even more important for other agencies who wish to develop training programs. Application of data science methods to administrative data could provide insight on how to make administrative processes more efficient. Information curated for the public by various agencies may also be of interest to data scientists, particularly if it is “big data”. Even an organization’s website and/or social media data may be useful for analysis using data science methods. The types of data that could be made available for data scientists will depend on the mission of each organization and the information each systematically collects. But, datasets that could be examined or explored with data science methods are not simply limited to “research data” collected by “research organizations”. Diverse Training Opportunities: Data science jobs are proliferating in the finance and insurance, professional services, and information technology sectors, with other fields lagging behind. To attract student talent to the government and/or biomedical/behavioral research, diverse training opportunities will be important to offer. In this highly competitive landscape, the demand for data scientists is currently greater than the supply side. While some students may be intrinsically motivated to pursue careers in public service and/or biomedical/behavioral research, a significant number of students will be lured to the lucrative offerings in other sectors. Having training opportunities that introduce data science students to important work in public service and/or biomedical/behavioral research and offering a variety of career tracks into these sectors will be important. The inter-relationship of data sharing and data science training: One additional note regarding data sharing and data science training opportunities. It is not beyond the realm of possibilities for public-private partnerships to develop with data science training as a key component to the partnership. The sharing of data among the respective organizations could offer exciting possibilities for students to analyze merged datasets. Data sharing agreements would be required in such collaborations.
  • 8. icf.com ©Copyright 2018 ICF 8 Data Science Training and Workforce Development The State of Virginia passed the Government Data Collection & Dissemination Practices Act in 2017, which provides guidelines for data sharing and rights of data subjects and can be used as a general model for future data sharing activities. In addition, the National Academies 2018 publication puts forth a “data science oath” and refers to another set of (evolving) principles on the ethics of data sharing, published by datapractices.org. Delving deeper into these documents, guidelines and principles could be instructive for establishing data sharing guidelines and procedures for organizations that wish to collaborate on data science training partnerships. Discussion ICF’s workshop on training opportunities and workforce development for data science and biomedical/behavioral research generated a number of key themes. This White Paper outlines these themes and discusses the implication of these themes as they apply to the broader data science community. The workshop highlighted student training opportunities at NIH. Other aspects of data science training and workforce development that deserve further exploration include: seminar/webinar development, course offerings at various sectors of the ecosystem, opportunities for data science professionals who wish to transition to the public sector and/or biomedical/behavioral research, and ethical and business issues to address with data science partnerships that involve data science training. With respect to the development of future data science training programs and opportunities, addressing each of the themes discussed in this paper would enable public sector, private sector, and academia to craft programs that simultaneously meet the needs of students, mentors and organizations while driving federal agency mission success via sound data science research methods.
  • 9. icf.com ©Copyright 2018 ICF 9 Data Science Training and Workforce Development References National Science Foundation (2016). Realizing the potential of data science. Available at: https://www.nsf.gov/cise/ac-data-science-report/ CISEACDataScienceReport1.19.17.pdf National Institutes of Health (2018). NIH strategic plan for data science. Available at: https://datascience.nih.gov/sites/default/files/NIH_Strategic_Plan_for_Data_ Science_Final_508.pdf National Institutes of Health (2012). Data and informatics working group report to The Advisory Committee to the Director. Available at: https://acd.od.nih.gov/ documents/reports/DataandInformaticsWorkingGroupReport.pdf National Academies of Sciences, Engineering, and Medicine (2018). Data science for undergraduates: Opportunities and options. Available at: http://sites. nationalacademies.org/cstb/currentprojects/cstb_175246 National Institutes of Health (2017). A platform for biomedical discovery and data- powered health: National Library of Medicine strategic plan 2017-2027. Available at: https://www.nlm.nih.gov/pubs/plan/lrp17/NLM_StrategicReport2017_2027.pdf Markow, W., Braganza, S., Taska, B., Miller S. M., and Hughes D. (2017). The quants crunch: How the demand for data science skills is disrupting the job market. Available at: https://www.burning-glass.com/wp-content/uploads/The_Quant_ Crunch.pdf McKinsey Global Institute (2011). Big data: The next frontier for innovation, competition and productivity. Available at: https://www.mckinsey.com/business- functions/digital-mckinsey/our-insights/big-data-the-next-frontier-for-innovation McKinsey Global Institute (2016). The age of analytics: Competing in a data-driven world. Available at: https://www.mckinsey.com/business-functions/mckinsey- analytics/our-insights/the-age-of-analytics-competing-in-a-data-driven-world The State of Virginia (2017). Government Data Collection & Dissemination Practices Act, Chapter 38 of Title 2.2 of the Code of Virginia (§ 2.2-3800 et seq.). Available at: http://dls.virginia.gov/commission/Materials/GDCDPA.pdf Bloomberg’s Data for Good Exchange (2017). Community principles on ethical data sharing. Available at: https://datapractices.org/community-principles-on-ethical- data-sharing/
  • 10. icf.com ©Copyright 2018 ICF 10 Data Science Training and Workforce Development Audie A. Atienza, PhD Senior Fellow ICF Richard Ballew, PhD Senior Business Analyst ICF Christopher Barrett, PhD Executive Director Biocomplexity Institute of Virginia Tech Andy Baxevanis, PhD Director of Computational Biology George Mason University Diane Ben-Senia Science Writer ICF Brett Berlin, PhD Adjunct Professor, Computer Science George Mason University Lew Berman, PhD, MS Vice President ICF Philip E. Bourne, PhD Professor/Director, Data Science Institute University of Virginia Regina Bures, PhD Health Scientist Administrator NIH/NICHD Joanne Campbell Program Coordinator George Washington University Elaine Collier, MD Senior Advisor NIH/NCATS Richard Conroy, PhD, MBA Program Leader, NIH Common Fund NIH OD Leslie Derr, PhD Program Leader, NIH Common Fund NIH OD Lisa Federer, MLIS Informationist NIH/NLM Erin Fitzgerald, PhD Director of National Research Initiatives University of Maryland Elizabeth Ginexi, PhD Health Scientist Administrator NIH OD/OBSSR William Hazel, MD Senior Advisor for Strategic Initiatives George Mason University Michael Huerta, PhD Associate Director NIH/NLM Jane Lockmuller, PhD Special Advisor NIH/NIAID Jessica Mazerik, PhD Health Scientist Administrator/ Special Asst NIH OD Michele McGuirl, PhD Health Scientist Administrator NIH/NCI Richard Moser, PhD Health Scientist Administrator NIH/NCI Andrea Norris, PhD Director and CIO NIH OD/CIT Nick Weber, MBA Program Manager, Cloud Services NIH OD/CIT Brian Wright, PhD Co-Director, Data Science Institute George Washington University Appendix A: Workshop Attendees
  • 11. icf.com ©Copyright 2018 ICF 11 Data Science Training and Workforce Development Biomedical and Behavioral Data Sciences The purpose of this workshop is to discuss a proposed data science consortium aimed at data science training and workforce development in biomedical and behavioral research. Attendees will discuss needs, opportunities, and challenges in data science training, as well as, the development of a data science consortium to enhance and accelerate data science research in the biomedical and behavioral areas. Appendix B: Workshop Agenda Agenda 7:45 a.m. Registration 8:30 a.m. Welcome and Introductions Dr. Audie Atienza, ICF 9:00 a.m. Workshop Purpose Dr. Philip Bourne, University of Virginia 9:15 a.m. Lightning talks (5 minutes each) Moderator: Dr. Lew Berman, ICF Topic: Data Science Talent and Opportunities  Dr. Brett Berlin, George Mason University  Dr. Brian Wright, George Washington University  Dr. Erin Fitzgerald, University of Maryland  Dr. Phil Bourne, University of Virginia  Dr. Chris Barrett, Virginia Polytechnic Institute and State University  Dr. Greg Eley, INOVA  Dr. Jessica Mazerik, National Institutes of Health 9:50 a.m. Discussion of the breakout groups goals and deliverables Dr. Philip Bourne, University of Virginia 10:00 a.m. Break 10:15 a.m. Breakout groups  Behavioral data science Facilitator: Dr. Brian Wright, GWU  Biomedical data science Facilitator: Dr. Philip Bourne, UVA  Process for assigning students to specific projects Facilitator: Dr. Audie Atienza, ICF 11:45 a.m. Break 12:00 noon Report Out from Breakout Groups Dr. Brett Berlin, GMU 12:30 p.m. Discussion and Next Steps Drs. Audie Atienza & Philip Bourne 1:00 p.m. Meeting Adjourned
  • 12. icf.com ©Copyright 2018 ICF 12 Data Science Training and Workforce Development Aboout the Authors Dr. Audie Atienza is a Senior Fellow at ICF currently working in support of ICF’s growing clinical and biomedical informatics areas, developing strategies and applying broad ICF subject matter expertise clients’ largest data analytics challenges. He provides expertise in the areas of health information technology innovation, data science, real-time data capture, research methodology, disease prevention, behavioral science, and health policy. Dr. Atienza holds more than 15 years of experience leading initiatives across the U.S. Health and Human Services (HHS) agency addressing major federal data challenges. Prior to joining ICF, Dr. Atienza served as Senior Advisor to the Associate Director for Data Science at the National Institutes of Health, and Behavioral Scientist at the National Cancer Institute. He also previously served as Senior Advisor to the Chief Technology Officer at HHS, where he led health technology initiatives in collaboration with the White House, Office of the U.S. Surgeon General, and Office of National Coordinator for Health Information Technology. Dr. Atienza holds a B.A. in psychology from the University of California at San Diego and a Ph.D. in clinical psychology from Kent State University. He completed his clinical psychology internship with a specialty in behavioral medicine at the Palo Alto VA, and was selected as a post-doctoral research fellow at the Stanford University School of Medicine. Dr. Lewis Berman is a Vice President with ICF, providing professional services to National Institutes of Health (NIH) and Centers for Disease Control and Prevention (CDC) clients in health data collection and survey research. Dr. Berman also serves as the ICF Institutional Official for Human Subjects Research Protection. He leads research and development (R&D) efforts in data science, health data brokering, nonprobability designs, improving the efficiency and quality of health surveys, and voice recognition cognitive computing. Previously, Dr. Berman served as the CDC National Health and Nutrition Examination Survey (NHANES) Deputy Director and held positions at National Library of Medicine and Naval Research Laboratory. During his career, he has contributed to prominent national public health and defense programs including NHANES, National Health Interview Survey (NHIS), National Children’s Study (NCS), New York City HANES I, Survey of the Health of Wisconsin, and the Oregon Health Insurance Experiment. He has conducted research and development in public health, informatics, imaging, and survey research. Dr. Berman holds a PhD in Computer Science from George Washington University, an MS in Computer Science / Artificial Intelligence from George Washington University, and a BS in Computer Science from the University of Maryland.
  • 13. icf.com ©Copyright 2018 ICF 13 EET PPR 1015 0511 Data Science Training and Workforce Development Visit us at icf.com/health Any views or opinions expressed in this white paper are solely those of the author(s) and do not necessarily represent those of ICF. This white paper is provided for informational purposes only and the contents are subject to change without notice. No contractual obligations are formed directly or indirectly by this document. ICF MAKES NO WARRANTIES, EXPRESS, IMPLIED, OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT. No part of this document may be reproduced or transmitted in any form, or by any means (electronic, mechanical, or otherwise), for any purpose without prior written permission. ICF and ICF INTERNATIONAL are registered trademarks of ICF and/or its affiliates. Other names may be trademarks of their respective owners. For more information, contact: Dr. Audie Atienza [email protected] +1.301.5720.5361 Dr. Lewis Berman [email protected] +1.301.407.6833 twitter.com/icfi linkedin.com/company/icf-international facebook.com/ICFInternational/ icf.com ©Copyright 2018 ICF About ICF ICF (NASDAQ:ICFI) is a global consulting services company with over 5,000 specialized experts, but we are not your typical consultants. At ICF, business analysts and policy specialists work together with digital strategists, data scientists and creatives. We combine unmatched industry expertise with cutting-edge engagement capabilities to help organizations solve their most complex challenges. Since 1969, public and private sector clients have worked with ICF to navigate change and shape the future. Learn more at icf.com.