Big Data in Digital Healthcare Lessons Learnt and Recommendations
Big Data in Digital Healthcare Lessons Learnt and Recommendations
https://doi.org/10.1038/s41437-020-0303-2
REVIEW ARTICLE
Received: 28 June 2019 / Revised: 25 February 2020 / Accepted: 25 February 2020 / Published online: 5 March 2020
© The Author(s) 2020. This article is published with open access
Abstract
Big Data will be an integral part of the next generation of technological developments—allowing us to gain new insights
from the vast quantities of data being produced by modern life. There is significant potential for the application of Big Data
to healthcare, but there are still some impediments to overcome, such as fragmentation, high costs, and questions around data
ownership. Envisioning a future role for Big Data within the digital healthcare context means balancing the benefits of
improving patient outcomes with the potential pitfalls of increasing physician burnout due to poor implementation leading to
added complexity. Oncology, the field where Big Data collection and utilization got a heard start with programs like TCGA
1234567890();,:
1234567890();,:
and the Cancer Moon Shot, provides an instructive example as we see different perspectives provided by the United States
(US), the United Kingdom (UK) and other nations in the implementation of Big Data in patient care with regards to their
centralization and regulatory approach to data. By drawing upon global approaches, we propose recommendations for
guidelines and regulations of data use in healthcare centering on the creation of a unique global patient ID that can integrate
data from a variety of healthcare providers. In addition, we expand upon the topic by discussing potential pitfalls to Big Data
such as the lack of diversity in Big Data research, and the security and transparency risks posed by machine learning
algorithms.
demonstrated in Biology, as initiatives such as the Whole Genomes (PCAWG)) and neuropsychiatric diseases
Genotype-Expression Project are producing enormous (PsychENCODE) (Tomczak et al. 2015; Akbarian et al.
quantities of data to better understand genetic regulation 2015; Campbell et al. 2020). These Big Data generation and
(Aguet et al. 2017). Yet, despite these advances, we see few open-access models have resulted in hundreds of applica-
examples of Big Data being leveraged in healthcare despite tions and scientific publications. The success of these
the opportunities it presents for creating personalized and initiatives in convincing the scientific and healthcare com-
effective treatments. munities of the advantages of sharing clinical and molecular
Effective use of Big Data in Healthcare is enabled by the data have led to major Big Data generation initiatives in a
development and deployment of machine learning (ML) variety of fields across the world such as the “All of Us”
approaches. ML approaches are often interchangeably used project in the US (Denny et al. 2019). The UK has now
with artificial intelligence (AI) approaches. ML and AI only established a clear national strategy that has resulted in the
now make it possible to unravel the patterns, associations, likes of the UK Biobank and 100,000 Genomes projects
correlations and causations in complex, unstructured, non- (Topol 2019b). These projects dovetail with a national
normalized, and unscaled datasets that the Big Data era strategy for the implementation of genomic medicine with
brings (Camacho et al. 2018). This allows it to provide the opening of multiple genome-sequencing sites, and the
actionable analysis on datasets as varied as sequences of introduction of genome sequencing as a standard part of
images (applicable in Radiology) or narratives (patient care for the NHS (Marx 2015). The US has no such national
records) using Natural Language Processing (Deng et al. strategy, and while it has started its own large genomic
2018; Esteva et al. 2019) and bringing all these datasets study—“All of Us”—it does not have any plans for
together to generate prediction models, such as response of implementation in its own healthcare system (Topol 2019b).
a patient to a treatment regimen. Application of ML tools is In this review, we have focussed our discussion on devel-
also supplemented by the now widespread adoption of opments in Big Data in Oncology as a method to understand
Electronic Health Records (EHRs) after the passage of the this complex and fast moving field, and to develop general
Affordable Care Act (2010) and Health Information Tech- guidelines for healthcare at large.
nology for Economic and Clinical Health Act (2009) in the
US, and recent limited adoption in the National Health
Service (NHS) (Garber et al. 2014). EHRs allow patient Big Data initiatives in the United Kingdom
data to become more accessible to both patients and a
variety of physicians, but also researchers by allowing for The UK Biobank is a prospective cohort initiative that is
remote electronic access and easy data manipulation. composed of individuals between the ages of 40 and 69
Oncology care specifically is instructive as to how Big Data before disease onset (Allen et al. 2012; Elliott et al. 2018).
can make a direct impact on patient care. Integrating EHRs The project has collected rich data on 500,000 individuals,
and diagnostic tests such as MRIs, genomic sequencing, and collating together biological samples, physical measures of
other technologies is the big opportunity for Big Data as it patient health, and sociological information such as lifestyle
will allow physicians to better understand the genetic causes and demographics (Allen et al. 2012). In addition to its size,
behind cancers, and therefore design more effective treat- the UK Biobank offers an unparalleled link to outcomes
ment regimens while also improving prevention and through integration with the NHS. This unified healthcare
screening measures (Raghupathi and Raghupathi 2014; system allows researchers to link initial baseline measures
Norgeot et al. 2019). Here, we survey the current challenges with disease outcomes, and with multiple sources of med-
in Big Data in healthcare and use oncology as an instructive ical information from hospital admission to clinical visits.
vignette, highlighting issues of data ownership, sharing, and This allows researchers to be better positioned to minimize
privacy. Our review builds on findings from the US, UK, error in disease classification and diagnosis. The UK Bio-
and other global healthcare systems to propose a funda- bank will also be conducting routine follow-up trials to
mental reorganization of EHRs around unique patient continue to provide information regarding activity and
identifiers and ML. further expanded biological testing to improve disease and
risk factor association.
Beyond the UK Biobank, Public Health England laun-
Current successes of Big Data in healthcare ched the 100,000 Genomes project with the intent to
understand the genetic origins behind common cancers
The UK and the US are both global leaders in healthcare (Turnbull et al. 2018). The massive effort consists of NHS
that will play important roles in the adoption of Big Data. patients consenting to have their genome sequenced and
We see this global leadership already in oncology (The linked to their health records. Without the significant phe-
Cancer Genome Atlas (TCGA), Pan-Cancer Analysis of notypic information collected in the UK Biobank—the
Big data in digital healthcare: lessons learnt and recommendations for general practice 527
project holds limited use as a prospective epidemiological long-term analysis (Connelly et al. 2016). Other nations can
study—but as a great tool for researchers interested in learn from the Swedish example by paying particular
identifying disease causing single-nucleotide polymorph- attention to the use of unique patient identifiers that can map
isms (SNPs). The size of the dataset itself is its main onto a number of datasets collected by government and
advance—as it provides the statistical power to discover the academia—an idea that was first mentioned in the US
associated SNPs even for rare diseases. Furthermore, the Health Insurance Portability and Accountability Act of 1996
100,000 Genomes Project’s ancillary aim is to stimulate (HIPAA) but has not yet been implemented (Davis 2019).
private sector growth in the genomics industry within China has recently become a leader in implementation
England. and development of new digital technologies, and it has
begun to approach healthcare with an emphasis on data
standardization and volume. Already, the central govern-
Big Data initiatives in the United States and ment in China has initiated several funding initiatives aimed
abroad at pushing Big Data into healthcare use cases, with a par-
ticular eye on linking together administrative data, regional
In the United States, the “All of Us” project is expanding claims data from the national health insurance program, and
upon the UK Biobank model by creating a direct link electronic medical records (Zhang et al. 2018). China hopes
between patient genome data and their phenotypes by to do this through leveraging its existing personal identifi-
integrating EHRs, behavioral, and family data into a unique cation system that covers all Chinese nationals—similar to
patient profile (Denny et al. 2019). By creating a standar- the Swedish model of maintaining a variety of regional and
dized and linked database for all patients—“All of Us” will national registries linked by personal identification num-
allow researchers greater scope than the UK BioBank to bers. This is particularly relevant to cancer research as
understand cancers and discover the associated genetic China has established a new cancer registry (National
causes. In addition, “All of Us” succeeds in focusing on Central Cancer Registry of China) that will take advantage
minority populations and health, an area of focus that sets it of the nation’s population size to give unique insight into
apart and gives it greater clinical significance. The UK otherwise rare oncogenesis. Major concerns regarding this
should learn from this effort by expanding the UK Biobank initiative are data quality and time. China has only relatively
project to further include minority populations and integrate recently adopted the International Classification of Diseases
it with ancillary patient data such as from wearables—the (ICD) revision ten coding system, a standardized method
current UK Biobank has ~500,000 patients that identify as for recording disease states alongside prescribed treatments.
white versus ~12,000 (i.e., just <2.5%) that identified as China is also still implementing standardized record keep-
non-white (Cohn et al. 2017). Meanwhile, individuals of ing terminologies at the regional level. This creates con-
Asian ethnicities made up over 7.5% of the UK population siderable heterogeneity in data quality—as well as
as per the 2011 UK Census, with the proportion of mino- inoperability between regions—a major obstacle in any
rities projected to rise in the coming years (O’Brien and national registry effort (Zhang et al. 2018). The recency of
Potter-Collins 2015; Cohn et al. 2017). these efforts also mean that some time is required until
Sweden too provides an informative example of the researchers will be able to take advantage of longitudinal
power of investment in rich electronic research registries analysis—vital for oncology research that aims to spot
(Webster 2014). The Swedish government has committed recurrences or track patient survival. In the future we can
over $70 million dollars in funding per annum to expand a expect significant findings to come out of China’s efforts to
variety of cancer registries that would allow researchers bring hundreds of millions of patient files available to
insight into risk factors for oncogenesis. In addition, its data researchers, but significant advances in standards of care
sources are particularly valuable for scientists, as each and interoperability must be first surpassed.
patient’s entries are linked to unique identity numbers that The large variety of “Big Data” research projects being
can be cross references with over 90 other registries to give undertaken around the world are proposing different
a more complete understanding of a patient’s health and approaches to the future of patient records. The UK is
social circumstances. These registries are not limited to broadly leveraging the centralization of the NHS to link
disease states and treatments, but also encompass extensive genomic data with clinical care records, and opening up the
public administrative records that can provide researchers disease endpoints to researchers through a patient ID.
considerable insight into social indicators of health such as Sweden and China are also adopting this model—lever-
income, occupation, and marital status (Connelly et al. aging unique identity numbers issued to citizens to link
2016). These data sources become even more valuable to otherwise disconnected datasets from administrative and
Swedish researchers as they have been in place for decades healthcare records (Connelly et al. 2016; Cnudde et al.
with commendable consistency—increasing the power of 2016; Zhang et al. 2018). In this way, tests, technologies
528 R. Agrawal and S. Prabakaran
and methods will be integrated in a way that is specific to care providers such as Kaiser Permanente there are inter-
the patient but not necessarily to the hospital or clinic. This operability issues that make EHRs unpopular among clin-
allows for significant flexibility in the seamless transfer of icians as they struggle to receive outside test results or the
information between sites and for physicians to take full narratives of patients who have recently moved (Leonard
advantage of all the data generated. The US’ “All of Us” and Tozzi 2012).
program is similar in integrating a variety of patient records The UK provides an informative contrast in its NHS, a
into a single-patient file that is stored in the cloud (Denny single government-run enterprise that provides free health-
et al. 2019). However, it does not significantly link to public care at the point of service. Currently, the NHS is able to
administrative data sources, and thus is limited in its use- successfully integrate a variety of health records—a step
fulness for long-term analysis of the effects of social con- ahead of the US—but relies on outdated technology with
tributors to cancer progression and risk. This foretells security vulnerabilities such as fax machines (Macaulay
greater problems with the current ecosystem of clinical data 2016). The NHS has recently also begun the process of
—where lack of integration, misguided design, and digitizing its health service, with separate NHS Trusts
ambiguous data ownership make research and clinical care adopting American EHR solutions, such as the Cam-
more difficult rather than easier. bridgeshire NHS trust’s recent agreement with Epic (Hon-
eyman et al. 2016). However, the NHS still lags behind the
US in broad use and uptake across all of its services
Survey of problems in clinical data use (Wallace 2016). Furthermore, it will need to force the
variety of EHRs being adopted to conform to centralized
Fragmentation standards and interoperability requirements that allow ser-
vices as far afield as genome sequencing to be added to a
Fragmentation is the primary problem that needs to be patient record.
addressed if EHRs have any hope of being used in any
serious clinical capacity. Fragmentation arises when EHRs Misguided EHR design
are unable to communicate effectively between each other
—effectively locking patient information into a proprietary Another issue often identified with the modern incarnation
system. While there are major players in the US EHR space of EHRs is that they are often not helpful for doctors in
such as Epic and General Electric, there are also dozens of diagnosis—and have been identified by leading clinicians as
minor and niche companies that also produce their own a hindrance to patient care (Lenzer 2017; Gawande 2018).
products—many of which are not able to communicate A common denominator among the current generation of
effectively or easily with one another (DeMartino and EHRs is their focus on billing codes, a set of numbers
Larsen 2013). The Clinical Oncology Requirements for the assigned to every task, service, and drug dispensed by a
EHR and the National Community Cancer Centers Program healthcare professional that is used to determine the level of
have both spoken out about the need for interoperability reimbursement the provider will receive. This focus on
requirements for EHRs and even published guidelines billing codes is a necessity of the insurance system in the
(Miller 2011). In addition, the Certification Commission for US, which reimburses providers on a service-rendered basis
Health Information Technology was created to issue (Essin 2012; Lenzer 2017). Due to the need for every part of
guidelines and standards for interoperability of EHRs the care process to be billed to insurers (of which there are
(Miller 2011). Fast Healthcare Interoperability Resources many) and sometimes to multiple insurers simultaneously,
(FHIR) is the current new standard for data exchange for EHRs in the US are designed foremost with insurance needs
healthcare published by Health Level 7 (HL7). It builds in mind. As a result, EHRs are hampered by government
upon past standards from both HL7 and a variety of other regulations around billing codes, the requirements of
standards such as the Reference Information Model. FHIR insurance companies, and only then are able to consider the
offers new principles on which data sharing can take place needs of providers or researchers (Bang and Baik 2019).
through RESTful APIs—and projects such as Argonaut are And because purchasing decisions for EHRs are not made
working to expand adoption to EHRs (Chambers et al. by physicians, the priority given to patient care outcomes
2019). Even with the introduction of the HL7 Ambulatory falls behind other needs. The American Medical Associa-
Oncology EHR Functional Profile, EHRs have not tion has cited the difficulty of EHRs as a contributing factor
improved and have actually become pain points for clin- in physician burnout and as a waste of valuable time
icians as they struggle to integrate the diagnostics from (Lenzer 2017; Gardner et al. 2019). The NHS, due to its
separate labs or hospitals, and can even leave physicians in reliance on American manufacturers of EHRs, must suffer
the dark about clinical history if the patient has moved through the same problems despite its fundamentally dif-
providers (Reisman 2017; Blobel 2018). Even in integrated ferent structure.
Big data in digital healthcare: lessons learnt and recommendations for general practice 529
Related to the problem of EHRs being optimized for Nebula Genomics, a firm founded by George Church, that is
billing, not patient care, is their lack of development beyond aimed at securing genomic data in blockchain in a way that
repositories of patient information into diagnostic aids. A scales commercially, and can be used for research purposes
study of modern day EHR use in the clinic notes many pain with permission only from data owners—the patients
points for physicians and healthcare teams (Assis-Hassid themselves. Other firms are exploring using a variety of data
et al. 2019). Foremost was the variance in EHR use within types stored in blockchain to create predictive models of
the clinic—in part because these programs are often not disease—such as Doc.Ai—but all are centrally based on the
designed with provider workflows in mind (Assis-Hassid idea of a blockchain to secure patient data and ensure pri-
et al. 2019). In addition, EHRs were found to distract from vate accurate transfer between sites (Agbo et al. 2019).
interpersonal communication and did not integrate the many Advantages of blockchain for healthcare data transfer and
different types of data being created by nurses, physician storage lie in its security and privacy, but the approach has
assistants, laboratories, and other providers into usable yet to gain widespread use.
information for physicians (Assis-Hassid et al. 2019).
One of the major challenges of current implementations of Design a new generation of EHRs
Big Data are the lack of regulations, incentives, and systems
to manage ownership and responsibilities for data. In the It is conceivable that physicians in the near future will be
clinical space, in the US, this takes the form of compliance faced with terabytes of data—patients coming to their
with HIPAA, a now decade-old law that aimed to set rules clinics with years of continuous data monitoring their heart
for patient privacy and control for data (Adibuzzaman et al. rate, blood sugar, and a variety of other factors (Topol
2018). As more types of data are generated for patients and 2019a). Gaining clinical insight from such a large quantity
uploaded to electronic platforms, HIPAA becomes a major of data is an impossible expectation to place upon physi-
roadblock to data sharing as it creates significant privacy cians. In order to solve this problem of the exploding
concerns that hamper research. Today, if a researcher is to numbers of tests, assays, and results, EHRs will need to be
search for even simple demographic and disease states— extended from simply being records of patient–physician
they can rapidly identify an otherwise de-identified patient interactions and digital folders, to being diagnostic aids
(Adibuzzaman et al. 2018). Concerns around breaking (Fig. 1). Companies such as Roche–Flatiron are already
HIPAA prevent complete and open data sharing agreements moving towards this model by building predictive and
—blocking a path to the specificity needed for the next analytical tools into their EHRs when they provide them to
generation of research from being achieved, and also throws providers. However, broader adoption across a variety of
a wrench into clinical application of these technologies as providers—and the transparency and portability of the
data sharing becomes bogged down by nebulousness sur-
rounding old regulations on patient privacy. Furthermore,
compliance with the General Data Protection Regulation
(GDPR) in the EU has hampered international collabora-
tions as compliance with both HIPAA and GDPR is not yet Low Activity SNP Array Proteomics Histology Patient History Drug Use
standardized (Rabesandratana 2019).
Data sharing is further complicated by the need to
develop new technologies to integrate across a variety of
providers. Taking from the example of the Informatics for
Integrating Biology and the Bedside (i2b2) program funded EHR Model
ID
nt
nt
ID
tie
analytics championed by P4 medicine will come to define
Pa
Lab
the patient experience (Flores et al. 2013). However, in this
piece we have demonstrated a series of hurdles that the field
Patient ID
ID
Pa
must overcome to avoid imposing additional burdens on
nt
tie
tie
Hospital Imaging
nt
Pa
ID
physicians and to deliver significant value. We recommend
a set of proposals built upon an examination of the NHS and
other publicly administered healthcare models and the US
multi-payer system to bridge the gap between the market EHR at your GP
competition needed to develop these new technologies and
effective patient care.
Access to patient data must be a paramount guiding
principle as regulators begin to approach the problem of
wrangling the many streams of data that are already being
generated. Data must both be accessible to physicians and
Patient
patients, but must also be secured and de-identified for the
benefit of research. A pathway taken by the UK Biobank to Fig. 4 Our recommended healthcare system model. Hypothetical
healthcare system design based on unique patient identifiers that
guarantee data integration and universal access has been function across a variety of systems and providers—linking together
through the creation of a single database and protocol for disparate datasets into a complete patient profile.
accessing its contents (Allen et al. 2012). It is then feasible
to suggest a similar system for the NHS which is already
centralized with a single funding source. However, this In place of a centralized authority building out a digital
system will necessarily also be a security concern due to its infrastructure to house and communicate patient data,
centralized nature, even if patient data is encrypted (Fig. 3). mandating protocols and security standards will allow for
Another approach is to follow in the footsteps of the US’ the development of specialized EHR solutions for an ever
HIPAA, which suggested the creation of unique patient IDs diversifying set of healthcare providers and encourage the
over 20 years ago. With a single patient identifier, EHRs market needed for continual development and support of
would then be allowed to communicate with heterogeneous these systems. Avoiding data fragmentation as seen already
systems especially designed for labs or imaging centers or in the US then becomes an exercise in mandating data
counseling services and more (Fig. 4). However, this design sharing in law.
presupposes a standardized format and protocol for com- The next problem then becomes the inevitable applica-
munication across a variety of databases—similar to the tion of AI to healthcare. Any such tool created will have to
HL7 standards that already exist (Bender and Sartipi 2013). stand up to the scrutiny not just of being asked to outclass
532 R. Agrawal and S. Prabakaran
human diagnoses, but to also reveal its methods. Because of The future of healthcare will increasingly live on server
the opacity of ML models, the “black box” effect means that racks and be built in glass office buildings by teams of
diagnoses cannot be scrutinized or understood by outside programmers. The US must take seriously the benefits of
observers (Fig. 5). This makes clinical use extremely lim- centralized regulations and protocols that have allowed
ited, unless further techniques are developed to deconvolute the NHS to be enormously successful in preventing the
the decision-making process of these models. Until then, we problem of data fragmentation—while the NHS must
expect that AI models will only provide support for approach the possibility of freer markets for healthcare
diagnoses. devices and technologies as a necessary condition for
Furthermore, many times AI models simply replicate entering the next generation of healthcare delivery which
biases in existing datasets. Cohn et al. 2017 demonstrated will require constant reinvention and improvement to
clear areas of deficiency in the minority representation of deliver accurate care.
patients in the UK Biobank. Any research conducted on Overall, we are entering a transition in how we think
these datasets will necessarily only be able to create models about caring for patients and the role of a physician. Rather
that generalize to the population in them (a largely homo- than creating a reactive healthcare system that finds cancers
genous white-British group) (Fig. 6). In order to protect once they have advanced to a serious stage—Big Data
against algorithmic bias and the black box of current models offers us the opportunity to fine tune screening and pre-
hiding their decision-making, regulators must enforce rules vention protocols to significantly reduce the burden of
that expose the decision-making of future predictive diseases such as advanced stage cancers and metastasis.
healthcare models to public and physician scrutiny. Similar This development allows physicians to think more about a
to the existing FDA regulatory framework for medical patient individually in their treatment plan as they leverage
devices, algorithms too must be put up to regulatory scru- information beyond rough demographic indicators such as
tiny to prevent discrimination, while also ensuring trans- genomic sequencing of their tumor. Healthcare is not yet
parency of care. prepared for this shift, so it is the job of governments
around the world to pay attention to how each other have
implemented Big Data in healthcare to write the regulatory
structure of the future. Ensuring competition, data security,
and algorithmic transparency will be the hallmarks of how
we think about guaranteeing better patient care.
Agbo CC, Mahmoud QH, Eklund JM (2019) Blockchain technology Essin D (2012) Improve EHR systems by rethinking medical billing.
in healthcare: a systematic review. Healthcare 7:56 Physicians Pract. https://www.physicianspractice.com/ehr/
Aguet F, Brown AA, Castel SE, Davis JR, He Y, Jo B et al. (2017) improve-ehr-systems-rethinking-medical-billing
Genetic effects on gene expression across human tissues. Nature Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou
550:204–213 K et al. (2019) A guide to deep learning in healthcare. Nat Med
Akbarian S, Liu C, Knowles JA, Vaccarino FM, Farnham PJ, Craw- 25:24–29
ford GE et al. (2015) The PsychENCODE project. Nat Neurosci Fessele KL (2018) The rise of Big Data in oncology. Semin Oncol
18:1707–1712 Nurs 34:168–176
Allen N, Sudlow C, Downey P, Peakman T, Danesh J, Elliott P et al. Flores M, Glusman G, Brogaard K, Price ND, Hood L (2013) P4
(2012) UK Biobank: current status and what it means for epi- medicine: how systems medicine will transform the healthcare
demiology. Health Policy Technol 1:123–126 sector and society. Pers Med 10:565–576
Assis-Hassid S, Grosz BJ, Zimlichman E, Rozenblum R, Bates DW Garber S, Gates SM, Keeler EB, Vaiana ME, Mulcahy AW, Lau C
(2019) Assessing EHR use during hospital morning rounds: a et al. (2014) Redirecting innovation in U.S. Health Care: options
multi-faceted study. PLoS ONE 14:e0212816 to decrease spending and increase value: Case Studies 133
Bang CS, Baik GH (2019) Using big data to see the forest and the Gardner RL, Cooper E, Haskell J, Harris DA, Poplau S, Kroth PJ et al.
trees: endoscopic submucosal dissection of early gastric cancer in (2019) Physician stress and burnout: the impact of health infor-
Korea. Korean J Intern Med 34:772–774 mation technology. J Am Med Inf Assoc 26:106–114
Bender D, Sartipi K (2013) HL7 FHIR: an agile and RESTful Gawande A (2018) Why doctors hate their computers. The New
approach to healthcare information exchange. In Proceedings of Yorker, 12 https://www.newyorker.com/magazine/2018/11/12/w
the 26th IEEE International Symposium on Computer-Based hy-doctors-hate-their-computers
Medical Systems, IEEE. pp 326–331 Gordon WJ, Catalini C (2018) Blockchain technology for healthcare:
Bibault J-E, Giraud P, Burgun A (2016) Big Data and machine facilitating the transition to patient-driven interoperability. Com-
learning in radiation oncology: state of the art and future pro- put Struct Biotechnol J 16:224–230
spects. Cancer Lett 382:110–117 Hasin Y, Seldin M, Lusis A (2017) Multi-omics approaches to disease.
Blobel B (2018) Interoperable EHR systems—challenges, standards Genome Biol 18:83
and solutions. Eur J Biomed Inf 14:10–19 Honeyman M, Dunn P, McKenna H (2016) A Digital NHS. An
Camacho DM, Collins KM, Powers RK, Costello JC, Collins JJ (2018) introduction to the digital agenda and plans for implementation
Next-generation machine learning for biological networks. Cell https://www.kingsfund.org.uk/sites/default/files/field/field_
173:1581–1592 publication_file/A_digital_NHS_Kings_Fund_Sep_2016.pdf
Campbell PJ, Getz G, Stuart JM, Korbel JO, Stein LD (2020) Pan- Kierkegaard P (2013) eHealth in Denmark: A Case Study. J Med Syst 37
cancer analysis of whole genomes. Nature https://www.nature. Krumholz HM (2014) Big Data and new knowledge in medicine: the
com/articles/s41586-020-1969-6 thinking, training, and tools needed for a learning health system.
Chambers DA, Amir E, Saleh RR, Rodin D, Keating NL, Osterman Health Aff 33:1163–1170
TJ, Chen JL (2019) The impact of Big Data research on practice, Lenzer J (2017) Commentary: the real problem is that electronic health
policy, and cancer care. Am Soc Clin Oncol Educ Book Am Soc records focus too much on billing. BMJ 356:j326
Clin Oncol Annu Meet 39:e167–e175 Leonard D, Tozzi J (2012) Why don’t more hospitals use electronic
Char DS, Shah NH, Magnus D (2018) Implementing machine learning health records. Bloom Bus Week
in health care—addressing ethical challenges. N Engl J Med Macaulay T (2016) Progress towards a paperless NHS. BMJ 355:
378:981–983 i4448
Cho WC (2015) Big Data for cancer research. Clin Med Insights Madhavan S, Subramaniam S, Brown TD, Chen JL (2018) Art and
Oncol 9:135–136 challenges of precision medicine: interpreting and integrating
Cnudde P, Rolfson O, Nemes S, Kärrholm J, Rehnberg C, Rogmark C, genomic data into clinical practice. Am Soc Clin Oncol Educ
Timperley J, Garellick G (2016) Linking Swedish health data reg- Book Am Soc Clin Oncol Annu Meet 38:546–553
isters to establish a research database and a shared decision-making Marx V (2015) The DNA of a nation. Nature 524:503–505
tool in hip replacement. BMC Musculoskelet Disord 17:414 Miller RS (2011) Electronic health record certification in oncology:
Cohn EG, Hamilton N, Larson EL, Williams JK (2017) Self-reported role of the certification commission for health information tech-
race and ethnicity of US biobank participants compared to the US nology. J Oncol Pr 7:209–213
Census. J Community Genet 8:229–238 Norgeot B, Glicksberg BS, Butte AJ (2019) A call for deep-learning
Connelly R, Playford CJ, Gayle V, Dibben C (2016) The role of healthcare. Nat Med 25:14–15
administrative data in the big data revolution in social science O’Brien R, Potter-Collins A (2015) 2011 Census analysis: ethnicity
research. Soc Sci Res 59:1–12 and religion of the non-UK born population in England and
Davis J (2019) National patient identifier HIPAA provision removed Wales: 2011. Office for National Statistics. https://www.ons.gov.
in proposed bill. HealthITSecurity https://healthitsecurity.com/ uk/peoplepopulationandcommunity/culturalidentity/ethnicity/a
news/national-patient-identifier-hipaa-provision-removed-in- rticles/2011censusanalysisethnicityandreligionofthenonukbornpo
proposed-bill pulationinenglandandwales/2015-06-18
DeMartino JK, Larsen JK (2013) Data needs in oncology: “Making Osong AB, Dekker A, van Soest J (2019) Big data for better cancer
Sense of The Big Data Soup”. J Natl Compr Canc Netw 11: care. Br J Hosp Med Lond Engl 2005 80:304–305
S1–S12 Rabesandratana T (2019) European data law is impeding studies on
Deng J, El Naqa I, Xing L (2018) Editorial: machine learning with diabetes and Alzheimer’s, researchers warn. Sci AAAS. https://
radiation oncology big data. Front Oncol 8:416 doi.org/10.1126/science.aba2926
Denny JC, Rutter JL, Goldstein DB, Philippakis Anthony, Smoller Raghupathi W, Raghupathi V (2014) Big data analytics in healthcare:
JW, Jenkins G et al. (2019) The “All of Us” research program. N promise and potential. Health Inf Sci Syst 2:3
Engl J Med 381:668–676 Reisman M (2017) EHRs: the challenge of making electronic data
Elliott LT, Sharp K, Alfaro-Almagro F, Shi S, Miller KL, Douaud G usable and interoperable. Pharm Ther 42:572–575
et al. (2018) Genome-wide association studies of brain imaging Shendure J, Ji H (2008) Next-generation DNA sequencing. Nature
phenotypes in UK Biobank. Nature 562:210–216 Biotechnology 26:1135–1145
534 R. Agrawal and S. Prabakaran
Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ et al. Genomes Project: bringing whole genome sequencing to the
(2015) Big Data: astronomical or genomical? PLOS Biol 13: NHS. BMJ 361
e1002195 Wallace WA (2016) Why the US has overtaken the NHS with its
Tomczak K, Czerwińska P, Wiznerowicz M (2015) The Cancer EMR. National Health Executive Magazine, pp 32–34
Genome Atlas (TCGA): an immeasurable source of knowledge. http://www.nationalhealthexecutive.com/Comment/why-the-us-
Contemp Oncol 19:A68–A77 has-overtaken-the-nhs-with-its-emr
Topol E (2019a) High-performance medicine: the convergence of Webster PC (2014) Sweden’s health data goldmine. CMAJ Can Med
human and artificial intelligence. Nat Med 25:44 Assoc J 186:E310
Topol E (2019b) The topol review: preparing the healthcare workforce Wetterstrand KA (2019) DNA sequencing costs: data from the NHGRI
to deliver the digital future. Health Education England https:// Genome Sequencing Program (GSP). Natl Hum Genome Res
topol.hee.nhs.uk/ Inst. www.genome.gov/sequencingcostsdata, Accessed 2019
Turnbull C, Scott RH, Thomas E, Jones L, Murugaesu N, Pretty FB, Zhang L, Wang H, Li Q, Zhao M-H, Zhan Q-M (2018) Big data and
Halai D, Baple E, Craig C, Hamblin A, et al. (2018) The 100 000 medical research in China. BMJ 360:j5910