Raghupathi-Raghupathi2014 Article BigDataAnalyticsInHealthcarePr PDF
Raghupathi-Raghupathi2014 Article BigDataAnalyticsInHealthcarePr PDF
http://www.hissjournal.com/content/2/1/3
Abstract
Objective: To describe the promise and potential of big data analytics in healthcare.
Methods: The paper describes the nascent field of big data analytics in healthcare, discusses the benefits, outlines
an architectural framework and methodology, describes examples reported in the literature, briefly discusses the
challenges, and offers conclusions.
Results: The paper provides a broad overview of big data analytics for healthcare researchers and practitioners.
Conclusions: Big data analytics in healthcare is evolving into a promising field for providing insight from very large
data sets and improving outcomes while reducing costs. Its potential is great; however there remain challenges to
overcome.
Keywords: Big data, Analytics, Hadoop, Healthcare, Framework, Methodology
© 2014 Raghupathi and Raghupathi; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of
the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use,
distribution, and reproduction in any medium, provided the original work is properly credited.
Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 2 of 10
http://www.hissjournal.com/content/2/1/3
are referred to as, no surprise here, big data analytics in for healthcare organizations to acquire the available
healthcare [13-15]. When big data is synthesized and an- tools, infrastructure, and techniques to leverage big data
alyzed—and those aforementioned associations, patterns effectively or else risk losing potentially millions of dol-
and trends revealed—healthcare providers and other lars in revenue and profits [19].
stakeholders in the healthcare delivery system can de- What exactly is big data? A report delivered to the U.S.
velop more thorough and insightful diagnoses and treat- Congress in August 2012 defines big data as “large vol-
ments, resulting, one would expect, in higher quality umes of high velocity, complex, and variable data that re-
care at lower costs and in better outcomes overall [12]. quire advanced techniques and technologies to enable the
The potential for big data analytics in healthcare to lead capture, storage, distribution, management and analysis of
to better outcomes exists across many scenarios, for ex- the information” [6]. Big data encompasses such charac-
ample: by analyzing patient characteristics and the cost teristics as variety, velocity and, with respect specifically to
and outcomes of care to identify the most clinically and healthcare, veracity [20-23]. Existing analytical techniques
cost effective treatments and offer analysis and tools, can be applied to the vast amount of existing (but cur-
thereby influencing provider behavior; applying ad- rently unanalyzed) patient-related health and medical data
vanced analytics to patient profiles (e.g., segmentation to reach a deeper understanding of outcomes, which then
and predictive modeling) to proactively identify individ- can be applied at the point of care. Ideally, individual and
uals who would benefit from preventative care or life- population data would inform each physician and her
style changes; broad scale disease profiling to identify patient during the decision-making process and help de-
predictive events and support prevention initiatives; col- termine the most appropriate treatment option for that
lecting and publishing data on medical procedures, thus particular patient.
assisting patients in determining the care protocols or
regimens that offer the best value; identifying, predicting Advantages to healthcare
and minimizing fraud by implementing advanced ana- By digitizing, combining and effectively using big data,
lytic systems for fraud detection and checking the accur- healthcare organizations ranging from single-physician
acy and consistency of claims; and, implementing much offices and multi-provider groups to large hospital net-
nearer to real-time, claim authorization; creating new works and accountable care organizations stand to
revenue streams by aggregating and synthesizing patient realize significant benefits [2]. Potential benefits include
clinical records and claims data sets to provide data and detecting diseases at earlier stages when they can be
services to third parties, for example, licensing data to treated more easily and effectively; managing specific in-
assist pharmaceutical companies in identifying patients dividual and population health and detecting health care
for inclusion in clinical trials. Many payers are develop- fraud more quickly and efficiently. Numerous questions
ing and deploying mobile apps that help patients manage can be addressed with big data analytics. Certain devel-
their care, locate providers and improve their health. Via opments or outcomes may be predicted and/or esti-
analytics, payers are able to monitor adherence to drug mated based on vast amounts of historical data, such as
and treatment regimens and detect trends that lead to length of stay (LOS); patients who will choose elective
individual and population wellness benefits [12,16-18]. surgery; patients who likely will not benefit from surgery;
This article provides an overview of big data analytics complications; patients at risk for medical complications;
in healthcare as it is emerging as a discipline. First, we patients at risk for sepsis, MRSA, C. difficile, or other
define and discuss the various advantages and character- hospital-acquired illness; illness/disease progression; pa-
istics of big data analytics in healthcare. Then we de- tients at risk for advancement in disease states; causal
scribe the architectural framework of big data analytics factors of illness/disease progression; and possible co-
in healthcare. Third, the big data analytics application morbid conditions (EMC Consulting). McKinsey esti-
development methodology is described. Fourth, we pro- mates that big data analytics can enable more than $300
vide examples of big data analytics in healthcare reported billion in savings per year in U.S. healthcare, two thirds
in the literature. Fifth, the challenges are identified. Lastly, of that through reductions of approximately 8% in na-
we offer conclusions and future directions. tional healthcare expenditures. Clinical operations and R
& D are two of the largest areas for potential savings
Big data analytics in healthcare with $165 billion and $108 billion in waste respectively
Health data volume is expected to grow dramatically in [24]. McKinsey believes big data could help reduce waste
the years ahead [6]. In addition, healthcare reimburse- and inefficiency in the following three areas:
ment models are changing; meaningful use and pay for
performance are emerging as critical new factors in to- Clinical operations: Comparative effectiveness
day’s healthcare environment. Although profit is not and research to determine more clinically relevant and
should not be a primary motivator, it is vitally important cost-effective ways to diagnose and treat patients.
Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 3 of 10
http://www.hissjournal.com/content/2/1/3
Research & development: 1) predictive modeling to and processes that do not deliver demonstrable benefits
lower attrition and produce a leaner, faster, more or cost too much; reducing readmissions by identifying
targeted R & D pipeline in drugs and devices; environmental or lifestyle factors that increase risk or trig-
2) statistical tools and algorithms to improve clinical ger adverse events [26] and adjusting treatment plans ac-
trial design and patient recruitment to better match cordingly; improving outcomes by examining vitals from
treatments to individual patients, thus reducing trial at-home health monitors; managing population health by
failures and speeding new treatments to market; and detecting vulnerabilities within patient populations during
3) analyzing clinical trials and patient records to identify disease outbreaks or disasters; and bringing clinical, finan-
follow-on indications and discover adverse effects before cial and operational data together to analyze resource
products reach the market. utilization productively and in real time [16].
Public health: 1) analyzing disease patterns and
tracking disease outbreaks and transmission to improve
public health surveillance and speed response; 2) faster The 4 “Vs” of big data analytics in healthcare
development of more accurately targeted vaccines, e.g., Like big data in healthcare, the analytics associated with
choosing the annual influenza strains; and, 3) turning big data is described by three primary characteristics:
large amounts of data into actionable information that volume, velocity and variety (http://www-01.ibm.com/soft
can be used to identify needs, provide services, and ware/data/bigdata/). Over time, health-related data will be
predict and prevent crises, especially for the benefit of created and accumulated continuously, resulting in an in-
populations [24]. credible volume of data. The already daunting volume of
In addition, [14] suggests big data analytics in existing healthcare data includes personal medical records,
healthcare can contribute to radiology images, clinical trial data FDA submissions, hu-
Evidence-based medicine: Combine and analyze a man genetics and population data genomic sequences, etc.
variety of structured and unstructured data-EMRs, Newer forms of big data, such as 3D imaging, genomics
financial and operational data, clinical data, and genomic and biometric sensor readings, are also fueling this ex-
data to match treatments with outcomes, predict patients ponential growth.
at risk for disease or readmission and provide more Fortunately, advances in data management, particu-
efficient care; larly virtualization and cloud computing, are facilitating
Genomic analytics: Execute gene sequencing more the development of platforms for more effective capture,
efficiently and cost effectively and make genomic storage and manipulation of large volumes of data [4].
analysis a part of the regular medical care decision Data is accumulated in real-time and at a rapid pace, or
process and the growing patient medical record [25]; velocity. The constant flow of new data accumulating at
Pre-adjudication fraud analysis: Rapidly analyze unprecedented rates presents new challenges. Just as the
large numbers of claim requests to reduce fraud, waste volume and variety of data that is collected and stored
and abuse; has changed, so too has the velocity at which it is gener-
Device/remote monitoring: Capture and analyze in ated and that is necessary for retrieving, analyzing, com-
real-time large volumes of fast-moving data from paring and making decisions based on the output.
in-hospital and in-home devices, for safety monitoring Most healthcare data has been traditionally static—paper
and adverse event prediction; files, x-ray films, and scripts. Velocity of mounting data in-
Patient profile analytics: Apply advanced analytics creases with data that represents regular monitoring, such
to patient profiles (e.g., segmentation and predictive as multiple daily diabetic glucose measurements (or more
modeling) to identify individuals who would benefit continuous control by insulin pumps), blood pressure
from proactive care or lifestyle changes, for example, readings, and EKGs. Meanwhile, in many medical situa-
those patients at risk of developing a specific disease tions, constant real-time data (trauma monitoring for
(e.g., diabetes) who would benefit from preventive blood pressure, operating room monitors for anesthesia,
care [14]. bedside heart monitors, etc.) can mean the difference be-
tween life and death.
According to [16], areas in which enhanced data and Future applications of real-time data, such as detecting
analytics yield the greatest results include: pinpointing infections as early as possible, identifying them swiftly
patients who are the greatest consumers of health re- and applying the right treatments (not just broad-spectrum
sources or at the greatest risk for adverse outcomes; pro- antibiotics) could reduce patient morbidity and mortality
viding individuals with the information they need to and even prevent hospital outbreaks. Already, real-time
make informed decisions and more effectively manage streaming data monitors neonates in the ICU, catching
their own health as well as more easily adopt and track life-threatening infections sooner [6]. The ability to per-
healthier behaviors; identifying treatments, programs form real-time analytics against such high-volume data in
Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 4 of 10
http://www.hissjournal.com/content/2/1/3
motion and across all specialties would revolutionize past and, more importantly, expedite distribution to the
healthcare [4]. Therein lies variety. right patients [4]. The prospects for all areas of health-
As the nature of health data has evolved, so too have care are infinite.
analytics techniques scaled up to the complex and so- Some practitioners and researchers have introduced a
phisticated analytics necessary to accommodate volume, fourth characteristic, veracity, or ‘data assurance’. That
velocity and variety. Gone are the days of data collected is, the big data, analytics and outcomes are error-free
exclusively in electronic health records and other struc- and credible. Of course, veracity is the goal, not (yet) the
tured formats. Increasingly, the data is in multimedia reality. Data quality issues are of acute concern in
format and unstructured. The enormous variety of data— healthcare for two reasons: life or death decisions de-
structured, unstructured and semi-structured—is a di- pend on having the accurate information, and the quality
mension that makes healthcare data both interesting of healthcare data, especially unstructured data, is highly
and challenging. variable and all too often incorrect. (Inaccurate “transla-
Structured data is data that can be easily stored, quer- tions” of poor handwriting on prescriptions are perhaps
ied, recalled, analyzed and manipulated by machine. His- the most infamous example).
torically, in healthcare, structured and semi-structured Veracity assumes the simultaneous scaling up in granu-
data includes instrument readings and data generated by larity and performance of the architectures and plat-
the ongoing conversion of paper records to electronic forms, algorithms, methodologies and tools to match
health and medical records. Historically, the point of the demands of big data. The analytics architectures
care generated unstructured data: office medical records, and tools for structured and unstructured big data are
handwritten nurse and doctor notes, hospital admission very different from traditional business intelligence (BI)
and discharge records, paper prescriptions, radiograph tools. They are necessarily of industrial strength. For ex-
films, MRI, CT and other images. ample, big data analytics in healthcare would be exe-
Already, new data streams—structured and unstruc- cuted in distributed processing across several servers
tured—are cascading into the healthcare realm from fit- (“nodes”), utilizing the paradigm of parallel computing
ness devices, genetics and genomics, social media and ‘divide and process’ approach. Likewise, models and
research and other sources. But relatively little of this techniques—such as data mining and statistical approaches,
data can presently be captured, stored and organized so algorithms, visualization techniques—need to take into ac-
that it can be manipulated by computers and analyzed count the characteristics of big data analytics. Traditional
for useful information. Healthcare applications in par- data management assumes that the warehoused data is
ticular need more efficient ways to combine and convert certain, clean, and precise.
varieties of data including automating conversion from Veracity in healthcare data faces many of the same is-
structured to unstructured data. sues as in financial data, especially on the payer side: Is
The structured data in EMRs and EHRs include famil- this the correct patient/hospital/payer/reimbursement
iar input record fields such as patient name, data of code/dollar amount? Other veracity issues are unique to
birth, address, physician’s name, hospital name and ad- healthcare: Are diagnoses/treatments/prescriptions/proce-
dress, treatment reimbursement codes, and other infor- dures/outcomes captured correctly?
mation easily coded into and handled by automated Improving coordination of care, avoiding errors and
databases. The need to field-code data at the point of reducing costs depend on high-quality data, as do ad-
care for electronic handling is a major barrier to accept- vances in drug safety and efficacy, diagnostic accuracy
ance of EMRs by physicians and nurses, who lose the and more precise targeting of disease processes by treat-
natural language ease of entry and understanding that ments. But increased variety and high velocity hinder
handwritten notes provide. On the other hand, most the ability to cleanse data before analyzing it and making
providers agree that an easy way to reduce prescription decisions, magnifying the issue of data “trust” [4].
errors is to use digital entries rather than handwritten The ‘4Vs’ are an appropriate starting point for a
scripts. discussion about big data analytics in healthcare. But
The potential of big data in healthcare lies in combin- there are other issues to consider, such as the num-
ing traditional data with new forms of data, both indi- ber of architectures and platforms, and the domin-
vidually and on a population level. We are already seeing ance of the open source paradigm in the availability
data sets from a multitude of sources support faster and of tools. Consider, too, the challenge of developing
more reliable research and discovery. If, for example, methodologies and the need for user-friendly inter-
pharmaceutical developers could integrate population faces. While the overall cost of hardware and software
clinical data sets with genomics data, this development is declining, these issues have to be addressed to har-
could facilitate those developers gaining approvals on ness and maximize the potential of big data analytics
more and better drug therapies more quickly than in the in healthcare.
Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 5 of 10
http://www.hissjournal.com/content/2/1/3
data from diverse sources is cleansed and readied. De- Table 1 Platforms & tools for big data analytics in
pending on whether the data is structured or unstruc- healthcare
tured, several data formats can be input to the big data Platform/Tool Description
analytics platform. The Hadoop Distributed HDFS enables the underlying storage for
In this next component in the conceptual framework, File System (HDFS) the Hadoop cluster. It divides the data into
smaller parts and distributes it across the
several decisions are made regarding the data input ap- various servers/nodes.
proach, distributed design, tool selection and analytics
MapReduce MapReduce provides the interface for the
models. Finally, on the far right, the four typical applica- distribution of sub-tasks and the gathering
tions of big data analytics in healthcare are shown. of outputs. When tasks are executed,
These include queries, reports, OLAP, and data mining. MapReduce tracks the processing of each
server/node.
Visualization is an overarching theme across the four ap-
plications. Drawing from such fields as statistics, com- PIG and PIG Latin Pig programming language is configured
(Pig and PigLatin) to assimilate all types of data (structured/
puter science, applied mathematics and economics, a unstructured, etc.). It is comprised of two
wide variety of techniques and technologies has been de- key modules: the language itself, called
veloped and adapted to aggregate, manipulate, analyze, PigLatin, and the runtime version in which
the PigLatin code is executed.
and visualize big data in healthcare.
Hive Hive is a runtime Hadoop support
The most significant platform for big data analytics is architecture that leverages Structure Query
the open-source distributed data processing platform Language (SQL) with the Hadoop platform.
Hadoop (Apache platform), initially developed for such It permits SQL programmers to develop
Hive Query Language (HQL) statements
routine functions as aggregating web search indexes. It akin to typical SQL statements.
belongs to the class “NoSQL” technologies—others in-
Jaql Jaql is a functional, declarative query
clude CouchDB and MongoDB—that evolved to aggre- language designed to process large data
gate data in unique ways. Hadoop has the potential to sets. To facilitate parallel processing, Jaql
process extremely large amounts of data mainly by allo- converts “‘high-level’ queries into ‘low-level’
queries” consisting of MapReduce tasks.
cating partitioned data sets to numerous servers (nodes),
each of which solves different parts of the larger prob- Zookeeper Zookeeper allows a centralized
infrastructure with various services,
lem and then integrates them for the final result [28-31]. providing synchronization across a cluster
Hadoop can serve the twin roles of data organizer and of servers. Big data analytics applications
analytics tool. It offers a great deal of potential in enab- utilize these services to coordinate parallel
processing across big clusters.
ling enterprises to harness the data that has been, until
HBase HBase is a column-oriented database man-
now, difficult to manage and analyze. Specifically, Hadoop agement system that sits on top of HDFS. It
makes it possible to process extremely large volumes of uses a non-SQL approach.
data with various structures or no structure at all. But Cassandra Cassandra is also a distributed database
Hadoop can be challenging to install, configure and ad- system. It is designated as a top-level pro-
minister, and individuals with Hadoop skills are not easily ject modeled to handle big data distributed
across many utility servers. It also provides
found. Furthermore, for these reasons, it appears organiza- reliable service with no particular point of
tions are not quite ready to embrace Hadoop completely. failure (http://en.wikipedia.org/wiki/Apache_
The surrounding ecosystem of additional platforms and Cassandra) and it is a NoSQL system.
tools supports the Hadoop distributed platform [30,31]. Oozie Oozie, an open source project, streamlines
These are summarized in Table 1. the workflow and coordination among the
tasks.
Numerous vendors—including AWS, Cloudera,
Lucene The Lucene project is used widely for text
Hortonworks, and MapR Technologies—distribute open- analytics/searches and has been
source Hadoop platforms [29]. Many proprietary options incorporated into several open source
are also available, such as IBM’s BigInsights. Further, projects. Its scope includes full text indexing
and library search for use within a Java
many of these platforms are cloud versions, making them application.
widely available. Cassandra, HBase, and MongoDB, de-
Avro Avro facilitates data serialization services.
scribed above, are used widely for the database compo- Versioning and version control are
nent. While the available frameworks and tools are mostly additional useful features.
open source and wrapped around Hadoop and related Mahout Mahout is yet another Apache project
platforms, there are numerous trade-offs that devel- whose goal is to generate free applications
opers and users of big data analytics in healthcare must of distributed and scalable machine
learning algorithms that support big data
consider. While the development costs may be lower analytics on the Hadoop platform.
since these tools are open source and free of charge, the
downsides are the lack of technical support and minimal
Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 7 of 10
http://www.hissjournal.com/content/2/1/3
security. In the healthcare industry, these are, of course, addressed: What problem is being addressed? Why is it
significant drawbacks, and therefore the trade-offs must be important and interesting to the healthcare provider?
addressed. Additionally, these platforms/tools require a What is the case for a ‘big data’ analytics approach?
great deal of programming, skills the typical end-user in (Because the complexity and cost of big data analytics
healthcare may not possess. Furthermore, considering the are significantly higher compared to traditional analytics
only recent emergence of big data analytics in healthcare, approaches, it is important to justify their use). The pro-
governance issues including ownership, privacy, security, ject team also should provide background information on
and standards have yet to be addressed. In the next section the problem domain as well as prior projects and research
we offer an applied big data analytics in healthcare meth- done in this domain.
odology to develop and implement a big data project for Next, in Step 3, the steps in the methodology are fleshed
healthcare providers. out and implemented. The concept statement is broken
down into a series of propositions. (Note these are not
Methodology rigorous as they would be in the case of statistical ap-
While several different methodologies are being developed proaches. Rather, they are developed to help guide the big
in this rapidly emerging discipline, here we outline one data analytics process). Simultaneously, the independent
that is practical and hands-on. Table 2 shows the main and dependent variables or indicators are identified. The
stages of the methodology. In Step 1, the interdisciplinary data sources, as outlined in Figure 1, are also identified;
big data analytics in healthcare team develops a ‘concept the data is collected, described, and transformed in prep-
statement’. This is a first cut at establishing the need for aration for for analytics. A very important step at this
such a project. The concept statement is followed by a de- point is platform/tool evaluation and selection. There are
scription of the project’s significance. The healthcare several options available, as indicated previously, including
organization will note that there are trade-offs in terms of AWS Hadoop, Cloudera, and IBM BigInsights. The next
alternative options, cost, scalability, etc. Once the concept step is to apply the various big data analytics techniques
statement is approved, the team can proceed to Step 2, the to the data. This process differs from routine analytics
proposal development stage. Here, more details are filled only in that the techniques are scaled up to large data sets.
in. Based on the concept statement, several questions are Through a series of iterations and what-if analyses, insight
is gained from the big data analytics. From the insight, in-
Table 2 Outline of big data analytics in healthcare formed decisions can be made. In Step 4, the models and
methodology their findings are tested and validated and presented to
Step 1 Concept statement stakeholders for action. Implementation is a staged ap-
• Establish need for big data analytics project in healthcare proach with feedback loops built in at each stage to
based on the “4Vs”. minimize risk of failure.
Step 2 Proposal The next section describes several reported big data
• What is the problem being addressed? analytics applications in healthcare. We draw on publicly
• Why is it important and interesting?
available material from numerous sources, including
vendor sites. In this emerging discipline, there is little in-
• Why big data analytics approach?
dependent research to cite. These examples are from
• Background material secondary sources. Nevertheless, they are illustrative of
Step 3 Methodology the potential of big data analytics in healthcare.
• Propositions
• Variable selection Examples
• Data collection
Premier, the U.S. healthcare alliance network, has more
than 2,700 members, hospitals and health systems,
• ETL and data transformation
90,000 non-acute facilities and 400,000 physicians and is
• Platform/tool selection reported to have data on approximately one in four pa-
• Conceptual model tients discharged from hospitals. Naturally, the network
• Analytic techniques has assembled a large database of clinical, financial, pa-
-Association, clustering, classification, etc. tient, and supply chain data, with which the network has
• Results & insight
generated comprehensive and comparable clinical out-
come measures, resource utilization reports and trans-
Step 4 Deployment
action level cost data. These outputs have informed
• Evaluation & validation decision-making and improved the healthcare processes
• Testing at approximately 330 hospitals, saving an estimated
Source: Adapted from [Raghupathi & Raghupathi, [9]]. 29,000 lives and reducing healthcare spending by nearly
Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 8 of 10
http://www.hissjournal.com/content/2/1/3
$7 billion [16]. North York General Hospital, a 450-bed (NICE) of the U.K.’s National Health Service. NICE is re-
community teaching hospital in Toronto, Canada, reports portedly a leader in the analytics of large clinical datasets
using real-time analytics to improve patient outcomes and for exploring the effectiveness of clinical and cost factors
gain greater insight into the operations of healthcare deliv- in the use of new drugs and/or clinical treatments. The
ery. North York is reported to have implemented a scal- Italian Medicines Agency is also reported to collect and
able real-time analytics application to provide multiple analyze clinical data on the use of expensive new drugs as
perspectives, including clinical, administrative, and finan- one goal in a country-level cost-effectiveness program [6].
cial [16]. Another example, reported by IBM, is that of the Another leading example of big data analytics in health-
large, unnamed healthcare provider that is analyzing data care is the Department of Veterans Affairs’ (VA) use of ap-
in the electronic medical record (EMR) system with the plications on its very large data set in an effort to comply
goal of reducing costs and improving patient care. (Data with “performance-based accountability framework and
in the EMR include the unstructured data from physician disease management practice” [6]. In one very famous ex-
notes, pathology reports and other sources). Big data ana- ample, California-based Kaiser Permanente associated
lytics is used to develop care protocols and case pathways clinical data with cost data to generate a key data set, the
and to assist caregivers in performing customized queries analytics of which led to the discovery of adverse drug ef-
[16]. Another example of big data analytics in healthcare fects and subsequent withdrawal of Vioxx from the mar-
is Columbia University Medical Center’s analysis of “com- ket [6]. Researchers at the Johns Hopkins School of
plex correlations” of streams of physiological data related Medicine discovered they could use data from Google Flu
to patients with brain injuries. The goal is to provide med- Trends to predict sudden increases in flu-related emer-
ical professionals with critical and timely information to gency room visits at least a week before warnings from
aggressively treat complications. The advanced analytics is the CDC. Likewise, the analysis of Twitter updates was as
reported to diagnose serious complications as much as 48 accurate as (and two weeks ahead of) official reports at
hours sooner than previously in patients who have suf- tracking the spread of cholera in Haiti after the January
fered a bleeding stroke from a ruptured brain aneurysm 2010 earthquake [6]. Also reported is an application devel-
[16]. The Rizzoli Orthopedic Institute in Bologna, Italy, is oped by IBM that predicts the likely outcomes of diabetes
reportedly using advanced analytics to gain a more patients using patients’ panel data linked to physicians,
“granular understanding” of the clinical variations within management protocols, and the overall relationship to
families whereby individual patients display extreme dif- population health management averages [6]. In another dia-
ferences in the severity of their symptoms. This insight is betes application, physicians at Harvard Medical School
reported to have reduced annual hospitalizations by 30% and Harvard Pilgrim Health Care recently demonstrated
and the number of imaging tests by 60%. In the long- the potential of analytics applications to EHR data to iden-
term, the Institute expects to gain insight into the role of tify and group patients with diabetes for public health sur-
genetic factors to develop treatments [16]. The Hospital veillance. Four years worth of data based on numerous
for Sick Children (Sick Kids) in Toronto is using analytics indicators from multiple sources was utilized. The analyt-
to improve the outcomes for infants prone to life- ics application also differentiated between Type 1 and
threatening “nosocomial infections”. It is reported that Type II diabetes [6,26]. Finally, at Blue Cross Blue Shield
Sick Kids applies advanced analytics to vital-sign data of Massachusetts (BCBSMA) there was a “need to embed
gathered from bedside monitoring devices to identify po- analytics into business processes to help decision-makers
tential signs infection as early as 24 hours prior to previous across the business gain insight into financial and medical
methods [6,16]. Additional examples are reported below. data and become more proactive”. Several benefits were
A recent New Yorker magazine article by Atul Gawande, reported. First, the analytics enabled medical directors to
MD described how orthopedic surgeons at Brigham and identify high-risk disease groups and act to minimize risk
Women’s Hospital in Boston relied on personal experi- and improve patient outcomes. For example, new pre-
ence along with insight extracted from research on data ventive treatment protocols could be introduced among
based on a host of factors critical to the success of joint- patient groups with high cholesterol, thereby fending off
replacement surgery to systematically standardize knee heart problems. Also, complex health informatics re-
joint-replacement surgery. The result: improved outcomes ports were generated 300% faster than previously, help-
at lower costs. The University of Michigan Health System ing BCBSMA service clients more effectively [6].
standardized the administration of blood transfusions The next section briefly identifies some of the key
using analytics in a similar fashion, combining experience challenges in big data analytics in healthcare.
with big data analytics research. This resulted in a 31% re-
duction in transfusions and $200,000 reduction in ex- Challenges
penses per month (reported in [6]). Another example is At minimum, a big data analytics platform in healthcare
The National Institute for Health and Clinical Excellence must support the key functions necessary for processing
Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 9 of 10
http://www.hissjournal.com/content/2/1/3
the data. The criteria for platform evaluation may include 2. Burghard C: Big Data and Analytics Key to Accountable Care Success. IDC
availability, continuity, ease of use, scalability, ability to Health Insights; 2012.
3. Dembosky A: “Data Prescription for Better Healthcare.” Financial Times,
manipulate at different levels of granularity, privacy and December 12, 2012, p. 19; 2012. Available from: http://www.ft.com/intl/cms/
security enablement, and quality assurance [6,29,32]. In s/2/55cbca5a-4333-11e2-aa8f-00144feabdc0.html#axzz2W9cuwajK.
addition, while most platforms currently available are 4. Feldman B, Martin EM, Skotnes T: “Big Data in Healthcare Hype and Hope.”
October 2012. Dr. Bonnie 360; 2012. http://www.west-info.eu/files/big-data-in-
open source, the typical advantages and limitations of healthcare.pdf.
open source platforms apply. To succeed, big data analyt- 5. Fernandes L, O’Connor M, Weaver V: Big data, bigger outcomes. J AHIMA
ics in healthcare needs to be packaged so it is menu- 2012:38–42.
6. IHTT: Transforming Health Care through Big Data Strategies for leveraging
driven, user-friendly and transparent. Real-time big data big data in the health care industry; 2013. http://ihealthtran.com/
analytics is a key requirement in healthcare. The lag be- wordpress/2013/03/iht%C2%B2-releases-big-data-research-report-
tween data collection and processing has to be addressed. download-today/.
7. Frost & Sullivan: Drowning in Big Data? Reducing Information Technology
The dynamic availability of numerous analytics algo- Complexities and Costs for Healthcare Organizations. http://www.emc.com/
rithms, models and methods in a pull-down type of menu collateral/analyst-reports/frost-sullivan-reducing-information-technology-
is also necessary for large-scale adoption. The important complexities-ar.pdf.
8. Bian J, Topaloglu U, Yu F, Yu F: Towards Large-scale Twitter Mining for Drug-
managerial issues of ownership, governance and standards related Adverse Events. Maui, Hawaii: SHB; 2012.
have to be considered. And woven through these issues 9. Raghupathi W, Raghupathi V: An Overview of Health Analytics. Working
are those of continuous data acquisition and data cleans- paper; 2013.
10. Ikanow: Data Analytics for Healthcare: Creating Understanding from Big Data.
ing. Health care data is rarely standardized, often fragmen- http://info.ikanow.com/Portals/163225/docs/data-analytics-for-healthcare.pdf.
ted, or generated in legacy IT systems with incompatible 11. jStart: “How Big Data Analytics Reduced Medicaid Re-admissions.” A jStart Case
formats [6]. This great challenge needs to be addressed Study; 2012. http://www-01.ibm.com/software/ebusiness/jstart/portfolio/
uncMedicaidCaseStudy.pdf.
as well.
12. Knowledgent: Big Data and Healthcare Payers; 2013. http://knowledgent.
com/mediapage/insights/whitepaper/482.
Conclusions 13. Explorys: Unlocking the Power of Big Data to Improve Healthcare for Everyone.
https://www.explorys.com/docs/data-sheets/explorys-overview.pdf.
Big data analytics has the potential to transform the way 14. IBM: IBM big data platform for healthcare.” Solutions Brief; 2012. http://public.
healthcare providers use sophisticated technologies to dhe.ibm.com/common/ssi/ecm/en/ims14398usen/IMS14398USEN.PDF.
gain insight from their clinical and other data repositor- 15. Intel: Leveraging Big Data and Analytics in Healthcare and Life Sciences:
Enabling Personalized Medicine for High-Quality Care, Better Outcomes; 2012.
ies and make informed decisions. In the future we’ll see http://www.intel.com/content/dam/www/public/us/en/documents/white-
the rapid, widespread implementation and use of big papers/healthcare-leveraging-big-data-paper.pdf.
data analytics across the healthcare organization and the 16. IBM: Data Driven Healthcare Organizations Use Big Data Analytics for Big
Gains; 2013. http://www03.ibm.com/industries/ca/en/healthcare/
healthcare industry. To that end, the several challenges documents/Data_driven_healthcare_organizations_use_big_data_analytics_
highlighted above, must be addressed. As big data analyt- for_big_gains.pdf.
ics becomes more mainstream, issues such as guarantee- 17. Savage N: Digging for drug facts. Commun ACM 2012, 55(10):11–13.
18. Zenger B: “Can Big Data Solve Healthcare’s Big Problems?” HealthByte,
ing privacy, safeguarding security, establishing standards February 2012; 2012. http://www.equityhealthcare.com/docstor/EH%20Blog%
and governance, and continually improving the tools and 20on%20Analytics.pdf.
technologies will garner attention. Big data analytics and 19. LaValle S, Lesser E, Shockley R, Hopkins MS, Kruschwitz N: Big data,
analytics and the path from insights to value. MIT Sloan Manag Rev 2011,
applications in healthcare are at a nascent stage of devel- 52:20–32.
opment, but rapid advances in platforms and tools can ac- 20. Capgemini: The Deciding Factor: Big Data & Decision Making; 2013. http://
celerate their maturing process. www.capgemini.com/thought-leadership/the-deciding-factor-big-data-
decision-making.
21. Connolly S, Wooledge S: Harnessing the Value of Big Data Analytics. Teradata;
Competing interests
2013.
We, the authors declare we have no competing interests.
22. Courtney M: Puzzling out big data. Engineering & Technology 2013:56–60.
23. Intel: Big Data Analytics; 2012. http://www.intel.com/content/dam/www/
Authors’ contributions public/us/en/documents/reports/data-insights-peer-research-report.pdf.
Both WR and VR contributed equally. Both authors read and approved the 24. Manyika J, Chui M, Brown B, Buhin J, Dobbs R, Roxburgh C, Byers AH: Big
final manuscript. Data: The Next Frontier for Innovation, Competition, and Productivity. USA:
McKinsey Global Institute; 2011.
Author details 25. IBM: Large Gene interaction Analytics at University at Buffalo, SUNY; 2012.
1
Graduate School of Business, Fordham University, 113 W. 60th Street, 10023 http://public.dhe.ibm.com/common/ssi/ecm/en/imc14675usen/
New York, NY, USA. 2Brooklyn College, City University of New York, Brooklyn, IMC14675USEN.PDF.
NY, USA. 26. IBM: Harvard Medical School; 2011. http://public.dhe.ibm.com/common/ssi/
ecm/en/imc14685usen/IMC14685USEN.PDF.
Received: 27 August 2013 Accepted: 5 January 2014 27. Raghupathi W, Kesh S: Interoperable electronic health records
Published: 7 February 2014 design: towards a service-oriented architecture. e-Service Journal
2007, 5:39–57.
References 28. Borkar VR, Carey MJ, Chen L: Big data platforms: what's next? ACM
1. Raghupathi W: Data Mining in Health Care. In Healthcare Informatics: Crossroads 2012, 19(1):44–49.
Improving Efficiency and Productivity. Edited by Kudyba S. Taylor & Francis; 29. Ohlhorst F: Big Data Analytics: Turning Big Data into Big Money. USA: John
2010:211–223. Wiley & Sons; 2012.
Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 10 of 10
http://www.hissjournal.com/content/2/1/3
doi:10.1186/2047-2501-2-3
Cite this article as: Raghupathi and Raghupathi: Big data analytics in
healthcare: promise and potential. Health Information Science and Systems
2014 2:3.