Potential Collaboration on Big
Data and Artificial Intelligence
Kamarul Imran Musa
MD M Community Med PhD
Associate Professor (Epidemiology and Statistics)
Department of Community Medicine
drkamarul@usm.my
• Big data
• IR 4.0
• Artificial intelligence
• Machine learning
• Examples of projects
This Photo by Unknown Author is licensed under CC BY
What is Big Data
• Big data is a term for data sets that are
• large or
• complex
• Main feature:
• Traditional data processing application software is inadequate to deal
• Data size – wont fit a local machine memory
https://en.wikipedia.org/wiki/Big_data
https://upload.wikimedia.org/wikipedia/commons/d/dd/Big_%26_Small_Pu
mkins.JPG
3
The rise of big data
• The huge data -> Big Data
• Big Data -> direct and indirect
effects on the public health
(Khoury and Ioannidis 2014).
4
This Photo by Unknown Author is licensed under CC BY
Challenges in Big Data
• Data capture
• Sources, format, connection
• Data storage
• Transport storage, size, scalability
• Data analysis, search, sharing, transfer
• Massive size, time, memory, processing power
• Visualization, querying, updating
• Massive observations, languages, real-time data
• Information privacy.
• Personal identifier, standards, cloud storage
https://3.bp.blogspot.com/-u_Yi8z1_yJ8/WBtQazL681I/AAAAAAAAEHU/-K2q1HexnO0My0UcqOIEQjynBpexkB-
DgCLcB/s400/8233546_orig.gif http://exhibitsalive.com/modular/IpoANI.gif
5
Big
Data
http://blog.infodiagram.com/wp-content/uploads/2016/02/infoDiagram_blog_bigdata_4V.png
6
Industrial revolution 4.0
• The Fourth Industrial Revolution
(IR 4.0) that has been occurring
since the middle of the last
century
• is characterized by a fusion of
technologies:
• physical,
• digital, and
• biological spheres
• (National Academies of Sciences,
Medicine et al. 2017).
7This Photo by Unknown Author is licensed under CC BY-SA
The 4th Industrial Revolution
• Cyber-physical
systems
• Internet of Things
• Internet of Systems
8
Artificial intelligence
• What is 'Artificial Intelligence - AI'
• Artificial intelligence (AI) = simulated intelligence in machines.
• These machines are programmed to "think" like a human
• mimic the way a person acts.
• The ideal characteristic of artificial intelligence
• ability to rationalize and take actions that have the best chance of achieving a specific
goal
• ** although the term can be applied to any machine that exhibits traits associated with a
human mind, such as learning and solving problems.
https://www.investopedia.com/terms/a/artificial-intelligence-ai.asp
https://www.g2crowd.com/categories/artificial-intelligence
Machine
learning
http://www.nersc.gov/users/data-analytics/data-analytics-2/deep-learning/
https://www.uruit.com/blog/2018/02/16/soccer-and-machine-learning-tutorial/
Deep learning
• Deep learning =
deep structured
learning =
hierarchical learning
• part of a broader
family of machine
learning methods
https://en.wikipedia.org/wiki/Deep_learning
https://medium.com/swlh/ill-tell-you-why-deep-learning-is-so-popular-and-in-demand-
5aca72628780
http://msb.embopress.org/content/12/7/878
• Deep-learning AI:
• Image recognition (e.g., Facebook’s facial
recognition system [Menlo Park, California],
• self-driving cars,
• speech recognition (e.g., Apple’s Siri
[Cupertino, California],
• Google Brain [Mountain View, California],
• Amazon’s Alexa Voice [Seattle,
Washington]),
• Google DeepMind AlphaGo, mobile apps
(e.g., Cardiogram app [San Francisco,
California]),
• machine vision software in cameras,
• IBM Watson (North Castle, New York), and
• robots.
AI,
Machine
learning,
Deep
learning
https://www.datasciencecentral.com/profiles/blogs/artificial-intelligence-vs-machine-learning-vs-deep-learning
Statistics vs the rest
https://www.educba.com/machine-learning-vs-statistics/
Examples of projects
• Big Data
• Big data projects:
• Fields:
• “omics” data, human
• gut microbiome sequencing,
• social media, and
• cardiac imaging
• Too large and heterogeneous, and
change too quickly, to be stored,
analyzed, and used.
https://www.genome.gov/images/content/dna_sequencing.jpg
(a) Heat map with dendogram plot, and (b) MDS plot, showing
the transcriptional correlation. (c-f) scatter and bar plots
showing the genes up-regulated (blue color and labeled "1"),
down-regulated (red color and labeled "-1") or non-
differentially expressed
• At the University of Pittsburgh
• link the molecular signatures of people with breast cancer to a host of
clinical data, including demographic information associated with risk such
as age, ethnicity and body weight.
• 5 petabytes, or 5 million gigabytes, which is enough data to overload
around 40,000 new iPhone 6 devices.
• Public databases
• TCGA and METABRIC (Molecular Taxonomy of Breast Cancer International
Consortium)
• contain data on the entire set of genes, RNA transcripts and proteins of
thousands of breast-cancer tumours
Big data, showing correlation between a CDC study on cardiovascular disease and a
study conducted based on hostility in Twitter tweets. This demonstrates how big data
from social media might be used to in new ways to evaluate population health.
https://www.dicardiology.com/article/understanding-how-big-data-will-change-healthcare
• Satellite imagery has the
power to capture high-
resolution, real-time data
• providing more frequent and
higher resolution information
about girls’ and women’s lives
• reveal pockets of gender
inequalities that are typically
masked by averages on the
country or district level.
Examples of
projects
• AI
https://www.sciencedirect.com/science/article/pii/S0735109717368456
• AI techniques in cardiovascular medicine:
• to explore novel genotypes and phenotypes in
existing diseases
• improve the quality of patient care
• enable cost-effectiveness,
• reduce readmission and mortality rates.
• Deep
learning
• “ … CAD system based on one of the most successful object
detection frameworks, Faster R-CNN. The system detects and
classifies malignant or benign lesions on a mammogram without any
human intervention. The proposed method sets the state of the art
classification performance on the public INbreast database, AUC =
0.95.”
• Deep learning
https://deepmind.com/blog/moorfields-major-milestone/#image-31930
• Deep learning architecture
• To a clinically heterogeneous
set of three-dimensional
optical coherence tomography
scans
• Demonstrate performance in
making a referral
recommendation that reaches
or exceeds that of experts on
a range of sight-threatening
retinal diseases after training
on only 14,884 scans.
https://www.nature.com/articles/s41591-018-0107-6/figures/1
• Detecting lung nodules using deep learning. Tim Salimans, Open AI.
• “Lung cancer is the leading cause of cancer-related death worldwide.
• However, large-scale implementation of such screening programs requires
radiologists to evaluate a huge number of scans, which is costly and error-prone.
• Aidence is an Amsterdam start-up developing an AI assistant for helping radiologists
with detecting, reporting and tracking of lung nodules. This talk covers the deep
learning techniques that we use to obtain state of the art”
https://ulasbagci.wordpress.com/2015/08/06/new-publication-in-deep-learning-in-medical-image-
analysis-dlmia-miccai-2015/
https://healthmanagement.org/c/imaging/news/deep-learning-technique-identifies-and-
segments-tumours
• “Streams, which was developed in partnership with technology company
DeepMind, uses a range of test result data to identify which patients could be in
danger of developing AKI and means doctors and nurses can respond in minutes
rather than hours or days - potentially saving lives. More than 26 doctors and
nurses at the Royal Free Hospital are now using Streams and each day it is
alerting them to an average of 11 patients at risk of acute kidney injury.”
https://www.royalfree.nhs.uk/news-media/news/new-app-helping-to-improve-patient-care/
• Machine learning
https://www.slideshare.net/0xdata/machine-learning-in-modern-medicine-with-erin-ledell-at-stanford-med
• AI
https://www.cedars-sinai.edu/Patients/Programs-and-Services/Medicine-Department/Artificial-Intelligence-in-
Medicine-AIM/
• Algorithms have been developed to:
• Take raw digital data output by the gamma camera
• identify where the heart is, reconstruct it into tomographic images and re-
orient those images
• Take tomographic images of the heart
• evaluate the signals from several hundred portions of the myocardium,
comparing the strength of the signals with those expected in a normal
heart and generate an exact quantitative measurement of the location,
extent and severity of perfusion abnormalities of the heart.
• Analyze the dynamic functioning of the heart (i.e., the way it contracts and
thickens during its cycle).
• A dynamic measurement of the heart cavity volume is performed from
electrocardiographically gated three-dimensional nuclear cardiology
images by automatically identifying the endocardial and epicardial
surfaces and following their motion throughout the cardiac cycle.
http://depend.csl.illinois.edu/our-research/health-analytics-and-
systems/omics/#sthash.ZMOPrIxM.dpbs
• Despite their differences, EBM and ML can assist one another.
• Algorithms can facilitate more precise estimates of individual risk, with implications
for choice between diagnostic tests or therapies that can then be compared in
prospective, adaptive, randomized controlled trials.
• Mendelian randomization and statistical analyses based on directed acyclic
graphs and different matching techniques.
• validate causal inferences based on ML associations.
• Clinical trials can compare ML-based interventions with usual care
• to assess their feasibility and validity in routine care.
• ML needs to develop common nomenclatures, evaluation and reporting
standards, comparative analyses of different algorithms, and training
programs for clinicians
http://annals.org/aim/article-abstract/2680060/machine-learning-evidence-based-medicine
Product?
Summary
• Big Data = Lots and variety of data
• Artificial intelligence, machine learning , deep learning
• Who ?
• Expert in the area of interest – YOU
• Expert in the computer science – Data engineering , Data scientists
• Expert in disease modelling – statisticians, epidemiologists
• Data source, algorithm, prediction, deployment
• Resources – HPC, GPU-based analysis
• Opportunity , new field

Big Data and Artificial Intelligence