l 0 (
l( 0 t
l) 0 x w m
l m
q v
l m m
q z 2 S ee V
q
q
l x m% w m
q RURae gV VRc X
q t
q RURae gV VRc X ' RUgVcdRc R ReeRT
Issues
l m
q v
l m m
q z 2 S ee V
q
q
l x m% w m
q RURae gV VRc X
q t
q RURae gV VRc X ' RUgVcdRc R ReeRT
Issues
l 3
l 3
l 3 $
l 3 $
l 3 $
l 3 $
l 3 $
l 3 $
qB B m !KVURdjd" (),
q $  
q $
q ()+ ><9t $ $ • (),
q )')( ! ((( gd )-(% (( "
q ‘ z
q 2 oB Bt
m
o 3
7
m
o 3
7
q
q
q
l
q 2 t
l t
q
qm t7n
qm t t7n
qm t y 7n
l
NCCN Guidelines Version 4.2014
Non-Small Cell Lung Cancer
NCCN Guidelines Index
NSCLC Table of Contents
Discussion
Version 4.2014, 06/05/14 © National Comprehensive Cancer Network, Inc. 2014, All rights reserved. The NCCN Guidelines®
and this illustration may not be reproduced in any form without the express written permission of NCCN®
.
Note: All recommendations are category 2A unless otherwise indicated.
Clinical Trials: NCCN believes that the best management of any cancer patient is in a clinical trial. Participation in clinical trials is especially encouraged.
NSCL-2
dT3, N0 related to size or satellite nodules.
fTesting is not listed in order of priority and is dependent upon clinical
circumstances, institutional processes, and judicious use of resources.
gMethods for evaluation include mediastinoscopy, mediastinotomy, EBUS, EUS,
and CT-guided biopsy.
hPositive PET/CT scan findings for distant disease need pathologic or other
radiologic confirmation. If PET/CT scan is positive in the mediastinum, lymph
node status needs pathologic confirmation.
iSee Principles of Surgical Therapy (NSCL-B).
jSee Principles of Radiation Therapy (NSCL-C).
kSee Chemotherapy Regimens for Neoadjuvant and Adjuvant Therapy (NSCL-D).
lExamples of high-risk factors may include poorly differentiated tumors (including
lung neuroendocrine tumors [excluding well-differentiated neuroendocrine tumors]),
vascular invasion, wedge resection, tumors >4 cm, visceral pleural involvement,
and incomplete lymph node sampling (Nx). These factors independently may not
be an indication and may be considered when determining treatment with adjuvant
chemotherapy.
mSee Chemotherapy Regimens Used with Radiation Therapy (NSCL-E).
CLINICAL ASSESSMENT PRETREATMENT EVALUATIONf INITIAL TREATMENT
Stage IA
(peripheral T1ab, N0)
Stage IB
(peripheral T2a, N0)
Stage I
(central T1ab–T2a, N0)
Stage II
(T1ab–2ab, N1; T2b, N0)
Stage IIB
(T3, N0)d
• PFTs (if not previously
done)
• Bronchoscopy
(intraoperative
preferred)
• Pathologic mediastinal
lymph node evaluationg
(category 2B)
• PET/CT scanh (if not
previously done)
• PFTs (if not previously
done)
• Bronchoscopy
• Pathologic mediastinal
lymph node evaluationg
• PET/CT scanh (if not
previously done)
• Brain MRI (Stage II,
Stage IB [category 2B])
Negative
mediastinal
nodes
Positive
mediastinal
nodes
Operable
Medically
inoperable
Negative
mediastinal
nodes
Positive
mediastinal
nodes
Operable
Medically
inoperable
Surgical exploration and
resectioni + mediastinal lymph
node dissection or systematic
lymph node sampling
Definitive RT including stereotactic
ablative radiotherapyj (SABR)
See Stage IIIA (NSCL-8) or Stage IIIB (NSCL-11)
Surgical exploration and
resectioni + mediastinal lymph
node dissection or systematic
lymph node sampling
N0
N1
See Stage IIIA (NSCL-8) or Stage IIIB (NSCL-11)
Definitive RT
including SABRj
Definitive chemoradiationj,m
See Adjuvant
Treatment (NSCL-3)
See Adjuvant
Treatment (NSCL-3)
Consider adjuvant
chemotherapyk
(category 2B) for
high-risk stages IB-IIl
Printed by yoon sup choi on 6/19/2014 8:23:15 PM. For personal use only. Not approved for distribution. Copyright © 2014 National Comprehensive Cancer Network, Inc., All Rights Reserved.
NCCN Guidelines Version 4.2014
Non-Small Cell Lung Cancer
NCCN Guidelines Index
NSCLC Table of Contents
Discussion
Version 4.2014, 06/05/14 © National Comprehensive Cancer Network, Inc. 2014, All rights reserved. The NCCN Guidelines®
and this illustration may not be reproduced in any form without the express written permission of NCCN®
.
Note: All recommendations are category 2A unless otherwise indicated.
Clinical Trials: NCCN believes that the best management of any cancer patient is in a clinical trial. Participation in clinical trials is especially encouraged.
NSCL-8
hPositive PET/CT scan findings for distant disease need pathologic or other
radiologic confirmation. If PET/CT scan is positive in the mediastinum, lymph
node status needs pathologic confirmation.
iSee Principles of Surgical Therapy (NSCL-B).
jSee Principles of Radiation Therapy (NSCL-C).
kSee Chemotherapy Regimens for Neoadjuvant and Adjuvant Therapy (NSCL-D).
mSee Chemotherapy Regimens Used with Radiation Therapy (NSCL-E).
nR0 = no residual tumor, R1 = microscopic residual tumor, R2 = macroscopic
residual tumor.
sPatients likely to receive adjuvant chemotherapy may be treated with induction
chemotherapy as an alternative.
MEDIASTINAL BIOPSY
FINDINGS
INITIAL TREATMENT ADJUVANT TREATMENT
T1-3, N0-1
(including T3
with multiple
nodules in
same lobe)
Surgeryi,s
Resectable
Medically
inoperable
Surgical resectioni
+ mediastinal lymph
node dissection or
systematic lymph
node sampling
See Treatment
according to clinical
stage (NSCL-2)
N0–1
N2
See NSCL-3
Margins
negative (R0)n
Sequential chemotherapyk
(category 1) + RTj
Margins
positiven
Surveillance
(NSCL-14)
R1n
R2n
Chemoradiationj
(sequentialk or concurrentm)
Surveillance
(NSCL-14)
Concurrent
chemoradiationj,m
Surveillance
(NSCL-14)
T1-2,
T3 (≥7 cm),
N2 nodes
positivei
• Brain MRI
• PET/CT
scan,h
if not
previously
done
Negative for
M1 disease
Positive
Definitive concurrent
chemoradiationj,m
(category 1)
or
Induction
chemotherapyk ± RTj
See Treatment for Metastasis
solitary site (NSCL-13) or
distant disease (NSCL-15)
No apparent
progression
Progression
Surgeryi ± chemotherapyk (category 2B)
± RTj (if not given)
RTj (if not given)
± chemotherapykLocal
Systemic
See Treatment for Metastasis
solitary site (NSCL-13) or
distant disease (NSCL-15)
T3
(invasion),
N2 nodes
positive
• Brain MRI
• PET/CT
scan,h
if not
previously
done
Negative for
M1 disease
Positive
Definitive concurrent
chemoradiationj,m
See Treatment for Metastasis
solitary site (NSCL-13) or
distant disease (NSCL-15)
Printed by yoon sup choi on 6/19/2014 8:23:15 PM. For personal use only. Not approved for distribution. Copyright © 2014 National Comprehensive Cancer Network, Inc., All Rights Reserved.
l %q
l 

l o o
q@f R e fTY
q $ w$ TRcV r 

l %
q
q s
Over the course of a career, an oncologist may impart bad news an average of 20,000 times,
but most practicing oncologists have never received any formal training to help them
prepare for such conversations.
High levels of empathy in primary care physicians correlate with 

better clinical outcomes for their patients with diabetes
p 3
We identified 384 empathic opportunities and found that physicians 

had responded empathically to 39 (10%) of them
In 398 conversations, the total empathic opportunities was 292.
When they occurred, oncologists responded with continuers 22% of the time.
h ‘ i p 3
q t )2 w &
q t 2 w &
q t +2 t &
h ‘ i p 3
• "시험 성적만 우수하면 의대에 들어오고 시험만 통과하면 전문
의 자격증을 취득할 수 있는 의사양성체계는 문제가 있다.”
• "의사를 업으로 삼으려면 서비스 마인드와 공감능력 및 의사소
통 분야에 뛰어나야 하는데 기질적으로 이와 맞지 않는 학생이
입학하게 되면 방황할 수 밖에 없다”
• "사실 환자와 많이 접촉하는 전공의 시절에 환자안전, 환자와
의 의사소통 등 의학 이외의 것에 대한 교육이 절실하지만 당
장 눈앞에 환자를 진료하기에 급급해 교육을 받지 못하고 있다"
l p ) y
q
q $ $
l p
)
0 p p
Hojat M et al. Acad Med. 2009
differences in empathy scores between
the two groups varied from a low of 0.05
(in year 0) to a maximum of 0.75 (in year
3). The effect size of the decline in empathy
from year 0 to year 3 was more than double
for those who chose technology-oriented
specialties (d ϭ 1.01) compared with their
counterparts in people-oriented specialties
(d ϭ 0.44).
Discussion
The results of this study showed a
significant decline in mean empathy
scores in th
The pattern
men and wo
pursued the
people-orie
specialties.
findings, ou
obtained a h
than men,11
people-orie
their counte
specialties.1
It is interest
magnitude
the effect si
men compa
those who p
careers com
in people-or
aforementio
with lower e
of medical s
interested in
specialties) l
medical sch
higher empa
pattern of fi
“at-risk” me
vulnerable t
What happ
medical sch
heart by wh
generates a
Figure 2 Changes in mean Jefferson Scale of Physician Empathy (JSPE) scores in different years of
medical school for 56 men and 65 women who identified themselves at all five administrations of
the JSPE (“matched cohort”) at Jefferson Medical College, Philadelphia, Pennsylvania, 2002–2008.
Empathy
matriculants entering in 2002. This is also
reflected in the total matched cohort.
Figure 1 shows a graphical presentation
of the changes in mean empathy scores
for the matched and unmatched cohorts.
As shown in the figure, the patterns of
changes are very similar in the matched
and unmatched cohorts.
Gender differences
We compared changes in empathy scores
during medical school for men (n ϭ 56)
and women (n ϭ 65) in the matched
cohort. Results are depicted in Figure 2.
As shown in the figure, women
consistently outscored men in every year
of medical school. Gender differences in
all of the test administrations were
statistically significant (P Ͻ .05, by t test).
As shown in Figure 2, although the
pattern of change in empathy scores for
women paralleled that of men, the effect
size estimates of these changes varied
from a low of 0.37 (in year 2) to a high of
0.79 (in year 3). The effect size of the
decline in empathy between year 0 and
year 3 was much larger for men (d ϭ
0.79) than for women (d ϭ 0.56).
Differences across specialties
Changes in empathy scores were
compared for 85 graduates in the
matched cohort who pursued their
residency training in “people-oriented”
specialties (e.g., family medicine,
internal medicine, pediatrics,
emergency medicine, psychiatry,
obstetrics–gynecology) and 36
who pursued their training in
“technology-oriented” specialties (e.g.,
anesthesiology, pathology, radiology,
surgery, orthopedic surgery, etc.).
Results appear in Figure 3. As shown in
the figure, those who pursued
people-oriented specialties consistently
scored higher in all years of medical
school than did their counterparts who
pursued technology-oriented
specialties. However, the difference in
empathy scores between the two groups
became statistically significant starting
from year 2 of medical school (P Ͻ .05,
by t test). The effect size estimates of
Figure 1 Changes in mean Jefferson Scale of Physician Empathy (JSPE) scores in different years of
medical school for the matched cohort (n ϭ 121), who identified themselves at all five
administrations of the JSPE, and the unmatched cohort (n ϭ 335) at Jefferson Medical College,
Philadelphia, Pennsylvania, 2002–2008.
medical school.
†
F(4,296) ϭ 14.4; P Ͻ .001.
‡
F(4,179) ϭ 11.7; P Ͻ .001.
¶
F(4,479) ϭ 25.5; P Ͻ .001.
Academic Medicine, Vol. 84, No. 9 / September 2009 1187
) m
q w $
q w $
l
q t
q
q s 

l o ” w m
l x w
m3
m3
& &
m3
& &
m
m
q &
q &
m
m
l x • $
q &
40
50
60
70
80
9 :
40
50
60
70
80
9 
 : 

69.5%
63%
49.5%
72.5%
57.5%
!
"
! "
b !
AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380.
q 2 ((
q 92 !-(( "
q :2 ! ( "
q 2 !)0 $ , "
q 2 NMFG
  o
Digital Healthcare Institute
Director,Yoon Sup Choi, PhD
yoonsup.choi@gmail.com
v ! "
0
50
100
150
200
w/o AI w/ AI
0
50
100
150
200
w/o AI w/ AI
188m
154m
180m
108m
saving 40%
of time
saving 18%
of time
4 5
 
o p m
q 2 ((
q 92 !-(( "
q :2 ! ( "
q 2 !)0 $ , "
q 2 NMFG
AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380.
Digital Healthcare Institute
Director,Yoon Sup Choi, PhD
yoonsup.choi@gmail.com
https://blogs.nvidia.com/blog/2016/09/19/deep-learning-breast-cancer-diagnosis/
p ! o
Yun Liu et al. Detecting Cancer Metastases on Gigapixel Pathology Images (2017)
l p s /($/ ..$+
q80>H2 >H 0z $ w
q>JG;2 >H )',$ )' $ )$ $ ,$ 0z w
q $ >H $ w t
l o p -) t &&
q v • •
q $ $ w z t
modeled separately. For micrometastases, sensitivity was significantly higher with
Negative
(Specificity)
Micromet
(Sensitivity)
Macromet
(Sensitivity)
0.7
0.5
0.6
0.8
0.9
1.0
p=0.02
A B
Performance
Unassisted
Assisted
FIGURE 3. Improved metastasis detection with algorithm assistance. A, Data represents performance across all images by image
category and assistance modality. Error bars indicate SE. The performance metric corresponds to corresponds to specificity for
negative cases and sensitivity for micrometastases (micromet) and macrometastases (macromet). B, Operating point of individual
pathologists with and without assistance for micrometastases and negative cases, overlayed on the receiver operating characteristic
curve of the algorithm. AUC indicates area under the curve.
Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved. www.ajsp.com | 5
• 민감도
• 인공지능을 사용한 경우 Micromet의 경우에 유의미하게 상승
• Negative와 Macromet은 유의미하지 않음
• AUC
• 병리학 전문의 혼자 or 인공지능 혼자보다,
• 병리학 전문의+인공지능이 조금 더 높음
isolated diagnostic tasks. Underlying these exciting advances,
however, is the important notion that these algorithms do not
replace the breadth and contextual knowledge of human
pathologists and that even the best algorithms would need to
from 83% to 91% and resulted in higher overall diagnostic
accuracy than that of either unassisted pathologist inter-
pretation or the computer algorithm alone. Although deep
learning algorithms have been credited with comparable
Unassisted Assisted
TimeofReview(seconds)
Timeofreviewperimage(seconds)
Negative ITC Micromet Macromet
A B
p=0.002
p=0.02
Unassisted
Assisted
Micrometastases
FIGURE 5. Average review time per image decreases with assistance. A, Average review time per image across all pathologists
analyzed by category. Black circles are average times with assistance, gray triangles represent average times without assistance.
Error bars indicate 95% confidence interval. B, Micrometastasis time of review decreases for nearly all images with assistance.
Circles represent average review time for each individual micrometastasis image, averaged across the 6 pathologists by assistance
modality. The dashed lines connect the points corresponding to the same image with and without assistance. The 2 images that
were not reviewed faster on average with assistance are represented with red dot-dash lines. Vertical lines of the box represent
quartiles, and the diamond indicates the average of review time for micrometastases in that modality. Micromet indicates mi-
crometastasis; macromet, macrometastasis.
8 | www.ajsp.com Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved.
• 판독 시간 (per image)
• 인공지능의 보조를 받으면, Negative와 Micromet 은 유의미하게 감소
• 특히, Micromet은 약 2분에서 1분으로 절반 가량 감소
• ITC(Isolated Tumor Cell)와 Negative는 유의미하지 않음
Deep Learning Automatic Detection Algorithm for Malignant Pulmonary Nodules
Table 3: Patient Classification and Nodule Detection at the Observer Performance Test
Observer
Test 1
DLAD versus Test 1
(P Value) Test 2
Test 1 versus Test 2 (P
Value)
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC FOM)
Radiograph
Classification
Nodule
Detection
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC
FOM)
Radiograph
Classification
Nodule
Detection
Nonradiology
physicians
Observer 1 0.77 0.716 ,.001 ,.001 0.91 0.853 ,.001 ,.001
Observer 2 0.78 0.657 ,.001 ,.001 0.90 0.846 ,.001 ,.001
Observer 3 0.80 0.700 ,.001 ,.001 0.88 0.783 ,.001 ,.001
Group 0.691 ,.001* 0.828 ,.001*
Radiology residents
Observer 4 0.78 0.767 ,.001 ,.001 0.80 0.785 .02 .03
Observer 5 0.86 0.772 .001 ,.001 0.91 0.837 .02 ,.001
Observer 6 0.86 0.789 .05 .002 0.86 0.799 .08 .54
Observer 7 0.84 0.807 .01 .003 0.91 0.843 .003 .02
Observer 8 0.87 0.797 .10 .003 0.90 0.845 .03 .001
Observer 9 0.90 0.847 .52 .12 0.92 0.867 .04 .03
Group 0.790 ,.001* 0.867 ,.001*
Board-certified
radiologists
Observer 10 0.87 0.836 .05 .01 0.90 0.865 .004 .002
Observer 11 0.83 0.804 ,.001 ,.001 0.84 0.817 .03 .04
Observer 12 0.88 0.817 .18 .005 0.91 0.841 .01 .01
Observer 13 0.91 0.824 ..99 .02 0.92 0.836 .51 .24
Observer 14 0.88 0.834 .14 .03 0.88 0.840 .87 .23
Group 0.821 .02* 0.840 .01*
Thoracic radiologists
Observer 15 0.94 0.856 .15 .21 0.96 0.878 .08 .03
Observer 16 0.92 0.854 .60 .17 0.93 0.872 .34 .02
Observer 17 0.86 0.820 .02 .01 0.88 0.838 .14 .12
Observer 18 0.84 0.800 ,.001 ,.001 0.87 0.827 .02 .02
Group 0.833 .08* 0.854 ,.001*
Note.—Observer 4 had 1 year of experience; observers 5 and 6 had 2 years of experience; observers 7–9 had 3 years of experience; observers
10–12 had 7 years of experience; observers 13 and 14 had 8 years of experience; observer 15 had 26 years of experience; observer 16 had 13
years of experience; and observers 17 and 18 had 9 years of experience. Observers 1–3 were 4th-year residents from obstetrics and gynecolo-
의사
인공지능 vs. 의사만
(p value) 의사+인공지능
의사 vs. 의사+인공지능
(p value)
영상의학과 1년차 전공의
영상의학과 2년차 전공의
영상의학과 3년차 전공의
산부인과 4년차 전공의
정형외과 4년차 전공의
내과 4년차 전공의
영상의학과 전문의
7년 경력
8년 경력
영상의학과 전문의 (흉부)
26년 경력
13년 경력
9년 경력
영상의학과 전공의
비영상의학과 의사
•인공지능을 second reader로 활용하면 정확도가 개선
•classification: 17 of 18 명이 개선 (15 of 18, P<0.05)
•nodule detection: 18 of 18 명이 개선 (14 of 18, P<0.05)
! t " ! "
)((

! P"! w "
! t " ! "
)((

! P"! w "
! t " ! "
)((

! P"! w "
! t " ! "
)((

! P"! w "
! t " ! "
)((

! P"! w "
! t " ! "
)((

! P"! w "
! t " ! "
)((

! P"! w "
! t " ! "
)((

! P"! w "
s 7E
Deep Learning Automatic Detection Algorithm for Malignant Pulmonary Nodules
Table 3: Patient Classification and Nodule Detection at the Observer Performance Test
Observer
Test 1
DLAD versus Test 1
(P Value) Test 2
Test 1 versus Test 2 (P
Value)
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC FOM)
Radiograph
Classification
Nodule
Detection
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC
FOM)
Radiograph
Classification
Nodule
Detection
Nonradiology
physicians
Observer 1 0.77 0.716 ,.001 ,.001 0.91 0.853 ,.001 ,.001
Observer 2 0.78 0.657 ,.001 ,.001 0.90 0.846 ,.001 ,.001
Observer 3 0.80 0.700 ,.001 ,.001 0.88 0.783 ,.001 ,.001
Group 0.691 ,.001* 0.828 ,.001*
Radiology residents
Observer 4 0.78 0.767 ,.001 ,.001 0.80 0.785 .02 .03
Observer 5 0.86 0.772 .001 ,.001 0.91 0.837 .02 ,.001
Observer 6 0.86 0.789 .05 .002 0.86 0.799 .08 .54
Observer 7 0.84 0.807 .01 .003 0.91 0.843 .003 .02
Observer 8 0.87 0.797 .10 .003 0.90 0.845 .03 .001
Observer 9 0.90 0.847 .52 .12 0.92 0.867 .04 .03
Group 0.790 ,.001* 0.867 ,.001*
Board-certified
radiologists
Observer 10 0.87 0.836 .05 .01 0.90 0.865 .004 .002
Observer 11 0.83 0.804 ,.001 ,.001 0.84 0.817 .03 .04
Observer 12 0.88 0.817 .18 .005 0.91 0.841 .01 .01
Observer 13 0.91 0.824 ..99 .02 0.92 0.836 .51 .24
Observer 14 0.88 0.834 .14 .03 0.88 0.840 .87 .23
Group 0.821 .02* 0.840 .01*
Thoracic radiologists
Observer 15 0.94 0.856 .15 .21 0.96 0.878 .08 .03
Observer 16 0.92 0.854 .60 .17 0.93 0.872 .34 .02
Observer 17 0.86 0.820 .02 .01 0.88 0.838 .14 .12
Observer 18 0.84 0.800 ,.001 ,.001 0.87 0.827 .02 .02
Group 0.833 .08* 0.854 ,.001*
Note.—Observer 4 had 1 year of experience; observers 5 and 6 had 2 years of experience; observers 7–9 had 3 years of experience; observers
10–12 had 7 years of experience; observers 13 and 14 had 8 years of experience; observer 15 had 26 years of experience; observer 16 had 13
years of experience; and observers 17 and 18 had 9 years of experience. Observers 1–3 were 4th-year residents from obstetrics and gynecolo-
의사
인공지능 vs. 의사만
(p value) 의사+인공지능
의사 vs. 의사+인공지능
(p value)
영상의학과 1년차 전공의
영상의학과 2년차 전공의
영상의학과 3년차 전공의
산부인과 4년차 전공의
정형외과 4년차 전공의
내과 4년차 전공의
영상의학과 전문의
7년 경력
8년 경력
영상의학과 전문의 (흉부)
26년 경력
13년 경력
9년 경력
영상의학과 전공의
비영상의학과 의사
인공지능 0.91 0.885
•“인공지능 혼자” 한 것이 “영상의학과 전문의+인공지능”보다 대부분 더 정확
•classification: 9명 중 6명보다 나음
•nodule detection: 9명 전원보다 나음
m
m
q &
l • $
/ & /+& /,& /.&
l
l (
l
l
l
l
l (
l
l
l
l (
l
l
l (
j m m w $
u o $k
James Albaugh, Boeing, 2011
)
())
(( % ()))1. %)1/)
)(( )((
!U d X W eYV ;cVh"
w $
!U d X W eYV ;cVh"
w $
q.. “
q /+/
q
o
!U d X W eYV ;cVh"
w $
q.. “
q /+/
q
o
‘
o
3
http://dimg.donga.com/wps/NEWS/IMAGE/2017/06/19/84945511.1.edit.jpg
(출처: 동아일보)
l m
q v
l m m
q z 2 S ee V
q
q
l x m% w m
q t
q t $ $ t
q RURae gV VRc X ' RUgVcdRc R ReeRT
Issues
m m
qt w $ t
q t v t &
q5 Y ?UZQ0 o m $

q 7
q t t
q t
qt w $ t
q t v t &
q5 Y ?UZQ0 o m $

q 7
q t t
q t
m m
http://www.mobihealthnews.com/content/fda-issues-three-guidances-including-long-awaited-cds-guidelines
q;<K PQS QQ R TaYMZ UZb bQYQZ
q m 67F $ t
l(& . 974 7E m
l<7d 7E0 G O Z AJ &&
l s 7E
l m
l YU P 7E PQ QO U Z x m
l( YU P 7E w r (s v 

l
l &s /&&
l p m nn .-$ ./$+ =4 4
l974m PQ Z b QYM WQ QbUQc M TcMe
l $
q 2 ' ' ! ' ' "
q ' 2 ' c
q v z
qacV%dTcVV X2 9A$ eYV Yf R U Te c
qU fS V cVRU X2 9A Yf R U Te c
qU fS V TYVT !dVT U a "2 Yf R U Te c$
eYV 9A
Assisting Pathologists in Detecting Cancer
with Deep Learning
윤곽선, 색상, heat map 등으로 표시 질병의 유무, 질병의 중증도 등을 제시
l $
q 2 ' ' ! ' ' "
q ' 2 ' c
q v z
qacV%dTcVV X2 9A$ eYV Yf R U Te c
qU fS V cVRU X2 9A Yf R U Te c
qU fS V TYVT !dVT U a "2 Yf R U Te c$
eYV 9A
m m3
THEBLACKBOX
2 0 | N A T U R E | V O L 5 3 8 | 6 O C T O B E R 2 0 1 6
THEBLACKBOX OFAI
Animal Intelligence: Clever Hans
Animal Intelligence: Clever Hans
• Clever Hans was an horse that was claimed to have been able to perform arithmetic.
• After a formal investigation in 1907, psychologist Oskar Pfungst demonstrated that the horse was 



not actually performing these mental tasks, but was watching the reactions of his human observers.
• The trainer was entirely unaware that he was providing such cues.
https://namkugkim.wordpress.com/2017/05/15/
“실제로 서울아산병원 흉부 X-ray를 학습시켰을때, 다른 질환과
다른게 Cardiomegaly (심장 비대)의 경우 학습결과는 좋았으나 전
혀 다른 것을 학습한다는 것을 이것을 통해 알게 되었다.
Cardiomegaly는 심장이 커지는 것을 X-ray로 진단하는 것인데, 딥
러닝이 실제로 심장의 크기를 보는 것이 아니라, 이런 질환을 가진
환자의 X-ray에 있는 특징인 수술자국을 보고 있다는 것을 CAM으
로 확인하였다."
- 서울아산병원 김남국 교수님 블로그
l $
q 2 ' ' ! ' ' "
q ' 2 ' c
q v z
q Q O QQZUZS2 9A s eYV Yf R U Te c
qP aN Q QMPUZS2 9A Yf R U Te c
qP aN Q OTQOW2 Yf R U Te c s 9A
l m
q v
l m m
q z 2 S ee V
q
q
l x m% w m
q t
q t $ $ t
q RURae gV VRc X ' RUgVcdRc R ReeRT
Issues
The new engl and jour nal of medicine
original article
Single Reading with Computer-Aided
Detection for Screening Mammography
Fiona J. Gilbert, F.R.C.R., Susan M. Astley, Ph.D., Maureen G.C. Gillan, Ph.D.,
Olorunsola F. Agbaje, Ph.D., Matthew G. Wallis, F.R.C.R.,
Jonathan James, F.R.C.R., Caroline R.M. Boggis, F.R.C.R.,
and Stephen W. Duffy, M.Sc., for the CADET II Group*
From the Aberdeen Biomedical Imaging
Centre, University of Aberdeen, Aberdeen
(F.J.G., M.G.C.G.); the Department of Im-
aging Science and Biomedical Engineer-
ing,UniversityofManchester,Manchester
(S.M.A.); the Department of Epidemiolo-
gy, Mathematics, and Statistics, Wolfson
Institute of Preventive Medicine, London
(O.F.A., S.W.D.); the Cambridge Breast
Unit, Addenbrookes Hospital, Cambridge
(M.G.W.); the Nottingham Breast Insti-
tute, Nottingham City Hospital, Notting-
ham (J.J.); and the Nightingale Breast
Screening Unit, Wythenshawe Hospital,
Manchester (C.R.M.B.) — all in the Unit-
ed Kingdom. Address reprint requests to
Dr. Gilbert at the Aberdeen Biomedical
Imaging Centre, University of Aberdeen,
Lilian Sutton Bldg., Foresterhill, Aberdeen
AB25 2ZD, Scotland, United Kingdom, or
at f.j.gilbert@abdn.ac.uk.
*The members of the Computer-Aided
Detection Evaluation Trial II (CADET II)
group are listed in the Appendix.
This article (10.1056/NEJMoa0803545)
was published at www.nejm.org on Oc-
tober 1, 2008.
N Engl J Med 2008;359:1675-84.
Copyright © 2008 Massachusetts Medical Society.
ABSTR ACT
Background
The sensitivity of screening mammography for the detection of small breast can-
cers is higher when the mammogram is read by two readers rather than by a single
reader. We conducted a trial to determine whether the performance of a single reader
using a computer-aided detection system would match the performance achieved by
two readers.
Methods
The trial was designed as an equivalence trial, with matched-pair comparisons be-
tween the cancer-detection rates achieved by single reading with computer-aided de-
tection and those achieved by double reading. We randomly assigned 31,057 women
undergoing routine screening by film mammography at three centers in England to
double reading, single reading with computer-aided detection, or both double read-
ing and single reading with computer-aided detection, at a ratio of 1:1:28. The pri-
mary outcome measures were the proportion of cancers detected according to regi-
men and the recall rates within the group receiving both reading regimens.
Results
The proportion of cancers detected was 199 of 227 (87.7%) for double reading and
198 of 227 (87.2%) for single reading with computer-aided detection (P=0.89). The
overall recall rates were 3.4% for double reading and 3.9% for single reading with
computer-aided detection; the difference between the rates was small but significant
(P<0.001). The estimated sensitivity, specificity, and positive predictive value for single
reading with computer-aided detection were 87.2%, 96.9%, and 18.0%, respectively.
The corresponding values for double reading were 87.7%, 97.4%, and 21.1%. There
were no significant differences between the pathological attributes of tumors de-
tected by single reading with computer-aided detection alone and those of tumors
detected by double reading alone.
Conclusions
Single reading with computer-aided detection could be an alternative to double read-
ing and could improve the rate of detection of cancer from screening mammograms
read by a single reader. (ClinicalTrials.gov number, NCT00450359.)
Mammography
• single reading+CAD vs. double reading
• Outcome: Cancer detection rate / Recall rate
The new engl and jour nal of medicine
original article
Single Reading with Computer-Aided
Detection for Screening Mammography
Fiona J. Gilbert, F.R.C.R., Susan M. Astley, Ph.D., Maureen G.C. Gillan, Ph.D.,
Olorunsola F. Agbaje, Ph.D., Matthew G. Wallis, F.R.C.R.,
Jonathan James, F.R.C.R., Caroline R.M. Boggis, F.R.C.R.,
and Stephen W. Duffy, M.Sc., for the CADET II Group*
From the Aberdeen Biomedical Imaging
Centre, University of Aberdeen, Aberdeen
(F.J.G., M.G.C.G.); the Department of Im-
aging Science and Biomedical Engineer-
ing,UniversityofManchester,Manchester
(S.M.A.); the Department of Epidemiolo-
gy, Mathematics, and Statistics, Wolfson
Institute of Preventive Medicine, London
(O.F.A., S.W.D.); the Cambridge Breast
Unit, Addenbrookes Hospital, Cambridge
(M.G.W.); the Nottingham Breast Insti-
tute, Nottingham City Hospital, Notting-
ham (J.J.); and the Nightingale Breast
Screening Unit, Wythenshawe Hospital,
Manchester (C.R.M.B.) — all in the Unit-
ed Kingdom. Address reprint requests to
Dr. Gilbert at the Aberdeen Biomedical
Imaging Centre, University of Aberdeen,
Lilian Sutton Bldg., Foresterhill, Aberdeen
AB25 2ZD, Scotland, United Kingdom, or
at f.j.gilbert@abdn.ac.uk.
*The members of the Computer-Aided
Detection Evaluation Trial II (CADET II)
group are listed in the Appendix.
This article (10.1056/NEJMoa0803545)
was published at www.nejm.org on Oc-
tober 1, 2008.
N Engl J Med 2008;359:1675-84.
Copyright © 2008 Massachusetts Medical Society.
ABSTR ACT
Background
The sensitivity of screening mammography for the detection of small breast can-
cers is higher when the mammogram is read by two readers rather than by a single
reader. We conducted a trial to determine whether the performance of a single reader
using a computer-aided detection system would match the performance achieved by
two readers.
Methods
The trial was designed as an equivalence trial, with matched-pair comparisons be-
tween the cancer-detection rates achieved by single reading with computer-aided de-
tection and those achieved by double reading. We randomly assigned 31,057 women
undergoing routine screening by film mammography at three centers in England to
double reading, single reading with computer-aided detection, or both double read-
ing and single reading with computer-aided detection, at a ratio of 1:1:28. The pri-
mary outcome measures were the proportion of cancers detected according to regi-
men and the recall rates within the group receiving both reading regimens.
Results
The proportion of cancers detected was 199 of 227 (87.7%) for double reading and
198 of 227 (87.2%) for single reading with computer-aided detection (P=0.89). The
overall recall rates were 3.4% for double reading and 3.9% for single reading with
computer-aided detection; the difference between the rates was small but significant
(P<0.001). The estimated sensitivity, specificity, and positive predictive value for single
reading with computer-aided detection were 87.2%, 96.9%, and 18.0%, respectively.
The corresponding values for double reading were 87.7%, 97.4%, and 21.1%. There
were no significant differences between the pathological attributes of tumors de-
tected by single reading with computer-aided detection alone and those of tumors
detected by double reading alone.
Conclusions
Single reading with computer-aided detection could be an alternative to double read-
ing and could improve the rate of detection of cancer from screening mammograms
read by a single reader. (ClinicalTrials.gov number, NCT00450359.)
Table 1
double reading single reading & CAD
proportion of cancers detected 87.7% 87.2%
overall recall rates 3.4% 3.9%
sensitivity 87.2% 87.8%
specificity 96.9% 97.7%
positive predicted value 18.0% 21.1%
Conclusion: Single reading with computer-aided detection could be an
alternative to double reading and could improve the rate of
detection of cancer from screening mammograms read by a single
reader.
Copyright 2015 American Medical Association. All rights reserved.
Diagnostic Accuracy of Digital Screening Mammography
With and Without Computer-Aided Detection
Constance D. Lehman, MD, PhD; Robert D. Wellman, MS; Diana S. M. Buist, PhD; Karla Kerlikowske, MD;
Anna N. A. Tosteson, ScD; Diana L. Miglioretti, PhD; for the Breast Cancer Surveillance Consortium
IMPORTANCE After the US Food and Drug Administration (FDA) approved computer-aided
detection (CAD) for mammography in 1998, and the Centers for Medicare and Medicaid
Services (CMS) provided increased payment in 2002, CAD technology disseminated rapidly.
Despite sparse evidence that CAD improves accuracy of mammographic interpretations and
costs over $400 million a year, CAD is currently used for most screening mammograms in the
United States.
OBJECTIVE To measure performance of digital screening mammography with and without
CAD in US community practice.
DESIGN, SETTING, AND PARTICIPANTS We compared the accuracy of digital screening
mammography interpreted with (n = 495 818) vs without (n = 129 807) CAD from 2003
through 2009 in 323 973 women. Mammograms were interpreted by 271 radiologists from
66 facilities in the Breast Cancer Surveillance Consortium. Linkage with tumor registries
identified 3159 breast cancers in 323 973 women within 1 year of the screening.
MAIN OUTCOMES AND MEASURES Mammography performance (sensitivity, specificity, and
screen-detected and interval cancers per 1000 women) was modeled using logistic
regression with radiologist-specific random effects to account for correlation among
examinations interpreted by the same radiologist, adjusting for patient age, race/ethnicity,
time since prior mammogram, examination year, and registry. Conditional logistic regression
was used to compare performance among 107 radiologists who interpreted mammograms
both with and without CAD.
RESULTS Screening performance was not improved with CAD on any metric assessed.
Mammography sensitivity was 85.3% (95% CI, 83.6%-86.9%) with and 87.3% (95% CI,
84.5%-89.7%) without CAD. Specificity was 91.6% (95% CI, 91.0%-92.2%) with and 91.4%
(95% CI, 90.6%-92.0%) without CAD. There was no difference in cancer detection rate (4.1 in
1000 women screened with and without CAD). Computer-aided detection did not improve
intraradiologist performance. Sensitivity was significantly decreased for mammograms
interpreted with vs without CAD in the subset of radiologists who interpreted both with and
without CAD (odds ratio, 0.53; 95% CI, 0.29-0.97).
CONCLUSIONS AND RELEVANCE Computer-aided detection does not improve diagnostic
accuracy of mammography. These results suggest that insurers pay more for CAD with no
established benefit to women.
JAMA Intern Med. 2015;175(11):1828-1837. doi:10.1001/jamainternmed.2015.5231
Published online September 28, 2015.
Invited Commentary
page 1837
Author Affiliations: Department of
Radiology, Massachusetts General
Hospital, Boston (Lehman); Group
Health Research Institute, Seattle,
Washington (Wellman, Buist,
Miglioretti); Departments of
Medicine and Epidemiology and
Biostatistics, University of California,
San Francisco, San Francisco
(Kerlikowske); Norris Cotton Cancer
Center, Geisel School of Medicine at
Dartmouth, Dartmouth College,
Lebanon, New Hampshire (Tosteson);
Department of Public Health
Sciences, School of Medicine,
University of California, Davis
(Miglioretti).
Corresponding Author: Constance
D. Lehman, MD, PhD, Department of
Radiology, Massachusetts General
Hospital, Avon Comprehensive Breast
Evaluation Center, 55 Fruit St, WAC
240, Boston, MA 02114 (clehman
@mgh.harvard.edu).
Research
Original Investigation | LESS IS MORE
1828 (Reprinted) jamainternalmedicine.com
Copyright 2015 American Medical Association. All rights reserved.
647 R MYY S M Te UZ HF
q )110 ><9
q (( ;V eVcd W c EVU TRcV R U EVU TR U KVcg TVd !;EK"
q v ,((
q (( ;V eVcd W c EVU TRcV R U EVU TR U KVcg TVd !;EK"
q () R XcR 0+
t ;9<cer diagnosis within the follow-up period. True-positive
examination results were defined as those with a positive
examination assessment and breast cancer diagnosis. False-
positive examination results were examinations with a posi-
Mammography performan
using logistic regression, includ
diologist-specific random effect
tion among examinations read b
dom effects were allowed to vary
the reading. Performance measu
dian of the random effects distrib
specific relative performance wa
(OR) with 95% CIs comparing C
for patient age at diagnosis, time
year of examination, and the BC
Receiver operating characte
mated from 135 radiologists wh
mogram associated with a cance
cal logistic regression model tha
accuracy parameters to depend o
ing examination interpretation.
racy among radiologists for exa
the same condition (with or wi
threshold for recall to vary acro
mally distributed, radiologist-spe
ied by whether the radiologist us
We estimated the normalized
mary ROC curves across the obs
rates from this model.26
We plo
the false-positive rate and supe
curves.
Two separate main sensitiv
in subsets of total examinations
Figure 1. Screening Mammography Patterns From 2000 to 2012
in US Community Practices in the Breast Cancer Surveillance Consortium
(BCSC)
100
80
60
40
20
0
TypeofMammography,%
Year
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
Film Digital with CAD
Type
Digital without CAD
Data are provided from the larger BCSC population including all screening
mammograms (5.2 million mammograms) for the indicated time period.
Research Original Investigation Digital Screening Mammog
CMS 보험 혜택
5%
83%
74%
Diagnostic accuracy was not improved with CAD
on any performance metric assessed
w/ CAD w/o CAD
sensitivity 85.3% 87.3%
sensitivity for invasive cancer 82.1% 85.0%
sensitivity for DCIS 93.2% 94.3%
specificity 91.6% 91.4%
Detection Rate (Overall) 4.1 per 1000 4.1 per 1000
Detection Rate in DCIS 1.2 per 1000 0.9 per 1000
<
<
<
>
From the ROC analysis, the accuracy of mammographic
interpretations with CAD was significantly lower than for
those without CAD (P = .002). The normalized partial area
under the summary ROC curve was 0.84 for interpretations
with CAD and 0.88 for interpretations without CAD
(Figure 2). In this subset of 135 radi
at least 1 mammogram associated
sensitivity of mammography was
86.9%) with and 89.3%% (95% CI
CAD. Specificity of mammograp
90.4%-91.8%) with and 91.3% (95%
out CAD.
Differences by Age, Breast Density, Men
and Time Since Last Mammogram
We found no differences in diagnos
graphic interpretations with and w
subgroups assessed, including pat
menopausal status, and time si
(Table 3).
Intraradiologist Performance Measures f
With and Without CAD
Among 107 radiologists who interpr
with and without CAD, intraradiolog
improved with CAD, and CAD was a
sensitivity. Sensitivity of mammogr
81.0%-85.6%) with and 89.6% (95%
out CAD. Specificity of mammogra
89.8%-91.7%) with and 89.6% (95%
out CAD. The OR for specificity b
interpreted with CAD and those inte
the same radiologist was 1.02 (95% C
was significantly decreased for ma
Figure 2. Receiver Operating Characteristic Curves for Digital Screening
Mammography With and Without the Use of CAD, Estimated
From 135 Radiologists Who Interpreted at Least 1 Examination
Associated With Cancer
100
80
60
40
20
0
0 403020
True-PositiveRate,%
False-Positive Rate, %
10
No CAD use (PAUC, 0.88)
CAD use (PAUC, 0.84)
Each circle represents the true-positive or false-positive rate for a single
radiologist, for examinations interpreted with (orange) or without (blue)
computer-aided detection (CAD). Circle size is proportional to the number of
mammograms associated with cancer interpreted by that radiologist with or
without CAD. PAUC indicates partial area under the curve.
DCIS, ductal carcinoma in situ; exam, examination.
a
Odds ratio for CAD vs No CAD adjusted for site, age group, race/ethnicity, time
since prior mammogram, and calendar year of the examination using
with CAD use.
b
The 95% CIs for sensitivity and specificity are
The accuracy of mammographic interpretations with CAD
was significantly lower than for those without CAD (P = .002)
SPECIAL CONTRIBUTION
J Korean Med Assoc 2018 December; 61(12):765-775
pISSN 1975-8456 / eISSN 2093-5951
https://doi.org/10.5124/jkma.2018.61.12.765
첨단 디지털 의료기기 765
첨단 디지털 헬스케어 의료기기를 진료에 도입할 때
평가원칙
박 성 호1
・도 경 현1
・최 준 일2
・심 정 석3
・양 달 모4
・어 홍5
・우 현 식6
・이 정 민7
・정 승 은2
・오 주 형8
| 1
울산대학교 의과대학 서
울아산병원 영상의학과, 2
가톨릭대학교 의과대학 서울성모병원 영상의학과, 3
위드심의원, 4
강동경희대학교병원 영상의학과, 5
삼성서울병원
영상의학과, 6
서울대학교 의과대학 서울특별시보라매병원 영상의학과, 7
서울대학교 의과대학 서울대학교병원 영상의학과, 8
경희대학교 의
과대학 경희의료원 영상의학과
Principles for evaluating the clinical
implementation of novel digital healthcare devices
Seong Ho Park, MD1
· Kyung-Hyun Do, MD1
· Joon-Il Choi, MD2
· Jung Suk Sim, MD3
· Dal Mo Yang, MD4
· Hong Eo, MD5
· Hyunsik Woo,
MD6
· Jeong Min Lee, MD7
· Seung Eun Jung, MD2
· Joo Hyeong Oh, MD8
1
Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul;
2
Department of Radiology, Seoul St. Mary's Hospital, The Catholic University of Korea College of Medicine, Seoul; 3
Withsim Clinic,
Seongnam; 4
Department of Radiology, Kyung Hee University Hospital at Gangdong, Seoul; 5
Department of Radiology and Center
for Imaging Science, Samsung Medical Center, Seoul; 6
Department of Radiology, SMG-SNU Boramae Medical Center, Seoul National
University College of Medicine, Seoul; 7
Department of Radiology, Seoul National University Hospital, Seoul National University College
of Medicine, Seoul; 8
Department of Radiology, Kyung Hee University Hospital, Kyung Hee University College of Medicine, Seoul,
Korea
With growing interest in novel digital healthcare devices, such as artificial intelligence (AI) software for medical
diagnosis and prediction, and their potential impacts on healthcare, discussions have taken place regarding
the regulatory approval, coverage, and clinical implementation of these devices. Despite their potential, ‘digital
exceptionalism’ (i.e., skipping the rigorous clinical validation of such digital tools) is creating significant concerns
for patients and healthcare stakeholders. This white paper presents the positions of the Korean Society of
Radiology, a leader in medical imaging and digital medicine, on the clinical validation, regulatory approval,
coverage decisions, and clinical implementation of novel digital healthcare devices, especially AI software
for medical diagnosis and prediction, and explains the scientific principles underlying those positions. Mere
regulatory approval by the Food and Drug Administration of Korea, the United States, or other countries should
be distinguished from coverage decisions and widespread clinical implementation, as regulatory approval only
indicates that a digital tool is allowed for use in patients, not that the device is beneficial or recommended
for patient care. Coverage or widespread clinical adoption of AI software tools should require a thorough
clinical validation of safety, high accuracy proven by robust external validation, documented benefits for patient
outcomes, and cost-effectiveness. The Korean Society of Radiology puts patients first when considering novel
digital healthcare tools, and as an impartial professional organization that follows scientific principles and
evidence, strives to provide correct information to the public, make reasonable policy suggestions, and build
collaborative partnerships with industry and government for the good of our patients.
Key Words:Software validation; Device approval; Insurance coverage; Artificial intelligence
REVIEWSANDCOMMENTARYnREVIEW
Radiology: Volume 000: Number 0—᭿ ᭿ ᭿ 2018 n radiology.rsna.org 1
1
From the Department of Radiology and Research Institute
of Radiology, University of Ulsan College of Medicine, Asan
Medical Center, 88 Olympic-ro 43-gil, Songpa-gu, Seoul
05505, South Korea (S.H.P.); and Department of Radiology,
Research Institute of Radiological Science, Yonsei
University College of Medicine, Seoul, South Korea (K.H.).
Received August 11, 2017; revision requested October 2;
revision received October 12; accepted October 24; final
version accepted November 3. Address correspondence
to S.H.P. (e-mail: parksh.radiology@gmail.com).
S.H.P. supported by the Industrial Strategic Technology
Development Program (grant 10072064) funded by the
Ministry of Trade, Industry and Energy.
q
RSNA, 2018
The use of artificial intelligence in medicine is currently an
issue of great interest, especially with regard to the diag-
nostic or predictive analysis of medical images. Adoption
of an artificial intelligence tool in clinical practice requires
careful confirmation of its clinical utility. Herein, the au-
thors explain key methodology points involved in a clinical
evaluation of artificial intelligence technology for use in
medicine, especially high-dimensional or overparameter-
ized diagnostic or predictive models in which artificial
deep neural networks are used, mainly from the stand-
points of clinical epidemiology and biostatistics. First, sta-
tistical methods for assessing the discrimination and cali-
bration performances of a diagnostic or predictive model
are summarized. Next, the effects of disease manifesta-
tion spectrum and disease prevalence on the performance
results are explained, followed by a discussion of the dif-
ference between evaluating the performance with use of
internal and external datasets, the importance of using
an adequate external dataset obtained from a well-de-
fined clinical cohort to avoid overestimating the clinical
performance as a result of overfitting in high-dimensional
or overparameterized classification model and spectrum
bias, and the essentials for achieving a more robust clini-
cal evaluation. Finally, the authors review the role of clin-
ical trials and observational outcome studies for ultimate
clinical verification of diagnostic or predictive artificial in-
telligence tools through patient outcomes, beyond perfor-
mance metrics, and how to design such studies.
q
RSNA, 2018
Seong Ho Park, MD, PhD
Kyunghwa Han, PhD
Methodologic Guide for
Evaluating Clinical Performance
and Effect of Artificial Intelligence
Technology for Medical Diagnosis
and Prediction1
This copy is for personal use only. To order printed copies, contact reprints@rsna.org
l % m
q t ! ()/ ) "
q t7 t t7
q !A:E ORed W c G T Xj ' ?V Td"
luq % %
q t 2 t7 t7
q t t7
q t t7
x w m% m x
AI기반 의료기술(영상의학 분야)의
급여여부 평가 가이드라인 마련 연구
건강보험심사평가원
대 한 영 상 의 학 회
l
l
l m m
l x • $$
- 20 -
· ·
·
·
·
l % v x •
l 4< v m •
l u h mi t
l uq m
l?QbQ
l m % o p
l v
l  
l =?
l m m m
- 22 -
l?QbQ (
l 5UM
l 4< •
l 4< % v t
l m w ?QbQ )
- 22 -
l?QbQ )
l 5UM
l 4< “ m ?QbQ )
l ! m ?QbQ
- 22 -
4<
•검사 행위 별 가산료
• 기존 수가에 가산료를 지급하는 방안
• ‘영상의학과 전문의가 판독하는 경우 vs 다른 의사가 판독하는 경우’와 유사

•간접 보상
• 의료기관 전체에 대한 간접적인 보상(의료기관 인증, 요양급여적정성 평가...)
• 일부 환자는 이득, 일부 환자는 피해를 보는 경우 (ex. Brain CT의 판독 prioritization)

•별도 행위 신설
• 기존 검사에서 제공하지 않던 완전히 새로운 진단 정보를 제공하는 경우
• 대응되는 기존의 급여/비급여 검사가 있는 경우 or 없는 경우 

•의사 업무량의 ‘일부’에 해당하는 수가 신설
• 기존의 ‘판독료’는 영상의학과 의사의 ‘전체 판독 프로세스’의 일부에 불과
• AI가 일부 업무만을 수행하는 경우, 그에 맞는 적절한 보상 규모 산정
FM 7 F R cM Q M M QPUOM 7QbUOQ
l t h i
q ! "
q $ JO< $ $
q $ KRE<
lC Q 6Q 0 m
FM 7 F R cM Q M M QPUOM 7QbUOQ
의료기기
엑스레이 기기

혈압계
디지털 헬스케어
핏빗

런키퍼

슬립싸이클
디지털
치료제
페어 알킬리
엠페티카
얼라이브코어
프로테우스
인공지능
*SaMD: Software as a Medical Device
SaMD*
복잡한
데이터
의료 

영상 시그널
영상의학
병리
안과
피부
3 m30 Z Ne
lh i0 4PM UbQ ?QM ZUZS
q KRE< m ! T VU"n t
q 9A !Vi& v "
q ORed W c G T Xj2 '
q 2 ecfV R Rc ' WR dV R Rc v
q 2 ' ' eV UVU W c fdV
x w m
•Initial premarket review 시에, 향후 수정에 대한 관리 계획을 제출
•SaMD Pre-Specification (SPS)
• 제조사가 예상/계획하는 출시 이후 performance / input / intended use 상의 변화
• 최초 기기에서 향후 변화할 범주 (region of potential changes)를 정의
•Algorithm Change Protocol (ACP)
• SPS에서 정의된 변화에 대한 risk 를 컨트롤하기 위한 구체적인 방법
• 변화 이후에 safety/effectiveness 유지 되도록 data/procedure 에 대해 단계별 기술
C QPQ Q YUZQP OTMZSQ O Z MZ
46C!9 X c eY ;YR XV Hc e T "general overview of the components of an ACP. This is "how" the algorithm will learn and change while
remaining safe and effective.
Scope and limitations for establishing SPS and ACP: The FDA acknowledges that the types of changes
that could be pre-specified in a SPS and managed through an ACP may necessitate individual
Figure 4: Algorithm Change Protocol components
Y h eYV R X c eY h VRc R U TYR XV hY V cV R X dRWV R U VWWVTe gV&
m FCF 46Cm x m3
modifications guidance results in either 1) submission of a new 510(k) for premarket review or 2)
documentation of the modification and the analysis in the risk management and 510(k) files. If, for
AI/ML SaMD with an approved SPS and ACP, modifications are within the bounds of the SPS and the
ACP, this proposed framework suggests that manufacturers would document the change in their change
history and other appropriate records, and file for reference, similar to the “document” approach
outlined in the software modifications guidance.
Figure 5: Approach to modifications to previously approved SaMD with SPS and ACP. This flowchart should only be
considered in conjunction with the accompanying text in this white paper.
l y 4PbQ M UM 4 MOW m
q $ t
q ! v "
q ! " u $
q J Sfde Vdd gd& HVcW c R TV2
h yi m
Science 2019
l y 4PbQ M UM 4 MOW m
q $ t
q ! v "
q ! " u $
q J Sfde Vdd gd& HVcW c R TV2
h yi m
Science 2019
INSIGHTS | POLICY FORUM
case of structured data such as billing codes,
adversarial techniques could be used to au-
tomate the discovery of code combinations
that maximize reimbursement or minimize
the probability of claims rejection.
Because adversarial attacks have been
demonstrated for virtually every class of ma-
chine-learning algorithms ever studied, from
simple and readily interpretable methods
such as logistic regression to more compli-
cated methods such as deep neural networks
(1), this is not a problem specific to medicine,
and every domain of machine-learning ap-
plication will need to contend with it. Re-
searchers have sought to develop algorithms
that are resilient to adversarial attacks, such
as by training algorithms with exposure to
adversarial examples or using clever data
processing to mitigate potential tampering
(1). Early efforts in this area are promising,
and we hope that the pursuit of fully robust
machine-learning models will catalyze the
development of algorithms that learn to
make decisions for consistently explainable
and appropriate reasons. Nevertheless, cur-
rent general-use defensive techniques come
at a material degeneration of accuracy, even
if sometimes at improved explainability (10).
Thus, the models that are both highly accu-
rate and robust to adversarial examples re-
main an open problem in computer science.
These challenges are compounded in the
medical context. Medical information tech-
nology (IT) systems are notoriously difficult
to update, so any new defenses could be diffi-
cult to roll out. In addition, the ground truth
in medical diagnoses is often ambiguous,
meaning that for many cases no individual
human can definitively assign the true label At the extreme of this tactical shaping of mends billing for codes corresponding to
Benign
Malignant
Model confidence
Benign
Malignant
Model confidence
The patient has a history of
back pain and chronic alcohol
abuse and more recently has
been seen in several...
277.7 Metabolic syndrome
429.9 Heart disease, unspecified
278.00 Obesity, unspecified
401.0 Benign essential hypertension
272.0 Hypercholesterolemia
272.2 Hyperglyceridemia
429.9 Heart disease,unspecified
278.00 Obesity,unspecified
The patient has a history of
lumbago and chronic alcohol
dependence and more recently
has been seen in several...
Dermatoscopic image of a benign
melanocytic nevus, along with the
diagnostic probability computed
by a deep neural network.
Perturbation computed
by a common adversarial
attack technique.
See (7) for details.
Combined image of nevus and
attack perturbation and the
diagnostic probabilities from
the same deep neural network.
Original image
Diagnosis: Benign Diagnosis: Malignant
Opioid abuse risk: High
Reimbursement: Denied Reimbursement: Approved
Opioid abuse risk: Low
Adversarial noise
Adversarial
rotation (8)
Adversarial example
Adversarial
text substitution (9)
Adversarial
coding (13)
+ 0.04  =
The anatomy of an adversarial attack
Demonstration of how adversarial attacks against various medical AI systems might be
executed without requiring any overtly fraudulent misrepresentation of the data.
onMarhttp://science.sciencemag.org/Downloadedfrom
l 0 (
l( 0 t
l) 0 x w m
“The AI Flywheel”
o
)m
8 E
‘
‘
p
Physician Burnout
Sep 2018, Health 2.0 @Santa Clara
March 2019, the Future of Individual Medicine @San Diego
m0
40
50
60
70
80
9 :
40
50
60
70
80
9 
 : 

69.5%
63%
49.5%
72.5%
57.5%
!

! 
b !
AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380.
q 2 ((
q 92 !-(( 
q :2 ! ( 
q 2 !)0 $ , 
q 2 NMFG
  o
Digital Healthcare Institute
Director,Yoon Sup Choi, PhD
yoonsup.choi@gmail.com
  s
Deep Learning Automatic Detection Algorithm for Malignant Pulmonary Nodules
Table 3: Patient Classification and Nodule Detection at the Observer Performance Test
Observer
Test 1
DLAD versus Test 1
(P Value) Test 2
Test 1 versus Test 2 (P
Value)
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC FOM)
Radiograph
Classification
Nodule
Detection
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC
FOM)
Radiograph
Classification
Nodule
Detection
Nonradiology
physicians
Observer 1 0.77 0.716 ,.001 ,.001 0.91 0.853 ,.001 ,.001
Observer 2 0.78 0.657 ,.001 ,.001 0.90 0.846 ,.001 ,.001
Observer 3 0.80 0.700 ,.001 ,.001 0.88 0.783 ,.001 ,.001
Group 0.691 ,.001* 0.828 ,.001*
Radiology residents
Observer 4 0.78 0.767 ,.001 ,.001 0.80 0.785 .02 .03
Observer 5 0.86 0.772 .001 ,.001 0.91 0.837 .02 ,.001
Observer 6 0.86 0.789 .05 .002 0.86 0.799 .08 .54
Observer 7 0.84 0.807 .01 .003 0.91 0.843 .003 .02
Observer 8 0.87 0.797 .10 .003 0.90 0.845 .03 .001
Observer 9 0.90 0.847 .52 .12 0.92 0.867 .04 .03
Group 0.790 ,.001* 0.867 ,.001*
Board-certified
radiologists
Observer 10 0.87 0.836 .05 .01 0.90 0.865 .004 .002
Observer 11 0.83 0.804 ,.001 ,.001 0.84 0.817 .03 .04
Observer 12 0.88 0.817 .18 .005 0.91 0.841 .01 .01
Observer 13 0.91 0.824 ..99 .02 0.92 0.836 .51 .24
Observer 14 0.88 0.834 .14 .03 0.88 0.840 .87 .23
Group 0.821 .02* 0.840 .01*
Thoracic radiologists
Observer 15 0.94 0.856 .15 .21 0.96 0.878 .08 .03
Observer 16 0.92 0.854 .60 .17 0.93 0.872 .34 .02
Observer 17 0.86 0.820 .02 .01 0.88 0.838 .14 .12
Observer 18 0.84 0.800 ,.001 ,.001 0.87 0.827 .02 .02
Group 0.833 .08* 0.854 ,.001*
Note.—Observer 4 had 1 year of experience; observers 5 and 6 had 2 years of experience; observers 7–9 had 3 years of experience; observers
10–12 had 7 years of experience; observers 13 and 14 had 8 years of experience; observer 15 had 26 years of experience; observer 16 had 13
years of experience; and observers 17 and 18 had 9 years of experience. Observers 1–3 were 4th-year residents from obstetrics and gynecolo-
b $
bM aQ !
b $ !
bM aQ
)
+
,
,
,
/
0
.
)+
1
s
https://blogs.nvidia.com/blog/2016/09/19/deep-learning-breast-cancer-diagnosis/
s
modeled separately. For micrometastases, sensitivity was significantly higher with
Negative
(Specificity)
Micromet
(Sensitivity)
Macromet
(Sensitivity)
0.7
0.5
0.6
0.8
0.9
1.0
p=0.02
A B
Performance
Unassisted
Assisted
FIGURE 3. Improved metastasis detection with algorithm assistance. A, Data represents performance across all images by image
category and assistance modality. Error bars indicate SE. The performance metric corresponds to corresponds to specificity for
negative cases and sensitivity for micrometastases (micromet) and macrometastases (macromet). B, Operating point of individual
pathologists with and without assistance for micrometastases and negative cases, overlayed on the receiver operating characteristic
curve of the algorithm. AUC indicates area under the curve.
Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved. www.ajsp.com | 5
l p
l UO YQ x
l AQSM UbQ MO YQ •
l 4H6
l
l !
s
1Wang P, et al. Gut 2019;0:1–7. doi:10.1136/gutjnl-2018-317500
Endoscopy
ORIGINAL ARTICLE
Real-time automatic detection system increases
colonoscopic polyp and adenoma detection rates: a
prospective randomised controlled study
Pu Wang,  1
Tyler M Berzin,  2
Jeremy Romek Glissen Brown,  2
Shishira Bharadwaj,2
Aymeric Becq,2
Xun Xiao,1
Peixi Liu,1
Liangping Li,1
Yan Song,1
Di Zhang,1
Yi Li,1
Guangre Xu,1
Mengtian Tu,1
Xiaogang Liu  1
To cite: Wang P, Berzin TM,
Glissen Brown JR, et al. Gut
Epub ahead of print: [please
include Day Month Year].
doi:10.1136/
gutjnl-2018-317500
► Additional material is
published online only.To view
please visit the journal online
(http://dx.doi.org/10.1136/
gutjnl-2018-317500).
1
Department of
Gastroenterology, Sichuan
Academy of Medical Sciences
 Sichuan Provincial People’s
Hospital, Chengdu, China
2
Center for Advanced
Endoscopy, Beth Israel
Deaconess Medical Center and
Harvard Medical School, Boston,
Massachusetts, USA
Correspondence to
Xiaogang Liu, Department
of Gastroenterology Sichuan
Academy of Medical Sciences
and Sichuan Provincial People’s
Hospital, Chengdu, China;
Gary.samsph@gmail.com
Received 30 August 2018
Revised 4 February 2019
Accepted 13 February 2019
© Author(s) (or their
employer(s)) 2019. Re-use
permitted under CC BY-NC. No
commercial re-use. See rights
and permissions. Published
by BMJ.
ABSTRACT
Objective The effect of colonoscopy on colorectal
cancer mortality is limited by several factors, among them
a certain miss rate, leading to limited adenoma detection
rates (ADRs).We investigated the effect of an automatic
polyp detection system based on deep learning on polyp
detection rate and ADR.
Design In an open, non-blinded trial, consecutive
patients were prospectively randomised to undergo
diagnostic colonoscopy with or without assistance of a
real-time automatic polyp detection system providing
a simultaneous visual notice and sound alarm on polyp
detection.The primary outcome was ADR.
Results Of 1058 patients included, 536 were
randomised to standard colonoscopy, and 522 were
randomised to colonoscopy with computer-aided
diagnosis.The artificial intelligence (AI) system
significantly increased ADR (29.1%vs20.3%, p0.001)
and the mean number of adenomas per patient
(0.53vs0.31, p0.001).This was due to a higher number
of diminutive adenomas found (185vs102; p0.001),
while there was no statistical difference in larger
adenomas (77vs58, p=0.075). In addition, the number
of hyperplastic polyps was also significantly increased
(114vs52, p0.001).
Conclusions In a low prevalent ADR population, an
automatic polyp detection system during colonoscopy
resulted in a significant increase in the number of
diminutive adenomas detected, as well as an increase in
the rate of hyperplastic polyps.The cost–benefit ratio of
such effects has to be determined further.
Trial registration number ChiCTR-DDD-17012221;
Results.
INTRODUCTION
Colorectal cancer (CRC) is the second and third-
leading causes of cancer-related deaths in men and
women respectively.1
Colonoscopy is the gold stan-
dard for screening CRC.2 3
Screening colonoscopy
has allowed for a reduction in the incidence and
mortality of CRC via the detection and removal
of adenomatous polyps.4–8
Additionally, there is
evidence that with each 1.0% increase in adenoma
detection rate (ADR), there is an associated 3.0%
decrease in the risk of interval CRC.9 10
However,
polyps can be missed, with reported miss rates of
up to 27% due to both polyp and operator charac-
teristics.11 12
Unrecognised polyps within the visual field is
an important problem to address.11
Several studies
have shown that assistance by a second observer
increases the polyp detection rate (PDR), but such a
strategy remains controversial in terms of increasing
the ADR.13–15
Ideally, a real-time automatic polyp detec-
tion system, with performance close to that of
expert endoscopists, could assist the endosco-
pist in detecting lesions that might correspond to
adenomas in a more consistent and reliable way
Significance of this study
What is already known on this subject?
► Colorectal adenoma detection rate (ADR)
is regarded as a main quality indicator of
(screening) colonoscopy and has been shown
to correlate with interval cancers. Reducing
adenoma miss rates by increasing ADR has
been a goal of many studies focused on
imaging techniques and mechanical methods.
► Artificial intelligence has been recently
introduced for polyp and adenoma detection
as well as differentiation and has shown
promising results in preliminary studies.
What are the new findings?
► This represents the first prospective randomised
controlled trial examining an automatic polyp
detection during colonoscopy and shows an
increase of ADR by 50%, from 20% to 30%.
► This effect was mainly due to a higher rate of
small adenomas found.
► The detection rate of hyperplastic polyps was
also significantly increased.
How might it impact on clinical practice in the
foreseeable future?
► Automatic polyp and adenoma detection could
be the future of diagnostic colonoscopy in order
to achieve stable high adenoma detection rates.
► However, the effect on ultimate outcome is
still unclear, and further improvements such as
polyp differentiation have to be implemented.
on17March2019byguest.Protectedbycopyright.http://gut.bmj.com/Gut:firstpublishedas10.1136/gutjnl-2018-317500on27February2019.Downloadedfrom
l e PQ QO U Z MPQZ YM PQ QO U Z ‘
q Hc daVTe gV J;L ! 5)(-03 deR URcU5-+.$ ;95- 
q
q MPQZ YM PQ QO U Z M Q m2 1)
 gd (+
 !a4((()
q PQ QO U Z MPQZ YM m2 (-+ gd (+) !a4((()
q PUYUZa UbQ MPQZ YM 2 )0- gd )( !a4((()
q Te Q M UO e m2 )), gd - !a4((()
Endoscopy
0.5cm and polyps in all segments of the colon with the excep-
tion of the caecum and the ascending colon (table 3).
Outcomes in excellent bowel preparation (BBPS ≥7)
In the situation of excellent bowel preparation, ADR in the
CADe group showed a trend of 6% increase superior to that of
the routine group. However, due to the inadequate sample size
of the subgroup analysis, it failed to show a statistically signifi-
cant difference. Other outcomes, including the mean number of
detected adenomas, mean number of detected polyps and PDR
were all significantly increased in the CADe group (table 4).
can still be missed. Studies have also reported that some polyps
are missed by the endoscopist despite being within the visual
field.31 32
Several hypotheses have been proposed to explain
the mechanism by which polyps may be missed. These include
differences in endoscopist skill level, differences in endoscopist
tracking patterns, ‘inattentional blindness’, wherein an observer
fails to process an image on the screen due to distraction, and
‘change blindness’, wherein changes are missed during inter-
ruptions in visual scanning or during eye movements13.33–37
Distraction caused by fatigue or emotional factors may also
contribute. A second party such as a nurse or a trainee observing
may improve PDR. While several studies have shown that this
increases PDR, controversy remains regarding ADR.13–15
It is
Table 3 Polyp and adenoma detection
Routine colonoscopy
(n=536)
CADe colonoscopy
(n=522) P value* FC/OR 95%CI
PDR 0.291 0.4502 0.001 1.995† 1.532 to 2.544
ADR 0.2034 0.2912 0.001 1.61† 1.213 to 2.135
Mean number of detected polyp 0.5019 0.954 0.001 1.89‡ 1.63 to 2.192
Mean number of detected adenoma 0.3097 0.5326 0.001 1.72‡ 1.419 to 2.084
*P value from χ2
test (or Fisher’s exact test, as appropriate) or t-test.
†OR.
‡FC.
ADR, adenoma detection rate; FC, fold change; PDR, polyp detection rate.
s
1Wu L, et al. Gut 2019;0:1–9. doi:10.1136/gutjnl-2018-317366
Endoscopy
ORIGINAL ARTICLE
Randomised controlled trial of WISENSE, a real-time
quality improving system for monitoring blind spots
during esophagogastroduodenoscopy
Lianlian Wu,1,2,3
Jun Zhang,1,2,3
Wei Zhou,1,2,3
Ping An,1,2,3
Lei Shen,1,2,3
Jun Liu,1,3
Xiaoda Jiang,1,2,3
Xu Huang,1,2,3
Ganggang Mu,1,2,3
Xinyue Wan,  1,2,3
Xiaoguang Lv,1,2,3
Juan Gao,1,3
Ning Cui,1,2,3
Shan Hu,4
Yiyun Chen,4
Xiao Hu,4
Jiangjie Li,4
Di Chen,1,2,3
Dexin Gong,1,2,3
Xinqi He,1,2,3
Qianshan Ding,1,2,3
Xiaoyun Zhu,1,2,3
Suqin Li,1,2,3
Xiao Wei,1,2,3
Xia Li,1,2,3
Xuemei Wang,1,2,3
Jie Zhou,1,2,3
Mengjiao Zhang,1,2,3
Hong Gang Yu  1,2,3
To cite: Wu L, Zhang J,
Zhou W, et al. Gut Epub
ahead of print: [please
include Day Month Year].
doi:10.1136/
gutjnl-2018-317366
► Additional material is
published online only.To view
please visit the journal online
(http://dx.doi.org/10.1136/
gutjnl-2018-317366).
For numbered affiliations see
end of article.
Correspondence to
Professor Hong
Gang Yu, Department of
Gastroenterology, Renmin
Hospital of Wuhan University,
Wuhan 430060, China;
yuhonggang1968@163.com
LW, JZ and WZ contributed
equally.
Received 10 August 2018
Revised 28 January 2019
Accepted 17 February 2019
© Author(s) (or their
employer(s)) 2019. Re-use
permitted under CC BY-NC. No
commercial re-use. See rights
and permissions. Published
by BMJ.
ABSTRACT
Objective Esophagogastroduodenoscopy (EGD)
is the pivotal procedure in the diagnosis of upper
gastrointestinal lesions. However, there are significant
variations in EGD performance among endoscopists,
impairing the discovery rate of gastric cancers and
precursor lesions.The aim of this study was to construct a
real-time quality improving system,WISENSE, to monitor
blind spots, time the procedure and automatically
generate photodocumentation during EGD and thus raise
the quality of everyday endoscopy.
Design WISENSE system was developed using the
methods of deep convolutional neural networks and
deep reinforcement learning. Patients referred because
of health examination, symptoms, surveillance were
recruited from Renmin hospital of Wuhan University.
Enrolled patients were randomly assigned to groups
that underwent EGD with or without the assistance
of WISENSE.The primary end point was to ascertain if
there was a difference in the rate of blind spots between
WISENSE-assisted group and the control group.
Results WISENSE monitored blind spots with an accuracy
of 90.40% in real EGD videos.A total of 324 patients were
recruited and randomised. 153 and 150 patients were
analysed in the WISENSE and control group, respectively.
Blind spot rate was lower in WISENSE group compared
with the control (5.86% vs 22.46%, p0.001), and
the mean difference was −15.39% (95% CI −19.23 to
−11.54).There was no significant adverse event.
Conclusions WISENSE significantly reduced blind spot
rate of EGD procedure and could be used to improve the
quality of everyday endoscopy.
Trial registration number ChiCTR1800014809;
Results.
INTRODUCTION
Esophagogastroduodenoscopy (EGD) is the pivotal
procedure in the diagnosis of upper gastrointestinal
lesions.1
High-quality endoscopy delivers better health
outcomes.2
However, there are significant variations in
EGD performance among endoscopists, impairing the
discovery rate of gastric cancers (GC) and precursor
lesions.3
The diagnosis rate of early GC in China is still
Significance of this study
What is already known on this subject?
► The past decades have seen remarkable progress
of deep convolutional neural network (DCNN)
in the field of endoscopy. Recent studies have
successfully used DCNN to achieve accurate
prediction of early gastric cancer in endoscopic
images and real-time histological classification
of colon polyps in unprocessed videos. However,
it has yet not been investigated whether DCNN
could be used in monitoring quality of everyday
endoscopy.
What are the new findings?
► In the present study,WISENSE, a real-time
quality improving system based on the DCNN
and deep reinforcement learning (DRL) for
monitoring blind spots, timing the procedure
and generating photodocumentation during
esophagogastroduodenoscopy (EGD) was
developed.The performance ofWISENSE was
verified in EGD videos.A single-centre randomised
controlled trial was conducted to evaluate the
hypothesis thatWISENSE would reduce the rate
of blind spots during EGD.To the best of our
knowledge, this is the first study using deep
learning in the field of assuring endoscopy
completeness and using DRL in making medical
decisions in human body environment and also
the first study validating the efficiency of a deep
learning system in a randomised controlled
trial.
How might it impact on clinical practice in the
foreseeable future?
► WISENSE greatly reduced blind spot rate,
increased inspection time and improved the
completeness of photodocumentation of EGD
in the randomised controlled trial. It could be
a powerful assistant tool for mitigating skill
variations among endoscopists and improving the
quality of everyday endoscopy.
on26March2019byguest.Protectedbycopyright.http://gut.bmj.com/Gut:firstpublishedas10.1136/gutjnl-2018-317366on11March2019.Downloadedfrom
8:7 s
m0 %
puted as new data became avail
when his or her score crossed t
dation set, the AUC obtained f
0.81 to 0.85) (Fig. 2). At a spec
of 0.33], TREWScore achieved a s
a median of 28.2 hours (IQR, 10
Identification of patients b
A critical event in the developme
related organ dysfunction (seve
been shown to increase after th
more than two-thirds (68.8%) o
were identified before any sepsi
tients were identified a median
(Fig. 3B).
Comparison of TREWScore
Weevaluatedtheperformanceof
methods for the purpose of provid
use of TREWScore. We first com
to MEWS, a general metric used
of catastrophic deterioration (17
oped for tracking sepsis, MEWS
tion of patients at risk for severe
Fig. 2. ROC for detection of septic shock before onset in the validation
set. The ROC curve for TREWScore is shown in blue, with the ROC curve for
MEWS in red. The sensitivity and specificity performance of the routine
screening criteria is indicated by the purple dot. Normal 95% CIs are shown
for TREWScore and MEWS. TPR, true-positive rate; FPR, false-positive rate.
R E S E A R C H A R T I C L E
A targeted real-time early warning score (TREWScore)
for septic shock
AUC=0.83
At a specificity of 0.67,TREWScore achieved a sensitivity of 0.85 

and identified patients a median of 28.2 hours before onset.
q • t !9$ : 
q92 =OK ++(
$ E=OK (+

q:2 =OK , /
$ E=OK ,(

(source: VUNO)
APPH(Alarms Per Patients Per Hour)
(source: VUNO)
Less False Alarm
’
ARTICLE OPEN
Scalable and accurate deep learning with electronic health
records
Alvin Rajkomar 1,2
, Eyal Oren1
, Kai Chen1
, Andrew M. Dai1
, Nissan Hajaj1
, Michaela Hardt1
, Peter J. Liu1
, Xiaobing Liu1
, Jake Marcus1
,
Mimi Sun1
, Patrik Sundberg1
, Hector Yee1
, Kun Zhang1
, Yi Zhang1
, Gerardo Flores1
, Gavin E. Duggan1
, Jamie Irvine1
, Quoc Le1
,
Kurt Litsch1
, Alexander Mossin1
, Justin Tansuwan1
, De Wang1
, James Wexler1
, Jimbo Wilson1
, Dana Ludwig2
, Samuel L. Volchenboum3
,
Katherine Chou1
, Michael Pearson1
, Srinivasan Madabushi1
, Nigam H. Shah4
, Atul J. Butte2
, Michael D. Howell1
, Claire Cui1
,
Greg S. Corrado1
and Jeffrey Dean1
Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare
quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR
data, a labor-intensive process that discards the vast majority of information in each patient’s record. We propose a representation
of patients’ entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that
deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple
centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two US academic
medical centers with 216,221 adult patients hospitalized for at least 24 h. In the sequential format we propose, this volume of EHR
data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for
tasks such as predicting: in-hospital mortality (area under the receiver operator curve [AUROC] across sites 0.93–0.94), 30-day
unplanned readmission (AUROC 0.75–0.76), prolonged length of stay (AUROC 0.85–0.86), and all of a patient’s final discharge
diagnoses (frequency-weighted AUROC 0.90). These models outperformed traditional, clinically-used predictive models in all cases.
We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios. In a case
study of a particular prediction, we demonstrate that neural networks can be used to identify relevant information from the
patient’s chart.
npj Digital Medicine (2018)1:18 ; doi:10.1038/s41746-018-0029-1
INTRODUCTION
The promise of digital medicine stems in part from the hope that,
by digitizing health data, we might more easily leverage computer
information systems to understand and improve care. In fact,
routinely collected patient healthcare data are now approaching
the genomic scale in volume and complexity.1
Unfortunately,
most of this information is not yet used in the sorts of predictive
statistical models clinicians might use to improve care delivery. It
is widely suspected that use of such efforts, if successful, could
provide major benefits not only for patient safety and quality but
also in reducing healthcare costs.2–6
In spite of the richness and potential of available data, scaling
the development of predictive models is difficult because, for
traditional predictive modeling techniques, each outcome to be
predicted requires the creation of a custom dataset with specific
variables.7
It is widely held that 80% of the effort in an analytic
model is preprocessing, merging, customizing, and cleaning
nurses, and other providers are included. Traditional modeling
approaches have dealt with this complexity simply by choosing a
very limited number of commonly collected variables to consider.7
This is problematic because the resulting models may produce
imprecise predictions: false-positive predictions can overwhelm
physicians, nurses, and other providers with false alarms and
concomitant alert fatigue,10
which the Joint Commission identified
as a national patient safety priority in 2014.11
False-negative
predictions can miss significant numbers of clinically important
events, leading to poor clinical outcomes.11,12
Incorporating the
entire EHR, including clinicians’ free-text notes, offers some hope
of overcoming these shortcomings but is unwieldy for most
predictive modeling techniques.
Recent developments in deep learning and artificial neural
networks may allow us to address many of these challenges and
unlock the information in the EHR. Deep learning emerged as the
preferred machine learning approach in machine perception
www.nature.com/npjdigitalmed
m0
Table 3 List of variants identified as actionable by 3 different platforms
Gene Variant
Identified variant Identified associated drugs
NYGC WGA FO NYGC WGA FO
CDKN2A Deletion Yes Yes Yes Palbociclib, LY2835219
LEE001
Palbociclib LY2835219 Clinical trial
CDKN2B Deletion Yes Yes Yes Palbociclib, LY2835219
LEE002
Palbociclib LY2835219 Clinical trial
EGFR Gain (whole arm) Yes — — Cetuximab — —
ERG Missense P114Q Yes Yes — RI-EIP RI-EIP —
FGFR3 Missense L49V Yes VUS — TK-1258 — —
MET Amplification Yes Yes Yes INC280 Crizotinib, cabozantinib Crizotinib, cabozantinib
MET Frame shift R755fs Yes — — INC280 — —
MET Exon skipping Yes — — INC280 — —
NF1 Deletion Yes — — MEK162 — —
NF1 Nonsense R461* Yes Yes Yes MEK162 MEK162, cobimetinib,
trametinib, GDC-0994
Everolimus, temsirolimus,
trametinib
PIK3R1 Insertion
R562_M563insI
Yes Yes — BKM120 BKM120, LY3023414 —
PTEN Loss (whole arm) Yes — — Everolimus, AZD2014 — —
STAG2 Frame shift R1012 fs Yes Yes Yes Veliparib, clinical trial Olaparib —
DNMT3A Splice site 2083-1G.C — — Yes — — —
TERT Promoter-146C.T Yes — Yes — — —
ABL2 Missense D716N Germline NA VUS
mTOR Missense H1687R Germline NA VUS
NPM1 Missense E169D Germline NA VUS
NTRK1 Missense G18E Germline NA VUS
PTCH1 Missense P1250R Germline NA VUS
TSC1 Missense G1035S Germline NA VUS
Abbreviations: FO 5 FoundationOne; NYGC 5 New York Genome Center; RNA-seq 5 RNA sequencing; WGA 5 Watson Genomic Analytics; WGS 5 whole-
genome sequencing.
Genes, variant description, and, where appropriate, candidate clinically relevant drugs are listed. Variants identified by the FO as variants of uncertain
significance (VUS) were identified by the NYGC as germline variants.
v ! 
0
50
100
150
200
w/o AI w/ AI
0
50
100
150
200
w/o AI w/ AI
188m
154m
180m
108m
saving 40%
of time
saving 18%
of time
4 5
 
o p m
q 2 ((
q 92 !-(( 
q :2 ! ( 
q 2 !)0 $ , 
q 2 NMFG
AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380.
Digital Healthcare Institute
Director,Yoon Sup Choi, PhD
yoonsup.choi@gmail.com
 
isolated diagnostic tasks. Underlying these exciting advances,
however, is the important notion that these algorithms do not
replace the breadth and contextual knowledge of human
pathologists and that even the best algorithms would need to
from 83% to 91% and resulted in higher overall diagnostic
accuracy than that of either unassisted pathologist inter-
pretation or the computer algorithm alone. Although deep
learning algorithms have been credited with comparable
Unassisted Assisted
TimeofReview(seconds)
Timeofreviewperimage(seconds)
Negative ITC Micromet Macromet
A B
p=0.002
p=0.02
Unassisted
Assisted
Micrometastases
FIGURE 5. Average review time per image decreases with assistance. A, Average review time per image across all pathologists
analyzed by category. Black circles are average times with assistance, gray triangles represent average times without assistance.
Error bars indicate 95% confidence interval. B, Micrometastasis time of review decreases for nearly all images with assistance.
Circles represent average review time for each individual micrometastasis image, averaged across the 6 pathologists by assistance
modality. The dashed lines connect the points corresponding to the same image with and without assistance. The 2 images that
were not reviewed faster on average with assistance are represented with red dot-dash lines. Vertical lines of the box represent
quartiles, and the diamond indicates the average of review time for micrometastases in that modality. Micromet indicates mi-
crometastasis; macromet, macrometastasis.
8 | www.ajsp.com Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved.
l o Q UYMSQ
l AQSM UbQ UO YQ x p
l UO YQ ( m p
l G6  M QP GaY 6Q AQSM UbQ •
referred from Dr. Eric Topol on Twitter
Feedback/Questions
• Email: yoonsup.choi@gmail.com
• Blog: http://www.yoonsupchoi.com
• Facebook: Yoon Sup Choi
Some more backups…
3
Supervised autonomous robotic soft tissue surgery
;Y UcV nd FRe R @VR eY
KjdeV z
K Rce L ddfV 9fe fd
J S e !KL9J
Azad Shademan et al. Sci Transl Med 2016
Azad Shademan et al. Sci Transl Med 2016
Supervised autonomous robotic soft tissue surgery
Azad Shademan et al. Sci Transl Med 2016
l8d bUb UZ bUb QZP QZP FG4E
q t !GH=F
q…x ! RaRc dT aj$ D9H
q J9K
l
q v !daRT X
q t …x t ! VR acVddfcV
q ! f SVc W deR Vd
q v !T a Ve e V
q x ! f V cVUfTe 
suture placement compared to other techniques (table S1). Moreover,
leak pressure reflects the functional quality of suturing. The linear
closure from STAR was able to withstand a higher average leak pressure
than all other techniques (Fig. 2B).
suturing tool maneuvers before piercing. Using the NIRF markers as
reference points, the plan interpolated intermediate suture placements
on the bowel and adjusted placement of each suture, knot, and corner
slidetoaccommodatedeformationsandinducedscenerotations(Fig.1F).
Fig. 2. Ex vivo linear suturing under deformations. The experiment con-
sisted of closing a longitudinal cut along pig intestine, whereas the tissue
was deformed by pulling on stay sutures. Five samples were tested per tech-
nique (OPEN, LAP, RAS, and STAR). (A) Suture spacing. Central mark is the
median; box edges are the 25th and 75th percentiles, error bars are the range
excluding outliers, and red dots are outliers. The whiskers represent the range
not including outliers. There is a different N number for each boxplot because
eachsurgeonuseda different number of sutures [OPEN (n =174), LAP(n= 128),
RAS (n = 176), and STAR (n = 206)]. These data are presented numerically in
table S2, including the SDs. P values determined by ANOVA with post hoc
Games-Howell. (B and C) Leak pressures and number of mistakes (reposi-
tioned stitches or robot reboot). Data are from individual tissue samples
(n = 5) with averages marked by a horizontal line. P values determined by
independent samples t test. (D) Completion times separated into knot-tying
and suturing, and other time was spent restaging or changing sutures. Data
are averages (n = 5). P values determined by independent samples t test.
www.ScienceTranslationalMedicine.org 4 May 2016 Vol 8 Issue 337 337ra64 3
nMay7,2016
Azad Shademan et al. Sci Transl Med 2016
maining circumference (Fig. 3
ing that different levels of auto
be used effectively for differ
Overall, 57.8% of the procedu
fully autonomously with no a
Alternatively, in the current s
autonomous mode without an
teraction would require suture
in 42.2% of sutures placed, m
corners. The completion tim
also included supervisory ac
surgeon, which accounted f
the total time (7% for suture
justment, 3.3% for confirmati
location, and 2.6% for mistake
In vivo end-to-end anast
Finally, we performed in vivo
autonomous surgery in pig in
cessed through a laparotomy
(n = 4) and compared these a
an OPEN control (n = 1) (fig
used the same suture algorith
ex vivo trials (Fig. 1, G and
OPEN control, the surgeon us
surgical hand tools to open th
exposed the intestine, and sutu
a transverse incision. The av
STAR procedure time was 50.0
where 77.4% was anastomos
22.6% was restaging time be
and front walls, which inclu
2.16 min for marking the tis
B and E, and Table 1). Al
OPEN timewasonly 8min,th
was comparable to the averag
laparoscopic anastomoses that
30 min for vesicourethral (25
for aortic (26), to 90 min for
constructions (27).
No complications were obs
Fig. 3. End-to-end anastomosis ex vivo. The experiment
consisted of closing a transverse cut in pig intestine. Five
samples were tested per technique (OPEN, LAP, RAS, and STAR).
(A) Suture spacing. Central mark is the median; box edges are
the 25th and 75th percentiles; and red dots are outliers. The
whiskers represent the range not including outliers. There is a
different N number for each boxplot because each surgeon
used a different number of sutures [OPEN (n = 138), LAP (n =
98), RAS (n = 132), and STAR (n = 180)]. The average spacing
betweenconsecutive sutures was calculated and compared be-
tween STAR and other modalities. The variance of suture
spacing is presented numerically in table S2, including the SD.
P values determined by ANOVA with post hoc Games-Howell. (B) Exvivo end-to-end anastomosis leak pressures.
Dataareindividualtissuesamples,withmeansdisplayedashorizontallines(n=4to5).Onesample was sutured
closed and thus could not be tested for leak pressure. P values determined by independent samples t test.
(C) The leak pressure as a function of maximum suture spacing. Data are individual tissue samples that were fit
to a rational function (y = 0.854/x) (n = 4 to 5). (D) Number of mistakes (repositioned stitches or robot reboot).
Data are individual tissue samples with means displayed as horizontal lines (n = 5). P values determined by
independent samples t test. (E) Ex vivo end-to-end anastomosis completion times. Average times for n = 5
tissue samples per procedure are divided into subtasks of knots and running sutures. “Other” time was spent
restaging and changing sutures. Pvalues determined byindependent samplesttest.(F) Percentreductionin
l8d bUb UZ bUb QZP QZP FG4E
q t !GH=F
q…x ! RaRc dT aj$ D9H
q J9K
l
q v !daRT X
q t …x t ! VR acVddfcV
q ! f SVc W deR Vd
q v !T a Ve e V
q x ! f V cVUfTe
Digital Psychiatrist?
BeyondVerbal: Reading emotions from voices
http://www.wsj.com/articles/SB10001424052702303824204579421242295627138
BeyondVerbal
q t w 7
q 2 ' ' w
q t
q 9Ve R ()
q .
q
• linguistic
• identification and extraction of
word instances (unigrams) and
word-pair instances (bi-grams)
from the transcriptions
• acoustic
• vocal dynamics
• voice quality
• vocal tract resonance frequencies
• pause lengths
A Machine Learning Approach to Identifying the
Thought Markers of Suicidal Subjects:A
Prospective Multicenter Trial
• “Do you have hope?”
• “Do you have any fear?”
• “Do you have any secrets?”
• “Are you angry?”
• “Does it hurt emotionally?”
Pestian, Suicide and Life-Threatening Behavior, 2016
A Machine Learning Approach to Identifying the
Thought Markers of Suicidal Subjects:A
Prospective Multicenter Trial
SensitivitySensitivity
1.00.00.20.40.60.81.00.00.20.40.60.81.0
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
SUICIDE THOUGHT MARKERS
SensitivitySensitivitySensitivity
0.00.20.40.60.81.00.00.20.40.60.81.00.00.20.40.60.81.0
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
Figure 1. Receiver operator curve (ROC): suicide versus control (upper), suicide versus mentally ill (middle), and
SensitivitySensitivity
0.00.20.40.60.81.00.00.20.40.60.81.0
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
Figure 1. Receiver operator curve (ROC): suicide versus control (upper), suicide versus mentally
suicide versus mentally ill with control. The ROC curves for adolescents (blue), adults (red), and a
generated where the nonsuicidal population is controls (top), mentally ill (middle), and mentally
using linguistic and acoustic features. The gray line is the AROC curve for a baseline (random) cla
TABLE 2
The AROC for the Machine Learning Algorithm. The Nonsuicidal Group Comprises of Either Mentally Ill and Control Subjects. Classification
Performances are Shown for Adolescents, Adults, and the Combined Adolescent and Adult Cohorts
Suicidal versus Controls Suicidal versus Mentally Ill Suicidal versus Mentally Ill and Controls
Adolescents
ROC (SD)
Adults
ROC (SD)
Adolescents +
Adults
ROC (SD)
Adolescents
ROC (SD)
Adults
ROC (SD)
Adolescents +
Adults
ROC (SD)
Adolescents
ROC (SD)
Adults
ROC (SD)
Adolescents +
Adults
ROC (SD)
Linguistics 0.87 (0.04) 0.91 (0.02) 0.93 (0.02) 0.82 (0.05) 0.77 (0.04) 0.79 (0.03) 0.82 (0.04) 0.84 (0.03) 0.87 (0.02)
Acoustics 0.74 (0.05) 0.82 (0.03) 0.79 (0.03) 0.69 (0.06) 0.74 (0.04) 0.76 (0.03) 0.74 (0.05) 0.80 (0.03) 0.76 (0.03)
Linguistics +
Acoustics
0.83 (0.05) 0.93 (0.02) 0.92 (0.02) 0.80 (0.05) 0.77 (0.04) 0.82 (0.03) 0.81 (0.04) 0.84 (0.03) 0.87 (0.02)
PESTIANETAL.
Suicidal vs. Control Suicidal vs. Mentally Ill Suicidal vs. Mentally Ill and Controls
adolescents
adults
Pestian, Suicide and Life-Threatening Behavior, 2016
l
q $ $ w
q aY X TR WVRefcV
qY XY V VcXj h cU2 x ' t

q ,(.
q ) 

l/ QOU U Z m
Detection of the Prodromal Phase of Bipolar Disorder from
Psychological and Phonological Aspects in Social Media
Yen-Hao Huang
National Tsing Hua University
Hsinchu, Taiwan
yenhao0218@gmail.com
Lin-Hung Wei
National Tsing Hua University
Hsinchu, Taiwan
adeline80916@gmail.com
Yi-Shin Chen
National Tsing Hua University
Hsinchu, Taiwan
yishin@gmail.com
ABSTRACT
Seven out of ten people with bipolar disorder are initially
misdiagnosed and thirty percent of individuals with bipolar
disorder will commit suicide. Identifying the early phases of
the disorder is one of the key components for reducing the
full development of the disorder. In this study, we aim at
leveraging the data from social media to design predictive
models, which utilize the psychological and phonological fea-
tures, to determine the onset period of bipolar disorder and
provide insights on its prodrome. This study makes these dis-
coveries possible by employing a novel data collection process,
coined as Time-specific Subconscious Crowdsourcing, which
helps collect a reliable dataset that supplements diagnosis
information from people suffering from bipolar disorder. Our
experimental results demonstrate that the proposed models
could greatly contribute to the regular assessments of people
with bipolar disorder, which is important in the primary care
setting.
KEYWORDS
Bipolar Disorder Detection, Mental Disorder, Prodromal
Phrase, Emotion Analysis, Sentiment Analysis, Phonology,
Social Media
1 INTRODUCTION
Bipolar disorder (BD) is a common mental illness charac-
terized by recurrent episodes of mania/hypomania and de-
pression, which is found among all ages, races, ethnic groups
and social classes. The regular assessment of people with
BD is an important part of its treatment, though it may be
very time-consuming [21]. There are many beneficial treat-
ments for the patients, particularly for delaying relapses. The
identification of early symptoms is significant for allowing
early intervention and reducing the multiple adverse conse-
quences of a full-blown episode. Despite the importance of
the detection of prodromal symptoms, there are very few
studies that have actually examined the ability of relatives to
detect these symptoms in BD patients. [20] For the purpose
of early treatment, the challenge leads to: how to identify
the prodrome period of BD. Current studies are thus
aimed at detecting prodromes and analyzing the prodromal
symptoms of manic recurrence in clinics.
With regards to the symptom of social isolation, people
are increasingly turning to popular social media, such as
Facebook and Twitter, to share their illness experiences or
seek advice from others with similar mental health conditions.
As the information is being shared in public, people are
subconsciously providing rich contents about their states
of mind. In this paper, we refer to this sharing and data
collection as time-specific subconscious crowdsourcing.
In this study, we carefully look at patients who have been
diagnosed with BD and who explicitly indicate the diagnosis
and time of diagnosis on Twitter. Our goal is to both predict
whether BD rises on a given period of time, and to discover
the prodromal period for BD. It’s important to clarify that
our goal doesn’t seek to offer a diagnosis but rather to make
a prediction of which users are likely to be suffering from the
BD. The main contributions of our work are:
• Introducing the concept of time-specific subconscious
crowdsourcing, which can aid in locating the social
network behavior data of BD patients with the corre-
sponding time of diagnosis.
• A BD assessment mechanism that differentiates be-
tween prodromal symptoms and acute symptoms.
• Introducing the phonological features into the assess-
ment mechanism, which allows for the possibility to
assess patients through text only.
• An automatic recognition approach that detects the
possible prodromal period for BD.
2 RELATED WORK
Social media resources have been widely utilized by researchers
to study mental health issues. The following literature em-
phasizes on data collection and feature engineering, including
subject recruitment, manual data collection, data collection
applications, keyword matching, and combined approaches.
The clinical approach for mental disorders and prodrome
studies are also discussed in this section.
Subject recruitment: Based on customized question-
naires and contact with subjects, Park et al. [15] recruited
participants for the Center for Epidemiologic Studies Depres-
sion scale(CES-D) [17] and provided their Twitter data. By
analyzing the information contained in tweets, participants
were divided into normal and depressive groups based on
their scores on CES-D. An approach like this one requires ex-
pensive costs to acquire data and conduct the questionnaire.
Manual and automatic data collecting: Moreno et
al. [14] collected data via the Facebook profiles of college stu-
dents reviewed by two investigators. They aimed at revealing
the relationship between demographic factors and depression.
Similarly, in our work, we invest on manual efforts to collect
and properly annotate our dataset. In addition, there are
many applications built on top of social networks that provide
free services where users may need to input their credentials
arXiv:1712.09183v1[cs.IR]26Dec2017
Detection of the Prodromal Phase of Bipolar Disorder from
Psychological and Phonological Aspects in Social Media
Yen-Hao Huang
National Tsing Hua University
Hsinchu, Taiwan
yenhao0218@gmail.com
Lin-Hung Wei
National Tsing Hua University
Hsinchu, Taiwan
adeline80916@gmail.com
Yi-Shin Chen
National Tsing Hua University
Hsinchu, Taiwan
yishin@gmail.com
ABSTRACT
Seven out of ten people with bipolar disorder are initially
misdiagnosed and thirty percent of individuals with bipolar
disorder will commit suicide. Identifying the early phases of
the disorder is one of the key components for reducing the
full development of the disorder. In this study, we aim at
leveraging the data from social media to design predictive
models, which utilize the psychological and phonological fea-
tures, to determine the onset period of bipolar disorder and
provide insights on its prodrome. This study makes these dis-
coveries possible by employing a novel data collection process,
coined as Time-specific Subconscious Crowdsourcing, which
helps collect a reliable dataset that supplements diagnosis
information from people suffering from bipolar disorder. Our
experimental results demonstrate that the proposed models
could greatly contribute to the regular assessments of people
with bipolar disorder, which is important in the primary care
setting.
KEYWORDS
Bipolar Disorder Detection, Mental Disorder, Prodromal
Phrase, Emotion Analysis, Sentiment Analysis, Phonology,
Social Media
1 INTRODUCTION
Bipolar disorder (BD) is a common mental illness charac-
terized by recurrent episodes of mania/hypomania and de-
pression, which is found among all ages, races, ethnic groups
and social classes. The regular assessment of people with
BD is an important part of its treatment, though it may be
very time-consuming [21]. There are many beneficial treat-
ments for the patients, particularly for delaying relapses. The
identification of early symptoms is significant for allowing
early intervention and reducing the multiple adverse conse-
quences of a full-blown episode. Despite the importance of
the detection of prodromal symptoms, there are very few
studies that have actually examined the ability of relatives to
detect these symptoms in BD patients. [20] For the purpose
of early treatment, the challenge leads to: how to identify
the prodrome period of BD. Current studies are thus
aimed at detecting prodromes and analyzing the prodromal
symptoms of manic recurrence in clinics.
With regards to the symptom of social isolation, people
are increasingly turning to popular social media, such as
Facebook and Twitter, to share their illness experiences or
seek advice from others with similar mental health conditions.
As the information is being shared in public, people are
subconsciously providing rich contents about their states
of mind. In this paper, we refer to this sharing and data
collection as time-specific subconscious crowdsourcing.
In this study, we carefully look at patients who have been
diagnosed with BD and who explicitly indicate the diagnosis
and time of diagnosis on Twitter. Our goal is to both predict
whether BD rises on a given period of time, and to discover
the prodromal period for BD. It’s important to clarify that
our goal doesn’t seek to offer a diagnosis but rather to make
a prediction of which users are likely to be suffering from the
BD. The main contributions of our work are:
• Introducing the concept of time-specific subconscious
crowdsourcing, which can aid in locating the social
network behavior data of BD patients with the corre-
sponding time of diagnosis.
• A BD assessment mechanism that differentiates be-
tween prodromal symptoms and acute symptoms.
• Introducing the phonological features into the assess-
ment mechanism, which allows for the possibility to
assess patients through text only.
• An automatic recognition approach that detects the
possible prodromal period for BD.
2 RELATED WORK
Social media resources have been widely utilized by researchers
to study mental health issues. The following literature em-
phasizes on data collection and feature engineering, including
subject recruitment, manual data collection, data collection
applications, keyword matching, and combined approaches.
The clinical approach for mental disorders and prodrome
studies are also discussed in this section.
Subject recruitment: Based on customized question-
naires and contact with subjects, Park et al. [15] recruited
participants for the Center for Epidemiologic Studies Depres-
sion scale(CES-D) [17] and provided their Twitter data. By
analyzing the information contained in tweets, participants
were divided into normal and depressive groups based on
their scores on CES-D. An approach like this one requires ex-
pensive costs to acquire data and conduct the questionnaire.
Manual and automatic data collecting: Moreno et
al. [14] collected data via the Facebook profiles of college stu-
dents reviewed by two investigators. They aimed at revealing
the relationship between demographic factors and depression.
Similarly, in our work, we invest on manual efforts to collect
and properly annotate our dataset. In addition, there are
many applications built on top of social networks that provide
free services where users may need to input their credentials
arXiv:1712.09183v1[cs.IR]26Dec2017
Wordcloud
Features(#DIM) 2 mths 3 mths 6 mths 9 mths 12 mths
AG(2)
0.475 0.503 0.445 0.434 0.383
Pol(5)
0.911 0.893 0.843 0.836 0.803
Emot(8)
0.893 0.895 0.908 0.917 0.896
Soc(4)
0.941 0.913 0.845 0.834 0.786
LT(1)
0.645 0.589 0.554 0.504 0.513
TRD(1)
0.570 0.638 0.626 0.615 0.654
Phon(8)
0.889 0.880 0.802 0.838 0.821
Table 2: Average Precision of Single Feature Perfor-
mance
Age and Gender
Mood Polarity Features
Emotional Score
Social Feature
Late Tweet Frequency
Tweet Rate Difference
Phonological Feature
Diagnosed time !
 months
 = 2 months
Figure 1: Illness Period Modeling
features are introduced: (1) Word-level features and BD
Pattern of Life features.
3.4.1 Word-level Features. With respect to the linguis-
tic features for BD, the Character n-gram language fea-
tures(CLF) and LIWC metrics are designed to capture it.
The CLF utilizes n-grams to measure the comment words
or phrases used by users. The tf-idf is utilized in our score-
calculating method, the tf is the frequency of an n-gram and
the document d of df is defined as each particular twitter
user k. The formula for the tf-idf is thus given as:
tfidf
(k,⌧,↵)
vn = freq
(k,⌧,↵)
vn ⇥ log
K
1 + freq
(K,⌧,↵)
vn
(1)
The freq
(k)
vn is the frequency of n-gram vn
, which is n 2 {1, 2}
to represent psychological features, su
terns and the behavioral tendency o
polarity, emotion, and social interacti
full BDPLF, there are five categories:
• Age and Gender: Sit et al. [2
effects on BD, indicating that wom
likely to have Bipolar Disorder
than men. We make use of the ag
proposed by Sap et al. [19], whic
social media.
• Mood Polarity Features: Ow
BD patients experience rapid mo
analysis is firstly adapted to obt
larity portrayed by each user’s t
the sentiment of tweets, the onlin
used, based on Go et al.’s work [
the contents of tweets into thre
positive, negative, and neutral.
those three categories into five d
positive ratio, negative ratio, pos
combo, and flips ratio.
• Emotional Scores: Beyond th
tion detection tool proposed by
employed to classify the tweets in
gories: joy, surprise, anticipation,
anger, and fear. The emotion cla
further transformed into emotio
esei,
(k)
⌧,↵
=
ei,
(k)
⌧,↵
ecount
s RQM a Q QOU U Z
“A lot of Syrian refugees have trauma and maybe
this can help them overcome that.” However, he
points out that there is a stigma around
psychotherapy, saying people feel shame about
seeking out psychologists.
As a result he thinks people might feel more
comfortable knowing they are talking to a “robot”
than to a human.
http://www.newyorker.com/tech/elements/the-chatbot-will-see-you-now
q +', t w$ w$
q t
q P 9A $ CRc
qV e %cVT X e R X c eY
q $ $ $ $ ! $

q $ $ 

q Rg U Ka VXV
q 3
q $ w
l J QN
q V eR YVR eY t ! 
q 9 UcVh FX
l J QN
q $ x
q ! $ MA
q FDH !
l J QN
q $ x
q ! $ MA
q FDH !
l J QN
q $ x
q ! $ MA
q FDH !
Original Paper
Delivering Cognitive Behavior Therapy to Young Adults With
Symptoms of Depression and Anxiety Using a Fully Automated
Conversational Agent (Woebot): A Randomized Controlled Trial
Kathleen Kara Fitzpatrick1*
, PhD; Alison Darcy2*
, PhD; Molly Vierhile1
, BA
1
Stanford School of Medicine, Department of Psychiatry and Behavioral Sciences, Stanford, CA, United States
2
Woebot Labs Inc., San Francisco, CA, United States
*
these authors contributed equally
Corresponding Author:
Alison Darcy, PhD
Woebot Labs Inc.
55 Fair Avenue
San Francisco, CA, 94110
United States
Email: alison@woebot.io
Abstract
Background: Web-based cognitive-behavioral therapeutic (CBT) apps have demonstrated efficacy but are characterized by
poor adherence. Conversational agents may offer a convenient, engaging way of getting support at any time.
Objective: The objective of the study was to determine the feasibility, acceptability, and preliminary efficacy of a fully automated
conversational agent to deliver a self-help program for college students who self-identify as having symptoms of anxiety and
depression.
Methods: In an unblinded trial, 70 individuals age 18-28 years were recruited online from a university community social media
site and were randomized to receive either 2 weeks (up to 20 sessions) of self-help content derived from CBT principles in a
conversational format with a text-based conversational agent (Woebot) (n=34) or were directed to the National Institute of Mental
Health ebook, “Depression in College Students,” as an information-only control group (n=36). All participants completed
Web-based versions of the 9-item Patient Health Questionnaire (PHQ-9), the 7-item Generalized Anxiety Disorder scale (GAD-7),
and the Positive and Negative Affect Scale at baseline and 2-3 weeks later (T2).
Results: Participants were on average 22.2 years old (SD 2.33), 67% female (47/70), mostly non-Hispanic (93%, 54/58), and
Caucasian (79%, 46/58). Participants in the Woebot group engaged with the conversational agent an average of 12.14 (SD 2.23)
times over the study period. No significant differences existed between the groups at baseline, and 83% (58/70) of participants
provided data at T2 (17% attrition). Intent-to-treat univariate analysis of covariance revealed a significant group difference on
depression such that those in the Woebot group significantly reduced their symptoms of depression over the study period as
measured by the PHQ-9 (F=6.47; P=.01) while those in the information control group did not. In an analysis of completers,
participants in both groups significantly reduced anxiety as measured by the GAD-7 (F1,54= 9.24; P=.004). Participants’ comments
suggest that process factors were more influential on their acceptability of the program than content factors mirroring traditional
therapy.
Conclusions: Conversational agents appear to be a feasible, engaging, and effective way to deliver CBT.
(JMIR Ment Health 2017;4(2):e19) doi:10.2196/mental.7785
KEYWORDS
conversational agents; mobile mental health; mental health; chatbots; depression; anxiety; college students; digital health
Introduction
Up to 74% of mental health diagnoses have their first onset
particularly common among college students, with more than
half reporting symptoms of anxiety and depression in the
previous year that were so severe they had difficulty functioning
Fitzpatrick et alJMIR MENTAL HEALTH
depression at baseline as measured by the PHQ-9, while
three-quarters (74%, 52/70) were in the severe range for anxiety
as measured by the GAD-7.
Figure 1. Participant recruitment flow.
Table 1. Demographic and clinical variables of participants at baseline.
WoebotInformation control
Scale, mean (SD)
14.30 (6.65)13.25 (5.17)Depression (PHQ-9)
18.05 (5.89)19.02 (4.27)Anxiety (GAD-7)
25.54 (9.58)26.19 (8.37)Positive affect
24.87 (8.13)28.74 (8.92)Negative affect
22.58 (2.38)21.83 (2.24)Age, mean (SD)
Gender, n (%)
7 (21)4 (7)Male
27 (79)20 (55)Female
Ethnicity, n (%)
2 (6)2 (8)Latino/Hispanic
32 (94)22 (92)Non-Latino/Hispanic
28 (82)18 (75)Caucasian
Fitzpatrick et alJMIR MENTAL HEALTH
Delivering Cognitive Behavior Therapy toYoung Adults With
Symptoms of Depression and Anxiety Using a Fully Automated
Conversational Agent (Woebot):A Randomized Controlled Trial
q u dV W%YV a
l 0 RQM UNU U e MOOQ MNU U e Q UYUZM e QRRUOMOe
l - (
l‘ J QN 0 )
l UZR YM U Z Z e 0 )
qGfT V2 H@I%1$ ?9%/
d cPFWoebotInformation-only control
95% CIb
T2a
95% CIb
T2a
0.44.0176.039.74-12.3211.14 (0.71)12.07-15.2713.67 (.81)PHQ-9
0.14.5810.3816.16-18.1317.35 (0.60)15.52-18.5616.84 (.67)GAD-7
0.02.7070.1724.35-29.4126.88 (1.29)23.17-28.8626.02 (1.45)PANAS positive
affect
0.344.9120.9123.54-28.4225.98 (1.24)24.73-30.3227.53 (1.42)PANAS nega-
tive affect
a
Baseline=pooled mean (standard error)
b
95% confidence interval.
c
Cohen d shown for between-subjects effects using means and standard errors at Time 2.
Figure 2. Change in mean depression (PHQ-9) score by group over the study period. Error bars represent standard error.
Preliminary Efficacy
Table 2 shows the results of the primary ITT analyses conducted
on the entire sample. Univariate ANCOVA revealed a significant
treatment effect on depression revealing that those in the Woebot
group significantly reduced PHQ-9 score while those in the
information control group did not (F1,48=6.03; P=.017) (see
Figure 2). This represented a moderate between-groups effect
size (d=0.44). This effect is robust after Bonferroni correction
for multiple comparisons (P=.04). No other significant
between-group differences were observed on anxiety or affect.
Completer Analysis
As a secondary analysis, to explore whether any main effects
existed, 2x2 repeated measures ANOVAs were conducted on
the primary outcome variables (with the exception of PHQ-9)
among completers only. A significant main effect was observed
on GAD-7 (F1,54=9.24; P=.004) suggesting that completers
experienced a significant reduction in symptoms of anxiety
between baseline and T2, regardless of the group to which they
were assigned with a within-subjects effect size of d=0.37. No
main effects were observed for positive (F1,50=.001; P=.951;
d=0.21) or negative affect (F1,50=.06; P=.80; d=0.003) as
measured by the PANAS.
To further elucidate the source and magnitude of change in
depression, repeated measures dependent t tests were conducted
and Cohen d effect sizes were calculated on individual items of
the PHQ-9 among those in the Woebot condition. The analysis
revealed that baseline-T2 changes were observed on the
following items in order of decreasing magnitude: motoric
symptoms (d=2.09), appetite (d=0.65), little interest or pleasure
in things (d=0.44), feeling bad about self (d=0.40), and
concentration (d=0.39), and suicidal thoughts (d=0.30), feeling
down (d=0.14), sleep (d=0.12), and energy (d=0.06).
JMIR Ment Health 2017 | vol. 4 | iss. 2 | e19 | p.6http://mental.jmir.org/2017/2/e19/
(page number not for citation purposes)
XSL•FO
RenderX
Change in mean depression (PHQ-9) score
by group over the study period
l
q ) ),
l USZURUOMZ S a gPURRQ QZOQ
lJ QN C D / p m
q w
q w t !?9%/
lJ QN ( -
l +
l (Y s )s
l( . UBF 

l( . ) J QN .Y 4
lAQc 8Z Q U Q 4 OUM Q A84 m 4ZP Qc AS 4 9aZP
l4 9aZP0 4ZP Qc AS m 4 -+
G.M. Lucas et al. / Computers in Human Behavior 37 (2014) 94–100
m 3
SimSensei/MultiSense (2014)
SimSensei/MultiSense (2014)
It’s only a computer:
Virtual humans increase willingness to disclose
G.M. Lucas et al. / Computers in Human Behavior 37 (2014) 94–100
!T afeVc WcR V !Yf R WcR V
!9A
!LV V% aVcReVU
Method
Frame
It’s only a computer:
Virtual humans increase willingness to disclose
G.M. Lucas et al. / Computers in Human Behavior 37 (2014) 94–100
‘‘How close are you to your family?’’
‘‘Tell me about a situation that you wish you had handled differently.’’
‘‘Tell me about an event, or something that you wish you could erase from your memory.’’
‘‘Tell me about the hardest decision you’ve ever had to make.’’
‘‘Tell me about the last time you felt really happy.’’
‘‘What are you most proud of in your life?’’
‘‘What’s something you feel guilty about?’’
‘‘When was the last time you argued with someone and what was it about?’’
It’s only a computer:
Virtual humans increase willingness to disclose
G.M. Lucas et al. / Computers in Human Behavior 37 (2014) 94–100
0
5
10
15
20
Computer frame Human frame
0
15.25
30.5
45.75
61
Computer frame Human frame
0
0.033
0.065
0.098
0.13
Computer frame Human frame
0
0.3
0.6
0.9
1.2
Computer frame Human frame
Fear of Self-disclosure Impression Management Sadness Displays Willingness to Disclosure
It’s only a computer:
Virtual humans increase willingness to disclose
G.M. Lucas et al. / Computers in Human Behavior 37 (2014) 94–100
‘‘This is way better than talking to a person. I don’t
really feel comfortable talking about personal stuff
to other people.’’
‘‘A human being would be judgmental. I shared a lot
of personal things, and it was because of that.’’
$ t
lj  OM PU bM Oa M QbQZ m w mk
q 2 +/0$ -.
q •
l 466%4 4 m m
qJR U W cVde3 D X de T cVXcVdd 3 ?cRU V e S dde X3 FVfcR Veh c
Can machine-learning improve cardiovascular
risk prediction using routine clinical data?
Stephen F.Weng et al PLoS One 2017
in a sensitivity of 62.7% and PPV of 17.1%. The random forest algorithm resulted in a net
increase of 191 CVD cases from the baseline model, increasing the sensitivity to 65.3% and
PPV to 17.8% while logistic regression resulted in a net increase of 324 CVD cases (sensitivity
67.1%; PPV 18.3%). Gradient boosting machines and neural networks performed best, result-
ing in a net increase of 354 (sensitivity 67.5%; PPV 18.4%) and 355 CVD (sensitivity 67.5%;
PPV 18.4%) cases correctly predicted, respectively.
The ACC/AHA baseline model correctly predicted 53,106 non-cases from 75,585 total non-
cases, resulting in a specificity of 70.3% and NPV of 95.1%. The net increase in non-cases
Table 3. Top 10 risk factor variables for CVD algorithms listed in descending order of coefficient effect size (ACC/AHA; logistic regression),
weighting (neural networks), or selection frequency (random forest, gradient boosting machines). Algorithms were derived from training cohort of
295,267 patients.
ACC/AHA Algorithm Machine-learning Algorithms
Men Women ML: Logistic
Regression
ML: Random Forest ML: Gradient Boosting
Machines
ML: Neural Networks
Age Age Ethnicity Age Age Atrial Fibrillation
Total Cholesterol HDL Cholesterol Age Gender Gender Ethnicity
HDL Cholesterol Total Cholesterol SES: Townsend
Deprivation Index
Ethnicity Ethnicity Oral Corticosteroid
Prescribed
Smoking Smoking Gender Smoking Smoking Age
Age x Total Cholesterol Age x HDL Cholesterol Smoking HDL cholesterol HDL cholesterol Severe Mental Illness
Treated Systolic Blood
Pressure
Age x Total Cholesterol Atrial Fibrillation HbA1c Triglycerides SES: Townsend
Deprivation Index
Age x Smoking Treated Systolic Blood
Pressure
Chronic Kidney Disease Triglycerides Total Cholesterol Chronic Kidney Disease
Age x HDL Cholesterol Untreated Systolic
Blood Pressure
Rheumatoid Arthritis SES: Townsend
Deprivation Index
HbA1c BMI missing
Untreated Systolic
Blood Pressure
Age x Smoking Family history of
premature CHD
BMI Systolic Blood Pressure Smoking
Diabetes Diabetes COPD Total Cholesterol SES: Townsend
Deprivation Index
Gender
Italics: Protective Factors
https://doi.org/10.1371/journal.pone.0174944.t003
PLOS ONE | https://doi.org/10.1371/journal.pone.0174944 April 4, 2017 8 / 14
q 9;;'9@9 t
q $l RSVeVd l l
q $ y 
q;GH$ dVgVcV V eR Vdd$ acVdTc S X W cR T ce T deVc Ud
qec X jTVc UV VgV
Can machine-learning improve cardiovascular
risk prediction using routine clinical data?
Stephen F.Weng et al PLoS One 2017
correctly predicted compared to the baseline ACC/AHA model ranged from 191 non-cases for
the random forest algorithm to 355 non-cases for the neural networks. Full details on classifi-
cation analysis can be found in S2 Table.
Discussion
Compared to an established AHA/ACC risk prediction algorithm, we found all machine-
learning algorithms tested were better at identifying individuals who will develop CVD and
those that will not. Unlike established approaches to risk prediction, the machine-learning
methods used were not limited to a small set of risk factors, and incorporated more pre-exist-
Table 4. Performance of the machine-learning (ML) algorithms predicting 10-year cardiovascular disease (CVD) risk derived from applying train-
ing algorithms on the validation cohort of 82,989 patients. Higher c-statistics results in better algorithm discrimination. The baseline (BL) ACC/AHA
10-year risk prediction algorithm is provided for comparative purposes.
Algorithms AUC c-statistic Standard Error* 95% Confidence
Interval
Absolute Change from Baseline
LCL UCL
BL: ACC/AHA 0.728 0.002 0.723 0.735 —
ML: Random Forest 0.745 0.003 0.739 0.750 +1.7%
ML: Logistic Regression 0.760 0.003 0.755 0.766 +3.2%
ML: Gradient Boosting Machines 0.761 0.002 0.755 0.766 +3.3%
ML: Neural Networks 0.764 0.002 0.759 0.769 +3.6%
*Standard error estimated by jack-knife procedure [30]
https://doi.org/10.1371/journal.pone.0174944.t004
Can machine-learning improve cardiovascular risk prediction using routine clinical data?
q t 9;;'9@9 t 
qFVfcR FVeh c d 9M;5(/., t 
qo +-- t TRcU gRdTf Rc VgV e p
qVVa DVRc X
q?V Ve T W c Re t c d WRTe c
Constructing higher-level
contextual/relational features:
Relationships between epithelial
nuclear neighbors
Relationships between morphologically
regular and irregular nuclei
Relationships between epithelial
and stromal objects
Relationships between epithelial
nuclei and cytoplasm
Characteristics of
stromal nuclei
and stromal matrix
Characteristics of
epithelial nuclei and
epithelial cytoplasm
Building an epithelial/stromal classifier:
Epithelial vs.stroma
classifier
Epithelial vs.stroma
classifier
B
Basic image processing and feature construction:
HE image Image broken into superpixels Nuclei identified within
each superpixel
A
Relationships of contiguous epithelial
regions with underlying nuclear objects
Learning an image-based model to predict survival
Processed images from patients Processed images from patients
C
D
onNovember17,2011stm.sciencemag.orgwnloadedfrom
TMAs contain 0.6-mm-diameter cores (median
of two cores per case) that represent only a small
sample of the full tumor. We acquired data from
two separate and independent cohorts: Nether-
lands Cancer Institute (NKI; 248 patients) and
Vancouver General Hospital (VGH; 328 patients).
Unlike previous work in cancer morphom-
etry (18–21), our image analysis pipeline was
not limited to a predefined set of morphometric
features selected by pathologists. Rather, C-Path
measures an extensive, quantitative feature set
from the breast cancer epithelium and the stro-
ma (Fig. 1). Our image processing system first
performed an automated, hierarchical scene seg-
mentation that generated thousands of measure-
ments, including both standard morphometric
descriptors of image objects and higher-level
contextual, relational, and global image features.
The pipeline consisted of three stages (Fig. 1, A
to C, and tables S8 and S9). First, we used a set of
processing steps to separate the tissue from the
background, partition the image into small regions
of coherent appearance known as superpixels,
find nuclei within the superpixels, and construct
Constructing higher-level
contextual/relational features:
Relationships between epithelial
nuclear neighbors
Relationships between morphologically
regular and irregular nuclei
Relationships between epithelial
and stromal objects
Relationships between epithelial
nuclei and cytoplasm
Characteristics of
stromal nuclei
and stromal matrix
Characteristics of
epithelial nuclei and
epithelial cytoplasm
Epithelial vs.stroma
classifier
Epithelial vs.stroma
classifier
Relationships of contiguous epithelial
regions with underlying nuclear objects
Learning an image-based model to predict survival
Processed images from patients
alive at 5 years
Processed images from patients
deceased at 5 years
L1-regularized
logisticregression
modelbuilding
5YS predictive model
Unlabeled images
Time
P(survival)
C
D
Identification of novel prognostically
important morphologic features
basic cellular morphologic properties (epithelial reg-
ular nuclei = red; epithelial atypical nuclei = pale blue;
epithelial cytoplasm = purple; stromal matrix = green;
stromal round nuclei = dark green; stromal spindled
nuclei = teal blue; unclassified regions = dark gray;
spindled nuclei in unclassified regions = yellow; round
nuclei in unclassified regions = gray; background =
white). (Left panel) After the classification of each
image object, a rich feature set is constructed. (D)
Learning an image-based model to predict survival.
Processed images from patients alive at 5 years after
surgery and from patients deceased at 5 years after
surgery were used to construct an image-based prog-
nostic model. After construction of the model, it was
applied to a test set of breast cancer images (not
used in model building) to classify patients as high
or low risk of death by 5 years.
www.ScienceTranslationalMedicine.org 9 November 2011 Vol 3 Issue 108 108ra113 2
onNovember17,2011stm.sciencemag.orgDownloadedfrom
Digital Pathologist
Sci Transl Med. 2011 Nov 9;3(108):108ra113
Digital Pathologist
Sci Transl Med. 2011 Nov 9;3(108):108ra113
Top stromal features associated with survival.
primarily characterizing epithelial nuclear characteristics, such as
size, color, and texture (21, 36). In contrast, after initial filtering of im-
ages to ensure high-quality TMA images and training of the C-Path
models using expert-derived image annotations (epithelium and
stroma labels to build the epithelial-stromal classifier and survival
time and survival status to build the prognostic model), our image
analysis system is automated with no manual steps, which greatly in-
creases its scalability. Additionally, in contrast to previous approaches,
our system measures thousands of morphologic descriptors of diverse
identification of prognostic features whose significance was not pre-
viously recognized.
Using our system, we built an image-based prognostic model on
the NKI data set and showed that in this patient cohort the model
was a strong predictor of survival and provided significant additional
prognostic information to clinical, molecular, and pathological prog-
nostic factors in a multivariate model. We also demonstrated that the
image-based prognostic model, built using the NKI data set, is a strong
prognostic factor on another, independent data set with very different
SD of the ratio of the pixel intensity SD to the mean intensity
for pixels within a ring of the center of epithelial nuclei
A
The sum of the number of unclassified objects
SD of the maximum blue pixel value for atypical epithelial nuclei
Maximum distance between atypical epithelial nuclei
B
C
D
Maximum value of the minimum green pixel intensity value in
epithelial contiguous regions
Minimum elliptic fit of epithelial contiguous regions
SD of distance between epithelial cytoplasmic and nuclear objects
Average border between epithelial cytoplasmic objects
E
F
G
H
Fig. 5. Top epithelial features. The eight panels in the figure (A to H) each
shows one of the top-ranking epithelial features from the bootstrap anal-
ysis. Left panels, improved prognosis; right panels, worse prognosis. (A) SD
of the (SD of intensity/mean intensity) for pixels within a ring of the center
of epithelial nuclei. Left, relatively consistent nuclear intensity pattern (low
score); right, great nuclear intensity diversity (high score). (B) Sum of the
number of unclassified objects. Red, epithelial regions; green, stromal re-
gions; no overlaid color, unclassified region. Left, few unclassified objects
(low score); right, higher number of unclassified objects (high score). (C) SD
of the maximum blue pixel value for atypical epithelial nuclei. Left, high
score; right, low score. (D) Maximum distance between atypical epithe-
lial nuclei. Left, high score; right, low score. (Insets) Red, atypical epithelial
nuclei; black, typical epithelial nuclei. (E) Minimum elliptic fit of epithelial
contiguous regions. Left, high score; right, low score. (F) SD of distance
between epithelial cytoplasmic and nuclear objects. Left, high score; right,
low score. (G) Average border between epithelial cytoplasmic objects. Left,
high score; right, low score. (H) Maximum value of the minimum green
pixel intensity value in epithelial contiguous regions. Left, low score indi-
cating black pixels within epithelial region; right, higher score indicating
presence of epithelial regions lacking black pixels.
onNovember17,2011stm.sciencemag.orgDownloadedfrom
and stromal matrix throughout the image, with thin cords of epithe-
lial cells infiltrating through stroma across the image, so that each
stromal matrix region borders a relatively constant proportion of ep-
ithelial and stromal regions. The stromal feature with the second
largest coefficient (Fig. 4B) was the sum of the minimum green in-
tensity value of stromal-contiguous regions. This feature received a
value of zero when stromal regions contained dark pixels (such as
inflammatory nuclei). The feature received a positive value when
stromal objects were devoid of dark pixels. This feature provided in-
formation about the relationship between stromal cellular composi-
tion and prognosis and suggested that the presence of inflammatory
cells in the stroma is associated with poor prognosis, a finding con-
sistent with previous observations (32). The third most significant
stromal feature (Fig. 4C) was a measure of the relative border between
spindled stromal nuclei to round stromal nuclei, with an increased rel-
ative border of spindled stromal nuclei to round stromal nuclei asso-
ciated with worse overall survival. Although the biological underpinning
of this morphologic feature is currently not known, this analysis sug-
gested that spatial relationships between different populations of stro-
mal cell types are associated with breast cancer progression.
Reproducibility of C-Path 5YS model predictions on
samples with multiple TMA cores
For the C-Path 5YS model (which was trained on the full NKI data
set), we assessed the intrapatient agreement of model predictions when
predictions were made separately on each image contributed by pa-
tients in the VGH data set. For the 190 VGH patients who contributed
two images with complete image data, the binary predictions (high
or low risk) on the individual images agreed with each other for 69%
(131 of 190) of the cases and agreed with the prediction on the aver-
aged data for 84% (319 of 380) of the images. Using the continuous
prediction score (which ranged from 0 to 100), the median of the ab-
solute difference in prediction score among the patients with replicate
images was 5%, and the Spearman correlation among replicates was
0.27 (P = 0.0002) (fig. S3). This degree of intrapatient agreement is
only moderate, and these findings suggest significant intrapatient tumor
heterogeneity, which is a cardinal feature of breast carcinomas (33–35).
Qualitative visual inspection of images receiving discordant scores
suggested that intrapatient variability in both the epithelial and the
stromal components is likely to contribute to discordant scores for
the individual images. These differences appeared to relate both to
the proportions of the epithelium and stroma and to the appearance
of the epithelium and stroma. Last, we sought to analyze whether sur-
vival predictions were more accurate on the VGH cases that contributed
multiple cores compared to the cases that contributed only a single
core. This analysis showed that the C-Path 5YS model showed signif-
icantly improved prognostic prediction accuracy on the VGH cases
for which we had multiple images compared to the cases that con-
tributed only a single image (Fig. 7). Together, these findings show
a significant degree of intrapatient variability and indicate that increased
tumor sampling is associated with improved model performance.
DISCUSSION
Heat map of stromal matrix
objects mean abs.diff
to neighbors
HE image separated
into epithelial and
stromal objects
A
B
C
Worse
prognosis
Improved
prognosis
Improved
prognosis
Improved
prognosis
Worse
prognosis
Worse
prognosis
Fig. 4. Top stromal features associated with survival. (A) Variability in ab-
solute difference in intensity between stromal matrix regions and neigh-
bors. Top panel, high score (24.1); bottom panel, low score (10.5). (Insets)
Top panel, high score; bottom panel; low score. Right panels, stromal matrix
objects colored blue (low), green (medium), or white (high) according to
each object’s absolute difference in intensity to neighbors. (B) Presence
R E S E A R C H A R T I C L E
onNovember17,2011stm.sciencemag.orgDownloadedfrom
Top epithelial features.The eight panels in the figure (A to H) each
shows one of the top-ranking epithelial features from the bootstrap
anal- ysis. Left panels, improved prognosis; right panels, worse prognosis.
z
P R E C I S I O N M E D I C I N E
Identification of type 2 diabetes subgroups through
topological analysis of patient similarity
Li Li,1
Wei-Yi Cheng,1
Benjamin S. Glicksberg,1
Omri Gottesman,2
Ronald Tamler,3
Rong Chen,1
Erwin P. Bottinger,2
Joel T. Dudley1,4
*
Type 2 diabetes (T2D) is a heterogeneous complex disease affecting more than 29 million Americans alone with a
rising prevalence trending toward steady increases in the coming decades. Thus, there is a pressing clinical need to
improve early prevention and clinical management of T2D and its complications. Clinicians have understood that
patients who carry the T2D diagnosis have a variety of phenotypes and susceptibilities to diabetes-related compli-
cations. We used a precision medicine approach to characterize the complexity of T2D patient populations based
on high-dimensional electronic medical records (EMRs) and genotype data from 11,210 individuals. We successfully
identified three distinct subgroups of T2D from topology-based patient-patient networks. Subtype 1 was character-
ized by T2D complications diabetic nephropathy and diabetic retinopathy; subtype 2 was enriched for cancer ma-
lignancy and cardiovascular diseases; and subtype 3 was associated most strongly with cardiovascular diseases,
neurological diseases, allergies, and HIV infections. We performed a genetic association analysis of the emergent
T2D subtypes to identify subtype-specific genetic markers and identified 1279, 1227, and 1338 single-nucleotide
polymorphisms (SNPs) that mapped to 425, 322, and 437 unique genes specific to subtypes 1, 2, and 3, respec-
tively. By assessing the human disease–SNP association for each subtype, the enriched phenotypes and
biological functions at the gene level for each subtype matched with the disease comorbidities and clinical dif-
ferences that we identified through EMRs. Our approach demonstrates the utility of applying the precision
medicine paradigm in T2D and the promise of extending the approach to the study of other complex, multi-
factorial diseases.
INTRODUCTION
Type 2 diabetes (T2D) is a complex, multifactorial disease that has
emerged as an increasing prevalent worldwide health concern asso-
ciated with high economic and physiological burdens. An estimated
29.1 million Americans (9.3% of the population) were estimated to
have some form of diabetes in 2012—up 13% from 2010—with T2D
representing up to 95% of all diagnosed cases (1, 2). Risk factors for
T2D include obesity, family history of diabetes, physical inactivity, eth-
nicity, and advanced age (1, 2). Diabetes and its complications now
rank among the leading causes of death in the United States (2). In fact,
diabetes is the leading cause of nontraumatic foot amputation, adult
blindness, and need for kidney dialysis, and multiplies risk for myo-
cardial infarction, peripheral artery disease, and cerebrovascular disease
(3–6). The total estimated direct medical cost attributable to diabetes in
the United States in 2012 was $176 billion, with an estimated $76 billion
attributable to hospital inpatient care alone. There is a great need to im-
prove understanding of T2D and its complex factors to facilitate pre-
vention, early detection, and improvements in clinical management.
A more precise characterization of T2D patient populations can en-
hance our understanding of T2D pathophysiology (7, 8). Current
clinical definitions classify diabetes into three major subtypes: type 1 dia-
betes (T1D), T2D, and maturity-onset diabetes of the young. Other sub-
types based on phenotype bridge the gap between T1D and T2D, for
example, latent autoimmune diabetes in adults (LADA) (7) and ketosis-
prone T2D. The current categories indicate that the traditional definition of
diabetes, especially T2D, might comprise additional subtypes with dis-
tinct clinical characteristics. A recent analysis of the longitudinal Whitehall
II cohort study demonstrated improved assessment of cardiovascular
risks when subgrouping T2D patients according to glucose concentration
criteria (9). Genetic association studies reveal that the genetic architec-
ture of T2D is profoundly complex (10–12). Identified T2D-associated
risk variants exhibit allelic heterogeneity and directional differentiation
among populations (13, 14). The apparent clinical and genetic com-
plexity and heterogeneity of T2D patient populations suggest that there
are opportunities to refine the current, predominantly symptom-based,
definition of T2D into additional subtypes (7).
Because etiological and pathophysiological differences exist among
T2D patients, we hypothesize that a data-driven analysis of a clinical
population could identify new T2D subtypes and factors. Here, we de-
velop a data-driven, topology-based approach to (i) map the complexity
of patient populations using clinical data from electronic medical re-
cords (EMRs) and (ii) identify new, emergent T2D patient subgroups
with subtype-specific clinical and genetic characteristics. We apply this
approachtoadatasetcomprisingmatchedEMRsandgenotypedatafrom
more than 11,000 individuals. Topological analysis of these data revealed
three distinct T2D subtypes that exhibited distinct patterns of clinical
characteristics and disease comorbidities. Further, we identified genetic
markers associated with each T2D subtype and performed gene- and
pathway-level analysis of subtype genetic associations. Biological and
phenotypic features enriched in the genetic analysis corroborated clinical
disparities observed among subgroups. Our findings suggest that data-
driven,topologicalanalysisofpatientcohortshasutilityinprecisionmedicine
effortstorefineourunderstandingofT2Dtowardimproving patient care.
1
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount
Sinai, 700 Lexington Ave., New York, NY 10065, USA. 2
Institute for Personalized Medicine,
Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029,
USA. 3
Division of Endocrinology, Diabetes, and Bone Diseases, Icahn School of Medicine
at Mount Sinai, New York, NY 10029, USA. 4
Department of Health Policy and Research,
Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
*Corresponding author. E-mail: joel.dudley@mssm.edu
R E S E A R C H A R T I C L E
www.ScienceTranslationalMedicine.org 28 October 2015 Vol 7 Issue 311 311ra174 1
onOctober28,2015http://stm.sciencemag.org/Downloadedfrom
and vision defects (RR, 1.32; range, 1.04
to 1.67), than were the other two subtypes
(Table 2A). Patients in subtype 2 (n = 617)
were more likely to associate with diseases
of cancer of bronchus: lung (RR, 3.76; range,
1.14 to 12.39); malignant neoplasm with-
out specification of site (RR, 3.46; range,
1.23 to 9.70); tuberculosis (RR, 2.93; range,
1.30 to 6.64); coronary atherosclerosis and
other heart disease (RR, 1.28; range, 1.01 to
1.61); and other circulatory disease (RR, 1.27;
range, 1.02 to 1.58), than were the other two
subtypes (Table 2B). Patients in subtype 3
(n = 1096) were more often diagnosed with
HIV infection (RR, 1.92; range, 1.30 to 2.85)
and were associated with E codes (that is,
external causes of injury care) (RR, 1.84;
range, 1.41 to 2.39); aortic and peripheral
arterial embolism or thrombosis (RR, 1.79;
range,1.18to 2.71); hypertension withcom-
plications and secondary hypertension (RR,
1.66; range, 1.29 to 2.15); coronary athero-
sclerosis and other heart disease (RR, 1.41;
range, 1.15 to 1.72); allergic reactions (RR,
1.42; range, 1.19 to 1.70); deficiency and other
anemia (RR, 1.39; range, 1.14 to 1.68); and
screening and history of mental health and
substance abuse code (RR, 1.30; range, 1.07
to 1.58) (Table 2C).
Significant disease–genetic variant
enrichments specific to T2D subtypes
We next evaluated the genetic variants sig-
nificantly associated with each of the three
subtypes. Observed genetic associations and
gene-level [that is, single-nucleotide poly-
morphisms (SNPs) mapped to gene-level
annotations] enrichments by hypergeometric
analysis are considered independent of the
Fig. 1. Patient and genotype networks. (A)
Patient-patient network for topology patterns
on 11,210 Biobank patients. Each node repre-
sents a single or a group of patients with the
significant similarity based on their clinical
features. Edge connected with nodes indicates
the nodes have shared patients. Red color rep-
resents the enrichment for patients with T2D
diagnosis, and blue color represents the non-
enrichment for patients with T2D diagnosis.
(B) Patient-patient network for topology pat-
terns on 2551 T2D patients. Each node repre-
sents a single or a group of patients with the
significant similarity based on their clinical
features. Edge connected with nodes indicates
the nodes have shared patients. Red color rep-
resents the enrichment for patients with females,
and blue color represents the enrichment for
males.
R E S E A R C H A R T I C L E
www.ScienceTranslationalMedicine.org 28 October 2015 Vol 7 Issue 311 311ra174 3
onOctober28,2015http://stm.sciencemag.org/Downloadedfrom
( )m aNS a z
!e a X TR R R jd d  UReR%Uc gV R R jd d
Machine Learning for Healthcare (MLFHC) 2018 at Stanford
Machine Learning for Healthcare (MLFHC) 2018 at Stanford
Machine Learning for Healthcare (MLFHC) 2018 at Stanford
4 TYM +m aNS a
Machine Learning for Healthcare (MLFHC) 2018 at Stanford
FaNS a m

인공지능은 의료를 어떻게 혁신하는가 (2019년 7월) (하)

  • 1.
    l 0 ( l(0 t l) 0 x w m
  • 2.
    l m q v lm m q z 2 S ee V q q l x m% w m q RURae gV VRc X q t q RURae gV VRc X ' RUgVcdRc R ReeRT Issues
  • 3.
    l m q v lm m q z 2 S ee V q q l x m% w m q RURae gV VRc X q t q RURae gV VRc X ' RUgVcdRc R ReeRT Issues
  • 4.
  • 5.
  • 6.
  • 7.
  • 9.
    qB B m!KVURdjd" (), q $   q $ q ()+ ><9t $ $ • (), q )')( ! ((( gd )-(% (( " q ‘ z q 2 oB Bt
  • 11.
  • 12.
  • 13.
    l q 2 t lt q qm t7n qm t t7n qm t y 7n l
  • 15.
    NCCN Guidelines Version4.2014 Non-Small Cell Lung Cancer NCCN Guidelines Index NSCLC Table of Contents Discussion Version 4.2014, 06/05/14 © National Comprehensive Cancer Network, Inc. 2014, All rights reserved. The NCCN Guidelines® and this illustration may not be reproduced in any form without the express written permission of NCCN® . Note: All recommendations are category 2A unless otherwise indicated. Clinical Trials: NCCN believes that the best management of any cancer patient is in a clinical trial. Participation in clinical trials is especially encouraged. NSCL-2 dT3, N0 related to size or satellite nodules. fTesting is not listed in order of priority and is dependent upon clinical circumstances, institutional processes, and judicious use of resources. gMethods for evaluation include mediastinoscopy, mediastinotomy, EBUS, EUS, and CT-guided biopsy. hPositive PET/CT scan findings for distant disease need pathologic or other radiologic confirmation. If PET/CT scan is positive in the mediastinum, lymph node status needs pathologic confirmation. iSee Principles of Surgical Therapy (NSCL-B). jSee Principles of Radiation Therapy (NSCL-C). kSee Chemotherapy Regimens for Neoadjuvant and Adjuvant Therapy (NSCL-D). lExamples of high-risk factors may include poorly differentiated tumors (including lung neuroendocrine tumors [excluding well-differentiated neuroendocrine tumors]), vascular invasion, wedge resection, tumors >4 cm, visceral pleural involvement, and incomplete lymph node sampling (Nx). These factors independently may not be an indication and may be considered when determining treatment with adjuvant chemotherapy. mSee Chemotherapy Regimens Used with Radiation Therapy (NSCL-E). CLINICAL ASSESSMENT PRETREATMENT EVALUATIONf INITIAL TREATMENT Stage IA (peripheral T1ab, N0) Stage IB (peripheral T2a, N0) Stage I (central T1ab–T2a, N0) Stage II (T1ab–2ab, N1; T2b, N0) Stage IIB (T3, N0)d • PFTs (if not previously done) • Bronchoscopy (intraoperative preferred) • Pathologic mediastinal lymph node evaluationg (category 2B) • PET/CT scanh (if not previously done) • PFTs (if not previously done) • Bronchoscopy • Pathologic mediastinal lymph node evaluationg • PET/CT scanh (if not previously done) • Brain MRI (Stage II, Stage IB [category 2B]) Negative mediastinal nodes Positive mediastinal nodes Operable Medically inoperable Negative mediastinal nodes Positive mediastinal nodes Operable Medically inoperable Surgical exploration and resectioni + mediastinal lymph node dissection or systematic lymph node sampling Definitive RT including stereotactic ablative radiotherapyj (SABR) See Stage IIIA (NSCL-8) or Stage IIIB (NSCL-11) Surgical exploration and resectioni + mediastinal lymph node dissection or systematic lymph node sampling N0 N1 See Stage IIIA (NSCL-8) or Stage IIIB (NSCL-11) Definitive RT including SABRj Definitive chemoradiationj,m See Adjuvant Treatment (NSCL-3) See Adjuvant Treatment (NSCL-3) Consider adjuvant chemotherapyk (category 2B) for high-risk stages IB-IIl Printed by yoon sup choi on 6/19/2014 8:23:15 PM. For personal use only. Not approved for distribution. Copyright © 2014 National Comprehensive Cancer Network, Inc., All Rights Reserved.
  • 16.
    NCCN Guidelines Version4.2014 Non-Small Cell Lung Cancer NCCN Guidelines Index NSCLC Table of Contents Discussion Version 4.2014, 06/05/14 © National Comprehensive Cancer Network, Inc. 2014, All rights reserved. The NCCN Guidelines® and this illustration may not be reproduced in any form without the express written permission of NCCN® . Note: All recommendations are category 2A unless otherwise indicated. Clinical Trials: NCCN believes that the best management of any cancer patient is in a clinical trial. Participation in clinical trials is especially encouraged. NSCL-8 hPositive PET/CT scan findings for distant disease need pathologic or other radiologic confirmation. If PET/CT scan is positive in the mediastinum, lymph node status needs pathologic confirmation. iSee Principles of Surgical Therapy (NSCL-B). jSee Principles of Radiation Therapy (NSCL-C). kSee Chemotherapy Regimens for Neoadjuvant and Adjuvant Therapy (NSCL-D). mSee Chemotherapy Regimens Used with Radiation Therapy (NSCL-E). nR0 = no residual tumor, R1 = microscopic residual tumor, R2 = macroscopic residual tumor. sPatients likely to receive adjuvant chemotherapy may be treated with induction chemotherapy as an alternative. MEDIASTINAL BIOPSY FINDINGS INITIAL TREATMENT ADJUVANT TREATMENT T1-3, N0-1 (including T3 with multiple nodules in same lobe) Surgeryi,s Resectable Medically inoperable Surgical resectioni + mediastinal lymph node dissection or systematic lymph node sampling See Treatment according to clinical stage (NSCL-2) N0–1 N2 See NSCL-3 Margins negative (R0)n Sequential chemotherapyk (category 1) + RTj Margins positiven Surveillance (NSCL-14) R1n R2n Chemoradiationj (sequentialk or concurrentm) Surveillance (NSCL-14) Concurrent chemoradiationj,m Surveillance (NSCL-14) T1-2, T3 (≥7 cm), N2 nodes positivei • Brain MRI • PET/CT scan,h if not previously done Negative for M1 disease Positive Definitive concurrent chemoradiationj,m (category 1) or Induction chemotherapyk ± RTj See Treatment for Metastasis solitary site (NSCL-13) or distant disease (NSCL-15) No apparent progression Progression Surgeryi ± chemotherapyk (category 2B) ± RTj (if not given) RTj (if not given) ± chemotherapykLocal Systemic See Treatment for Metastasis solitary site (NSCL-13) or distant disease (NSCL-15) T3 (invasion), N2 nodes positive • Brain MRI • PET/CT scan,h if not previously done Negative for M1 disease Positive Definitive concurrent chemoradiationj,m See Treatment for Metastasis solitary site (NSCL-13) or distant disease (NSCL-15) Printed by yoon sup choi on 6/19/2014 8:23:15 PM. For personal use only. Not approved for distribution. Copyright © 2014 National Comprehensive Cancer Network, Inc., All Rights Reserved.
  • 17.
    l %q l 
 lo o q@f R e fTY q $ w$ TRcV r 
 l % q q s
  • 18.
    Over the courseof a career, an oncologist may impart bad news an average of 20,000 times, but most practicing oncologists have never received any formal training to help them prepare for such conversations.
  • 19.
    High levels ofempathy in primary care physicians correlate with 
 better clinical outcomes for their patients with diabetes
  • 20.
  • 21.
    We identified 384empathic opportunities and found that physicians 
 had responded empathically to 39 (10%) of them
  • 22.
    In 398 conversations,the total empathic opportunities was 292. When they occurred, oncologists responded with continuers 22% of the time.
  • 23.
    h ‘ ip 3 q t )2 w & q t 2 w & q t +2 t &
  • 24.
    h ‘ ip 3 • "시험 성적만 우수하면 의대에 들어오고 시험만 통과하면 전문 의 자격증을 취득할 수 있는 의사양성체계는 문제가 있다.” • "의사를 업으로 삼으려면 서비스 마인드와 공감능력 및 의사소 통 분야에 뛰어나야 하는데 기질적으로 이와 맞지 않는 학생이 입학하게 되면 방황할 수 밖에 없다” • "사실 환자와 많이 접촉하는 전공의 시절에 환자안전, 환자와 의 의사소통 등 의학 이외의 것에 대한 교육이 절실하지만 당 장 눈앞에 환자를 진료하기에 급급해 교육을 받지 못하고 있다"
  • 25.
    l p )y q q $ $ l p ) 0 p p Hojat M et al. Acad Med. 2009 differences in empathy scores between the two groups varied from a low of 0.05 (in year 0) to a maximum of 0.75 (in year 3). The effect size of the decline in empathy from year 0 to year 3 was more than double for those who chose technology-oriented specialties (d ϭ 1.01) compared with their counterparts in people-oriented specialties (d ϭ 0.44). Discussion The results of this study showed a significant decline in mean empathy scores in th The pattern men and wo pursued the people-orie specialties. findings, ou obtained a h than men,11 people-orie their counte specialties.1 It is interest magnitude the effect si men compa those who p careers com in people-or aforementio with lower e of medical s interested in specialties) l medical sch higher empa pattern of fi “at-risk” me vulnerable t What happ medical sch heart by wh generates a Figure 2 Changes in mean Jefferson Scale of Physician Empathy (JSPE) scores in different years of medical school for 56 men and 65 women who identified themselves at all five administrations of the JSPE (“matched cohort”) at Jefferson Medical College, Philadelphia, Pennsylvania, 2002–2008. Empathy matriculants entering in 2002. This is also reflected in the total matched cohort. Figure 1 shows a graphical presentation of the changes in mean empathy scores for the matched and unmatched cohorts. As shown in the figure, the patterns of changes are very similar in the matched and unmatched cohorts. Gender differences We compared changes in empathy scores during medical school for men (n ϭ 56) and women (n ϭ 65) in the matched cohort. Results are depicted in Figure 2. As shown in the figure, women consistently outscored men in every year of medical school. Gender differences in all of the test administrations were statistically significant (P Ͻ .05, by t test). As shown in Figure 2, although the pattern of change in empathy scores for women paralleled that of men, the effect size estimates of these changes varied from a low of 0.37 (in year 2) to a high of 0.79 (in year 3). The effect size of the decline in empathy between year 0 and year 3 was much larger for men (d ϭ 0.79) than for women (d ϭ 0.56). Differences across specialties Changes in empathy scores were compared for 85 graduates in the matched cohort who pursued their residency training in “people-oriented” specialties (e.g., family medicine, internal medicine, pediatrics, emergency medicine, psychiatry, obstetrics–gynecology) and 36 who pursued their training in “technology-oriented” specialties (e.g., anesthesiology, pathology, radiology, surgery, orthopedic surgery, etc.). Results appear in Figure 3. As shown in the figure, those who pursued people-oriented specialties consistently scored higher in all years of medical school than did their counterparts who pursued technology-oriented specialties. However, the difference in empathy scores between the two groups became statistically significant starting from year 2 of medical school (P Ͻ .05, by t test). The effect size estimates of Figure 1 Changes in mean Jefferson Scale of Physician Empathy (JSPE) scores in different years of medical school for the matched cohort (n ϭ 121), who identified themselves at all five administrations of the JSPE, and the unmatched cohort (n ϭ 335) at Jefferson Medical College, Philadelphia, Pennsylvania, 2002–2008. medical school. † F(4,296) ϭ 14.4; P Ͻ .001. ‡ F(4,179) ϭ 11.7; P Ͻ .001. ¶ F(4,479) ϭ 25.5; P Ͻ .001. Academic Medicine, Vol. 84, No. 9 / September 2009 1187
  • 26.
    ) m q w$ q w $
  • 27.
    l q t q q s
 l o ” w m l x w
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
    40 50 60 70 80 9 : 40 50 60 70 80 9 
: 
 69.5% 63% 49.5% 72.5% 57.5% ! " ! " b ! AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380. q 2 (( q 92 !-(( " q :2 ! ( " q 2 !)0 $ , " q 2 NMFG   o Digital Healthcare Institute Director,Yoon Sup Choi, PhD [email protected]
  • 34.
    v ! " 0 50 100 150 200 w/oAI w/ AI 0 50 100 150 200 w/o AI w/ AI 188m 154m 180m 108m saving 40% of time saving 18% of time 4 5   o p m q 2 (( q 92 !-(( " q :2 ! ( " q 2 !)0 $ , " q 2 NMFG AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380. Digital Healthcare Institute Director,Yoon Sup Choi, PhD [email protected]
  • 35.
  • 36.
    p ! o YunLiu et al. Detecting Cancer Metastases on Gigapixel Pathology Images (2017) l p s /($/ ..$+ q80>H2 >H 0z $ w q>JG;2 >H )',$ )' $ )$ $ ,$ 0z w q $ >H $ w t l o p -) t && q v • • q $ $ w z t
  • 37.
    modeled separately. Formicrometastases, sensitivity was significantly higher with Negative (Specificity) Micromet (Sensitivity) Macromet (Sensitivity) 0.7 0.5 0.6 0.8 0.9 1.0 p=0.02 A B Performance Unassisted Assisted FIGURE 3. Improved metastasis detection with algorithm assistance. A, Data represents performance across all images by image category and assistance modality. Error bars indicate SE. The performance metric corresponds to corresponds to specificity for negative cases and sensitivity for micrometastases (micromet) and macrometastases (macromet). B, Operating point of individual pathologists with and without assistance for micrometastases and negative cases, overlayed on the receiver operating characteristic curve of the algorithm. AUC indicates area under the curve. Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved. www.ajsp.com | 5 • 민감도 • 인공지능을 사용한 경우 Micromet의 경우에 유의미하게 상승 • Negative와 Macromet은 유의미하지 않음 • AUC • 병리학 전문의 혼자 or 인공지능 혼자보다, • 병리학 전문의+인공지능이 조금 더 높음
  • 38.
    isolated diagnostic tasks.Underlying these exciting advances, however, is the important notion that these algorithms do not replace the breadth and contextual knowledge of human pathologists and that even the best algorithms would need to from 83% to 91% and resulted in higher overall diagnostic accuracy than that of either unassisted pathologist inter- pretation or the computer algorithm alone. Although deep learning algorithms have been credited with comparable Unassisted Assisted TimeofReview(seconds) Timeofreviewperimage(seconds) Negative ITC Micromet Macromet A B p=0.002 p=0.02 Unassisted Assisted Micrometastases FIGURE 5. Average review time per image decreases with assistance. A, Average review time per image across all pathologists analyzed by category. Black circles are average times with assistance, gray triangles represent average times without assistance. Error bars indicate 95% confidence interval. B, Micrometastasis time of review decreases for nearly all images with assistance. Circles represent average review time for each individual micrometastasis image, averaged across the 6 pathologists by assistance modality. The dashed lines connect the points corresponding to the same image with and without assistance. The 2 images that were not reviewed faster on average with assistance are represented with red dot-dash lines. Vertical lines of the box represent quartiles, and the diamond indicates the average of review time for micrometastases in that modality. Micromet indicates mi- crometastasis; macromet, macrometastasis. 8 | www.ajsp.com Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved. • 판독 시간 (per image) • 인공지능의 보조를 받으면, Negative와 Micromet 은 유의미하게 감소 • 특히, Micromet은 약 2분에서 1분으로 절반 가량 감소 • ITC(Isolated Tumor Cell)와 Negative는 유의미하지 않음
  • 39.
    Deep Learning AutomaticDetection Algorithm for Malignant Pulmonary Nodules Table 3: Patient Classification and Nodule Detection at the Observer Performance Test Observer Test 1 DLAD versus Test 1 (P Value) Test 2 Test 1 versus Test 2 (P Value) Radiograph Classification (AUROC) Nodule Detection (JAFROC FOM) Radiograph Classification Nodule Detection Radiograph Classification (AUROC) Nodule Detection (JAFROC FOM) Radiograph Classification Nodule Detection Nonradiology physicians Observer 1 0.77 0.716 ,.001 ,.001 0.91 0.853 ,.001 ,.001 Observer 2 0.78 0.657 ,.001 ,.001 0.90 0.846 ,.001 ,.001 Observer 3 0.80 0.700 ,.001 ,.001 0.88 0.783 ,.001 ,.001 Group 0.691 ,.001* 0.828 ,.001* Radiology residents Observer 4 0.78 0.767 ,.001 ,.001 0.80 0.785 .02 .03 Observer 5 0.86 0.772 .001 ,.001 0.91 0.837 .02 ,.001 Observer 6 0.86 0.789 .05 .002 0.86 0.799 .08 .54 Observer 7 0.84 0.807 .01 .003 0.91 0.843 .003 .02 Observer 8 0.87 0.797 .10 .003 0.90 0.845 .03 .001 Observer 9 0.90 0.847 .52 .12 0.92 0.867 .04 .03 Group 0.790 ,.001* 0.867 ,.001* Board-certified radiologists Observer 10 0.87 0.836 .05 .01 0.90 0.865 .004 .002 Observer 11 0.83 0.804 ,.001 ,.001 0.84 0.817 .03 .04 Observer 12 0.88 0.817 .18 .005 0.91 0.841 .01 .01 Observer 13 0.91 0.824 ..99 .02 0.92 0.836 .51 .24 Observer 14 0.88 0.834 .14 .03 0.88 0.840 .87 .23 Group 0.821 .02* 0.840 .01* Thoracic radiologists Observer 15 0.94 0.856 .15 .21 0.96 0.878 .08 .03 Observer 16 0.92 0.854 .60 .17 0.93 0.872 .34 .02 Observer 17 0.86 0.820 .02 .01 0.88 0.838 .14 .12 Observer 18 0.84 0.800 ,.001 ,.001 0.87 0.827 .02 .02 Group 0.833 .08* 0.854 ,.001* Note.—Observer 4 had 1 year of experience; observers 5 and 6 had 2 years of experience; observers 7–9 had 3 years of experience; observers 10–12 had 7 years of experience; observers 13 and 14 had 8 years of experience; observer 15 had 26 years of experience; observer 16 had 13 years of experience; and observers 17 and 18 had 9 years of experience. Observers 1–3 were 4th-year residents from obstetrics and gynecolo- 의사 인공지능 vs. 의사만 (p value) 의사+인공지능 의사 vs. 의사+인공지능 (p value) 영상의학과 1년차 전공의 영상의학과 2년차 전공의 영상의학과 3년차 전공의 산부인과 4년차 전공의 정형외과 4년차 전공의 내과 4년차 전공의 영상의학과 전문의 7년 경력 8년 경력 영상의학과 전문의 (흉부) 26년 경력 13년 경력 9년 경력 영상의학과 전공의 비영상의학과 의사 •인공지능을 second reader로 활용하면 정확도가 개선 •classification: 17 of 18 명이 개선 (15 of 18, P<0.05) •nodule detection: 18 of 18 명이 개선 (14 of 18, P<0.05)
  • 41.
    ! t "! " )(( ! P"! w "
  • 42.
    ! t "! " )(( ! P"! w "
  • 43.
    ! t "! " )(( ! P"! w "
  • 44.
    ! t "! " )(( ! P"! w "
  • 45.
    ! t "! " )(( ! P"! w "
  • 46.
    ! t "! " )(( ! P"! w "
  • 47.
    ! t "! " )(( ! P"! w "
  • 48.
    ! t "! " )(( ! P"! w "
  • 50.
  • 51.
    Deep Learning AutomaticDetection Algorithm for Malignant Pulmonary Nodules Table 3: Patient Classification and Nodule Detection at the Observer Performance Test Observer Test 1 DLAD versus Test 1 (P Value) Test 2 Test 1 versus Test 2 (P Value) Radiograph Classification (AUROC) Nodule Detection (JAFROC FOM) Radiograph Classification Nodule Detection Radiograph Classification (AUROC) Nodule Detection (JAFROC FOM) Radiograph Classification Nodule Detection Nonradiology physicians Observer 1 0.77 0.716 ,.001 ,.001 0.91 0.853 ,.001 ,.001 Observer 2 0.78 0.657 ,.001 ,.001 0.90 0.846 ,.001 ,.001 Observer 3 0.80 0.700 ,.001 ,.001 0.88 0.783 ,.001 ,.001 Group 0.691 ,.001* 0.828 ,.001* Radiology residents Observer 4 0.78 0.767 ,.001 ,.001 0.80 0.785 .02 .03 Observer 5 0.86 0.772 .001 ,.001 0.91 0.837 .02 ,.001 Observer 6 0.86 0.789 .05 .002 0.86 0.799 .08 .54 Observer 7 0.84 0.807 .01 .003 0.91 0.843 .003 .02 Observer 8 0.87 0.797 .10 .003 0.90 0.845 .03 .001 Observer 9 0.90 0.847 .52 .12 0.92 0.867 .04 .03 Group 0.790 ,.001* 0.867 ,.001* Board-certified radiologists Observer 10 0.87 0.836 .05 .01 0.90 0.865 .004 .002 Observer 11 0.83 0.804 ,.001 ,.001 0.84 0.817 .03 .04 Observer 12 0.88 0.817 .18 .005 0.91 0.841 .01 .01 Observer 13 0.91 0.824 ..99 .02 0.92 0.836 .51 .24 Observer 14 0.88 0.834 .14 .03 0.88 0.840 .87 .23 Group 0.821 .02* 0.840 .01* Thoracic radiologists Observer 15 0.94 0.856 .15 .21 0.96 0.878 .08 .03 Observer 16 0.92 0.854 .60 .17 0.93 0.872 .34 .02 Observer 17 0.86 0.820 .02 .01 0.88 0.838 .14 .12 Observer 18 0.84 0.800 ,.001 ,.001 0.87 0.827 .02 .02 Group 0.833 .08* 0.854 ,.001* Note.—Observer 4 had 1 year of experience; observers 5 and 6 had 2 years of experience; observers 7–9 had 3 years of experience; observers 10–12 had 7 years of experience; observers 13 and 14 had 8 years of experience; observer 15 had 26 years of experience; observer 16 had 13 years of experience; and observers 17 and 18 had 9 years of experience. Observers 1–3 were 4th-year residents from obstetrics and gynecolo- 의사 인공지능 vs. 의사만 (p value) 의사+인공지능 의사 vs. 의사+인공지능 (p value) 영상의학과 1년차 전공의 영상의학과 2년차 전공의 영상의학과 3년차 전공의 산부인과 4년차 전공의 정형외과 4년차 전공의 내과 4년차 전공의 영상의학과 전문의 7년 경력 8년 경력 영상의학과 전문의 (흉부) 26년 경력 13년 경력 9년 경력 영상의학과 전공의 비영상의학과 의사 인공지능 0.91 0.885 •“인공지능 혼자” 한 것이 “영상의학과 전문의+인공지능”보다 대부분 더 정확 •classification: 9명 중 6명보다 나음 •nodule detection: 9명 전원보다 나음
  • 52.
  • 56.
    / & /+&/,& /.& l l ( l l l l l ( l l l l ( l l l (
  • 57.
    j m mw $ u o $k James Albaugh, Boeing, 2011
  • 59.
  • 60.
    ()) (( % ()))1.%)1/) )(( )((
  • 61.
    !U d XW eYV ;cVh" w $
  • 62.
    !U d XW eYV ;cVh" w $ q.. “ q /+/ q o
  • 63.
    !U d XW eYV ;cVh" w $ q.. “ q /+/ q o ‘ o
  • 64.
  • 65.
  • 66.
    l m q v lm m q z 2 S ee V q q l x m% w m q t q t $ $ t q RURae gV VRc X ' RUgVcdRc R ReeRT Issues
  • 67.
    m m qt w$ t q t v t & q5 Y ?UZQ0 o m $
 q 7 q t t q t
  • 68.
    qt w $t q t v t & q5 Y ?UZQ0 o m $
 q 7 q t t q t m m
  • 69.
  • 70.
    l(& . 9747E m l<7d 7E0 G O Z AJ && l s 7E l m l YU P 7E PQ QO U Z x m l( YU P 7E w r (s v 
 l l &s /&& l p m nn .-$ ./$+ =4 4 l974m PQ Z b QYM WQ QbUQc M TcMe
  • 71.
    l $ q 2' ' ! ' ' " q ' 2 ' c q v z qacV%dTcVV X2 9A$ eYV Yf R U Te c qU fS V cVRU X2 9A Yf R U Te c qU fS V TYVT !dVT U a "2 Yf R U Te c$ eYV 9A
  • 74.
    Assisting Pathologists inDetecting Cancer with Deep Learning 윤곽선, 색상, heat map 등으로 표시 질병의 유무, 질병의 중증도 등을 제시
  • 75.
    l $ q 2' ' ! ' ' " q ' 2 ' c q v z qacV%dTcVV X2 9A$ eYV Yf R U Te c qU fS V cVRU X2 9A Yf R U Te c qU fS V TYVT !dVT U a "2 Yf R U Te c$ eYV 9A
  • 77.
  • 78.
    THEBLACKBOX 2 0 |N A T U R E | V O L 5 3 8 | 6 O C T O B E R 2 0 1 6 THEBLACKBOX OFAI
  • 79.
  • 80.
    Animal Intelligence: CleverHans • Clever Hans was an horse that was claimed to have been able to perform arithmetic. • After a formal investigation in 1907, psychologist Oskar Pfungst demonstrated that the horse was 
 
 not actually performing these mental tasks, but was watching the reactions of his human observers. • The trainer was entirely unaware that he was providing such cues.
  • 81.
    https://namkugkim.wordpress.com/2017/05/15/ “실제로 서울아산병원 흉부X-ray를 학습시켰을때, 다른 질환과 다른게 Cardiomegaly (심장 비대)의 경우 학습결과는 좋았으나 전 혀 다른 것을 학습한다는 것을 이것을 통해 알게 되었다. Cardiomegaly는 심장이 커지는 것을 X-ray로 진단하는 것인데, 딥 러닝이 실제로 심장의 크기를 보는 것이 아니라, 이런 질환을 가진 환자의 X-ray에 있는 특징인 수술자국을 보고 있다는 것을 CAM으 로 확인하였다." - 서울아산병원 김남국 교수님 블로그
  • 82.
    l $ q 2' ' ! ' ' " q ' 2 ' c q v z q Q O QQZUZS2 9A s eYV Yf R U Te c qP aN Q QMPUZS2 9A Yf R U Te c qP aN Q OTQOW2 Yf R U Te c s 9A
  • 83.
    l m q v lm m q z 2 S ee V q q l x m% w m q t q t $ $ t q RURae gV VRc X ' RUgVcdRc R ReeRT Issues
  • 87.
    The new england jour nal of medicine original article Single Reading with Computer-Aided Detection for Screening Mammography Fiona J. Gilbert, F.R.C.R., Susan M. Astley, Ph.D., Maureen G.C. Gillan, Ph.D., Olorunsola F. Agbaje, Ph.D., Matthew G. Wallis, F.R.C.R., Jonathan James, F.R.C.R., Caroline R.M. Boggis, F.R.C.R., and Stephen W. Duffy, M.Sc., for the CADET II Group* From the Aberdeen Biomedical Imaging Centre, University of Aberdeen, Aberdeen (F.J.G., M.G.C.G.); the Department of Im- aging Science and Biomedical Engineer- ing,UniversityofManchester,Manchester (S.M.A.); the Department of Epidemiolo- gy, Mathematics, and Statistics, Wolfson Institute of Preventive Medicine, London (O.F.A., S.W.D.); the Cambridge Breast Unit, Addenbrookes Hospital, Cambridge (M.G.W.); the Nottingham Breast Insti- tute, Nottingham City Hospital, Notting- ham (J.J.); and the Nightingale Breast Screening Unit, Wythenshawe Hospital, Manchester (C.R.M.B.) — all in the Unit- ed Kingdom. Address reprint requests to Dr. Gilbert at the Aberdeen Biomedical Imaging Centre, University of Aberdeen, Lilian Sutton Bldg., Foresterhill, Aberdeen AB25 2ZD, Scotland, United Kingdom, or at [email protected]. *The members of the Computer-Aided Detection Evaluation Trial II (CADET II) group are listed in the Appendix. This article (10.1056/NEJMoa0803545) was published at www.nejm.org on Oc- tober 1, 2008. N Engl J Med 2008;359:1675-84. Copyright © 2008 Massachusetts Medical Society. ABSTR ACT Background The sensitivity of screening mammography for the detection of small breast can- cers is higher when the mammogram is read by two readers rather than by a single reader. We conducted a trial to determine whether the performance of a single reader using a computer-aided detection system would match the performance achieved by two readers. Methods The trial was designed as an equivalence trial, with matched-pair comparisons be- tween the cancer-detection rates achieved by single reading with computer-aided de- tection and those achieved by double reading. We randomly assigned 31,057 women undergoing routine screening by film mammography at three centers in England to double reading, single reading with computer-aided detection, or both double read- ing and single reading with computer-aided detection, at a ratio of 1:1:28. The pri- mary outcome measures were the proportion of cancers detected according to regi- men and the recall rates within the group receiving both reading regimens. Results The proportion of cancers detected was 199 of 227 (87.7%) for double reading and 198 of 227 (87.2%) for single reading with computer-aided detection (P=0.89). The overall recall rates were 3.4% for double reading and 3.9% for single reading with computer-aided detection; the difference between the rates was small but significant (P<0.001). The estimated sensitivity, specificity, and positive predictive value for single reading with computer-aided detection were 87.2%, 96.9%, and 18.0%, respectively. The corresponding values for double reading were 87.7%, 97.4%, and 21.1%. There were no significant differences between the pathological attributes of tumors de- tected by single reading with computer-aided detection alone and those of tumors detected by double reading alone. Conclusions Single reading with computer-aided detection could be an alternative to double read- ing and could improve the rate of detection of cancer from screening mammograms read by a single reader. (ClinicalTrials.gov number, NCT00450359.) Mammography • single reading+CAD vs. double reading • Outcome: Cancer detection rate / Recall rate
  • 88.
    The new england jour nal of medicine original article Single Reading with Computer-Aided Detection for Screening Mammography Fiona J. Gilbert, F.R.C.R., Susan M. Astley, Ph.D., Maureen G.C. Gillan, Ph.D., Olorunsola F. Agbaje, Ph.D., Matthew G. Wallis, F.R.C.R., Jonathan James, F.R.C.R., Caroline R.M. Boggis, F.R.C.R., and Stephen W. Duffy, M.Sc., for the CADET II Group* From the Aberdeen Biomedical Imaging Centre, University of Aberdeen, Aberdeen (F.J.G., M.G.C.G.); the Department of Im- aging Science and Biomedical Engineer- ing,UniversityofManchester,Manchester (S.M.A.); the Department of Epidemiolo- gy, Mathematics, and Statistics, Wolfson Institute of Preventive Medicine, London (O.F.A., S.W.D.); the Cambridge Breast Unit, Addenbrookes Hospital, Cambridge (M.G.W.); the Nottingham Breast Insti- tute, Nottingham City Hospital, Notting- ham (J.J.); and the Nightingale Breast Screening Unit, Wythenshawe Hospital, Manchester (C.R.M.B.) — all in the Unit- ed Kingdom. Address reprint requests to Dr. Gilbert at the Aberdeen Biomedical Imaging Centre, University of Aberdeen, Lilian Sutton Bldg., Foresterhill, Aberdeen AB25 2ZD, Scotland, United Kingdom, or at [email protected]. *The members of the Computer-Aided Detection Evaluation Trial II (CADET II) group are listed in the Appendix. This article (10.1056/NEJMoa0803545) was published at www.nejm.org on Oc- tober 1, 2008. N Engl J Med 2008;359:1675-84. Copyright © 2008 Massachusetts Medical Society. ABSTR ACT Background The sensitivity of screening mammography for the detection of small breast can- cers is higher when the mammogram is read by two readers rather than by a single reader. We conducted a trial to determine whether the performance of a single reader using a computer-aided detection system would match the performance achieved by two readers. Methods The trial was designed as an equivalence trial, with matched-pair comparisons be- tween the cancer-detection rates achieved by single reading with computer-aided de- tection and those achieved by double reading. We randomly assigned 31,057 women undergoing routine screening by film mammography at three centers in England to double reading, single reading with computer-aided detection, or both double read- ing and single reading with computer-aided detection, at a ratio of 1:1:28. The pri- mary outcome measures were the proportion of cancers detected according to regi- men and the recall rates within the group receiving both reading regimens. Results The proportion of cancers detected was 199 of 227 (87.7%) for double reading and 198 of 227 (87.2%) for single reading with computer-aided detection (P=0.89). The overall recall rates were 3.4% for double reading and 3.9% for single reading with computer-aided detection; the difference between the rates was small but significant (P<0.001). The estimated sensitivity, specificity, and positive predictive value for single reading with computer-aided detection were 87.2%, 96.9%, and 18.0%, respectively. The corresponding values for double reading were 87.7%, 97.4%, and 21.1%. There were no significant differences between the pathological attributes of tumors de- tected by single reading with computer-aided detection alone and those of tumors detected by double reading alone. Conclusions Single reading with computer-aided detection could be an alternative to double read- ing and could improve the rate of detection of cancer from screening mammograms read by a single reader. (ClinicalTrials.gov number, NCT00450359.) Table 1 double reading single reading & CAD proportion of cancers detected 87.7% 87.2% overall recall rates 3.4% 3.9% sensitivity 87.2% 87.8% specificity 96.9% 97.7% positive predicted value 18.0% 21.1% Conclusion: Single reading with computer-aided detection could be an alternative to double reading and could improve the rate of detection of cancer from screening mammograms read by a single reader.
  • 89.
    Copyright 2015 AmericanMedical Association. All rights reserved. Diagnostic Accuracy of Digital Screening Mammography With and Without Computer-Aided Detection Constance D. Lehman, MD, PhD; Robert D. Wellman, MS; Diana S. M. Buist, PhD; Karla Kerlikowske, MD; Anna N. A. Tosteson, ScD; Diana L. Miglioretti, PhD; for the Breast Cancer Surveillance Consortium IMPORTANCE After the US Food and Drug Administration (FDA) approved computer-aided detection (CAD) for mammography in 1998, and the Centers for Medicare and Medicaid Services (CMS) provided increased payment in 2002, CAD technology disseminated rapidly. Despite sparse evidence that CAD improves accuracy of mammographic interpretations and costs over $400 million a year, CAD is currently used for most screening mammograms in the United States. OBJECTIVE To measure performance of digital screening mammography with and without CAD in US community practice. DESIGN, SETTING, AND PARTICIPANTS We compared the accuracy of digital screening mammography interpreted with (n = 495 818) vs without (n = 129 807) CAD from 2003 through 2009 in 323 973 women. Mammograms were interpreted by 271 radiologists from 66 facilities in the Breast Cancer Surveillance Consortium. Linkage with tumor registries identified 3159 breast cancers in 323 973 women within 1 year of the screening. MAIN OUTCOMES AND MEASURES Mammography performance (sensitivity, specificity, and screen-detected and interval cancers per 1000 women) was modeled using logistic regression with radiologist-specific random effects to account for correlation among examinations interpreted by the same radiologist, adjusting for patient age, race/ethnicity, time since prior mammogram, examination year, and registry. Conditional logistic regression was used to compare performance among 107 radiologists who interpreted mammograms both with and without CAD. RESULTS Screening performance was not improved with CAD on any metric assessed. Mammography sensitivity was 85.3% (95% CI, 83.6%-86.9%) with and 87.3% (95% CI, 84.5%-89.7%) without CAD. Specificity was 91.6% (95% CI, 91.0%-92.2%) with and 91.4% (95% CI, 90.6%-92.0%) without CAD. There was no difference in cancer detection rate (4.1 in 1000 women screened with and without CAD). Computer-aided detection did not improve intraradiologist performance. Sensitivity was significantly decreased for mammograms interpreted with vs without CAD in the subset of radiologists who interpreted both with and without CAD (odds ratio, 0.53; 95% CI, 0.29-0.97). CONCLUSIONS AND RELEVANCE Computer-aided detection does not improve diagnostic accuracy of mammography. These results suggest that insurers pay more for CAD with no established benefit to women. JAMA Intern Med. 2015;175(11):1828-1837. doi:10.1001/jamainternmed.2015.5231 Published online September 28, 2015. Invited Commentary page 1837 Author Affiliations: Department of Radiology, Massachusetts General Hospital, Boston (Lehman); Group Health Research Institute, Seattle, Washington (Wellman, Buist, Miglioretti); Departments of Medicine and Epidemiology and Biostatistics, University of California, San Francisco, San Francisco (Kerlikowske); Norris Cotton Cancer Center, Geisel School of Medicine at Dartmouth, Dartmouth College, Lebanon, New Hampshire (Tosteson); Department of Public Health Sciences, School of Medicine, University of California, Davis (Miglioretti). Corresponding Author: Constance D. Lehman, MD, PhD, Department of Radiology, Massachusetts General Hospital, Avon Comprehensive Breast Evaluation Center, 55 Fruit St, WAC 240, Boston, MA 02114 (clehman @mgh.harvard.edu). Research Original Investigation | LESS IS MORE 1828 (Reprinted) jamainternalmedicine.com Copyright 2015 American Medical Association. All rights reserved. 647 R MYY S M Te UZ HF q )110 ><9 q (( ;V eVcd W c EVU TRcV R U EVU TR U KVcg TVd !;EK" q v ,((
  • 90.
    q (( ;VeVcd W c EVU TRcV R U EVU TR U KVcg TVd !;EK" q () R XcR 0+ t ;9<cer diagnosis within the follow-up period. True-positive examination results were defined as those with a positive examination assessment and breast cancer diagnosis. False- positive examination results were examinations with a posi- Mammography performan using logistic regression, includ diologist-specific random effect tion among examinations read b dom effects were allowed to vary the reading. Performance measu dian of the random effects distrib specific relative performance wa (OR) with 95% CIs comparing C for patient age at diagnosis, time year of examination, and the BC Receiver operating characte mated from 135 radiologists wh mogram associated with a cance cal logistic regression model tha accuracy parameters to depend o ing examination interpretation. racy among radiologists for exa the same condition (with or wi threshold for recall to vary acro mally distributed, radiologist-spe ied by whether the radiologist us We estimated the normalized mary ROC curves across the obs rates from this model.26 We plo the false-positive rate and supe curves. Two separate main sensitiv in subsets of total examinations Figure 1. Screening Mammography Patterns From 2000 to 2012 in US Community Practices in the Breast Cancer Surveillance Consortium (BCSC) 100 80 60 40 20 0 TypeofMammography,% Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 Film Digital with CAD Type Digital without CAD Data are provided from the larger BCSC population including all screening mammograms (5.2 million mammograms) for the indicated time period. Research Original Investigation Digital Screening Mammog CMS 보험 혜택 5% 83% 74%
  • 91.
    Diagnostic accuracy wasnot improved with CAD on any performance metric assessed w/ CAD w/o CAD sensitivity 85.3% 87.3% sensitivity for invasive cancer 82.1% 85.0% sensitivity for DCIS 93.2% 94.3% specificity 91.6% 91.4% Detection Rate (Overall) 4.1 per 1000 4.1 per 1000 Detection Rate in DCIS 1.2 per 1000 0.9 per 1000 < < < >
  • 92.
    From the ROCanalysis, the accuracy of mammographic interpretations with CAD was significantly lower than for those without CAD (P = .002). The normalized partial area under the summary ROC curve was 0.84 for interpretations with CAD and 0.88 for interpretations without CAD (Figure 2). In this subset of 135 radi at least 1 mammogram associated sensitivity of mammography was 86.9%) with and 89.3%% (95% CI CAD. Specificity of mammograp 90.4%-91.8%) with and 91.3% (95% out CAD. Differences by Age, Breast Density, Men and Time Since Last Mammogram We found no differences in diagnos graphic interpretations with and w subgroups assessed, including pat menopausal status, and time si (Table 3). Intraradiologist Performance Measures f With and Without CAD Among 107 radiologists who interpr with and without CAD, intraradiolog improved with CAD, and CAD was a sensitivity. Sensitivity of mammogr 81.0%-85.6%) with and 89.6% (95% out CAD. Specificity of mammogra 89.8%-91.7%) with and 89.6% (95% out CAD. The OR for specificity b interpreted with CAD and those inte the same radiologist was 1.02 (95% C was significantly decreased for ma Figure 2. Receiver Operating Characteristic Curves for Digital Screening Mammography With and Without the Use of CAD, Estimated From 135 Radiologists Who Interpreted at Least 1 Examination Associated With Cancer 100 80 60 40 20 0 0 403020 True-PositiveRate,% False-Positive Rate, % 10 No CAD use (PAUC, 0.88) CAD use (PAUC, 0.84) Each circle represents the true-positive or false-positive rate for a single radiologist, for examinations interpreted with (orange) or without (blue) computer-aided detection (CAD). Circle size is proportional to the number of mammograms associated with cancer interpreted by that radiologist with or without CAD. PAUC indicates partial area under the curve. DCIS, ductal carcinoma in situ; exam, examination. a Odds ratio for CAD vs No CAD adjusted for site, age group, race/ethnicity, time since prior mammogram, and calendar year of the examination using with CAD use. b The 95% CIs for sensitivity and specificity are The accuracy of mammographic interpretations with CAD was significantly lower than for those without CAD (P = .002)
  • 93.
    SPECIAL CONTRIBUTION J KoreanMed Assoc 2018 December; 61(12):765-775 pISSN 1975-8456 / eISSN 2093-5951 https://doi.org/10.5124/jkma.2018.61.12.765 첨단 디지털 의료기기 765 첨단 디지털 헬스케어 의료기기를 진료에 도입할 때 평가원칙 박 성 호1 ・도 경 현1 ・최 준 일2 ・심 정 석3 ・양 달 모4 ・어 홍5 ・우 현 식6 ・이 정 민7 ・정 승 은2 ・오 주 형8 | 1 울산대학교 의과대학 서 울아산병원 영상의학과, 2 가톨릭대학교 의과대학 서울성모병원 영상의학과, 3 위드심의원, 4 강동경희대학교병원 영상의학과, 5 삼성서울병원 영상의학과, 6 서울대학교 의과대학 서울특별시보라매병원 영상의학과, 7 서울대학교 의과대학 서울대학교병원 영상의학과, 8 경희대학교 의 과대학 경희의료원 영상의학과 Principles for evaluating the clinical implementation of novel digital healthcare devices Seong Ho Park, MD1 · Kyung-Hyun Do, MD1 · Joon-Il Choi, MD2 · Jung Suk Sim, MD3 · Dal Mo Yang, MD4 · Hong Eo, MD5 · Hyunsik Woo, MD6 · Jeong Min Lee, MD7 · Seung Eun Jung, MD2 · Joo Hyeong Oh, MD8 1 Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul; 2 Department of Radiology, Seoul St. Mary's Hospital, The Catholic University of Korea College of Medicine, Seoul; 3 Withsim Clinic, Seongnam; 4 Department of Radiology, Kyung Hee University Hospital at Gangdong, Seoul; 5 Department of Radiology and Center for Imaging Science, Samsung Medical Center, Seoul; 6 Department of Radiology, SMG-SNU Boramae Medical Center, Seoul National University College of Medicine, Seoul; 7 Department of Radiology, Seoul National University Hospital, Seoul National University College of Medicine, Seoul; 8 Department of Radiology, Kyung Hee University Hospital, Kyung Hee University College of Medicine, Seoul, Korea With growing interest in novel digital healthcare devices, such as artificial intelligence (AI) software for medical diagnosis and prediction, and their potential impacts on healthcare, discussions have taken place regarding the regulatory approval, coverage, and clinical implementation of these devices. Despite their potential, ‘digital exceptionalism’ (i.e., skipping the rigorous clinical validation of such digital tools) is creating significant concerns for patients and healthcare stakeholders. This white paper presents the positions of the Korean Society of Radiology, a leader in medical imaging and digital medicine, on the clinical validation, regulatory approval, coverage decisions, and clinical implementation of novel digital healthcare devices, especially AI software for medical diagnosis and prediction, and explains the scientific principles underlying those positions. Mere regulatory approval by the Food and Drug Administration of Korea, the United States, or other countries should be distinguished from coverage decisions and widespread clinical implementation, as regulatory approval only indicates that a digital tool is allowed for use in patients, not that the device is beneficial or recommended for patient care. Coverage or widespread clinical adoption of AI software tools should require a thorough clinical validation of safety, high accuracy proven by robust external validation, documented benefits for patient outcomes, and cost-effectiveness. The Korean Society of Radiology puts patients first when considering novel digital healthcare tools, and as an impartial professional organization that follows scientific principles and evidence, strives to provide correct information to the public, make reasonable policy suggestions, and build collaborative partnerships with industry and government for the good of our patients. Key Words:Software validation; Device approval; Insurance coverage; Artificial intelligence REVIEWSANDCOMMENTARYnREVIEW Radiology: Volume 000: Number 0—᭿ ᭿ ᭿ 2018 n radiology.rsna.org 1 1 From the Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, 88 Olympic-ro 43-gil, Songpa-gu, Seoul 05505, South Korea (S.H.P.); and Department of Radiology, Research Institute of Radiological Science, Yonsei University College of Medicine, Seoul, South Korea (K.H.). Received August 11, 2017; revision requested October 2; revision received October 12; accepted October 24; final version accepted November 3. Address correspondence to S.H.P. (e-mail: [email protected]). S.H.P. supported by the Industrial Strategic Technology Development Program (grant 10072064) funded by the Ministry of Trade, Industry and Energy. q RSNA, 2018 The use of artificial intelligence in medicine is currently an issue of great interest, especially with regard to the diag- nostic or predictive analysis of medical images. Adoption of an artificial intelligence tool in clinical practice requires careful confirmation of its clinical utility. Herein, the au- thors explain key methodology points involved in a clinical evaluation of artificial intelligence technology for use in medicine, especially high-dimensional or overparameter- ized diagnostic or predictive models in which artificial deep neural networks are used, mainly from the stand- points of clinical epidemiology and biostatistics. First, sta- tistical methods for assessing the discrimination and cali- bration performances of a diagnostic or predictive model are summarized. Next, the effects of disease manifesta- tion spectrum and disease prevalence on the performance results are explained, followed by a discussion of the dif- ference between evaluating the performance with use of internal and external datasets, the importance of using an adequate external dataset obtained from a well-de- fined clinical cohort to avoid overestimating the clinical performance as a result of overfitting in high-dimensional or overparameterized classification model and spectrum bias, and the essentials for achieving a more robust clini- cal evaluation. Finally, the authors review the role of clin- ical trials and observational outcome studies for ultimate clinical verification of diagnostic or predictive artificial in- telligence tools through patient outcomes, beyond perfor- mance metrics, and how to design such studies. q RSNA, 2018 Seong Ho Park, MD, PhD Kyunghwa Han, PhD Methodologic Guide for Evaluating Clinical Performance and Effect of Artificial Intelligence Technology for Medical Diagnosis and Prediction1 This copy is for personal use only. To order printed copies, contact [email protected]
  • 94.
    l % m qt ! ()/ ) " q t7 t t7 q !A:E ORed W c G T Xj ' ?V Td" luq % % q t 2 t7 t7 q t t7 q t t7 x w m% m x
  • 95.
    AI기반 의료기술(영상의학 분야)의 급여여부평가 가이드라인 마련 연구 건강보험심사평가원 대 한 영 상 의 학 회 l l l m m l x • $$
  • 96.
    - 20 - ·· · · · l % v x • l 4< v m • l u h mi t l uq m
  • 97.
    l?QbQ l m %o p l v l   l =? l m m m - 22 -
  • 98.
    l?QbQ ( l 5UM l4< • l 4< % v t l m w ?QbQ ) - 22 -
  • 99.
    l?QbQ ) l 5UM l4< “ m ?QbQ ) l ! m ?QbQ - 22 -
  • 100.
    4< •검사 행위 별가산료 • 기존 수가에 가산료를 지급하는 방안 • ‘영상의학과 전문의가 판독하는 경우 vs 다른 의사가 판독하는 경우’와 유사
 •간접 보상 • 의료기관 전체에 대한 간접적인 보상(의료기관 인증, 요양급여적정성 평가...) • 일부 환자는 이득, 일부 환자는 피해를 보는 경우 (ex. Brain CT의 판독 prioritization)
 •별도 행위 신설 • 기존 검사에서 제공하지 않던 완전히 새로운 진단 정보를 제공하는 경우 • 대응되는 기존의 급여/비급여 검사가 있는 경우 or 없는 경우 
 •의사 업무량의 ‘일부’에 해당하는 수가 신설 • 기존의 ‘판독료’는 영상의학과 의사의 ‘전체 판독 프로세스’의 일부에 불과 • AI가 일부 업무만을 수행하는 경우, 그에 맞는 적절한 보상 규모 산정
  • 101.
    FM 7 FR cM Q M M QPUOM 7QbUOQ l t h i q ! " q $ JO< $ $ q $ KRE< lC Q 6Q 0 m
  • 102.
    FM 7 FR cM Q M M QPUOM 7QbUOQ 의료기기 엑스레이 기기 혈압계 디지털 헬스케어 핏빗 런키퍼 슬립싸이클 디지털 치료제 페어 알킬리 엠페티카 얼라이브코어 프로테우스 인공지능 *SaMD: Software as a Medical Device SaMD* 복잡한 데이터 의료 
 영상 시그널 영상의학 병리 안과 피부 3 m30 Z Ne
  • 103.
    lh i0 4PMUbQ ?QM ZUZS q KRE< m ! T VU"n t q 9A !Vi& v " q ORed W c G T Xj2 ' q 2 ecfV R Rc ' WR dV R Rc v q 2 ' ' eV UVU W c fdV x w m
  • 104.
    •Initial premarket review시에, 향후 수정에 대한 관리 계획을 제출 •SaMD Pre-Specification (SPS) • 제조사가 예상/계획하는 출시 이후 performance / input / intended use 상의 변화 • 최초 기기에서 향후 변화할 범주 (region of potential changes)를 정의 •Algorithm Change Protocol (ACP) • SPS에서 정의된 변화에 대한 risk 를 컨트롤하기 위한 구체적인 방법 • 변화 이후에 safety/effectiveness 유지 되도록 data/procedure 에 대해 단계별 기술 C QPQ Q YUZQP OTMZSQ O Z MZ
  • 105.
    46C!9 X ceY ;YR XV Hc e T "general overview of the components of an ACP. This is "how" the algorithm will learn and change while remaining safe and effective. Scope and limitations for establishing SPS and ACP: The FDA acknowledges that the types of changes that could be pre-specified in a SPS and managed through an ACP may necessitate individual Figure 4: Algorithm Change Protocol components Y h eYV R X c eY h VRc R U TYR XV hY V cV R X dRWV R U VWWVTe gV&
  • 106.
    m FCF 46Cmx m3 modifications guidance results in either 1) submission of a new 510(k) for premarket review or 2) documentation of the modification and the analysis in the risk management and 510(k) files. If, for AI/ML SaMD with an approved SPS and ACP, modifications are within the bounds of the SPS and the ACP, this proposed framework suggests that manufacturers would document the change in their change history and other appropriate records, and file for reference, similar to the “document” approach outlined in the software modifications guidance. Figure 5: Approach to modifications to previously approved SaMD with SPS and ACP. This flowchart should only be considered in conjunction with the accompanying text in this white paper.
  • 107.
    l y 4PbQM UM 4 MOW m q $ t q ! v " q ! " u $ q J Sfde Vdd gd& HVcW c R TV2 h yi m Science 2019
  • 108.
    l y 4PbQM UM 4 MOW m q $ t q ! v " q ! " u $ q J Sfde Vdd gd& HVcW c R TV2 h yi m Science 2019 INSIGHTS | POLICY FORUM case of structured data such as billing codes, adversarial techniques could be used to au- tomate the discovery of code combinations that maximize reimbursement or minimize the probability of claims rejection. Because adversarial attacks have been demonstrated for virtually every class of ma- chine-learning algorithms ever studied, from simple and readily interpretable methods such as logistic regression to more compli- cated methods such as deep neural networks (1), this is not a problem specific to medicine, and every domain of machine-learning ap- plication will need to contend with it. Re- searchers have sought to develop algorithms that are resilient to adversarial attacks, such as by training algorithms with exposure to adversarial examples or using clever data processing to mitigate potential tampering (1). Early efforts in this area are promising, and we hope that the pursuit of fully robust machine-learning models will catalyze the development of algorithms that learn to make decisions for consistently explainable and appropriate reasons. Nevertheless, cur- rent general-use defensive techniques come at a material degeneration of accuracy, even if sometimes at improved explainability (10). Thus, the models that are both highly accu- rate and robust to adversarial examples re- main an open problem in computer science. These challenges are compounded in the medical context. Medical information tech- nology (IT) systems are notoriously difficult to update, so any new defenses could be diffi- cult to roll out. In addition, the ground truth in medical diagnoses is often ambiguous, meaning that for many cases no individual human can definitively assign the true label At the extreme of this tactical shaping of mends billing for codes corresponding to Benign Malignant Model confidence Benign Malignant Model confidence The patient has a history of back pain and chronic alcohol abuse and more recently has been seen in several... 277.7 Metabolic syndrome 429.9 Heart disease, unspecified 278.00 Obesity, unspecified 401.0 Benign essential hypertension 272.0 Hypercholesterolemia 272.2 Hyperglyceridemia 429.9 Heart disease,unspecified 278.00 Obesity,unspecified The patient has a history of lumbago and chronic alcohol dependence and more recently has been seen in several... Dermatoscopic image of a benign melanocytic nevus, along with the diagnostic probability computed by a deep neural network. Perturbation computed by a common adversarial attack technique. See (7) for details. Combined image of nevus and attack perturbation and the diagnostic probabilities from the same deep neural network. Original image Diagnosis: Benign Diagnosis: Malignant Opioid abuse risk: High Reimbursement: Denied Reimbursement: Approved Opioid abuse risk: Low Adversarial noise Adversarial rotation (8) Adversarial example Adversarial text substitution (9) Adversarial coding (13) + 0.04 = The anatomy of an adversarial attack Demonstration of how adversarial attacks against various medical AI systems might be executed without requiring any overtly fraudulent misrepresentation of the data. onMarhttp://science.sciencemag.org/Downloadedfrom
  • 109.
    l 0 ( l(0 t l) 0 x w m
  • 110.
  • 111.
  • 112.
  • 113.
    Sep 2018, Health2.0 @Santa Clara
  • 114.
    March 2019, theFuture of Individual Medicine @San Diego
  • 115.
  • 116.
    40 50 60 70 80 9 : 40 50 60 70 80 9 
: 
 69.5% 63% 49.5% 72.5% 57.5% ! ! b ! AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380. q 2 (( q 92 !-(( q :2 ! ( q 2 !)0 $ , q 2 NMFG   o Digital Healthcare Institute Director,Yoon Sup Choi, PhD [email protected]   s
  • 117.
    Deep Learning AutomaticDetection Algorithm for Malignant Pulmonary Nodules Table 3: Patient Classification and Nodule Detection at the Observer Performance Test Observer Test 1 DLAD versus Test 1 (P Value) Test 2 Test 1 versus Test 2 (P Value) Radiograph Classification (AUROC) Nodule Detection (JAFROC FOM) Radiograph Classification Nodule Detection Radiograph Classification (AUROC) Nodule Detection (JAFROC FOM) Radiograph Classification Nodule Detection Nonradiology physicians Observer 1 0.77 0.716 ,.001 ,.001 0.91 0.853 ,.001 ,.001 Observer 2 0.78 0.657 ,.001 ,.001 0.90 0.846 ,.001 ,.001 Observer 3 0.80 0.700 ,.001 ,.001 0.88 0.783 ,.001 ,.001 Group 0.691 ,.001* 0.828 ,.001* Radiology residents Observer 4 0.78 0.767 ,.001 ,.001 0.80 0.785 .02 .03 Observer 5 0.86 0.772 .001 ,.001 0.91 0.837 .02 ,.001 Observer 6 0.86 0.789 .05 .002 0.86 0.799 .08 .54 Observer 7 0.84 0.807 .01 .003 0.91 0.843 .003 .02 Observer 8 0.87 0.797 .10 .003 0.90 0.845 .03 .001 Observer 9 0.90 0.847 .52 .12 0.92 0.867 .04 .03 Group 0.790 ,.001* 0.867 ,.001* Board-certified radiologists Observer 10 0.87 0.836 .05 .01 0.90 0.865 .004 .002 Observer 11 0.83 0.804 ,.001 ,.001 0.84 0.817 .03 .04 Observer 12 0.88 0.817 .18 .005 0.91 0.841 .01 .01 Observer 13 0.91 0.824 ..99 .02 0.92 0.836 .51 .24 Observer 14 0.88 0.834 .14 .03 0.88 0.840 .87 .23 Group 0.821 .02* 0.840 .01* Thoracic radiologists Observer 15 0.94 0.856 .15 .21 0.96 0.878 .08 .03 Observer 16 0.92 0.854 .60 .17 0.93 0.872 .34 .02 Observer 17 0.86 0.820 .02 .01 0.88 0.838 .14 .12 Observer 18 0.84 0.800 ,.001 ,.001 0.87 0.827 .02 .02 Group 0.833 .08* 0.854 ,.001* Note.—Observer 4 had 1 year of experience; observers 5 and 6 had 2 years of experience; observers 7–9 had 3 years of experience; observers 10–12 had 7 years of experience; observers 13 and 14 had 8 years of experience; observer 15 had 26 years of experience; observer 16 had 13 years of experience; and observers 17 and 18 had 9 years of experience. Observers 1–3 were 4th-year residents from obstetrics and gynecolo- b $ bM aQ ! b $ ! bM aQ ) + , , , / 0 . )+ 1 s
  • 118.
  • 119.
    modeled separately. Formicrometastases, sensitivity was significantly higher with Negative (Specificity) Micromet (Sensitivity) Macromet (Sensitivity) 0.7 0.5 0.6 0.8 0.9 1.0 p=0.02 A B Performance Unassisted Assisted FIGURE 3. Improved metastasis detection with algorithm assistance. A, Data represents performance across all images by image category and assistance modality. Error bars indicate SE. The performance metric corresponds to corresponds to specificity for negative cases and sensitivity for micrometastases (micromet) and macrometastases (macromet). B, Operating point of individual pathologists with and without assistance for micrometastases and negative cases, overlayed on the receiver operating characteristic curve of the algorithm. AUC indicates area under the curve. Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved. www.ajsp.com | 5 l p l UO YQ x l AQSM UbQ MO YQ • l 4H6 l l ! s
  • 120.
    1Wang P, et al. Gut2019;0:1–7. doi:10.1136/gutjnl-2018-317500 Endoscopy ORIGINAL ARTICLE Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study Pu Wang,  1 Tyler M Berzin,  2 Jeremy Romek Glissen Brown,  2 Shishira Bharadwaj,2 Aymeric Becq,2 Xun Xiao,1 Peixi Liu,1 Liangping Li,1 Yan Song,1 Di Zhang,1 Yi Li,1 Guangre Xu,1 Mengtian Tu,1 Xiaogang Liu  1 To cite: Wang P, Berzin TM, Glissen Brown JR, et al. Gut Epub ahead of print: [please include Day Month Year]. doi:10.1136/ gutjnl-2018-317500 ► Additional material is published online only.To view please visit the journal online (http://dx.doi.org/10.1136/ gutjnl-2018-317500). 1 Department of Gastroenterology, Sichuan Academy of Medical Sciences Sichuan Provincial People’s Hospital, Chengdu, China 2 Center for Advanced Endoscopy, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, USA Correspondence to Xiaogang Liu, Department of Gastroenterology Sichuan Academy of Medical Sciences and Sichuan Provincial People’s Hospital, Chengdu, China; [email protected] Received 30 August 2018 Revised 4 February 2019 Accepted 13 February 2019 © Author(s) (or their employer(s)) 2019. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. ABSTRACT Objective The effect of colonoscopy on colorectal cancer mortality is limited by several factors, among them a certain miss rate, leading to limited adenoma detection rates (ADRs).We investigated the effect of an automatic polyp detection system based on deep learning on polyp detection rate and ADR. Design In an open, non-blinded trial, consecutive patients were prospectively randomised to undergo diagnostic colonoscopy with or without assistance of a real-time automatic polyp detection system providing a simultaneous visual notice and sound alarm on polyp detection.The primary outcome was ADR. Results Of 1058 patients included, 536 were randomised to standard colonoscopy, and 522 were randomised to colonoscopy with computer-aided diagnosis.The artificial intelligence (AI) system significantly increased ADR (29.1%vs20.3%, p0.001) and the mean number of adenomas per patient (0.53vs0.31, p0.001).This was due to a higher number of diminutive adenomas found (185vs102; p0.001), while there was no statistical difference in larger adenomas (77vs58, p=0.075). In addition, the number of hyperplastic polyps was also significantly increased (114vs52, p0.001). Conclusions In a low prevalent ADR population, an automatic polyp detection system during colonoscopy resulted in a significant increase in the number of diminutive adenomas detected, as well as an increase in the rate of hyperplastic polyps.The cost–benefit ratio of such effects has to be determined further. Trial registration number ChiCTR-DDD-17012221; Results. INTRODUCTION Colorectal cancer (CRC) is the second and third- leading causes of cancer-related deaths in men and women respectively.1 Colonoscopy is the gold stan- dard for screening CRC.2 3 Screening colonoscopy has allowed for a reduction in the incidence and mortality of CRC via the detection and removal of adenomatous polyps.4–8 Additionally, there is evidence that with each 1.0% increase in adenoma detection rate (ADR), there is an associated 3.0% decrease in the risk of interval CRC.9 10 However, polyps can be missed, with reported miss rates of up to 27% due to both polyp and operator charac- teristics.11 12 Unrecognised polyps within the visual field is an important problem to address.11 Several studies have shown that assistance by a second observer increases the polyp detection rate (PDR), but such a strategy remains controversial in terms of increasing the ADR.13–15 Ideally, a real-time automatic polyp detec- tion system, with performance close to that of expert endoscopists, could assist the endosco- pist in detecting lesions that might correspond to adenomas in a more consistent and reliable way Significance of this study What is already known on this subject? ► Colorectal adenoma detection rate (ADR) is regarded as a main quality indicator of (screening) colonoscopy and has been shown to correlate with interval cancers. Reducing adenoma miss rates by increasing ADR has been a goal of many studies focused on imaging techniques and mechanical methods. ► Artificial intelligence has been recently introduced for polyp and adenoma detection as well as differentiation and has shown promising results in preliminary studies. What are the new findings? ► This represents the first prospective randomised controlled trial examining an automatic polyp detection during colonoscopy and shows an increase of ADR by 50%, from 20% to 30%. ► This effect was mainly due to a higher rate of small adenomas found. ► The detection rate of hyperplastic polyps was also significantly increased. How might it impact on clinical practice in the foreseeable future? ► Automatic polyp and adenoma detection could be the future of diagnostic colonoscopy in order to achieve stable high adenoma detection rates. ► However, the effect on ultimate outcome is still unclear, and further improvements such as polyp differentiation have to be implemented. on17March2019byguest.Protectedbycopyright.http://gut.bmj.com/Gut:firstpublishedas10.1136/gutjnl-2018-317500on27February2019.Downloadedfrom l e PQ QO U Z MPQZ YM PQ QO U Z ‘ q Hc daVTe gV J;L ! 5)(-03 deR URcU5-+.$ ;95- q q MPQZ YM PQ QO U Z M Q m2 1) gd (+ !a4((() q PQ QO U Z MPQZ YM m2 (-+ gd (+) !a4((() q PUYUZa UbQ MPQZ YM 2 )0- gd )( !a4((() q Te Q M UO e m2 )), gd - !a4((() Endoscopy 0.5cm and polyps in all segments of the colon with the excep- tion of the caecum and the ascending colon (table 3). Outcomes in excellent bowel preparation (BBPS ≥7) In the situation of excellent bowel preparation, ADR in the CADe group showed a trend of 6% increase superior to that of the routine group. However, due to the inadequate sample size of the subgroup analysis, it failed to show a statistically signifi- cant difference. Other outcomes, including the mean number of detected adenomas, mean number of detected polyps and PDR were all significantly increased in the CADe group (table 4). can still be missed. Studies have also reported that some polyps are missed by the endoscopist despite being within the visual field.31 32 Several hypotheses have been proposed to explain the mechanism by which polyps may be missed. These include differences in endoscopist skill level, differences in endoscopist tracking patterns, ‘inattentional blindness’, wherein an observer fails to process an image on the screen due to distraction, and ‘change blindness’, wherein changes are missed during inter- ruptions in visual scanning or during eye movements13.33–37 Distraction caused by fatigue or emotional factors may also contribute. A second party such as a nurse or a trainee observing may improve PDR. While several studies have shown that this increases PDR, controversy remains regarding ADR.13–15 It is Table 3 Polyp and adenoma detection Routine colonoscopy (n=536) CADe colonoscopy (n=522) P value* FC/OR 95%CI PDR 0.291 0.4502 0.001 1.995† 1.532 to 2.544 ADR 0.2034 0.2912 0.001 1.61† 1.213 to 2.135 Mean number of detected polyp 0.5019 0.954 0.001 1.89‡ 1.63 to 2.192 Mean number of detected adenoma 0.3097 0.5326 0.001 1.72‡ 1.419 to 2.084 *P value from χ2 test (or Fisher’s exact test, as appropriate) or t-test. †OR. ‡FC. ADR, adenoma detection rate; FC, fold change; PDR, polyp detection rate. s
  • 121.
    1Wu L, et al. Gut2019;0:1–9. doi:10.1136/gutjnl-2018-317366 Endoscopy ORIGINAL ARTICLE Randomised controlled trial of WISENSE, a real-time quality improving system for monitoring blind spots during esophagogastroduodenoscopy Lianlian Wu,1,2,3 Jun Zhang,1,2,3 Wei Zhou,1,2,3 Ping An,1,2,3 Lei Shen,1,2,3 Jun Liu,1,3 Xiaoda Jiang,1,2,3 Xu Huang,1,2,3 Ganggang Mu,1,2,3 Xinyue Wan,  1,2,3 Xiaoguang Lv,1,2,3 Juan Gao,1,3 Ning Cui,1,2,3 Shan Hu,4 Yiyun Chen,4 Xiao Hu,4 Jiangjie Li,4 Di Chen,1,2,3 Dexin Gong,1,2,3 Xinqi He,1,2,3 Qianshan Ding,1,2,3 Xiaoyun Zhu,1,2,3 Suqin Li,1,2,3 Xiao Wei,1,2,3 Xia Li,1,2,3 Xuemei Wang,1,2,3 Jie Zhou,1,2,3 Mengjiao Zhang,1,2,3 Hong Gang Yu  1,2,3 To cite: Wu L, Zhang J, Zhou W, et al. Gut Epub ahead of print: [please include Day Month Year]. doi:10.1136/ gutjnl-2018-317366 ► Additional material is published online only.To view please visit the journal online (http://dx.doi.org/10.1136/ gutjnl-2018-317366). For numbered affiliations see end of article. Correspondence to Professor Hong Gang Yu, Department of Gastroenterology, Renmin Hospital of Wuhan University, Wuhan 430060, China; [email protected] LW, JZ and WZ contributed equally. Received 10 August 2018 Revised 28 January 2019 Accepted 17 February 2019 © Author(s) (or their employer(s)) 2019. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. ABSTRACT Objective Esophagogastroduodenoscopy (EGD) is the pivotal procedure in the diagnosis of upper gastrointestinal lesions. However, there are significant variations in EGD performance among endoscopists, impairing the discovery rate of gastric cancers and precursor lesions.The aim of this study was to construct a real-time quality improving system,WISENSE, to monitor blind spots, time the procedure and automatically generate photodocumentation during EGD and thus raise the quality of everyday endoscopy. Design WISENSE system was developed using the methods of deep convolutional neural networks and deep reinforcement learning. Patients referred because of health examination, symptoms, surveillance were recruited from Renmin hospital of Wuhan University. Enrolled patients were randomly assigned to groups that underwent EGD with or without the assistance of WISENSE.The primary end point was to ascertain if there was a difference in the rate of blind spots between WISENSE-assisted group and the control group. Results WISENSE monitored blind spots with an accuracy of 90.40% in real EGD videos.A total of 324 patients were recruited and randomised. 153 and 150 patients were analysed in the WISENSE and control group, respectively. Blind spot rate was lower in WISENSE group compared with the control (5.86% vs 22.46%, p0.001), and the mean difference was −15.39% (95% CI −19.23 to −11.54).There was no significant adverse event. Conclusions WISENSE significantly reduced blind spot rate of EGD procedure and could be used to improve the quality of everyday endoscopy. Trial registration number ChiCTR1800014809; Results. INTRODUCTION Esophagogastroduodenoscopy (EGD) is the pivotal procedure in the diagnosis of upper gastrointestinal lesions.1 High-quality endoscopy delivers better health outcomes.2 However, there are significant variations in EGD performance among endoscopists, impairing the discovery rate of gastric cancers (GC) and precursor lesions.3 The diagnosis rate of early GC in China is still Significance of this study What is already known on this subject? ► The past decades have seen remarkable progress of deep convolutional neural network (DCNN) in the field of endoscopy. Recent studies have successfully used DCNN to achieve accurate prediction of early gastric cancer in endoscopic images and real-time histological classification of colon polyps in unprocessed videos. However, it has yet not been investigated whether DCNN could be used in monitoring quality of everyday endoscopy. What are the new findings? ► In the present study,WISENSE, a real-time quality improving system based on the DCNN and deep reinforcement learning (DRL) for monitoring blind spots, timing the procedure and generating photodocumentation during esophagogastroduodenoscopy (EGD) was developed.The performance ofWISENSE was verified in EGD videos.A single-centre randomised controlled trial was conducted to evaluate the hypothesis thatWISENSE would reduce the rate of blind spots during EGD.To the best of our knowledge, this is the first study using deep learning in the field of assuring endoscopy completeness and using DRL in making medical decisions in human body environment and also the first study validating the efficiency of a deep learning system in a randomised controlled trial. How might it impact on clinical practice in the foreseeable future? ► WISENSE greatly reduced blind spot rate, increased inspection time and improved the completeness of photodocumentation of EGD in the randomised controlled trial. It could be a powerful assistant tool for mitigating skill variations among endoscopists and improving the quality of everyday endoscopy. on26March2019byguest.Protectedbycopyright.http://gut.bmj.com/Gut:firstpublishedas10.1136/gutjnl-2018-317366on11March2019.Downloadedfrom 8:7 s
  • 122.
  • 123.
    puted as newdata became avail when his or her score crossed t dation set, the AUC obtained f 0.81 to 0.85) (Fig. 2). At a spec of 0.33], TREWScore achieved a s a median of 28.2 hours (IQR, 10 Identification of patients b A critical event in the developme related organ dysfunction (seve been shown to increase after th more than two-thirds (68.8%) o were identified before any sepsi tients were identified a median (Fig. 3B). Comparison of TREWScore Weevaluatedtheperformanceof methods for the purpose of provid use of TREWScore. We first com to MEWS, a general metric used of catastrophic deterioration (17 oped for tracking sepsis, MEWS tion of patients at risk for severe Fig. 2. ROC for detection of septic shock before onset in the validation set. The ROC curve for TREWScore is shown in blue, with the ROC curve for MEWS in red. The sensitivity and specificity performance of the routine screening criteria is indicated by the purple dot. Normal 95% CIs are shown for TREWScore and MEWS. TPR, true-positive rate; FPR, false-positive rate. R E S E A R C H A R T I C L E A targeted real-time early warning score (TREWScore) for septic shock AUC=0.83 At a specificity of 0.67,TREWScore achieved a sensitivity of 0.85 
 and identified patients a median of 28.2 hours before onset.
  • 124.
    q • t!9$ : q92 =OK ++( $ E=OK (+ q:2 =OK , / $ E=OK ,( (source: VUNO) APPH(Alarms Per Patients Per Hour) (source: VUNO) Less False Alarm ’
  • 125.
    ARTICLE OPEN Scalable andaccurate deep learning with electronic health records Alvin Rajkomar 1,2 , Eyal Oren1 , Kai Chen1 , Andrew M. Dai1 , Nissan Hajaj1 , Michaela Hardt1 , Peter J. Liu1 , Xiaobing Liu1 , Jake Marcus1 , Mimi Sun1 , Patrik Sundberg1 , Hector Yee1 , Kun Zhang1 , Yi Zhang1 , Gerardo Flores1 , Gavin E. Duggan1 , Jamie Irvine1 , Quoc Le1 , Kurt Litsch1 , Alexander Mossin1 , Justin Tansuwan1 , De Wang1 , James Wexler1 , Jimbo Wilson1 , Dana Ludwig2 , Samuel L. Volchenboum3 , Katherine Chou1 , Michael Pearson1 , Srinivasan Madabushi1 , Nigam H. Shah4 , Atul J. Butte2 , Michael D. Howell1 , Claire Cui1 , Greg S. Corrado1 and Jeffrey Dean1 Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient’s record. We propose a representation of patients’ entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two US academic medical centers with 216,221 adult patients hospitalized for at least 24 h. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting: in-hospital mortality (area under the receiver operator curve [AUROC] across sites 0.93–0.94), 30-day unplanned readmission (AUROC 0.75–0.76), prolonged length of stay (AUROC 0.85–0.86), and all of a patient’s final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed traditional, clinically-used predictive models in all cases. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios. In a case study of a particular prediction, we demonstrate that neural networks can be used to identify relevant information from the patient’s chart. npj Digital Medicine (2018)1:18 ; doi:10.1038/s41746-018-0029-1 INTRODUCTION The promise of digital medicine stems in part from the hope that, by digitizing health data, we might more easily leverage computer information systems to understand and improve care. In fact, routinely collected patient healthcare data are now approaching the genomic scale in volume and complexity.1 Unfortunately, most of this information is not yet used in the sorts of predictive statistical models clinicians might use to improve care delivery. It is widely suspected that use of such efforts, if successful, could provide major benefits not only for patient safety and quality but also in reducing healthcare costs.2–6 In spite of the richness and potential of available data, scaling the development of predictive models is difficult because, for traditional predictive modeling techniques, each outcome to be predicted requires the creation of a custom dataset with specific variables.7 It is widely held that 80% of the effort in an analytic model is preprocessing, merging, customizing, and cleaning nurses, and other providers are included. Traditional modeling approaches have dealt with this complexity simply by choosing a very limited number of commonly collected variables to consider.7 This is problematic because the resulting models may produce imprecise predictions: false-positive predictions can overwhelm physicians, nurses, and other providers with false alarms and concomitant alert fatigue,10 which the Joint Commission identified as a national patient safety priority in 2014.11 False-negative predictions can miss significant numbers of clinically important events, leading to poor clinical outcomes.11,12 Incorporating the entire EHR, including clinicians’ free-text notes, offers some hope of overcoming these shortcomings but is unwieldy for most predictive modeling techniques. Recent developments in deep learning and artificial neural networks may allow us to address many of these challenges and unlock the information in the EHR. Deep learning emerged as the preferred machine learning approach in machine perception www.nature.com/npjdigitalmed
  • 126.
  • 128.
    Table 3 Listof variants identified as actionable by 3 different platforms Gene Variant Identified variant Identified associated drugs NYGC WGA FO NYGC WGA FO CDKN2A Deletion Yes Yes Yes Palbociclib, LY2835219 LEE001 Palbociclib LY2835219 Clinical trial CDKN2B Deletion Yes Yes Yes Palbociclib, LY2835219 LEE002 Palbociclib LY2835219 Clinical trial EGFR Gain (whole arm) Yes — — Cetuximab — — ERG Missense P114Q Yes Yes — RI-EIP RI-EIP — FGFR3 Missense L49V Yes VUS — TK-1258 — — MET Amplification Yes Yes Yes INC280 Crizotinib, cabozantinib Crizotinib, cabozantinib MET Frame shift R755fs Yes — — INC280 — — MET Exon skipping Yes — — INC280 — — NF1 Deletion Yes — — MEK162 — — NF1 Nonsense R461* Yes Yes Yes MEK162 MEK162, cobimetinib, trametinib, GDC-0994 Everolimus, temsirolimus, trametinib PIK3R1 Insertion R562_M563insI Yes Yes — BKM120 BKM120, LY3023414 — PTEN Loss (whole arm) Yes — — Everolimus, AZD2014 — — STAG2 Frame shift R1012 fs Yes Yes Yes Veliparib, clinical trial Olaparib — DNMT3A Splice site 2083-1G.C — — Yes — — — TERT Promoter-146C.T Yes — Yes — — — ABL2 Missense D716N Germline NA VUS mTOR Missense H1687R Germline NA VUS NPM1 Missense E169D Germline NA VUS NTRK1 Missense G18E Germline NA VUS PTCH1 Missense P1250R Germline NA VUS TSC1 Missense G1035S Germline NA VUS Abbreviations: FO 5 FoundationOne; NYGC 5 New York Genome Center; RNA-seq 5 RNA sequencing; WGA 5 Watson Genomic Analytics; WGS 5 whole- genome sequencing. Genes, variant description, and, where appropriate, candidate clinically relevant drugs are listed. Variants identified by the FO as variants of uncertain significance (VUS) were identified by the NYGC as germline variants.
  • 129.
    v ! 0 50 100 150 200 w/oAI w/ AI 0 50 100 150 200 w/o AI w/ AI 188m 154m 180m 108m saving 40% of time saving 18% of time 4 5   o p m q 2 (( q 92 !-(( q :2 ! ( q 2 !)0 $ , q 2 NMFG AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380. Digital Healthcare Institute Director,Yoon Sup Choi, PhD [email protected]  
  • 130.
    isolated diagnostic tasks.Underlying these exciting advances, however, is the important notion that these algorithms do not replace the breadth and contextual knowledge of human pathologists and that even the best algorithms would need to from 83% to 91% and resulted in higher overall diagnostic accuracy than that of either unassisted pathologist inter- pretation or the computer algorithm alone. Although deep learning algorithms have been credited with comparable Unassisted Assisted TimeofReview(seconds) Timeofreviewperimage(seconds) Negative ITC Micromet Macromet A B p=0.002 p=0.02 Unassisted Assisted Micrometastases FIGURE 5. Average review time per image decreases with assistance. A, Average review time per image across all pathologists analyzed by category. Black circles are average times with assistance, gray triangles represent average times without assistance. Error bars indicate 95% confidence interval. B, Micrometastasis time of review decreases for nearly all images with assistance. Circles represent average review time for each individual micrometastasis image, averaged across the 6 pathologists by assistance modality. The dashed lines connect the points corresponding to the same image with and without assistance. The 2 images that were not reviewed faster on average with assistance are represented with red dot-dash lines. Vertical lines of the box represent quartiles, and the diamond indicates the average of review time for micrometastases in that modality. Micromet indicates mi- crometastasis; macromet, macrometastasis. 8 | www.ajsp.com Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved. l o Q UYMSQ l AQSM UbQ UO YQ x p l UO YQ ( m p l G6 M QP GaY 6Q AQSM UbQ •
  • 132.
    referred from Dr.Eric Topol on Twitter
  • 134.
    Feedback/Questions • Email: [email protected]Blog: http://www.yoonsupchoi.com • Facebook: Yoon Sup Choi
  • 135.
  • 136.
  • 137.
    Supervised autonomous roboticsoft tissue surgery ;Y UcV nd FRe R @VR eY KjdeV z K Rce L ddfV 9fe fd J S e !KL9J Azad Shademan et al. Sci Transl Med 2016
  • 138.
    Azad Shademan etal. Sci Transl Med 2016 Supervised autonomous robotic soft tissue surgery
  • 139.
    Azad Shademan etal. Sci Transl Med 2016 l8d bUb UZ bUb QZP QZP FG4E q t !GH=F q…x ! RaRc dT aj$ D9H q J9K l q v !daRT X q t …x t ! VR acVddfcV q ! f SVc W deR Vd q v !T a Ve e V q x ! f V cVUfTe suture placement compared to other techniques (table S1). Moreover, leak pressure reflects the functional quality of suturing. The linear closure from STAR was able to withstand a higher average leak pressure than all other techniques (Fig. 2B). suturing tool maneuvers before piercing. Using the NIRF markers as reference points, the plan interpolated intermediate suture placements on the bowel and adjusted placement of each suture, knot, and corner slidetoaccommodatedeformationsandinducedscenerotations(Fig.1F). Fig. 2. Ex vivo linear suturing under deformations. The experiment con- sisted of closing a longitudinal cut along pig intestine, whereas the tissue was deformed by pulling on stay sutures. Five samples were tested per tech- nique (OPEN, LAP, RAS, and STAR). (A) Suture spacing. Central mark is the median; box edges are the 25th and 75th percentiles, error bars are the range excluding outliers, and red dots are outliers. The whiskers represent the range not including outliers. There is a different N number for each boxplot because eachsurgeonuseda different number of sutures [OPEN (n =174), LAP(n= 128), RAS (n = 176), and STAR (n = 206)]. These data are presented numerically in table S2, including the SDs. P values determined by ANOVA with post hoc Games-Howell. (B and C) Leak pressures and number of mistakes (reposi- tioned stitches or robot reboot). Data are from individual tissue samples (n = 5) with averages marked by a horizontal line. P values determined by independent samples t test. (D) Completion times separated into knot-tying and suturing, and other time was spent restaging or changing sutures. Data are averages (n = 5). P values determined by independent samples t test. www.ScienceTranslationalMedicine.org 4 May 2016 Vol 8 Issue 337 337ra64 3 nMay7,2016
  • 140.
    Azad Shademan etal. Sci Transl Med 2016 maining circumference (Fig. 3 ing that different levels of auto be used effectively for differ Overall, 57.8% of the procedu fully autonomously with no a Alternatively, in the current s autonomous mode without an teraction would require suture in 42.2% of sutures placed, m corners. The completion tim also included supervisory ac surgeon, which accounted f the total time (7% for suture justment, 3.3% for confirmati location, and 2.6% for mistake In vivo end-to-end anast Finally, we performed in vivo autonomous surgery in pig in cessed through a laparotomy (n = 4) and compared these a an OPEN control (n = 1) (fig used the same suture algorith ex vivo trials (Fig. 1, G and OPEN control, the surgeon us surgical hand tools to open th exposed the intestine, and sutu a transverse incision. The av STAR procedure time was 50.0 where 77.4% was anastomos 22.6% was restaging time be and front walls, which inclu 2.16 min for marking the tis B and E, and Table 1). Al OPEN timewasonly 8min,th was comparable to the averag laparoscopic anastomoses that 30 min for vesicourethral (25 for aortic (26), to 90 min for constructions (27). No complications were obs Fig. 3. End-to-end anastomosis ex vivo. The experiment consisted of closing a transverse cut in pig intestine. Five samples were tested per technique (OPEN, LAP, RAS, and STAR). (A) Suture spacing. Central mark is the median; box edges are the 25th and 75th percentiles; and red dots are outliers. The whiskers represent the range not including outliers. There is a different N number for each boxplot because each surgeon used a different number of sutures [OPEN (n = 138), LAP (n = 98), RAS (n = 132), and STAR (n = 180)]. The average spacing betweenconsecutive sutures was calculated and compared be- tween STAR and other modalities. The variance of suture spacing is presented numerically in table S2, including the SD. P values determined by ANOVA with post hoc Games-Howell. (B) Exvivo end-to-end anastomosis leak pressures. Dataareindividualtissuesamples,withmeansdisplayedashorizontallines(n=4to5).Onesample was sutured closed and thus could not be tested for leak pressure. P values determined by independent samples t test. (C) The leak pressure as a function of maximum suture spacing. Data are individual tissue samples that were fit to a rational function (y = 0.854/x) (n = 4 to 5). (D) Number of mistakes (repositioned stitches or robot reboot). Data are individual tissue samples with means displayed as horizontal lines (n = 5). P values determined by independent samples t test. (E) Ex vivo end-to-end anastomosis completion times. Average times for n = 5 tissue samples per procedure are divided into subtasks of knots and running sutures. “Other” time was spent restaging and changing sutures. Pvalues determined byindependent samplesttest.(F) Percentreductionin l8d bUb UZ bUb QZP QZP FG4E q t !GH=F q…x ! RaRc dT aj$ D9H q J9K l q v !daRT X q t …x t ! VR acVddfcV q ! f SVc W deR Vd q v !T a Ve e V q x ! f V cVUfTe
  • 141.
  • 142.
  • 143.
  • 145.
    BeyondVerbal q t w7 q 2 ' ' w q t q 9Ve R () q . q
  • 146.
    • linguistic • identificationand extraction of word instances (unigrams) and word-pair instances (bi-grams) from the transcriptions • acoustic • vocal dynamics • voice quality • vocal tract resonance frequencies • pause lengths A Machine Learning Approach to Identifying the Thought Markers of Suicidal Subjects:A Prospective Multicenter Trial • “Do you have hope?” • “Do you have any fear?” • “Do you have any secrets?” • “Are you angry?” • “Does it hurt emotionally?” Pestian, Suicide and Life-Threatening Behavior, 2016
  • 147.
    A Machine LearningApproach to Identifying the Thought Markers of Suicidal Subjects:A Prospective Multicenter Trial SensitivitySensitivity 1.00.00.20.40.60.81.00.00.20.40.60.81.0 1.0 0.8 0.6 0.4 0.2 0.0 Specificity 1.0 0.8 0.6 0.4 0.2 0.0 Specificity SUICIDE THOUGHT MARKERS SensitivitySensitivitySensitivity 0.00.20.40.60.81.00.00.20.40.60.81.00.00.20.40.60.81.0 1.0 0.8 0.6 0.4 0.2 0.0 Specificity 1.0 0.8 0.6 0.4 0.2 0.0 Specificity 1.0 0.8 0.6 0.4 0.2 0.0 Specificity Figure 1. Receiver operator curve (ROC): suicide versus control (upper), suicide versus mentally ill (middle), and SensitivitySensitivity 0.00.20.40.60.81.00.00.20.40.60.81.0 1.0 0.8 0.6 0.4 0.2 0.0 Specificity 1.0 0.8 0.6 0.4 0.2 0.0 Specificity Figure 1. Receiver operator curve (ROC): suicide versus control (upper), suicide versus mentally suicide versus mentally ill with control. The ROC curves for adolescents (blue), adults (red), and a generated where the nonsuicidal population is controls (top), mentally ill (middle), and mentally using linguistic and acoustic features. The gray line is the AROC curve for a baseline (random) cla TABLE 2 The AROC for the Machine Learning Algorithm. The Nonsuicidal Group Comprises of Either Mentally Ill and Control Subjects. Classification Performances are Shown for Adolescents, Adults, and the Combined Adolescent and Adult Cohorts Suicidal versus Controls Suicidal versus Mentally Ill Suicidal versus Mentally Ill and Controls Adolescents ROC (SD) Adults ROC (SD) Adolescents + Adults ROC (SD) Adolescents ROC (SD) Adults ROC (SD) Adolescents + Adults ROC (SD) Adolescents ROC (SD) Adults ROC (SD) Adolescents + Adults ROC (SD) Linguistics 0.87 (0.04) 0.91 (0.02) 0.93 (0.02) 0.82 (0.05) 0.77 (0.04) 0.79 (0.03) 0.82 (0.04) 0.84 (0.03) 0.87 (0.02) Acoustics 0.74 (0.05) 0.82 (0.03) 0.79 (0.03) 0.69 (0.06) 0.74 (0.04) 0.76 (0.03) 0.74 (0.05) 0.80 (0.03) 0.76 (0.03) Linguistics + Acoustics 0.83 (0.05) 0.93 (0.02) 0.92 (0.02) 0.80 (0.05) 0.77 (0.04) 0.82 (0.03) 0.81 (0.04) 0.84 (0.03) 0.87 (0.02) PESTIANETAL. Suicidal vs. Control Suicidal vs. Mentally Ill Suicidal vs. Mentally Ill and Controls adolescents adults Pestian, Suicide and Life-Threatening Behavior, 2016
  • 148.
    l q $ $w q aY X TR WVRefcV qY XY V VcXj h cU2 x ' t
 q ,(. q ) 
 l/ QOU U Z m
  • 149.
    Detection of theProdromal Phase of Bipolar Disorder from Psychological and Phonological Aspects in Social Media Yen-Hao Huang National Tsing Hua University Hsinchu, Taiwan [email protected] Lin-Hung Wei National Tsing Hua University Hsinchu, Taiwan [email protected] Yi-Shin Chen National Tsing Hua University Hsinchu, Taiwan [email protected] ABSTRACT Seven out of ten people with bipolar disorder are initially misdiagnosed and thirty percent of individuals with bipolar disorder will commit suicide. Identifying the early phases of the disorder is one of the key components for reducing the full development of the disorder. In this study, we aim at leveraging the data from social media to design predictive models, which utilize the psychological and phonological fea- tures, to determine the onset period of bipolar disorder and provide insights on its prodrome. This study makes these dis- coveries possible by employing a novel data collection process, coined as Time-specific Subconscious Crowdsourcing, which helps collect a reliable dataset that supplements diagnosis information from people suffering from bipolar disorder. Our experimental results demonstrate that the proposed models could greatly contribute to the regular assessments of people with bipolar disorder, which is important in the primary care setting. KEYWORDS Bipolar Disorder Detection, Mental Disorder, Prodromal Phrase, Emotion Analysis, Sentiment Analysis, Phonology, Social Media 1 INTRODUCTION Bipolar disorder (BD) is a common mental illness charac- terized by recurrent episodes of mania/hypomania and de- pression, which is found among all ages, races, ethnic groups and social classes. The regular assessment of people with BD is an important part of its treatment, though it may be very time-consuming [21]. There are many beneficial treat- ments for the patients, particularly for delaying relapses. The identification of early symptoms is significant for allowing early intervention and reducing the multiple adverse conse- quences of a full-blown episode. Despite the importance of the detection of prodromal symptoms, there are very few studies that have actually examined the ability of relatives to detect these symptoms in BD patients. [20] For the purpose of early treatment, the challenge leads to: how to identify the prodrome period of BD. Current studies are thus aimed at detecting prodromes and analyzing the prodromal symptoms of manic recurrence in clinics. With regards to the symptom of social isolation, people are increasingly turning to popular social media, such as Facebook and Twitter, to share their illness experiences or seek advice from others with similar mental health conditions. As the information is being shared in public, people are subconsciously providing rich contents about their states of mind. In this paper, we refer to this sharing and data collection as time-specific subconscious crowdsourcing. In this study, we carefully look at patients who have been diagnosed with BD and who explicitly indicate the diagnosis and time of diagnosis on Twitter. Our goal is to both predict whether BD rises on a given period of time, and to discover the prodromal period for BD. It’s important to clarify that our goal doesn’t seek to offer a diagnosis but rather to make a prediction of which users are likely to be suffering from the BD. The main contributions of our work are: • Introducing the concept of time-specific subconscious crowdsourcing, which can aid in locating the social network behavior data of BD patients with the corre- sponding time of diagnosis. • A BD assessment mechanism that differentiates be- tween prodromal symptoms and acute symptoms. • Introducing the phonological features into the assess- ment mechanism, which allows for the possibility to assess patients through text only. • An automatic recognition approach that detects the possible prodromal period for BD. 2 RELATED WORK Social media resources have been widely utilized by researchers to study mental health issues. The following literature em- phasizes on data collection and feature engineering, including subject recruitment, manual data collection, data collection applications, keyword matching, and combined approaches. The clinical approach for mental disorders and prodrome studies are also discussed in this section. Subject recruitment: Based on customized question- naires and contact with subjects, Park et al. [15] recruited participants for the Center for Epidemiologic Studies Depres- sion scale(CES-D) [17] and provided their Twitter data. By analyzing the information contained in tweets, participants were divided into normal and depressive groups based on their scores on CES-D. An approach like this one requires ex- pensive costs to acquire data and conduct the questionnaire. Manual and automatic data collecting: Moreno et al. [14] collected data via the Facebook profiles of college stu- dents reviewed by two investigators. They aimed at revealing the relationship between demographic factors and depression. Similarly, in our work, we invest on manual efforts to collect and properly annotate our dataset. In addition, there are many applications built on top of social networks that provide free services where users may need to input their credentials arXiv:1712.09183v1[cs.IR]26Dec2017
  • 150.
    Detection of theProdromal Phase of Bipolar Disorder from Psychological and Phonological Aspects in Social Media Yen-Hao Huang National Tsing Hua University Hsinchu, Taiwan [email protected] Lin-Hung Wei National Tsing Hua University Hsinchu, Taiwan [email protected] Yi-Shin Chen National Tsing Hua University Hsinchu, Taiwan [email protected] ABSTRACT Seven out of ten people with bipolar disorder are initially misdiagnosed and thirty percent of individuals with bipolar disorder will commit suicide. Identifying the early phases of the disorder is one of the key components for reducing the full development of the disorder. In this study, we aim at leveraging the data from social media to design predictive models, which utilize the psychological and phonological fea- tures, to determine the onset period of bipolar disorder and provide insights on its prodrome. This study makes these dis- coveries possible by employing a novel data collection process, coined as Time-specific Subconscious Crowdsourcing, which helps collect a reliable dataset that supplements diagnosis information from people suffering from bipolar disorder. Our experimental results demonstrate that the proposed models could greatly contribute to the regular assessments of people with bipolar disorder, which is important in the primary care setting. KEYWORDS Bipolar Disorder Detection, Mental Disorder, Prodromal Phrase, Emotion Analysis, Sentiment Analysis, Phonology, Social Media 1 INTRODUCTION Bipolar disorder (BD) is a common mental illness charac- terized by recurrent episodes of mania/hypomania and de- pression, which is found among all ages, races, ethnic groups and social classes. The regular assessment of people with BD is an important part of its treatment, though it may be very time-consuming [21]. There are many beneficial treat- ments for the patients, particularly for delaying relapses. The identification of early symptoms is significant for allowing early intervention and reducing the multiple adverse conse- quences of a full-blown episode. Despite the importance of the detection of prodromal symptoms, there are very few studies that have actually examined the ability of relatives to detect these symptoms in BD patients. [20] For the purpose of early treatment, the challenge leads to: how to identify the prodrome period of BD. Current studies are thus aimed at detecting prodromes and analyzing the prodromal symptoms of manic recurrence in clinics. With regards to the symptom of social isolation, people are increasingly turning to popular social media, such as Facebook and Twitter, to share their illness experiences or seek advice from others with similar mental health conditions. As the information is being shared in public, people are subconsciously providing rich contents about their states of mind. In this paper, we refer to this sharing and data collection as time-specific subconscious crowdsourcing. In this study, we carefully look at patients who have been diagnosed with BD and who explicitly indicate the diagnosis and time of diagnosis on Twitter. Our goal is to both predict whether BD rises on a given period of time, and to discover the prodromal period for BD. It’s important to clarify that our goal doesn’t seek to offer a diagnosis but rather to make a prediction of which users are likely to be suffering from the BD. The main contributions of our work are: • Introducing the concept of time-specific subconscious crowdsourcing, which can aid in locating the social network behavior data of BD patients with the corre- sponding time of diagnosis. • A BD assessment mechanism that differentiates be- tween prodromal symptoms and acute symptoms. • Introducing the phonological features into the assess- ment mechanism, which allows for the possibility to assess patients through text only. • An automatic recognition approach that detects the possible prodromal period for BD. 2 RELATED WORK Social media resources have been widely utilized by researchers to study mental health issues. The following literature em- phasizes on data collection and feature engineering, including subject recruitment, manual data collection, data collection applications, keyword matching, and combined approaches. The clinical approach for mental disorders and prodrome studies are also discussed in this section. Subject recruitment: Based on customized question- naires and contact with subjects, Park et al. [15] recruited participants for the Center for Epidemiologic Studies Depres- sion scale(CES-D) [17] and provided their Twitter data. By analyzing the information contained in tweets, participants were divided into normal and depressive groups based on their scores on CES-D. An approach like this one requires ex- pensive costs to acquire data and conduct the questionnaire. Manual and automatic data collecting: Moreno et al. [14] collected data via the Facebook profiles of college stu- dents reviewed by two investigators. They aimed at revealing the relationship between demographic factors and depression. Similarly, in our work, we invest on manual efforts to collect and properly annotate our dataset. In addition, there are many applications built on top of social networks that provide free services where users may need to input their credentials arXiv:1712.09183v1[cs.IR]26Dec2017 Wordcloud Features(#DIM) 2 mths 3 mths 6 mths 9 mths 12 mths AG(2) 0.475 0.503 0.445 0.434 0.383 Pol(5) 0.911 0.893 0.843 0.836 0.803 Emot(8) 0.893 0.895 0.908 0.917 0.896 Soc(4) 0.941 0.913 0.845 0.834 0.786 LT(1) 0.645 0.589 0.554 0.504 0.513 TRD(1) 0.570 0.638 0.626 0.615 0.654 Phon(8) 0.889 0.880 0.802 0.838 0.821 Table 2: Average Precision of Single Feature Perfor- mance Age and Gender Mood Polarity Features Emotional Score Social Feature Late Tweet Frequency Tweet Rate Difference Phonological Feature Diagnosed time ! months = 2 months Figure 1: Illness Period Modeling features are introduced: (1) Word-level features and BD Pattern of Life features. 3.4.1 Word-level Features. With respect to the linguis- tic features for BD, the Character n-gram language fea- tures(CLF) and LIWC metrics are designed to capture it. The CLF utilizes n-grams to measure the comment words or phrases used by users. The tf-idf is utilized in our score- calculating method, the tf is the frequency of an n-gram and the document d of df is defined as each particular twitter user k. The formula for the tf-idf is thus given as: tfidf (k,⌧,↵) vn = freq (k,⌧,↵) vn ⇥ log K 1 + freq (K,⌧,↵) vn (1) The freq (k) vn is the frequency of n-gram vn , which is n 2 {1, 2} to represent psychological features, su terns and the behavioral tendency o polarity, emotion, and social interacti full BDPLF, there are five categories: • Age and Gender: Sit et al. [2 effects on BD, indicating that wom likely to have Bipolar Disorder than men. We make use of the ag proposed by Sap et al. [19], whic social media. • Mood Polarity Features: Ow BD patients experience rapid mo analysis is firstly adapted to obt larity portrayed by each user’s t the sentiment of tweets, the onlin used, based on Go et al.’s work [ the contents of tweets into thre positive, negative, and neutral. those three categories into five d positive ratio, negative ratio, pos combo, and flips ratio. • Emotional Scores: Beyond th tion detection tool proposed by employed to classify the tweets in gories: joy, surprise, anticipation, anger, and fear. The emotion cla further transformed into emotio esei, (k) ⌧,↵ = ei, (k) ⌧,↵ ecount s RQM a Q QOU U Z
  • 151.
    “A lot ofSyrian refugees have trauma and maybe this can help them overcome that.” However, he points out that there is a stigma around psychotherapy, saying people feel shame about seeking out psychologists. As a result he thinks people might feel more comfortable knowing they are talking to a “robot” than to a human.
  • 152.
    http://www.newyorker.com/tech/elements/the-chatbot-will-see-you-now q +', tw$ w$ q t q P 9A $ CRc qV e %cVT X e R X c eY q $ $ $ $ ! $ q $ $ 
 q Rg U Ka VXV q 3 q $ w
  • 153.
    l J QN qV eR YVR eY t ! q 9 UcVh FX
  • 154.
    l J QN q$ x q ! $ MA q FDH !
  • 155.
    l J QN q$ x q ! $ MA q FDH !
  • 156.
    l J QN q$ x q ! $ MA q FDH !
  • 157.
    Original Paper Delivering CognitiveBehavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial Kathleen Kara Fitzpatrick1* , PhD; Alison Darcy2* , PhD; Molly Vierhile1 , BA 1 Stanford School of Medicine, Department of Psychiatry and Behavioral Sciences, Stanford, CA, United States 2 Woebot Labs Inc., San Francisco, CA, United States * these authors contributed equally Corresponding Author: Alison Darcy, PhD Woebot Labs Inc. 55 Fair Avenue San Francisco, CA, 94110 United States Email: [email protected] Abstract Background: Web-based cognitive-behavioral therapeutic (CBT) apps have demonstrated efficacy but are characterized by poor adherence. Conversational agents may offer a convenient, engaging way of getting support at any time. Objective: The objective of the study was to determine the feasibility, acceptability, and preliminary efficacy of a fully automated conversational agent to deliver a self-help program for college students who self-identify as having symptoms of anxiety and depression. Methods: In an unblinded trial, 70 individuals age 18-28 years were recruited online from a university community social media site and were randomized to receive either 2 weeks (up to 20 sessions) of self-help content derived from CBT principles in a conversational format with a text-based conversational agent (Woebot) (n=34) or were directed to the National Institute of Mental Health ebook, “Depression in College Students,” as an information-only control group (n=36). All participants completed Web-based versions of the 9-item Patient Health Questionnaire (PHQ-9), the 7-item Generalized Anxiety Disorder scale (GAD-7), and the Positive and Negative Affect Scale at baseline and 2-3 weeks later (T2). Results: Participants were on average 22.2 years old (SD 2.33), 67% female (47/70), mostly non-Hispanic (93%, 54/58), and Caucasian (79%, 46/58). Participants in the Woebot group engaged with the conversational agent an average of 12.14 (SD 2.23) times over the study period. No significant differences existed between the groups at baseline, and 83% (58/70) of participants provided data at T2 (17% attrition). Intent-to-treat univariate analysis of covariance revealed a significant group difference on depression such that those in the Woebot group significantly reduced their symptoms of depression over the study period as measured by the PHQ-9 (F=6.47; P=.01) while those in the information control group did not. In an analysis of completers, participants in both groups significantly reduced anxiety as measured by the GAD-7 (F1,54= 9.24; P=.004). Participants’ comments suggest that process factors were more influential on their acceptability of the program than content factors mirroring traditional therapy. Conclusions: Conversational agents appear to be a feasible, engaging, and effective way to deliver CBT. (JMIR Ment Health 2017;4(2):e19) doi:10.2196/mental.7785 KEYWORDS conversational agents; mobile mental health; mental health; chatbots; depression; anxiety; college students; digital health Introduction Up to 74% of mental health diagnoses have their first onset particularly common among college students, with more than half reporting symptoms of anxiety and depression in the previous year that were so severe they had difficulty functioning Fitzpatrick et alJMIR MENTAL HEALTH
  • 158.
    depression at baselineas measured by the PHQ-9, while three-quarters (74%, 52/70) were in the severe range for anxiety as measured by the GAD-7. Figure 1. Participant recruitment flow. Table 1. Demographic and clinical variables of participants at baseline. WoebotInformation control Scale, mean (SD) 14.30 (6.65)13.25 (5.17)Depression (PHQ-9) 18.05 (5.89)19.02 (4.27)Anxiety (GAD-7) 25.54 (9.58)26.19 (8.37)Positive affect 24.87 (8.13)28.74 (8.92)Negative affect 22.58 (2.38)21.83 (2.24)Age, mean (SD) Gender, n (%) 7 (21)4 (7)Male 27 (79)20 (55)Female Ethnicity, n (%) 2 (6)2 (8)Latino/Hispanic 32 (94)22 (92)Non-Latino/Hispanic 28 (82)18 (75)Caucasian Fitzpatrick et alJMIR MENTAL HEALTH Delivering Cognitive Behavior Therapy toYoung Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot):A Randomized Controlled Trial q u dV W%YV a l 0 RQM UNU U e MOOQ MNU U e Q UYUZM e QRRUOMOe l - ( l‘ J QN 0 ) l UZR YM U Z Z e 0 ) qGfT V2 H@I%1$ ?9%/
  • 159.
    d cPFWoebotInformation-only control 95%CIb T2a 95% CIb T2a 0.44.0176.039.74-12.3211.14 (0.71)12.07-15.2713.67 (.81)PHQ-9 0.14.5810.3816.16-18.1317.35 (0.60)15.52-18.5616.84 (.67)GAD-7 0.02.7070.1724.35-29.4126.88 (1.29)23.17-28.8626.02 (1.45)PANAS positive affect 0.344.9120.9123.54-28.4225.98 (1.24)24.73-30.3227.53 (1.42)PANAS nega- tive affect a Baseline=pooled mean (standard error) b 95% confidence interval. c Cohen d shown for between-subjects effects using means and standard errors at Time 2. Figure 2. Change in mean depression (PHQ-9) score by group over the study period. Error bars represent standard error. Preliminary Efficacy Table 2 shows the results of the primary ITT analyses conducted on the entire sample. Univariate ANCOVA revealed a significant treatment effect on depression revealing that those in the Woebot group significantly reduced PHQ-9 score while those in the information control group did not (F1,48=6.03; P=.017) (see Figure 2). This represented a moderate between-groups effect size (d=0.44). This effect is robust after Bonferroni correction for multiple comparisons (P=.04). No other significant between-group differences were observed on anxiety or affect. Completer Analysis As a secondary analysis, to explore whether any main effects existed, 2x2 repeated measures ANOVAs were conducted on the primary outcome variables (with the exception of PHQ-9) among completers only. A significant main effect was observed on GAD-7 (F1,54=9.24; P=.004) suggesting that completers experienced a significant reduction in symptoms of anxiety between baseline and T2, regardless of the group to which they were assigned with a within-subjects effect size of d=0.37. No main effects were observed for positive (F1,50=.001; P=.951; d=0.21) or negative affect (F1,50=.06; P=.80; d=0.003) as measured by the PANAS. To further elucidate the source and magnitude of change in depression, repeated measures dependent t tests were conducted and Cohen d effect sizes were calculated on individual items of the PHQ-9 among those in the Woebot condition. The analysis revealed that baseline-T2 changes were observed on the following items in order of decreasing magnitude: motoric symptoms (d=2.09), appetite (d=0.65), little interest or pleasure in things (d=0.44), feeling bad about self (d=0.40), and concentration (d=0.39), and suicidal thoughts (d=0.30), feeling down (d=0.14), sleep (d=0.12), and energy (d=0.06). JMIR Ment Health 2017 | vol. 4 | iss. 2 | e19 | p.6http://mental.jmir.org/2017/2/e19/ (page number not for citation purposes) XSL•FO RenderX Change in mean depression (PHQ-9) score by group over the study period l q ) ), l USZURUOMZ S a gPURRQ QZOQ lJ QN C D / p m q w q w t !?9%/
  • 160.
    lJ QN (- l + l (Y s )s l( . UBF 
 l( . ) J QN .Y 4 lAQc 8Z Q U Q 4 OUM Q A84 m 4ZP Qc AS 4 9aZP l4 9aZP0 4ZP Qc AS m 4 -+
  • 161.
    G.M. Lucas etal. / Computers in Human Behavior 37 (2014) 94–100 m 3
  • 162.
  • 163.
  • 164.
    It’s only acomputer: Virtual humans increase willingness to disclose G.M. Lucas et al. / Computers in Human Behavior 37 (2014) 94–100 !T afeVc WcR V !Yf R WcR V !9A !LV V% aVcReVU Method Frame
  • 165.
    It’s only acomputer: Virtual humans increase willingness to disclose G.M. Lucas et al. / Computers in Human Behavior 37 (2014) 94–100 ‘‘How close are you to your family?’’ ‘‘Tell me about a situation that you wish you had handled differently.’’ ‘‘Tell me about an event, or something that you wish you could erase from your memory.’’ ‘‘Tell me about the hardest decision you’ve ever had to make.’’ ‘‘Tell me about the last time you felt really happy.’’ ‘‘What are you most proud of in your life?’’ ‘‘What’s something you feel guilty about?’’ ‘‘When was the last time you argued with someone and what was it about?’’
  • 166.
    It’s only acomputer: Virtual humans increase willingness to disclose G.M. Lucas et al. / Computers in Human Behavior 37 (2014) 94–100 0 5 10 15 20 Computer frame Human frame 0 15.25 30.5 45.75 61 Computer frame Human frame 0 0.033 0.065 0.098 0.13 Computer frame Human frame 0 0.3 0.6 0.9 1.2 Computer frame Human frame Fear of Self-disclosure Impression Management Sadness Displays Willingness to Disclosure
  • 167.
    It’s only acomputer: Virtual humans increase willingness to disclose G.M. Lucas et al. / Computers in Human Behavior 37 (2014) 94–100 ‘‘This is way better than talking to a person. I don’t really feel comfortable talking about personal stuff to other people.’’ ‘‘A human being would be judgmental. I shared a lot of personal things, and it was because of that.’’
  • 168.
  • 170.
    lj OMPU bM Oa M QbQZ m w mk q 2 +/0$ -. q • l 466%4 4 m m qJR U W cVde3 D X de T cVXcVdd 3 ?cRU V e S dde X3 FVfcR Veh c
  • 171.
    Can machine-learning improvecardiovascular risk prediction using routine clinical data? Stephen F.Weng et al PLoS One 2017 in a sensitivity of 62.7% and PPV of 17.1%. The random forest algorithm resulted in a net increase of 191 CVD cases from the baseline model, increasing the sensitivity to 65.3% and PPV to 17.8% while logistic regression resulted in a net increase of 324 CVD cases (sensitivity 67.1%; PPV 18.3%). Gradient boosting machines and neural networks performed best, result- ing in a net increase of 354 (sensitivity 67.5%; PPV 18.4%) and 355 CVD (sensitivity 67.5%; PPV 18.4%) cases correctly predicted, respectively. The ACC/AHA baseline model correctly predicted 53,106 non-cases from 75,585 total non- cases, resulting in a specificity of 70.3% and NPV of 95.1%. The net increase in non-cases Table 3. Top 10 risk factor variables for CVD algorithms listed in descending order of coefficient effect size (ACC/AHA; logistic regression), weighting (neural networks), or selection frequency (random forest, gradient boosting machines). Algorithms were derived from training cohort of 295,267 patients. ACC/AHA Algorithm Machine-learning Algorithms Men Women ML: Logistic Regression ML: Random Forest ML: Gradient Boosting Machines ML: Neural Networks Age Age Ethnicity Age Age Atrial Fibrillation Total Cholesterol HDL Cholesterol Age Gender Gender Ethnicity HDL Cholesterol Total Cholesterol SES: Townsend Deprivation Index Ethnicity Ethnicity Oral Corticosteroid Prescribed Smoking Smoking Gender Smoking Smoking Age Age x Total Cholesterol Age x HDL Cholesterol Smoking HDL cholesterol HDL cholesterol Severe Mental Illness Treated Systolic Blood Pressure Age x Total Cholesterol Atrial Fibrillation HbA1c Triglycerides SES: Townsend Deprivation Index Age x Smoking Treated Systolic Blood Pressure Chronic Kidney Disease Triglycerides Total Cholesterol Chronic Kidney Disease Age x HDL Cholesterol Untreated Systolic Blood Pressure Rheumatoid Arthritis SES: Townsend Deprivation Index HbA1c BMI missing Untreated Systolic Blood Pressure Age x Smoking Family history of premature CHD BMI Systolic Blood Pressure Smoking Diabetes Diabetes COPD Total Cholesterol SES: Townsend Deprivation Index Gender Italics: Protective Factors https://doi.org/10.1371/journal.pone.0174944.t003 PLOS ONE | https://doi.org/10.1371/journal.pone.0174944 April 4, 2017 8 / 14 q 9;;'9@9 t q $l RSVeVd l l q $ y q;GH$ dVgVcV V eR Vdd$ acVdTc S X W cR T ce T deVc Ud qec X jTVc UV VgV
  • 172.
    Can machine-learning improvecardiovascular risk prediction using routine clinical data? Stephen F.Weng et al PLoS One 2017 correctly predicted compared to the baseline ACC/AHA model ranged from 191 non-cases for the random forest algorithm to 355 non-cases for the neural networks. Full details on classifi- cation analysis can be found in S2 Table. Discussion Compared to an established AHA/ACC risk prediction algorithm, we found all machine- learning algorithms tested were better at identifying individuals who will develop CVD and those that will not. Unlike established approaches to risk prediction, the machine-learning methods used were not limited to a small set of risk factors, and incorporated more pre-exist- Table 4. Performance of the machine-learning (ML) algorithms predicting 10-year cardiovascular disease (CVD) risk derived from applying train- ing algorithms on the validation cohort of 82,989 patients. Higher c-statistics results in better algorithm discrimination. The baseline (BL) ACC/AHA 10-year risk prediction algorithm is provided for comparative purposes. Algorithms AUC c-statistic Standard Error* 95% Confidence Interval Absolute Change from Baseline LCL UCL BL: ACC/AHA 0.728 0.002 0.723 0.735 — ML: Random Forest 0.745 0.003 0.739 0.750 +1.7% ML: Logistic Regression 0.760 0.003 0.755 0.766 +3.2% ML: Gradient Boosting Machines 0.761 0.002 0.755 0.766 +3.3% ML: Neural Networks 0.764 0.002 0.759 0.769 +3.6% *Standard error estimated by jack-knife procedure [30] https://doi.org/10.1371/journal.pone.0174944.t004 Can machine-learning improve cardiovascular risk prediction using routine clinical data? q t 9;;'9@9 t qFVfcR FVeh c d 9M;5(/., t qo +-- t TRcU gRdTf Rc VgV e p qVVa DVRc X q?V Ve T W c Re t c d WRTe c
  • 173.
    Constructing higher-level contextual/relational features: Relationshipsbetween epithelial nuclear neighbors Relationships between morphologically regular and irregular nuclei Relationships between epithelial and stromal objects Relationships between epithelial nuclei and cytoplasm Characteristics of stromal nuclei and stromal matrix Characteristics of epithelial nuclei and epithelial cytoplasm Building an epithelial/stromal classifier: Epithelial vs.stroma classifier Epithelial vs.stroma classifier B Basic image processing and feature construction: HE image Image broken into superpixels Nuclei identified within each superpixel A Relationships of contiguous epithelial regions with underlying nuclear objects Learning an image-based model to predict survival Processed images from patients Processed images from patients C D onNovember17,2011stm.sciencemag.orgwnloadedfrom TMAs contain 0.6-mm-diameter cores (median of two cores per case) that represent only a small sample of the full tumor. We acquired data from two separate and independent cohorts: Nether- lands Cancer Institute (NKI; 248 patients) and Vancouver General Hospital (VGH; 328 patients). Unlike previous work in cancer morphom- etry (18–21), our image analysis pipeline was not limited to a predefined set of morphometric features selected by pathologists. Rather, C-Path measures an extensive, quantitative feature set from the breast cancer epithelium and the stro- ma (Fig. 1). Our image processing system first performed an automated, hierarchical scene seg- mentation that generated thousands of measure- ments, including both standard morphometric descriptors of image objects and higher-level contextual, relational, and global image features. The pipeline consisted of three stages (Fig. 1, A to C, and tables S8 and S9). First, we used a set of processing steps to separate the tissue from the background, partition the image into small regions of coherent appearance known as superpixels, find nuclei within the superpixels, and construct Constructing higher-level contextual/relational features: Relationships between epithelial nuclear neighbors Relationships between morphologically regular and irregular nuclei Relationships between epithelial and stromal objects Relationships between epithelial nuclei and cytoplasm Characteristics of stromal nuclei and stromal matrix Characteristics of epithelial nuclei and epithelial cytoplasm Epithelial vs.stroma classifier Epithelial vs.stroma classifier Relationships of contiguous epithelial regions with underlying nuclear objects Learning an image-based model to predict survival Processed images from patients alive at 5 years Processed images from patients deceased at 5 years L1-regularized logisticregression modelbuilding 5YS predictive model Unlabeled images Time P(survival) C D Identification of novel prognostically important morphologic features basic cellular morphologic properties (epithelial reg- ular nuclei = red; epithelial atypical nuclei = pale blue; epithelial cytoplasm = purple; stromal matrix = green; stromal round nuclei = dark green; stromal spindled nuclei = teal blue; unclassified regions = dark gray; spindled nuclei in unclassified regions = yellow; round nuclei in unclassified regions = gray; background = white). (Left panel) After the classification of each image object, a rich feature set is constructed. (D) Learning an image-based model to predict survival. Processed images from patients alive at 5 years after surgery and from patients deceased at 5 years after surgery were used to construct an image-based prog- nostic model. After construction of the model, it was applied to a test set of breast cancer images (not used in model building) to classify patients as high or low risk of death by 5 years. www.ScienceTranslationalMedicine.org 9 November 2011 Vol 3 Issue 108 108ra113 2 onNovember17,2011stm.sciencemag.orgDownloadedfrom Digital Pathologist Sci Transl Med. 2011 Nov 9;3(108):108ra113
  • 174.
    Digital Pathologist Sci TranslMed. 2011 Nov 9;3(108):108ra113 Top stromal features associated with survival. primarily characterizing epithelial nuclear characteristics, such as size, color, and texture (21, 36). In contrast, after initial filtering of im- ages to ensure high-quality TMA images and training of the C-Path models using expert-derived image annotations (epithelium and stroma labels to build the epithelial-stromal classifier and survival time and survival status to build the prognostic model), our image analysis system is automated with no manual steps, which greatly in- creases its scalability. Additionally, in contrast to previous approaches, our system measures thousands of morphologic descriptors of diverse identification of prognostic features whose significance was not pre- viously recognized. Using our system, we built an image-based prognostic model on the NKI data set and showed that in this patient cohort the model was a strong predictor of survival and provided significant additional prognostic information to clinical, molecular, and pathological prog- nostic factors in a multivariate model. We also demonstrated that the image-based prognostic model, built using the NKI data set, is a strong prognostic factor on another, independent data set with very different SD of the ratio of the pixel intensity SD to the mean intensity for pixels within a ring of the center of epithelial nuclei A The sum of the number of unclassified objects SD of the maximum blue pixel value for atypical epithelial nuclei Maximum distance between atypical epithelial nuclei B C D Maximum value of the minimum green pixel intensity value in epithelial contiguous regions Minimum elliptic fit of epithelial contiguous regions SD of distance between epithelial cytoplasmic and nuclear objects Average border between epithelial cytoplasmic objects E F G H Fig. 5. Top epithelial features. The eight panels in the figure (A to H) each shows one of the top-ranking epithelial features from the bootstrap anal- ysis. Left panels, improved prognosis; right panels, worse prognosis. (A) SD of the (SD of intensity/mean intensity) for pixels within a ring of the center of epithelial nuclei. Left, relatively consistent nuclear intensity pattern (low score); right, great nuclear intensity diversity (high score). (B) Sum of the number of unclassified objects. Red, epithelial regions; green, stromal re- gions; no overlaid color, unclassified region. Left, few unclassified objects (low score); right, higher number of unclassified objects (high score). (C) SD of the maximum blue pixel value for atypical epithelial nuclei. Left, high score; right, low score. (D) Maximum distance between atypical epithe- lial nuclei. Left, high score; right, low score. (Insets) Red, atypical epithelial nuclei; black, typical epithelial nuclei. (E) Minimum elliptic fit of epithelial contiguous regions. Left, high score; right, low score. (F) SD of distance between epithelial cytoplasmic and nuclear objects. Left, high score; right, low score. (G) Average border between epithelial cytoplasmic objects. Left, high score; right, low score. (H) Maximum value of the minimum green pixel intensity value in epithelial contiguous regions. Left, low score indi- cating black pixels within epithelial region; right, higher score indicating presence of epithelial regions lacking black pixels. onNovember17,2011stm.sciencemag.orgDownloadedfrom and stromal matrix throughout the image, with thin cords of epithe- lial cells infiltrating through stroma across the image, so that each stromal matrix region borders a relatively constant proportion of ep- ithelial and stromal regions. The stromal feature with the second largest coefficient (Fig. 4B) was the sum of the minimum green in- tensity value of stromal-contiguous regions. This feature received a value of zero when stromal regions contained dark pixels (such as inflammatory nuclei). The feature received a positive value when stromal objects were devoid of dark pixels. This feature provided in- formation about the relationship between stromal cellular composi- tion and prognosis and suggested that the presence of inflammatory cells in the stroma is associated with poor prognosis, a finding con- sistent with previous observations (32). The third most significant stromal feature (Fig. 4C) was a measure of the relative border between spindled stromal nuclei to round stromal nuclei, with an increased rel- ative border of spindled stromal nuclei to round stromal nuclei asso- ciated with worse overall survival. Although the biological underpinning of this morphologic feature is currently not known, this analysis sug- gested that spatial relationships between different populations of stro- mal cell types are associated with breast cancer progression. Reproducibility of C-Path 5YS model predictions on samples with multiple TMA cores For the C-Path 5YS model (which was trained on the full NKI data set), we assessed the intrapatient agreement of model predictions when predictions were made separately on each image contributed by pa- tients in the VGH data set. For the 190 VGH patients who contributed two images with complete image data, the binary predictions (high or low risk) on the individual images agreed with each other for 69% (131 of 190) of the cases and agreed with the prediction on the aver- aged data for 84% (319 of 380) of the images. Using the continuous prediction score (which ranged from 0 to 100), the median of the ab- solute difference in prediction score among the patients with replicate images was 5%, and the Spearman correlation among replicates was 0.27 (P = 0.0002) (fig. S3). This degree of intrapatient agreement is only moderate, and these findings suggest significant intrapatient tumor heterogeneity, which is a cardinal feature of breast carcinomas (33–35). Qualitative visual inspection of images receiving discordant scores suggested that intrapatient variability in both the epithelial and the stromal components is likely to contribute to discordant scores for the individual images. These differences appeared to relate both to the proportions of the epithelium and stroma and to the appearance of the epithelium and stroma. Last, we sought to analyze whether sur- vival predictions were more accurate on the VGH cases that contributed multiple cores compared to the cases that contributed only a single core. This analysis showed that the C-Path 5YS model showed signif- icantly improved prognostic prediction accuracy on the VGH cases for which we had multiple images compared to the cases that con- tributed only a single image (Fig. 7). Together, these findings show a significant degree of intrapatient variability and indicate that increased tumor sampling is associated with improved model performance. DISCUSSION Heat map of stromal matrix objects mean abs.diff to neighbors HE image separated into epithelial and stromal objects A B C Worse prognosis Improved prognosis Improved prognosis Improved prognosis Worse prognosis Worse prognosis Fig. 4. Top stromal features associated with survival. (A) Variability in ab- solute difference in intensity between stromal matrix regions and neigh- bors. Top panel, high score (24.1); bottom panel, low score (10.5). (Insets) Top panel, high score; bottom panel; low score. Right panels, stromal matrix objects colored blue (low), green (medium), or white (high) according to each object’s absolute difference in intensity to neighbors. (B) Presence R E S E A R C H A R T I C L E onNovember17,2011stm.sciencemag.orgDownloadedfrom Top epithelial features.The eight panels in the figure (A to H) each shows one of the top-ranking epithelial features from the bootstrap anal- ysis. Left panels, improved prognosis; right panels, worse prognosis. z
  • 175.
    P R EC I S I O N M E D I C I N E Identification of type 2 diabetes subgroups through topological analysis of patient similarity Li Li,1 Wei-Yi Cheng,1 Benjamin S. Glicksberg,1 Omri Gottesman,2 Ronald Tamler,3 Rong Chen,1 Erwin P. Bottinger,2 Joel T. Dudley1,4 * Type 2 diabetes (T2D) is a heterogeneous complex disease affecting more than 29 million Americans alone with a rising prevalence trending toward steady increases in the coming decades. Thus, there is a pressing clinical need to improve early prevention and clinical management of T2D and its complications. Clinicians have understood that patients who carry the T2D diagnosis have a variety of phenotypes and susceptibilities to diabetes-related compli- cations. We used a precision medicine approach to characterize the complexity of T2D patient populations based on high-dimensional electronic medical records (EMRs) and genotype data from 11,210 individuals. We successfully identified three distinct subgroups of T2D from topology-based patient-patient networks. Subtype 1 was character- ized by T2D complications diabetic nephropathy and diabetic retinopathy; subtype 2 was enriched for cancer ma- lignancy and cardiovascular diseases; and subtype 3 was associated most strongly with cardiovascular diseases, neurological diseases, allergies, and HIV infections. We performed a genetic association analysis of the emergent T2D subtypes to identify subtype-specific genetic markers and identified 1279, 1227, and 1338 single-nucleotide polymorphisms (SNPs) that mapped to 425, 322, and 437 unique genes specific to subtypes 1, 2, and 3, respec- tively. By assessing the human disease–SNP association for each subtype, the enriched phenotypes and biological functions at the gene level for each subtype matched with the disease comorbidities and clinical dif- ferences that we identified through EMRs. Our approach demonstrates the utility of applying the precision medicine paradigm in T2D and the promise of extending the approach to the study of other complex, multi- factorial diseases. INTRODUCTION Type 2 diabetes (T2D) is a complex, multifactorial disease that has emerged as an increasing prevalent worldwide health concern asso- ciated with high economic and physiological burdens. An estimated 29.1 million Americans (9.3% of the population) were estimated to have some form of diabetes in 2012—up 13% from 2010—with T2D representing up to 95% of all diagnosed cases (1, 2). Risk factors for T2D include obesity, family history of diabetes, physical inactivity, eth- nicity, and advanced age (1, 2). Diabetes and its complications now rank among the leading causes of death in the United States (2). In fact, diabetes is the leading cause of nontraumatic foot amputation, adult blindness, and need for kidney dialysis, and multiplies risk for myo- cardial infarction, peripheral artery disease, and cerebrovascular disease (3–6). The total estimated direct medical cost attributable to diabetes in the United States in 2012 was $176 billion, with an estimated $76 billion attributable to hospital inpatient care alone. There is a great need to im- prove understanding of T2D and its complex factors to facilitate pre- vention, early detection, and improvements in clinical management. A more precise characterization of T2D patient populations can en- hance our understanding of T2D pathophysiology (7, 8). Current clinical definitions classify diabetes into three major subtypes: type 1 dia- betes (T1D), T2D, and maturity-onset diabetes of the young. Other sub- types based on phenotype bridge the gap between T1D and T2D, for example, latent autoimmune diabetes in adults (LADA) (7) and ketosis- prone T2D. The current categories indicate that the traditional definition of diabetes, especially T2D, might comprise additional subtypes with dis- tinct clinical characteristics. A recent analysis of the longitudinal Whitehall II cohort study demonstrated improved assessment of cardiovascular risks when subgrouping T2D patients according to glucose concentration criteria (9). Genetic association studies reveal that the genetic architec- ture of T2D is profoundly complex (10–12). Identified T2D-associated risk variants exhibit allelic heterogeneity and directional differentiation among populations (13, 14). The apparent clinical and genetic com- plexity and heterogeneity of T2D patient populations suggest that there are opportunities to refine the current, predominantly symptom-based, definition of T2D into additional subtypes (7). Because etiological and pathophysiological differences exist among T2D patients, we hypothesize that a data-driven analysis of a clinical population could identify new T2D subtypes and factors. Here, we de- velop a data-driven, topology-based approach to (i) map the complexity of patient populations using clinical data from electronic medical re- cords (EMRs) and (ii) identify new, emergent T2D patient subgroups with subtype-specific clinical and genetic characteristics. We apply this approachtoadatasetcomprisingmatchedEMRsandgenotypedatafrom more than 11,000 individuals. Topological analysis of these data revealed three distinct T2D subtypes that exhibited distinct patterns of clinical characteristics and disease comorbidities. Further, we identified genetic markers associated with each T2D subtype and performed gene- and pathway-level analysis of subtype genetic associations. Biological and phenotypic features enriched in the genetic analysis corroborated clinical disparities observed among subgroups. Our findings suggest that data- driven,topologicalanalysisofpatientcohortshasutilityinprecisionmedicine effortstorefineourunderstandingofT2Dtowardimproving patient care. 1 Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 700 Lexington Ave., New York, NY 10065, USA. 2 Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA. 3 Division of Endocrinology, Diabetes, and Bone Diseases, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA. 4 Department of Health Policy and Research, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA. *Corresponding author. E-mail: [email protected] R E S E A R C H A R T I C L E www.ScienceTranslationalMedicine.org 28 October 2015 Vol 7 Issue 311 311ra174 1 onOctober28,2015http://stm.sciencemag.org/Downloadedfrom and vision defects (RR, 1.32; range, 1.04 to 1.67), than were the other two subtypes (Table 2A). Patients in subtype 2 (n = 617) were more likely to associate with diseases of cancer of bronchus: lung (RR, 3.76; range, 1.14 to 12.39); malignant neoplasm with- out specification of site (RR, 3.46; range, 1.23 to 9.70); tuberculosis (RR, 2.93; range, 1.30 to 6.64); coronary atherosclerosis and other heart disease (RR, 1.28; range, 1.01 to 1.61); and other circulatory disease (RR, 1.27; range, 1.02 to 1.58), than were the other two subtypes (Table 2B). Patients in subtype 3 (n = 1096) were more often diagnosed with HIV infection (RR, 1.92; range, 1.30 to 2.85) and were associated with E codes (that is, external causes of injury care) (RR, 1.84; range, 1.41 to 2.39); aortic and peripheral arterial embolism or thrombosis (RR, 1.79; range,1.18to 2.71); hypertension withcom- plications and secondary hypertension (RR, 1.66; range, 1.29 to 2.15); coronary athero- sclerosis and other heart disease (RR, 1.41; range, 1.15 to 1.72); allergic reactions (RR, 1.42; range, 1.19 to 1.70); deficiency and other anemia (RR, 1.39; range, 1.14 to 1.68); and screening and history of mental health and substance abuse code (RR, 1.30; range, 1.07 to 1.58) (Table 2C). Significant disease–genetic variant enrichments specific to T2D subtypes We next evaluated the genetic variants sig- nificantly associated with each of the three subtypes. Observed genetic associations and gene-level [that is, single-nucleotide poly- morphisms (SNPs) mapped to gene-level annotations] enrichments by hypergeometric analysis are considered independent of the Fig. 1. Patient and genotype networks. (A) Patient-patient network for topology patterns on 11,210 Biobank patients. Each node repre- sents a single or a group of patients with the significant similarity based on their clinical features. Edge connected with nodes indicates the nodes have shared patients. Red color rep- resents the enrichment for patients with T2D diagnosis, and blue color represents the non- enrichment for patients with T2D diagnosis. (B) Patient-patient network for topology pat- terns on 2551 T2D patients. Each node repre- sents a single or a group of patients with the significant similarity based on their clinical features. Edge connected with nodes indicates the nodes have shared patients. Red color rep- resents the enrichment for patients with females, and blue color represents the enrichment for males. R E S E A R C H A R T I C L E www.ScienceTranslationalMedicine.org 28 October 2015 Vol 7 Issue 311 311ra174 3 onOctober28,2015http://stm.sciencemag.org/Downloadedfrom ( )m aNS a z !e a X TR R R jd d UReR%Uc gV R R jd d
  • 176.
    Machine Learning forHealthcare (MLFHC) 2018 at Stanford
  • 177.
    Machine Learning forHealthcare (MLFHC) 2018 at Stanford
  • 178.
    Machine Learning forHealthcare (MLFHC) 2018 at Stanford 4 TYM +m aNS a
  • 179.
    Machine Learning forHealthcare (MLFHC) 2018 at Stanford FaNS a m