0% found this document useful (0 votes)
34 views10 pages

Prognostic Score Accuracy in Surgical ICU

Uploaded by

xandebarros
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views10 pages

Prognostic Score Accuracy in Surgical ICU

Uploaded by

xandebarros
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Falcão et al. Ann.

Intensive Care (2019) 9:18


https://doi.org/10.1186/s13613-019-0488-9

RESEARCH Open Access

The prognostic accuracy evaluation of SAPS


3, SOFA and APACHE II scores for mortality
prediction in the surgical ICU: an external
validation study and decision‑making analysis
Antônio Luis Eiras Falcão1*, Alexandre Guimarães de Almeida Barros1, Angela Alcântara Magnani Bezerra1,
Natália Lopes Ferreira1, Claudinéia Muterle Logato1, Filipa Pais Silva2, Ana Beatriz Francioso Oliveira do Monte1,
Rodrigo Marques Tonella1, Luciana Castilho de Figueiredo1, Rui Moreno2, Desanka Dragosavac1
and Nelson Adami Andreollo1

Abstract
Background: The early postoperative period is critical for surgical patients. SOFA, SAPS 3 and APACHE II are prog-
nostic scores widely used to predict mortality in ICU patients. This study aimed to evaluate these index tests for their
prognostic accuracy for intra-ICU and in-hospital mortalities as target conditions in patients admitted to ICU after
urgent or elective surgeries and to test whether they aid in decision-making. The process comprised the assessment
of discrimination through analysis of the areas under the receiver operating characteristic curves and calibration of
the prognostic models for the target conditions. After, the clinical relevance of applying them was evaluated through
the measurement of the net benefit of their use in the clinical decision.
Results: Index tests were found to discriminate regular for both target conditions with a poor calibration (C sta-
tistics—intra-ICU mortality AUROCs: APACHE II 0.808, SAPS 3 0.821 and SOFA 0.797/in-hospital mortality AUROCs:
APACHE II 0.772, SAPS 3 0.790 and SOFA 0.742). Calibration assessment revealed a weak correlation between the
observed and expected number of cases in several thresholds of risk, calculated by each model, for both tested
outcomes. The net benefit analysis showed that all score’s aggregate value in the clinical decision when the calculated
probabilities of death ranged between 10 and 40%.
Conclusions: In this study, we observed that the tested ICU prognostic scores are fair tools for intra-ICU and in-hospi-
tal mortality prediction in a cohort of postoperative surgical patients. Also, they may have some potential to be used
as ancillary data to support decision-making by physicians and families regarding the level of therapeutic investment
and palliative care.
Keywords: Prognostic scores, Critical care, Surgical intensive care unit

*Correspondence: [email protected]
1
Intensive Care Unit, Discipline of Physiology and Surgical Metabology,
Department of Surgery, Faculty of Medical Sciences, State University
of Campinas (Unicamp), Tessália Viera de Camargo St. 126, University
Town Zeferino Vaz, Campinas, São Paulo 13083‑887, Brazil
Full list of author information is available at the end of the article

© The Author(s) 2019. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License
(http://creat​iveco​mmons​.org/licen​ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium,
provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license,
and indicate if changes were made.
Falcão et al. Ann. Intensive Care (2019) 9:18 Page 2 of 10

Background Methods
Surgical procedures continue to evolve, and patients with This study was a prospectively defined analysis of a regis-
advanced age, frailty, and comorbidities are exposed to try-based data validation cohort, gathered from consec-
interventions with different levels of invasiveness, com- utively admitted patients to a surgical ICU of a tertiary
plexity, morbidity, and mortality—proposed classifica- university hospital in Brazil, from January 1, 2013, to
tion systems grade complications from those procedures December 31, 2016. Our electronic database is continu-
as simple symptomatic situations to conditions requiring ously fed with predefined clinical and laboratory infor-
surgical, endoscopic or radiological reintervention and mation from every patient admitted to our surgical ICU.
life-threatening organ failure [1, 2]. Therefore, admis- Patients were followed daily during their ICU stay and
sion to ICU for postoperative recovery is common for then tracked for their final hospital status as discharged
surgical patients [1, 2]. Nevertheless, admission to ICU is or deceased. The target condition of interest was the
associated with potentially harmful situations like inva- death of any cause in ICU or hospital. Variables, coef-
sive monitoring and painful procedures [3]. Thus, a pre- ficients, and equations used for the index tests (SOFA,
cise evaluation of the initial clinical condition, the type of APACHE II, and SAPS 3) calculations were based on
procedure, and the final operative status is necessary to original publications without any adjustment or updat-
inform patients and physicians about the risk of compli- ing and are available upon request [4–6, 8]. APACHE II,
cations and poor outcomes and to aid tailoring propor- SAPS 3 and SOFA scores were calculated after the first
tional therapeutic efforts. day of ICU admission using data collected at the pre-
Among many proposed prediction scores, Sequen- specified time frame. This study was a registry-based data
tial Organ Failure Assessment (SOFA), Simplified Acute analysis with outcomes and predictors available before
Physiology Score 3 (SAPS 3) and Acute Physiology the beginning of any form of statistical analysis. There-
and Chronic Health Disease Classification System II fore, the blindness of outcomes or predictors was not
(APACHE II) are prognostic models that use clinical and employed. We followed the standards for reporting diag-
laboratory variables to predict in-hospital mortality [4– nostic accuracy (STARD) statement and the transparent
8]. APACHE II and SAPS 3 were derived from a cohort of reporting of a multivariable prediction model for indi-
general ICU patients, while a consensus panel proposed vidual prognosis or diagnosis (TRIPOD) statement rec-
SOFA as an organ dysfunction measurement score. Their ommendations for validation studies (Additional file 1:
performance was extensively assessed in several popu- Figure S1) [17, 18].
lational subgroups including mixed surgical–medical We did not perform any formal statistical method for
patients, post-cardiovascular surgical patients, and onco- sample size calculation and evaluated all patients availa-
logic patients with heterogeneous results [9–12]. There- ble in our database for enrollment. However, considering
fore, external validation remains essential to evaluate the that more than 100 events were observed for intra-ICU
accuracy of them in new population subgroups and in mortality and more than 250 events for in-hospital mor-
different settings of care over time. tality, we believe that our sample size is satisfactory.
Moreover, traditional statistical methods use metrics Patients eligibility criteria for study enrollment were
based on sensitivity and specificity to assess prediction age 18 or above and admission to surgical ICU for
model’s accuracy. However, the relationship between the postoperative recovery of an elective or urgent surgi-
measurement of accuracy and its clinical usefulness is a cal procedure. Patient data were excluded only if the
gray zone [13, 14]. The decision analysis approach is an target condition information was missing. Noteworthy,
alternative to evaluate the clinical significance of apply- there were no patient’s exclusions after application of
ing those models and provides information into the clini- eligibility criteria. Our eligibility criteria were restric-
cal consequences of using them [13, 14]. This strategy has tive, allowing only surgical patients enrollment. These
been used to test for the net benefit of using SAPS II to criteria contrast with original development cohorts of
end-of-life care decisions and to evaluate the net benefit SAPS 3 and APACHE II. The SAPS 3 cohort included
of a new model based on CURB-65 and C-reactive pro- the first ICU admission of patients aged 16 or more and
tein to guide decision-making in ICU-admitted patients excluded data from patients lacking information about
with success [15, 16]. any admission or discharge variables. The APACHE II
This study aimed to validate and compare the perfor- cohort consecutively included ICU-admitted patients
mance of SOFA, SAPS 3 and APACHE II for intra-ICU for a medical or surgical reason and excluded patients
and in-hospital mortalities as the target conditions in a that were missing any admission variable information
cohort of mixed surgical patients admitted to ICU for or submitted to a coronary artery bypass graft surgery.
postoperative recovery and to test whether they aid in the These inclusion criteria are in contrast with our sam-
clinical decision-making. ple that enrolled patients submitted to any surgical
Falcão et al. Ann. Intensive Care (2019) 9:18 Page 3 of 10

procedure and enrolled those who had admission data sample size, results with p values higher than 0.05 indi-
missing. We handled missing values in predictor vari- cate a good agreement between the model’s predicted
ables with multiple imputations. This procedure was probabilities and observed outcome rates.
performed with SPSS version 22 using a linear regres- Median follow-up was calculated for intra-ICU and in-
sion model. The variables included in the multiple hospital periods according to the reverse Kaplan–Meier
imputation model were intra-ICU and in-hospital mor- survival function that uses the event indicator reversed
talities, age, sex, type of surgery, SAPS 3, APACHE II, and censoring becomes the outcome of interest.
and SOFA scores. Ten imputed datasets were created, A decision curve analysis was developed to describe
and areas under the receiver operating characteristic and compare the clinical utility of tested models. Logis-
curve had their sensitivities and specificities averaged tic regression was used to convert the model’s calculated
to generate the final curve used in our results. values into predicted probabilities of death. Patients were
Our ICU provides a mixed model of care with full-time defined as high risk if their intra-ICU or in-hospital mor-
intensivists, nurses, assistants, respiratory therapists, tality probabilities were higher than the prognostic model
dietitians, and attending physicians. A minimum stand- set probability threshold. Net benefit for different thresh-
ardized level of care was provided, consisting of a daily old values of each model was calculated according to
checklist called ABCD-preV (Additional file 2: Table S1) Vickers et al. and compared to the possible clinical strat-
[19], in order to minimize therapeutic variations inside egy of considering that all patients were positive for the
the population that could change the probability of the outcome and treated them all and that all patients were
outcome and biased the results. negative for the outcome and received no treatment [13,
We evaluated the predictive performance of the index 14].
tests in a cohort of general surgical patients by estimat- Statistical analyses were performed using MedCalc ver-
ing their discrimination and calibration. Discrimination sion 18 and SPSS version 22. Continuous variables were
reflects the capacity of a prediction model to differenti- reported as a mean and standard deviation or median
ate between those who do and do not develop the defined and interquartile ranges whether they follow a normal
target condition during the study period. For the meas- distribution or not. Categorical variables were presented
urement of discrimination, we used the concordance as count and proportion. Univariate analysis was per-
index (C-index) statistic through the calculation of the formed using appropriated tests for continuous and cat-
area under the receiver operating characteristic curve egorical variables to assess association with mortality.
(AUROC) with intra-ICU or in-hospital mortality as the Relative risks for mortalities were calculated after adjust-
binary endpoints. A value of 0.5 for AUROC signifies ment for illness severity. This procedure was performed
chance and means that the predictor in analysis cannot using a case-control matching strategy with severity
distinguish between a positive or an adverse outcome scores (SOFA, SAPS 3, and APACHE II) as specific crite-
while a value of 1 represents perfect discrimination. Dis- ria. A two-tailed p value of less than 0.05 was considered
crimination was classified according to AUROC values statistically significant.
as follows: 0.90–1 excellent, 0.80–0.90 good, 0.70–0.80
fair, 0.60–0.70 poor and 0.50–0.60 fail [20]. The DeLong
method was used to compare whether differences
between different models AUROC’s were statistically sig-
nificant [21]. Calibration reflects how well intra-ICU and
in-hospital mortalities predicted by each model agree
with the observed outcomes. This relation was shown
graphically by clustering patients in tenths of predicted
risk according to each model and plotting the expected
against the observed number of cases. A smoothed line
was drawn over the entire predicted probability range
to augment the observed correlation. A well-calibrated
model predicts over a line slope around 45°. The calibra-
tion plot also indicates the magnitude and direction of
the model’s miscalibrations. For statistical analysis of the
model’s predictive performance, we employed the Hos-
Fig. 1 Participant flow diagram
mer–Lemeshow goodness-of-fit test [22]. In an adequate
Falcão et al. Ann. Intensive Care (2019) 9:18 Page 4 of 10

Results of stay. Mechanical ventilation was associated with the


We assessed an initial population of 3568 patients and highest relative risk for ICU mortality [RR 3.97 (95%
polled out 3008 patients for further analysis according to CI 1.59–9.95)].
our eligibility criteria (Fig. 1). The main reason for exclu- C-index statistics were calculated for each prognos-
sion was ICU admission motivated by a medical reason tic model with intra-ICU and in-hospital mortalities
not related to a surgical procedure. All patients assessed as dependent target conditions (Table 3). The follow-
had their outcomes available, and no further exclusion ing AUROCs were obtained with intra-ICU mortality as
was necessary. APACHE II, SAPS 3, and SOFA were the outcome: APACHE II 0.808 (95% CI 0.794–0.822),
calculated at appropriated timepoints and patients fol- SAPS 3 0.821 (95% CI 0.807–0.835), and SOFA 0.797
lowed until they deceased or discharged from the hos- (95% CI 0.783–0.812). Considering in-hospital mortality,
pital. APACHE II data were missing in 206 patients and the following AUROCs were observed: APACHE II 0.772
had their values calculated using multiple imputations. (95% CI 0.757–0.787), SAPS 3 0.790 (95% CI 0.775–
Analyzed population demography and clinical features 0.804), and SOFA 0.742 (95% CI 0.726–0.758). Pairwise
are summarized in Tables 1 and 2 and Additional file 3: comparison among prognostic models resulted in no
Figure S2. In-hospital and intra-ICU mortality rates significant difference between them, except for SAPS 3
were 8.91% and 5.42%, respectively, during the evaluated and SOFA score AUROCs difference that could not be
period. Median follow-up period was 12 days for in-hos- explained by chance when in-hospital mortality was the
pital length of stay and three days for intra-ICU length target condition (Table 4; Fig. 2).

Table 1 Patient’s baseline characteristics


Total Intra-ICU In-hospital
Alive Deaths p value Relative risk (95% Alive Deaths p value Relative risk (95%
CI) CI)

Age median (IQR) 58 (47–67) 58 (47–67) 63 (53–70) < 0.001* 57 (46–67) 63 (54.5–71) < 0.001*
Male sex count (%) 1798 (59.8) 1693 105 0.21** 1631 167 0.37**
The urgency of the surgical procedure count (%)
Urgent 220 (7.3) 170 50 152 68
Elective 2788 (92.7) 2675 113 2588 200
Preexistent conditions count (%)
Arterial hyperten- 1537 (51.1) 1452 85 0.72** 1394 143 0.40**
sion
Diabetes mellitus 634 (21.1) 604 30 0.39** 570 64 0.24**
Alcohol use 371 (12.3) 347 24 0.34** 335 36 0.57**
Tobacco use 1085 (36.1) 1029 56 0.64** 1001 84 0.09**
Intra-ICU length of 3 (2–5) 3 (2–5) 7 (3–15) < 0.001*
stay days median
(IQR)
In-hospital length of 12 (8–20) 11 (7–19) 17 (9–34.5) < 0.001*
stay days median
(IQR)
Severity Scores median (IQR)
SOFA 3 (2–6) 3 (2–6) 7 (5–9) < 0.001* 3 (2–6) 6 (4–9) < 0.001*
APACHE II 12 (9–15) 11 (8–14) 17 (13–22) < 0.001* 11 (8–14) 16 (13–20) < 0.001*
SAPS 3 36 (28–44) 36 (28–43) 52 (43–60) < 0.001* 35 (28–43) 48 (41–58) < 0.001*
Life support therapies
Mechanical ventila- 1491 (49.6) 1333 158 < 0.01** 3.97 (1.59–9.95) 1269 222 < 0.01** 1.44 (1.07–1.93)
tion count (%)
Length of mechani- 1 (1–2) 1 (1–1) 7 (2–12) < 0.01* 1 (1–1) 5 (2–11) < 0.01*
cal ventilation
days median (IQR)
Renal replacement 143 (4.8) 93 50 < 0.01** 1.9 (1.42–2.53) 78 65 < 0.01** 1.78 (1.43–2.22)
therapy count (%)
*Mann–Whitney
**Chi-squared
Falcão et al. Ann. Intensive Care (2019) 9:18 Page 5 of 10

Table 2 Type of surgery distribution across patients Next, patients were divided into approximately ten
Surgical specialties Number Percent
similar groups of risk defined by increasing order of
of cases estimated risk according to each prognostic model
count (n) and expected, and observed deaths were calculated
Head and neck surgery
in each group. Calibration graphs were built plotting
Tumor 38 1.26
the expected and observed values for each group and
Others 14 0.47
goodness-of-fit tested with the Hosmer–Lemeshow sta-
Cardiac surgery
tistics (Fig. 3; Table 5). Also, the ratios of observed and
Coronary artery bypass graft 339 11.27
expected number of deaths in each risk group were plot-
Thoracic aortic aneurysm 89 2.96
ted to show the overall fit of the tested models (Fig. 3). In
summary, models had a poor calibration in extremities of
Cardiac transplant 24 0.80
risk, overestimating and underestimating intra-ICU and
Valve replacement 189 6.28
in-hospital mortality, respectively. Based on the Hosmer–
Others 50 1.66
Lemeshow goodness-of-fit test, APACHE II and SAPS
Surgery of esophagus and abdomen
3 had p values above 0.05 while SOFA score showed a p
Liver 67 2.23
value lower than 0.05 which indicates miscalibration for
Liver transplant 141 4.69
both outcomes.
Biliary tract 133 4.42
Then, we calculated the intra-ICU and the in-hospital
Esophagus and stomach 177 5.88
probability of death given by each prognostic model in
Colon, rectum, and anus 195 6.48
ICU admission and plotted decision curves to determine
Others 4 0.13
how they aid in decision-making (Fig. 4). For both target
Neurosurgery
conditions, the net benefit curves of the tested prognostic
Aneurysm 105 3.49
models were similar regardless of the selected threshold.
Epilepsy 84 2.79
Although SOFA, SAPS 3, and APACHE II showed diverse
Tumor 317 10.54
discrimination and calibration features, they showed a
Spine 109 3.62
positive net benefit in the 10–40% range of death prob-
Decompressive craniectomy 23 0.76
ability. Above or below this range, the net benefit of using
Ventriculostomy 23 0.76
them is no better than not treat any patient or treat them
Others 60 1.99
all, respectively.
Thoracic surgery
Tumor 70 2.33
Other 57 1.89 Discussion
Urology In this external validation study, we sought to evaluate
Kidney transplant 123 4.09 the performance of prognostic models to predict intra-
Tumor 167 5.55 ICU, and in-hospital mortalities in a cohort of surgical
Others 48 1.60 patients admitted in ICU for postoperative recovery and
Vascular surgery tested how it could help in decision-making. Multivari-
Abdominal aortic aneurysm 164 5.45 able prognostic models analyzed were employed identical
Endarterectomy 88 2.93 to their original descriptions, without any adjustments in
Others 95 3.16 variables selection or weighting. SAPS 3 and APACHE
Trauma, orthopedic, and ophthalmic surgeries 15 0.50 II were initially developed to predict hospital mortality,
Total 3008 100 while SOFA was initially proposed as a measurement of
organic dysfunction and posteriorly validated for mor-
tality prediction in different subgroups of patients [4, 5,
8, 23]. In development studies, SAPS 3 and APACHE II
Table 3 Severity score’s area under the receiver operating scores showed AUROCs of 0.825 and 0.863, respectively.
characteristic (AUROC) curves for hospital and ICU In a recent review of prognostic scores performance in
mortalities as outcomes low and mid-income countries, discrimination of SAPS
Severity score AUROC—in-hospital AUROC—intra-ICU 3 and APACHE II evaluated through AUROCs ranged
mortality (95% CI) mortality (95% CI) between 0.7 and 0.9 for intra-ICU and in-hospital mor-
talities as outcomes [24]. It is important to stress out that
APACHE II 0.772 (0.757–0.787) 0.808 (0.794–0.822)
our sample was enrolled in a tertiary university hospital
SAPS 3 0.790 (0.775–0.804) 0.821 (0.807–0.835)
from a high-income region of Brazil and may have fea-
SOFA 0.742 (0.726–0.758) 0.797 (0.783–0.812)
tures different from low- and mid-income settings that
Falcão et al. Ann. Intensive Care (2019) 9:18 Page 6 of 10

Table 4 Pairwise comparison of prediction scores AUROC curves


Severity score Difference between AUROCs p value Difference between AUROCs intra-ICU p value
in-hospital mortality (95% CI) mortality (95% CI)

APACHE II versus SOFA 0.0296 (− 0.004 to 0.063) 0.0840 0.0109 (− 0.027 to 0.049) 0.5748
APACHE II versus SAPS 3 0.0177 (− 0.014 to 0.049) 0.2686 0.0130 (− 0.024 to 0.05) 0.4973
SAPS 3 versus SOFA 0.0474 (0.013–0.082) 0.0068 0.0263 (− 0.013 to 0.061) 0.2050

and conventional, and cannot define whether is worth


using a particular model as an ancillary tool for decision-
making or which of them is superior in practice [13, 20].
We calculated the net benefit of tested models using dif-
ferent thresholds of the risk of death. Although death is
a severe final event and false-negative and false-positive
results limit the individual applicability of prognostic
scores, the benefit of full therapeutic investment in cer-
tain patients admitted in ICU is unclear and may bring
additional suffering and unnecessary resource utilization
Fig. 2 Pairwise comparison of the prediction model’s receiver [13, 14]. Our data suggest that APACHE II, SAPS 3, and
operating characteristic (ROC) curves. ROC curves of different severity
scores with intra-ICU (a) and in-hospital (b) mortality as the outcome.
SOFA calculated in admission may add information to
Green line—APACHE II; blue line—SAPS 3; orange line—SOFA help physicians and patients in decision-making about
therapeutic management and palliative care when the
calculated predicted risk of death is between 10 and 40%
with no score superior to others. Although redundant
may preclude extrapolation. To the best of our knowl- in extremes of illness severity, mortality of patients with
edge, none of the assessed prognostic models had their low and intermediate levels of risk is difficult to predict
performance tested in a cohort exclusive of surgical and gathering data from prognostic models may improve
patients from different specialties. Our data suggest fair decisions about therapeutic management [13, 14, 27]. It
to good discrimination of the tested models, with best is important to stress out that there was no observed net
results observed using SAPS 3 for prediction of both benefit to patients with high levels of risk for both target
target conditions. APACHE II score was better cali- conditions. Maybe the small sample size in this subgroup
brated for in-hospital mortality prediction than SAPS 3 of patients was insufficient to create a detectable signal
and SOFA that trend to underestimate low-risk patient’s by the tested prognostic models.
and overestimate high-risk patient’s probability of death. This study has several limitations that must be stressed
Scores prediction of intra-ICU mortality had a poor cali- out. Our cohort was derived from a single-center popu-
bration with SAPS 3 fitting better among them. lation with inclusion and exclusion criteria that yielded
In contrast to APACHE II and SAPS 3 that use features significant differences in demographic and clinical fea-
reflecting chronic conditions like the patient’s age to esti- tures compared with original multicentric cohorts used
mate risk, SOFA measures six organic variables reflect- for SAPS 3 and APACHE II development [4, 5, 8]. SAPS
ing mostly acute conditions. In this study, our sample was 3 and APACHE II cohorts were composed of mixed
composed mainly of patients admitted to elective surgical clinical and surgical cases, with almost half of patients
procedures with their baseline conditions optimized. Per- being unplanned admitted in ICU, which contrasts with
haps SOFA performed poorly because of the lack of cor- our sample that was composed exclusively of surgi-
relation between its variables and the target conditions cal patients admitted to ICU for postoperative recovery
in our setting. It is possible that recalibration of SOFA’s mainly of elective surgeries. Patients were also iller in
variables may improve its accuracy. Moreover, prognos- original SAPS 3 and APACHE II development cohorts as
tic scores performance deteriorates over time and among illustrated by the number of organic dysfunctions which
different ICUs, especially calibration [25, 26]. Therefore, was higher than in our cohort. For instance, the median
it is critical to external validate prognostic scores over- SOFA in SAPS 3 original development cohort was 9 with
time and before their utilization in new ICUs. an interquartile range of 6–11, while our patients had a
The traditional evaluation of prognostic scores using median SOFA of 3 with an interquartile range of 2–6 [5,
discrimination and calibration measurements is not new 8]. Although the length of ICU and hospital stay, age, and
Falcão et al. Ann. Intensive Care (2019) 9:18 Page 7 of 10

Fig. 3 Prediction models calibration plots. a–f Groups covering the entire predicted intra-ICU (a–c) or in-hospital (d–f) mortality probabilities
calculated by each severity score (on the x-axis) plotted against observed frequencies (on the y-axis) (Dots linked by the black line). A LOWESS line
(red), spanning 75% of local values, was created for each dataset to clarify the relationship between assessed variables and to shed light on the
direction and magnitude of model miscalibration across the probability range. g, h The ratios of observed over expected intra-ICU (g) or in-hospital
(h) mortality probabilities, calculated by each prediction model (on the y-axis), were plotted against sequential clusters of risk (on the x-axis) to allow
direct comparison between severity scores. Linear trend lines were created to aid in comparison. Orange line—APACHE II; black line—SAPS 3; blue
line—SOFA
Falcão et al. Ann. Intensive Care (2019) 9:18 Page 8 of 10

Table 5 Prognostic model’s calibration values for hospital and intra-ICU mortalities as outcomes
Severity score Hospital mortality p value intra-ICU mortality p value
Hosmer and Lemeshow test—Chi- Hosmer and Lemeshow test—Chi-
squared (DF) squared (DF)

SOFA admission 18.04 (7) 0.0118 14.98 (7) 0.0362


SAPS 3 10.71 (8) 0.2189 2.02 (8) 0.9804
APACHE II 7.89 (8) 0.4441 13.35 (8) 0.1003

Fig. 4 Prediction models decision curves. a, b The net benefits of using each prediction model (on the y-axis) plotted for different thresholds
of the probability of intra-ICU (a) or in-hospital (b) deaths (on the x-axis). The net benefit was calculated according to the following formula:
net benefit = [(true-positive count)/n] − [(false-positive count)/n] × [pt/(1 − pt)] where n is the total number of patients and pt the threshold
probability. Two lines representing the net benefit associated with the strategy of assuming all patients survived (no false positives) (black line) and
that all patients died (yellow line) was drawn for comparison. Orange line—APACHE II; blue line—SOFA; gray line—SAPS 3

comorbidities profile were similar among our patients validation studies. Comparison of the observed in-
and original SAPS 3 and APACHE II cohorts, compari- hospital mortality rate in this study with those found in
son of intra-ICU and in-hospital mortality reveals differ- comparable cohorts showed similar frequencies [28–30].
ences in outcome rates [4, 5, 8]. SAPS 3 and APACHE II Datasets from these studies were derived from elective
original cohorts exhibited a broad spectrum of intra-ICU and non-elective surgical patients in the postoperative
and in-hospital mortalities, with rates ranging between period admitted in ICUs of European hospitals with sim-
10 and 30%, while mortality rates observed in this study ilar features to the tertiary setting where our data were
were both below 10%. This difference may be in part derived [28–30]. Correlation of our mortality frequen-
explained by the features described above in the compo- cies with data from other Brazilian ICUs revealed simi-
sition of analyzed cohorts, but also from selection and lar in-hospital mortality although cohorts compositions
information bias, which are intrinsic to observational were different [24, 31]. Another limitation was the small
studies [18]. Also, it must be pointed out that the time size of our cohort, especially in the high-risk subgroup of
difference between each cohort assembly creates a vari- patients. This fact may account for part of the reasonable
ance in features like therapeutic options available at the accuracy and poor calibration observed for the tested
time that have a direct impact on analyzed outcomes. scores and the absence of net benefit to this subgroup of
SAPS 3 database was built from data of patients admitted patients in decision-making.
in ICUs of multiple countries from October to Decem-
ber 2002, while APACHE II database recruited patients Conclusions
between 1979 and 1982 in multiple ICUs from the USA In conclusion, this study assessed the performance of
[4, 5, 8]. It is in contrast with our database which col- widely used prognostic scores for death prediction of sur-
lected data from patients admitted in one hospital ICU gical patients admitted in ICU for postoperative recovery.
from 2013 to 2016. Differences in frequency of tested Observed results suggested that APACHE II, SAPS 3, and
outcomes are an important feature that may impact the SOFA have regular discrimination features and poor cali-
generalizability of results and conclusions of external bration. Other studies showed similar results in different
Falcão et al. Ann. Intensive Care (2019) 9:18 Page 9 of 10

population subgroups, none using a cohort with charac- Acknowledgements


We are thankful to all members of the intensive care unit of Unicamp’s Teach-
teristics of ours. Currently, prognostic scores are used for ing Hospital and Central Lisbon Hospital Center that contributed to this study.
benchmarking, comparisons between ICUs performance
and standardization of excellence. As previously sug- Competing interests
The authors declare that they have no competing interests.
gested by others, our data support the fact that adopting
those prognostic scores without further local external Availability of data and materials
validation and adjustment may be misleading [25, 26]. The datasets used and analyzed during the current study are available from
the corresponding author on reasonable request.
Another point to be stressed out is that although the
tested prognostic scores have a net benefit in death pre- Consent for publication
diction of the low and intermediate level of risk surgi- Not applicable.
cal patients admitted in ICU, their performance was Ethics approval and consent to participate
deficient when applied in the high level of risk patients The local ethics committee approved this study; Process No. CAAE
which is the subgroup most susceptible to the futility of 75821717.1.0000.5404. This study was observational, and every clinical deci-
sion was at the discretion of the attending physician. Therefore, informed
care. Therefore, before being ascribed as ancillary tools consent was waived. The electronic database encrypted patient’s identifica-
to aid in decision-making, improvements in the net tion and investigators had access only to relevant data for the study.
benefit features generated using the tested prognostic
Funding
models, especially in extremes of illness severity, must This study has not received any financial support from any source.
be sought. Noteworthy, no prognostic model should be
used isolated to guide decision-making or replace clinical Publisher’s Note
judgment. Further studies are needed to define the exact Springer Nature remains neutral with regard to jurisdictional claims in pub-
role the tested prognostic models may have as part of the lished maps and institutional affiliations.
decision-making process in ICU. Received: 28 August 2018 Accepted: 12 January 2019

Additional files References


1. Ghaffar S, Pearse RM, Gillies MA. ICU admission after surgery. Curr Opin
Crit Care [Internet]. 2017;1. http://insig​hts.ovid.com/cross​ref?an = 00075​
Additional file 1: Figure S1. STARD 2015 Checklist: Prediction Model
198-90000​0000-99242​.
Validation.
2. Guarracino F, Bertini P. To ICU or not to ICU: tailoring postoperative care in
Additional file 2: Table S1. ABCD-preV checklist. the face of reduced resources and increased morbidity. Minerva Anest-
Additional file 3: Figure S2. Prediction scores distribution frequency. esiol. 2017;83:134–5.
A–F—Patients distribution across severity scores values with intra-ICU 3. Niederman MS, Berger JT. The delivery of futile care is harmful to other
(A, C and E) and in-hospital (B, D and F) mortality as outcomes. Blue bars patients. Crit Care Med. 2010;38:S518–22.
represent survivors and green bars non-survivors. 4. Knaus WA, Draper EA, Wagner DP, Zimmerman JE. APACHE II: a severity of
disease classification system. Crit Care Med [Internet]. 1985;13:818–29.
5. Moreno RP, Metnitz PGH, Almeida E, Jordan B, Bauer P, Campos RA, et al.
Abbreviations SAPS 3—from evaluation of the patient to evaluation of the intensive
SOFA: Sequential Organ Failure Assessment; SAPS 3: Simplified Acute Physiol- care unit. Part 2: development of a prognostic model for hospital mortal-
ogy Score 3; APACHE II: Acute Physiology and Chronic Health Disease Clas- ity at ICU admission. Intensive Care Med [Internet]. 2005;31:1345–55.
sification System II; AUROC: area under the receiver operating characteristic 6. Vincent JL, Moreno R, Takala J, Willatts S, De Mendonça A, Bruining
curve; ICU: intensive care unit. H, et al. The SOFA (Sepsis-related Organ Failure Assessment) score to
describe organ dysfunction/failure. Intensive Care Med. 1996;22:707–10.
Authors’ contributions 7. Minne L, Abu-Hanna A, de Jonge E. Evaluation of SOFA-based models for
ALEF and AGAB conceived and designed the study, analyzed the data, and predicting mortality in the ICU: a systematic review. Crit Care [Internet].
wrote the first and revised version of the manuscript. AAMB, MRB, and FPS 2009;12:R161. https​://doi.org/10.1186/cc716​0.
contributed substantially reviewing data analysis and manuscript. ABFOM, 8. Metnitz PG, Moreno RP, Almeida E, Jordan B, Bauer P, Campos RA, et al.
RMT, and LCF contributed substantially with manuscript writing and revision. SAPS 3—from evaluation of the patient to evaluation of the intensive
RM, DD, and NRA contributed with study design, manuscript writing, and revi- care unit. Part 1: objectives, methods and cohort description. Intensive
sion. NLF contributed substantially with the writing of the revised version of Care Med [Internet]. 2005;31:1336–44.
this manuscript. All authors read and approved the final manuscript. 9. Sakr Y, Krauss C, Amaral ACKB, Réa-Neto A, Specht M, Reinhart K, et al.
Comparison of the performance of SAPS II, SAPS 3, APACHE II, and their
Author details customized prognostic models in a surgical intensive care unit. Br J
1
Intensive Care Unit, Discipline of Physiology and Surgical Metabology, Anaesth. 2008;101:798–803.
Department of Surgery, Faculty of Medical Sciences, State University of Campi- 10. Soares M, Salluh JIF. Validation of the SAPS 3 admission prognostic model
nas (Unicamp), Tessália Viera de Camargo St. 126, University Town Zeferino in patients with cancer in need of intensive care. Intensive Care Med.
Vaz, Campinas, São Paulo 13083‑887, Brazil. 2 Unidade de Cuidados Intensivos 2006;32:1839–44.
Polivalente, Unidade de Cuidados Neurocríticos, Hospital de São José, Centro 11. den Boer S, de Keizer NF, de Jonge E. Performance of prognostic
Hospitalar de Lisboa Central, Lisbon, Portugal. models in critically ill cancer patients—a review. Crit Care [Internet].
2005;9:458–63.
Falcão et al. Ann. Intensive Care (2019) 9:18 Page 10 of 10

12. Stephens RS, Whitman GJR. Postoperative critical care of the adult 23. Vincent JL, Moreno R, Takala J, Willatts S, De Mendonca A, Bruining
cardiac surgical patient. Part I: routine postoperative care. Crit Care Med. H, et al. The SOFA (Sepsis-related Organ Failure Assessment) score to
2015;43:1477–97. describe organ dysfunction/failure. On behalf of the Working Group
13. Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluat- on Sepsis-Related Problems of the European Society of Intensive Care
ing prediction models. Med Decis Mak. 2006;26:565–74. Medicine. Intensive Care Med [Internet]. 1996;22:707–10.
14. Vickers AJ. Decision analysis for the evaluation of diagnostic tests, predic- 24. Haniffa R, Isaam I, De Silva AP, Dondorp AM, De Keizer NF. Performance
tion models, and molecular markers. Am Stat. 2008;62:314–20. of critical care prognostic scoring systems in low and middle-income
15. Allyn J, Ferdynus C, Bohrer M, Dalban C, Valance D, Allou N. Simplified countries: a systematic review. Crit Care. 2018;22:18.
acute physiology score II as predictor of mortality in intensive care units: 25. Salluh JIF, Soares M. ICU severity of illness scores. Curr Opin Crit Care
a decision curve analysis. PLoS ONE. 2016;11:e0164828. [Internet]. 2014;20:557–65.
16. Yamamoto S, Yamazaki S, Shimizu T, Takeshima T, Fukuma S, Yamamoto Y, 26. Vincent J-L, Moreno R, Moreno R, Moreno R, Jordan B, Metnitz P, et al.
et al. Prognostic utility of serum CRP levels in combination with CURB-65 Clinical review: scoring systems in the critically ill. Crit Care [Internet].
in patients with clinically suspected sepsis: a decision curve analysis. BMJ 2010;14:207. https​://doi.org/10.1186/cc820​4.
Open. 2015;5:e007049. 27. Schenker Y, White DB, Crowley-Matoka M, Dohan D, Tiver GA, Arnold RM.
17. Cohen JF, Korevaar DA, Altman DG, Bruns DE, Gatsonis CA, Hooft L, et al. “It hurts to know… and it helps”: exploring how surrogates in the ICU
STARD 2015 guidelines for reporting diagnostic accuracy studies: expla- cope with prognostic information. J Palliat Med. 2013;16:243–9.
nation and elaboration. BMJ Open. 2016;6:e012799. 28. Pearse RM, Rhodes A, Moreno R, Pelosi P, Spies C, Vallet B, et al. EuSOS:
18. Moons KGM, Altman DG, Reitsma JB, Ioannidis JPA, Macaskill P, Steyen- European surgical outcomes study. Eur J Anaesthesiol. 2011;28:454–6.
berg EW, et al. Transparent Reporting of a multivariable prediction model 29. Pearse R, Moreno RP, Bauer P, Pelosi P, Metnitz P, Spies C, et al. Mortality
for Individual Prognosis or Diagnosis (TRIPOD): explanation and Elabora- after surgery in Europe: a 7 day cohort study. Lancet. 2012;380:1059–65.
tion. Ann Intern Med [Internet]. 2015;162:W1–74. 30. Kahan BC, Koulenti D, Arvaniti K, Beavis V, Campbell D, Chan M, et al.
19. Vincent JL. Give your patient a fast hug (at least) once a day. Crit Care Critical care admission following elective surgery was not associated with
Med. 2005;33:1225–9. survival benefit: prospective analysis of data from 27 countries. Intensive
20. Alba AC, Agoritsas T, Walsh M, Hanna S, Iorio A, Devereaux PJ, et al. Dis- Care Med. 2017;43:971–9.
crimination and calibration of clinical prediction models. JAMA [Internet]. 31. Silva Junior JM, Malbouisson LMS, Nuevo HL, Barbosa LGT, Marubayashi
2017;318:1377. https​://doi.org/10.1001/jama.2017.12126​. LY, Teixeira IC, et al. Applicability of the simplified acute physiology score
21. DeLong E, DeLong D, Clarke-Pearson D. Comparing the areas under two (SAPS 3) in Brazilian hospitals. Rev Bras Anestesiol. 2010;60:20–31.
or more correlated receiver operating characteristic curves: a nonpara-
metric approach. JSTOR Biom. 1988;44(3):837–45.
22. Lemeshow S, Hosmer DWJ. A review of goodness of fit statistics for use
in the development of logistic regression models. Am J Epidemiol [Inter-
net]. 1982;115:92–106.

You might also like