Predicting Hospital Readmission
using TreeNet™
Robert Aronoff MD
The vision …..

Streamlined sequence of processes


            EMR                    Predictive Model               Decision Support
     Clinical Workflow                 Creation                    at Point of Care

  Capture data entered as     Automated E-T-L                Vendor ‘neutral’ scoring
   part of routine clinical     processes                       tools:
   workflow                    Machine learning                    Intranet based
                                algorithms for target class         JSON serialization
                                prediction
Agenda / Table of contents

1   Readmission after Heart Failure


2   Data Structure of an Electronic Medical Record


3   TreeNet™ Modeling with our Dataset


4   Lessons Learned and Next Step(s)
PREFACE:
Data Modeling Paradigm




© Salford Systems, 2011
Model Accuracy


Completeness of Set                       Feature Selection
                                                Feature Fit




                                              Target Class




Kattan MW, EuroUro. 2011; (59): 566-567
Model Error: The Bias – Variance Decomposition




              Prediction Error = Irreducible Error + Bias² + Variance

Hastie T, Tibshirani R, Friedman JH. The elements of statistical learning :
Data mining, inference, and prediction. New York, NY: Springer; 2009.
Model caveats



 Association does not prove causality


 Models are retrospective (observational)
 and therefore hypothesis generating (i.e.
 not hypothesis proving)
READMISSION AFTER HEART FAILURE
Congestive Heart Failure


 Common cause for admission.
 Readmission in excess of 23%.                                     Bueno, H. et al.
                                                           JAMA 2010;303:2141-2147
 Risk factors for readmission extensively studied.
 Published reviews cite over 120 studies.
 - Methods: Logistic regression; Cox proportional hazard
 - C-statistic in 0.6-0.7 range

 Reduction of readmission has been declared a national goal.
 Improved risk models have the potential to more effectively deploy
  targeted disease management.
EMR data structure

 Data collected for clinical workflow.
 Large volume
 - Multiple observations; repeated measures
 - Many interactions and interdependencies

 Complex dataset
 - Continuous, Ordinal, Nominal (low and high order), Binary
 - ‘High-order variable-dimension nominal variables’

 Missing data:
 - May represent error or practice patterns

 Unbalanced classes
 Outliers and Entry errors
Preliminary Dataset
- 1612 consecutive heart failure discharges abstracted
- 1280 candidate predictors screened
- Target class: Readmission at 30 days ( binary )
Administrative candidate predictors             Clinical candidate predictors
•Admission source, status, service          •Specialty medical services consulted
•Age, gender, race                          •Specialty ancillary services consulted
•Primary/secondary payers                   •Blood laboratory values
•Primary/secondary diagnoses (names         •Medications name / therapeutic class
and condition categories)                   •Dosages of medications
•Total length of stay, ICU length of stay   •Patient weights during hospitalization
•Hospital costs and charges                 •Transfusions during hospitalization
•Discharge status and disposition           •Nursing assessments
•All-cause same-center admission in         •Education topics
preceding year                              •Diagnostic tests ordered
                                            •Ordersets utilized



  Preliminary Unpublished Data
Benefits of Stochastic Gradient Boosting
             Friedman JH. Stochastic gradient boosting. Computational Statistics and Data Analysis 2002; 38(4):367- 378.



 Input and processing                                   Output
 Does not require data                               High model accuracy
  transformation
                                                      Classification and regression
 Handles large numbers of
                                                      Non-parametric application of
  categorical and continuous                           logistic , L1, L2, or Huber-M
  variables                                            loss function
 Has mechanisms for:
 - Feature selection
 - Managing missing values
 - Assessing the relationship of
   predictors to target
 Robust to:
 - Data entry errors, Outliers,
   Target misclassification
TreeNet™ Modeling with our Dataset


1   Parameters of ‘feature fit’

2   Parameters of ‘feature selection’

3   Elements of insight

4   Putting it all together
TreeNet – parameters of ‘feature fit’

     Do not forget the manual …..
Feature selection – variable importance

                                  Variable Importance
                                      Calculation

                              Squared relative improvement
                               is the sum of squared
                               improvements (in squared
                               error risk) over all internal
                               nodes for which the variable
                               was chosen as the splitting
                               variable
                              Measurements are relative
                               and is customary to express
                               largest at 100 and scale other
                               variables appropriately
Insight into the model
Illuminating the ‘black box’ with partial dependence




   Preliminary Unpublished Data
Approach to feature selection

Domain ‘Neutral’ vs. Domain ‘Centric’


      Domain Neutral                 Both                   Domain Centric

  Start with a subset      Know your data             Use all potential
   based on univariate      Univariate stats            predictors
   significance (i.e. P-                                Use knowledge of
   value below a given      Application of Variable
                             Importance                  target and predictors to
   level) or variance                                    make decisions on
   above a given            Screening with              inclusion (or rejection)
   threshold                 batteries                   of predictor
                            Forward and backward
                             stepwise progression
Model Variability
Establishing AUC precision and accuracy

                Variation                          Accuracy / Precision
  The model is fit via sampling (i.e.    S.E.M.= S.D. / sqrt ( N )
   stochastic) process.                   Precision (95%) ≈ 4 * S.E.M.




                                          S.D. = 0.03
                                              N trials       10        30     300
                                          S.E.M.            .0095 .0055       .0017
                                          Precision (95%)   .038       .022   .007
Precision and predictor selection
         STEP_1 (197) (0.531)             STEP_66 (737) (0.703)
            0.75
                                                                  Min      = 0.5057
                                                                  Median   = 0.6738
                    0.70                                          Mean     = 0.6500
                                                                  Max      = 0.7034
         Avg. ROC



                    0.65
                                                                              Test ROC
                    0.60


                    0.55


                    0.50




 AUC estimated using CV-10 ( = 10 trees)  SEM .0095 and
  precision (95%) of .038
 Repeating CV-10 (using CVR battery) 30 times  SEM .0017 and
  precision (95%) of .007
 Profound implication on dimensionality of model achievable without
  domain knowledge input.
How much of a change in AUC is clinically relevant ?

Gain Curve complements ROC curve




 Preliminary Unpublished
           Data
Useful batteries for feature selection
Methods of forward and backward selection




                                                  STEPWISE

                                             Testing set to CV-10
                                             Select predictor 1-2 at
                                              a time
                                             Confirm with CVR
                                              battery

                                                   SHAVING
BUILDING A MODEL                                                 **Each change confirmed with
                                                                 CVR (30 reps). Review partial
This is a multi-step process                                     dependence plot.


Run model with all candidate             Use backward and forward           Re-examine discarded
predictors. Select N highest             selection to reduce                  predictors in smaller
important predictors.                    preliminary model to a core        groups. Use backward
N= 2-3 x final size                      5 -15 predictors.**               and forward selection.**



     Step 1                   Step 2                 Step 3          Step 4            Step 5


                      Run batteries to assess                 Review predictors and use domain
                      parameters of feature ‘fit’.            knowledge to eliminate redundant
                      Assess model (AUC)                      (dependent) predictors and consider
                      variability. Repeat as                  predictors of known value. **
                      needed through process.
Initial runs
Information content and irreducible error
                                           #287 (0.519) #287 (0.519) (0.436)
 0.576                   0.6
         Cross Entropy




 0.576                   0.5
                         0.4
                         0.3
                                                                                                                                                   Train
                         0.2
                         0.1
                                                                                                                                                   Test    6 Nodes
                         0.0
                               0   100   200    300      400     500     600      700   800    900     1000    1100    1200   1300   1400   1500

                                                                               Number of Trees




                                                                                         #880 (0.518) #880 (0.518) (0.457)
 0.576                   0.6
         Cross Entropy




 0.577                   0.5
                         0.4
                         0.3
                                                                                                                                                   Train
                         0.2
                         0.1
                                                                                                                                                   Test    2 Nodes
                         0.0
                               0   100   200    300      400     500     600      700   800    900     1000    1100    1200   1300   1400   1500

                                                                               Number of Trees




     Preliminary Unpublished Data
Sample Model
                    GAIN CURVE    ROC CURVE

                      FEATURE SELECTION SET




                       MODEL TRAINING SET




 Preliminary Unpublished Data
Sample partial dependence plots
The value of non-parametric regression

                    Admissions within prior year        ICU Days




                            Anion Gap                Initial Systolic BP




                             Final BNP             BUN-Creatinine Ratio




   Preliminary Unpublished Data
Prospective application
Additional heart failure discharges can be scored against the model

                GAIN curve                          ROC curve




                                                    Preliminary Unpublished
             Causes for performance shift                     Data
  Overfitting in the original model
  Concomitant intervention programs are altering
   patient risk of readmission
Non-influential candidate predictors

Models favor continuous over binary ‘dummy’ variables

 Diagnoses and QualNet Condition Categories
 Medications and Therapeutic Categories
 Diagnostic Tests
 Ordersets Submitted




  Preliminary Unpublished Data
Lessons learned
 TreeNet ( stochastic gradient boosting) is extremely well suited to
  data structure of EMR data.
 Insight in to dataset is a rich feature (in and above prediction
  performance).
 Model performance variance is important in feature selection.
 - Consequence of limited information content in our dataset.

 Batteries are useful.
   - PARTITION – Variability assessment
   - CVR – Model assessment
   - STEPWISE – Forward selection
   - SHAVING – Backward selection

 There is great value in learning on a non-trivial dataset within a
  familiar domain.
Next steps ……


Explore options to manage model variability
and increase dimensionality of predictor set.

Extend analysis of predictor interactions.

Develop mechanism of ‘point-of-care’ patient
scoring.

Apply techniques to new problems and
dataset.
Any Questions?




raronoff@hmc.psu.edu

Predicting Hospital Readmission Using TreeNet

  • 1.
    Predicting Hospital Readmission usingTreeNet™ Robert Aronoff MD
  • 2.
    The vision ….. Streamlinedsequence of processes EMR Predictive Model Decision Support Clinical Workflow Creation at Point of Care  Capture data entered as  Automated E-T-L  Vendor ‘neutral’ scoring part of routine clinical processes tools: workflow  Machine learning  Intranet based algorithms for target class  JSON serialization prediction
  • 3.
    Agenda / Tableof contents 1 Readmission after Heart Failure 2 Data Structure of an Electronic Medical Record 3 TreeNet™ Modeling with our Dataset 4 Lessons Learned and Next Step(s)
  • 4.
  • 5.
    Data Modeling Paradigm ©Salford Systems, 2011
  • 6.
    Model Accuracy Completeness ofSet Feature Selection Feature Fit Target Class Kattan MW, EuroUro. 2011; (59): 566-567
  • 7.
    Model Error: TheBias – Variance Decomposition Prediction Error = Irreducible Error + Bias² + Variance Hastie T, Tibshirani R, Friedman JH. The elements of statistical learning : Data mining, inference, and prediction. New York, NY: Springer; 2009.
  • 8.
    Model caveats Associationdoes not prove causality Models are retrospective (observational) and therefore hypothesis generating (i.e. not hypothesis proving)
  • 9.
  • 10.
    Congestive Heart Failure Common cause for admission.  Readmission in excess of 23%. Bueno, H. et al. JAMA 2010;303:2141-2147  Risk factors for readmission extensively studied.  Published reviews cite over 120 studies. - Methods: Logistic regression; Cox proportional hazard - C-statistic in 0.6-0.7 range  Reduction of readmission has been declared a national goal.  Improved risk models have the potential to more effectively deploy targeted disease management.
  • 11.
    EMR data structure Data collected for clinical workflow.  Large volume - Multiple observations; repeated measures - Many interactions and interdependencies  Complex dataset - Continuous, Ordinal, Nominal (low and high order), Binary - ‘High-order variable-dimension nominal variables’  Missing data: - May represent error or practice patterns  Unbalanced classes  Outliers and Entry errors
  • 12.
    Preliminary Dataset - 1612consecutive heart failure discharges abstracted - 1280 candidate predictors screened - Target class: Readmission at 30 days ( binary ) Administrative candidate predictors Clinical candidate predictors •Admission source, status, service •Specialty medical services consulted •Age, gender, race •Specialty ancillary services consulted •Primary/secondary payers •Blood laboratory values •Primary/secondary diagnoses (names •Medications name / therapeutic class and condition categories) •Dosages of medications •Total length of stay, ICU length of stay •Patient weights during hospitalization •Hospital costs and charges •Transfusions during hospitalization •Discharge status and disposition •Nursing assessments •All-cause same-center admission in •Education topics preceding year •Diagnostic tests ordered •Ordersets utilized Preliminary Unpublished Data
  • 13.
    Benefits of StochasticGradient Boosting Friedman JH. Stochastic gradient boosting. Computational Statistics and Data Analysis 2002; 38(4):367- 378. Input and processing Output  Does not require data  High model accuracy transformation  Classification and regression  Handles large numbers of  Non-parametric application of categorical and continuous logistic , L1, L2, or Huber-M variables loss function  Has mechanisms for: - Feature selection - Managing missing values - Assessing the relationship of predictors to target  Robust to: - Data entry errors, Outliers, Target misclassification
  • 14.
    TreeNet™ Modeling withour Dataset 1 Parameters of ‘feature fit’ 2 Parameters of ‘feature selection’ 3 Elements of insight 4 Putting it all together
  • 15.
    TreeNet – parametersof ‘feature fit’ Do not forget the manual …..
  • 16.
    Feature selection –variable importance Variable Importance Calculation  Squared relative improvement is the sum of squared improvements (in squared error risk) over all internal nodes for which the variable was chosen as the splitting variable  Measurements are relative and is customary to express largest at 100 and scale other variables appropriately
  • 17.
    Insight into themodel Illuminating the ‘black box’ with partial dependence Preliminary Unpublished Data
  • 18.
    Approach to featureselection Domain ‘Neutral’ vs. Domain ‘Centric’ Domain Neutral Both Domain Centric  Start with a subset  Know your data  Use all potential based on univariate  Univariate stats predictors significance (i.e. P-  Use knowledge of value below a given  Application of Variable Importance target and predictors to level) or variance make decisions on above a given  Screening with inclusion (or rejection) threshold batteries of predictor  Forward and backward stepwise progression
  • 19.
    Model Variability Establishing AUCprecision and accuracy Variation Accuracy / Precision  The model is fit via sampling (i.e.  S.E.M.= S.D. / sqrt ( N ) stochastic) process.  Precision (95%) ≈ 4 * S.E.M.  S.D. = 0.03 N trials 10 30 300 S.E.M. .0095 .0055 .0017 Precision (95%) .038 .022 .007
  • 20.
    Precision and predictorselection STEP_1 (197) (0.531) STEP_66 (737) (0.703) 0.75 Min = 0.5057 Median = 0.6738 0.70 Mean = 0.6500 Max = 0.7034 Avg. ROC 0.65 Test ROC 0.60 0.55 0.50  AUC estimated using CV-10 ( = 10 trees)  SEM .0095 and precision (95%) of .038  Repeating CV-10 (using CVR battery) 30 times  SEM .0017 and precision (95%) of .007  Profound implication on dimensionality of model achievable without domain knowledge input.
  • 21.
    How much ofa change in AUC is clinically relevant ? Gain Curve complements ROC curve Preliminary Unpublished Data
  • 22.
    Useful batteries forfeature selection Methods of forward and backward selection STEPWISE  Testing set to CV-10  Select predictor 1-2 at a time  Confirm with CVR battery SHAVING
  • 23.
    BUILDING A MODEL **Each change confirmed with CVR (30 reps). Review partial This is a multi-step process dependence plot. Run model with all candidate Use backward and forward Re-examine discarded predictors. Select N highest selection to reduce predictors in smaller important predictors. preliminary model to a core groups. Use backward N= 2-3 x final size 5 -15 predictors.** and forward selection.** Step 1 Step 2 Step 3 Step 4 Step 5 Run batteries to assess Review predictors and use domain parameters of feature ‘fit’. knowledge to eliminate redundant Assess model (AUC) (dependent) predictors and consider variability. Repeat as predictors of known value. ** needed through process.
  • 24.
    Initial runs Information contentand irreducible error #287 (0.519) #287 (0.519) (0.436) 0.576 0.6 Cross Entropy 0.576 0.5 0.4 0.3 Train 0.2 0.1 Test 6 Nodes 0.0 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 Number of Trees #880 (0.518) #880 (0.518) (0.457) 0.576 0.6 Cross Entropy 0.577 0.5 0.4 0.3 Train 0.2 0.1 Test 2 Nodes 0.0 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 Number of Trees Preliminary Unpublished Data
  • 25.
    Sample Model GAIN CURVE ROC CURVE FEATURE SELECTION SET MODEL TRAINING SET Preliminary Unpublished Data
  • 26.
    Sample partial dependenceplots The value of non-parametric regression Admissions within prior year ICU Days Anion Gap Initial Systolic BP Final BNP BUN-Creatinine Ratio Preliminary Unpublished Data
  • 27.
    Prospective application Additional heartfailure discharges can be scored against the model GAIN curve ROC curve Preliminary Unpublished Causes for performance shift Data  Overfitting in the original model  Concomitant intervention programs are altering patient risk of readmission
  • 28.
    Non-influential candidate predictors Modelsfavor continuous over binary ‘dummy’ variables Diagnoses and QualNet Condition Categories Medications and Therapeutic Categories Diagnostic Tests Ordersets Submitted Preliminary Unpublished Data
  • 29.
    Lessons learned  TreeNet( stochastic gradient boosting) is extremely well suited to data structure of EMR data.  Insight in to dataset is a rich feature (in and above prediction performance).  Model performance variance is important in feature selection. - Consequence of limited information content in our dataset.  Batteries are useful. - PARTITION – Variability assessment - CVR – Model assessment - STEPWISE – Forward selection - SHAVING – Backward selection  There is great value in learning on a non-trivial dataset within a familiar domain.
  • 30.
    Next steps …… Exploreoptions to manage model variability and increase dimensionality of predictor set. Extend analysis of predictor interactions. Develop mechanism of ‘point-of-care’ patient scoring. Apply techniques to new problems and dataset.
  • 31.

Editor's Notes

  • #14 Stochastic Gradient Boosting is the algorithm that underlies the TreeNet application. Discussing this at a Salford conference is like bringing coal to Newcastle – won’t embarrass myself –Extorts several characteristics that are attractive for EMR ( and most any) datasets