BioSense 2.0 Analytics

Change Point Analysis

June 22, 2012



By TahaKass-Hout and ZhihengXu



BioSense is a national public health surveillance system for early detection and rapid assessment of

potential bioterrorism-related illness. It integrates current health data shared by health departments from

a variety of sources to provide insight on the health of communities and the country. Using statistical

aberration detection methods, public health officials are able toidentify and investigate the

anomaliesboth temporal and spatial. In the first iteration of statistical tools for inclusion in BioSense 2.0

redesign, we introduced the Early Aberration Reporting System (EARS) which has been used

extensively in BioSense for disease anomaly detection. As a complimentary tool to EARS, Change Point

Analysis (CPA) has been implemented in BioSenseto address the limitation of EARS in detecting subtle

changes and characterizing disease trends. In this paper, we will describe how to implement Taylor’s

cumulative sum (CUSUM) CPA method.


CUSUM CPA

Taylor [1] developed a change point analysis method through the iterative application of cumulative sum

charts (CUSUM) and bootstrapping methods to detect changes in time-series and their inferences. This

approach is based on the mean-shift model and assumes that residuals are independent and identically

distributed (iid) with a mean of zero. For time-series data Yi with i=1, …, N, the mean-shift model is

written as             ,where µ is the sample average as                and    is the residual term defined

as              for the ithobservation. The cumulative sums of residuals are calculated as
for i=1, …, N where           The change point at location mis detected through searching for the

maximum absolute CUSUM of residuals where                              . The time-series data is split into

two segments on each side of the change point, and the analysis is repeated for each segment. 1000

Bootstrapping samples are generated to calculate the significance level and 95% confidence interval (CI)

of change points. The following steps summarize how to implement Taylor’s CUSUM CPA to detect

change points:


   1. Prepare the initial time series data.

   2. Calculate the cumulative sum of residuals           .

   3. Find the location with the maximum absolute CUSUM of residuals which is defined as the

       change point.

   4. Calculate the difference between maximum and minimum CUSUM of residuals as

                               where                  and

   5. Determine whether this change point is significant or not via bootstrapping:

           a. Generate a bootstrap sample of N, denoted as                        through reshuffling

                 the original N values.

           b. Calculate the CUSUM of residuals from the bootstrap sample, denoted as



           c. Calculate the maximum, minimum and difference of CUSUM of residuals, denoted as

                                     where

           d. Determine whether the difference of CUSUM from the bootstrap sample is less than

                 the original difference     .
e. Repeat step a-d 1000 times and record the number of the bootstrap samples which

                has                denoted as X.

           f. The significance level is defined as X/1000.

   6. If the significance level ≥95%, it indicates the detected change point is statistically

       significant and then we split the dataset into two subsets from this significant change point;

       if the significance level <95%, it indicates the detected change point is not statistically

       significant and then we stop the splitting.

   7. Repeat step 2-6 in each one of two subsets until no more significant change point is

       detected.


Data Example

The following data were created to illustrate the detection of change points using CUSUM CPA method.


MMWR
week (i)        Percent of visit (Yi) µ       εi       Si         |Si|
            1                 0.001     0.036 -0.03483 -0.03483 0.034827
            2                 0.002     0.036 -0.03383 -0.06865 0.068654
            3                 0.003     0.036 -0.03283 -0.10148 0.101481
            4                 0.002     0.036 -0.03383 -0.13531 0.135308
            5                 0.008     0.036 -0.02783 -0.16313 0.163135
            6                 0.009     0.036 -0.02683 -0.18996 0.189962
            7                 0.012     0.036 -0.02383 -0.21379 0.213788
            8                 0.011     0.036 -0.02483 -0.23862 0.238615
            9                 0.009     0.036 -0.02683 -0.26544 0.265442
           10                 0.011     0.036 -0.02483 -0.29027 0.290269
           11                 0.021     0.036 -0.01483    -0.3051 0.305096
           12                 0.012     0.036 -0.02383 -0.32892 0.328923
           13                  0.01     0.036 -0.02583 -0.35475 0.35475
           14                 0.008     0.036 -0.02783 -0.38258 0.382577
           15                  0.01     0.036 -0.02583    -0.4084 0.408404
           16                 0.028     0.036 -0.00783 -0.41623 0.416231
           17                 0.023     0.036 -0.01283 -0.42906 0.429058
           18                 0.015     0.036 -0.02083 -0.44988 0.449885
19   0.014   0.036   -0.02183   -0.47171    0.471712
20   0.052   0.036   0.016173   -0.45554    0.455538
21   0.079   0.036   0.043173   -0.41237    0.412365
22   0.064   0.036   0.028173   -0.38419    0.384192
23   0.079   0.036   0.043173   -0.34102    0.341019
24   0.085   0.036   0.049173   -0.29185    0.291846
25   0.072   0.036   0.036173   -0.25567    0.255673
26   0.099   0.036   0.063173     -0.1925     0.1925
27   0.036   0.036   0.000173   -0.19233    0.192327
28    0.07   0.036   0.034173   -0.15815    0.158154
29   0.077   0.036   0.041173   -0.11698    0.116981
30   0.092   0.036   0.056173   -0.06081    0.060808
31   0.111   0.036   0.075173   0.014365    0.014365
32   0.083   0.036   0.047173   0.061538    0.061538
33   0.095   0.036   0.059173   0.120712    0.120712
34   0.072   0.036   0.036173   0.156885    0.156885
35   0.092   0.036   0.056173   0.213058    0.213058
36   0.019   0.036   -0.01683   0.196231    0.196231
37   0.012   0.036   -0.02383   0.172404    0.172404
38   0.023   0.036   -0.01283   0.159577    0.159577
39   0.022   0.036   -0.01383    0.14575     0.14575
40   0.024   0.036   -0.01183   0.133923    0.133923
41   0.012   0.036   -0.02383   0.110096    0.110096
42    0.03   0.036   -0.00583   0.104269    0.104269
43   0.021   0.036   -0.01483   0.089442    0.089442
44   0.026   0.036   -0.00983   0.079615    0.079615
45   0.025   0.036   -0.01083   0.068788    0.068788
46    0.02   0.036   -0.01583   0.052962    0.052962
47   0.026   0.036   -0.00983   0.043135    0.043135
48    0.02   0.036   -0.01583   0.027308    0.027308
49   0.036   0.036   0.000173   0.027481    0.027481
50    0.03   0.036   -0.00583   0.021654    0.021654
51    0.03   0.036   -0.00583   0.015827    0.015827
52    0.02   0.036   -0.01583   6.94E-17    6.94E-17
Percent of visit (Yi)
  0.12

   0.1

  0.08

  0.06

  0.04

  0.02

     0
         1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61


The significance level and 95% CI of change points will be calculated from 1000 bootstrapping samples.

After detecting the first significant change point which is highlighted in yellow in the above table, the

time-series data for MMWR week 1-52 will be split into two segments: MMWR week 1-19 and week

20-52. The analysis will be repeated on each of two segments to determine their change points.


Reference
   1. Taylor, W. Change-Point Analysis: A Powerful New Tool For Detecting Changes. 2010;
       Available from: http://www.variation.com/anonftp/pub/changepoint.pdf.
   2. Barker, N. A Practical Introduction to the Bootstrap Using the SAS System. 2010; Available
       from: http://www.lexjansen.com/phuse/2005/pk/pk02.pdf.
   3. Efron, B.a.T., Robert, An introduction fo the Bootstrap1993, New York: Chapman & Hall.
   4. Kass-Hout TA, Xu Z, McMurray P, Park S, Buckeridge DL, Brownstein JS, Finelli L, Groseclose
       SL. Application of change point analysis to daily influenza-like-illness (ILI) emergency
       department visits. Journal of American Medical Informatics Association (2012), in press.

Change Point Analysis (CPA)

  • 1.
    BioSense 2.0 Analytics ChangePoint Analysis June 22, 2012 By TahaKass-Hout and ZhihengXu BioSense is a national public health surveillance system for early detection and rapid assessment of potential bioterrorism-related illness. It integrates current health data shared by health departments from a variety of sources to provide insight on the health of communities and the country. Using statistical aberration detection methods, public health officials are able toidentify and investigate the anomaliesboth temporal and spatial. In the first iteration of statistical tools for inclusion in BioSense 2.0 redesign, we introduced the Early Aberration Reporting System (EARS) which has been used extensively in BioSense for disease anomaly detection. As a complimentary tool to EARS, Change Point Analysis (CPA) has been implemented in BioSenseto address the limitation of EARS in detecting subtle changes and characterizing disease trends. In this paper, we will describe how to implement Taylor’s cumulative sum (CUSUM) CPA method. CUSUM CPA Taylor [1] developed a change point analysis method through the iterative application of cumulative sum charts (CUSUM) and bootstrapping methods to detect changes in time-series and their inferences. This approach is based on the mean-shift model and assumes that residuals are independent and identically distributed (iid) with a mean of zero. For time-series data Yi with i=1, …, N, the mean-shift model is written as ,where µ is the sample average as and is the residual term defined as for the ithobservation. The cumulative sums of residuals are calculated as
  • 2.
    for i=1, …,N where The change point at location mis detected through searching for the maximum absolute CUSUM of residuals where . The time-series data is split into two segments on each side of the change point, and the analysis is repeated for each segment. 1000 Bootstrapping samples are generated to calculate the significance level and 95% confidence interval (CI) of change points. The following steps summarize how to implement Taylor’s CUSUM CPA to detect change points: 1. Prepare the initial time series data. 2. Calculate the cumulative sum of residuals . 3. Find the location with the maximum absolute CUSUM of residuals which is defined as the change point. 4. Calculate the difference between maximum and minimum CUSUM of residuals as where and 5. Determine whether this change point is significant or not via bootstrapping: a. Generate a bootstrap sample of N, denoted as through reshuffling the original N values. b. Calculate the CUSUM of residuals from the bootstrap sample, denoted as c. Calculate the maximum, minimum and difference of CUSUM of residuals, denoted as where d. Determine whether the difference of CUSUM from the bootstrap sample is less than the original difference .
  • 3.
    e. Repeat stepa-d 1000 times and record the number of the bootstrap samples which has denoted as X. f. The significance level is defined as X/1000. 6. If the significance level ≥95%, it indicates the detected change point is statistically significant and then we split the dataset into two subsets from this significant change point; if the significance level <95%, it indicates the detected change point is not statistically significant and then we stop the splitting. 7. Repeat step 2-6 in each one of two subsets until no more significant change point is detected. Data Example The following data were created to illustrate the detection of change points using CUSUM CPA method. MMWR week (i) Percent of visit (Yi) µ εi Si |Si| 1 0.001 0.036 -0.03483 -0.03483 0.034827 2 0.002 0.036 -0.03383 -0.06865 0.068654 3 0.003 0.036 -0.03283 -0.10148 0.101481 4 0.002 0.036 -0.03383 -0.13531 0.135308 5 0.008 0.036 -0.02783 -0.16313 0.163135 6 0.009 0.036 -0.02683 -0.18996 0.189962 7 0.012 0.036 -0.02383 -0.21379 0.213788 8 0.011 0.036 -0.02483 -0.23862 0.238615 9 0.009 0.036 -0.02683 -0.26544 0.265442 10 0.011 0.036 -0.02483 -0.29027 0.290269 11 0.021 0.036 -0.01483 -0.3051 0.305096 12 0.012 0.036 -0.02383 -0.32892 0.328923 13 0.01 0.036 -0.02583 -0.35475 0.35475 14 0.008 0.036 -0.02783 -0.38258 0.382577 15 0.01 0.036 -0.02583 -0.4084 0.408404 16 0.028 0.036 -0.00783 -0.41623 0.416231 17 0.023 0.036 -0.01283 -0.42906 0.429058 18 0.015 0.036 -0.02083 -0.44988 0.449885
  • 4.
    19 0.014 0.036 -0.02183 -0.47171 0.471712 20 0.052 0.036 0.016173 -0.45554 0.455538 21 0.079 0.036 0.043173 -0.41237 0.412365 22 0.064 0.036 0.028173 -0.38419 0.384192 23 0.079 0.036 0.043173 -0.34102 0.341019 24 0.085 0.036 0.049173 -0.29185 0.291846 25 0.072 0.036 0.036173 -0.25567 0.255673 26 0.099 0.036 0.063173 -0.1925 0.1925 27 0.036 0.036 0.000173 -0.19233 0.192327 28 0.07 0.036 0.034173 -0.15815 0.158154 29 0.077 0.036 0.041173 -0.11698 0.116981 30 0.092 0.036 0.056173 -0.06081 0.060808 31 0.111 0.036 0.075173 0.014365 0.014365 32 0.083 0.036 0.047173 0.061538 0.061538 33 0.095 0.036 0.059173 0.120712 0.120712 34 0.072 0.036 0.036173 0.156885 0.156885 35 0.092 0.036 0.056173 0.213058 0.213058 36 0.019 0.036 -0.01683 0.196231 0.196231 37 0.012 0.036 -0.02383 0.172404 0.172404 38 0.023 0.036 -0.01283 0.159577 0.159577 39 0.022 0.036 -0.01383 0.14575 0.14575 40 0.024 0.036 -0.01183 0.133923 0.133923 41 0.012 0.036 -0.02383 0.110096 0.110096 42 0.03 0.036 -0.00583 0.104269 0.104269 43 0.021 0.036 -0.01483 0.089442 0.089442 44 0.026 0.036 -0.00983 0.079615 0.079615 45 0.025 0.036 -0.01083 0.068788 0.068788 46 0.02 0.036 -0.01583 0.052962 0.052962 47 0.026 0.036 -0.00983 0.043135 0.043135 48 0.02 0.036 -0.01583 0.027308 0.027308 49 0.036 0.036 0.000173 0.027481 0.027481 50 0.03 0.036 -0.00583 0.021654 0.021654 51 0.03 0.036 -0.00583 0.015827 0.015827 52 0.02 0.036 -0.01583 6.94E-17 6.94E-17
  • 5.
    Percent of visit(Yi) 0.12 0.1 0.08 0.06 0.04 0.02 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 The significance level and 95% CI of change points will be calculated from 1000 bootstrapping samples. After detecting the first significant change point which is highlighted in yellow in the above table, the time-series data for MMWR week 1-52 will be split into two segments: MMWR week 1-19 and week 20-52. The analysis will be repeated on each of two segments to determine their change points. Reference 1. Taylor, W. Change-Point Analysis: A Powerful New Tool For Detecting Changes. 2010; Available from: http://www.variation.com/anonftp/pub/changepoint.pdf. 2. Barker, N. A Practical Introduction to the Bootstrap Using the SAS System. 2010; Available from: http://www.lexjansen.com/phuse/2005/pk/pk02.pdf. 3. Efron, B.a.T., Robert, An introduction fo the Bootstrap1993, New York: Chapman & Hall. 4. Kass-Hout TA, Xu Z, McMurray P, Park S, Buckeridge DL, Brownstein JS, Finelli L, Groseclose SL. Application of change point analysis to daily influenza-like-illness (ILI) emergency department visits. Journal of American Medical Informatics Association (2012), in press.