Change Point Analysis

32
Change Point Analysis Change Point Analysis Zhiheng (Roy) Xu, MS (PhD Candidate) Zhiheng (Roy) Xu, MS (PhD Candidate) Senior Research Scientist Senior Research Scientist Taha A. Kass-Hout, MD, MS Taha A. Kass-Hout, MD, MS Deputy Director for Information Science (Acting) and BioSense Deputy Director for Information Science (Acting) and BioSense Program Manager Program Manager Division of Healthcare Information (DHI) Division of Healthcare Information (DHI) Public Health Surveillance Program Office (PHSPO) Public Health Surveillance Program Office (PHSPO) Office of Surveillance, Epidemiology, and Laboratory Services (OSELS) Office of Surveillance, Epidemiology, and Laboratory Services (OSELS) Centers for Disease Control & Prevention (CDC) Centers for Disease Control & Prevention (CDC) Any views or opinions expressed here do not necessarily represent the views of the CDC, HHS, or any other Any views or opinions expressed here do not necessarily represent the views of the CDC, HHS, or any other entity of the United States government. Furthermore, the use of any product names, trade names, images, or entity of the United States government. Furthermore, the use of any product names, trade names, images, or commercial sources is for identification purposes only, and does not imply endorsement or government commercial sources is for identification purposes only, and does not imply endorsement or government sanction by the U.S. Department of Health and Human Services. sanction by the U.S. Department of Health and Human Services.

description

Prospective anomaly detection methods such as the Modified EARS C2 are commonly adapted and used in public health syndromic surveillance systems. These methods however can produce an excessive false alert rate. We present a combined use of retrospective (e.g., Change Point Analysis (or CPA)) and prospective (e.g., C2) anomaly detection methods. This combined approach will help detect sudden aberrations in addition to subtle changes in local trends, help rule out alarm investigations, and assist with retrospective follow-ups. Examples on the utility of this combined approach in working collaboratively with the scientific community are applied to BioSense emergency departments' visits due to ILI. Methods, limitations, future work, and invitation to the scientific community to collaborate with us will be discussed at this talk.

Transcript of Change Point Analysis

Page 1: Change Point Analysis

Change Point AnalysisChange Point AnalysisZhiheng (Roy) Xu, MS (PhD Candidate)Zhiheng (Roy) Xu, MS (PhD Candidate)Senior Research ScientistSenior Research Scientist

Taha A. Kass-Hout, MD, MSTaha A. Kass-Hout, MD, MSDeputy Director for Information Science (Acting) and BioSense Program Deputy Director for Information Science (Acting) and BioSense Program ManagerManagerDivision of Healthcare Information (DHI)Division of Healthcare Information (DHI)Public Health Surveillance Program Office (PHSPO)Public Health Surveillance Program Office (PHSPO)Office of Surveillance, Epidemiology, and Laboratory Services (OSELS)Office of Surveillance, Epidemiology, and Laboratory Services (OSELS)Centers for Disease Control & Prevention (CDC)Centers for Disease Control & Prevention (CDC)

Any views or opinions expressed here do not necessarily represent the views of the CDC, Any views or opinions expressed here do not necessarily represent the views of the CDC, HHS, or any other entity of the United States government. Furthermore, the use of any HHS, or any other entity of the United States government. Furthermore, the use of any product names, trade names, images, or commercial sources is for identification purposes product names, trade names, images, or commercial sources is for identification purposes only, and does not imply endorsement or government sanction by the U.S. Department of only, and does not imply endorsement or government sanction by the U.S. Department of Health and Human Services. Health and Human Services.

Page 2: Change Point Analysis

Change point

• HIV/AIDs Mortality Rate;• Breast Cancer Screening;• Quality Control;– e.g., Cereal Packaging

• Social Network Change Detection (SNCD)– e.g., An open source social network of

the Al-Qaeda terrorist organization*

* McCulloh, I., Webb, M., Carley, K.M. (2007). Social Network Monitoring of Al-Qaeda. Network Science Report, Vol 1, pp 25–30.

Page 3: Change Point Analysis

Change point analysis (CPA)

• Purpose– CPA aims at detecting any change in the

mean of a process (e.g., time series)

• Use CPA to answer:– Did a change occur?– Did more than one change occur?–When did the changes occur?–With what confidence did the changes

occur?

Page 4: Change Point Analysis

Time-series data

• A sequence of data points, measured typically at successive times spaced at uniform time intervals, e.g. stock price, mortgage rate, interest rate, etc.

Source Google, Inc.

Page 5: Change Point Analysis

Control Chart

• Invented by Walter A. Shewhart in 1920s to improve the reliability of their telephony transmission systems in Bell Labs.

Walter A. Shewhart, Ph.D. (1891-1967)

Image source at http://en.wikipedia.org/wiki/Walter_A._Shewhart

Page 6: Change Point Analysis

Control Chart

Upper Control Limit (UCL)= µ + 3σ Lower Control Limit (LCL) = µ - 3σ where µ is the sample mean (central line) and σ is the sample standard deviation.

Page 7: Change Point Analysis

SIX SIGMA

A six-sigma process is one in which 99.99966% of the products manufactured are free of defects.

Page 8: Change Point Analysis

CPA vs. Control Charts

CPA Control Charts

Data type AnyNormal distributed

data

Type of changes

Major and subtle changes

Major changes only

Mean Mean-shift Stable mean

Computation Depends on the

algorithmsSimple and fast

Page 9: Change Point Analysis

CPA Benefits

• Detect changes in historic data;• Investigate what caused the changes;• Real-time trend analysis;• When was the last change in % ED

visits due to ILI;• Forecasting;• Since last change, is influenza

activity going up, down or stable?

Page 10: Change Point Analysis

CPA method 1

• Cumulative Sum (CUSUM)– Based on mean-shift model:

–Maximizing the absolute cumulative sum of residuals;

– Data assumption: identical and independent (iid);

– Statistical inferences through bootstrapping.

Page 11: Change Point Analysis

CUSUM*

Step 1: sample mean

Step 2: residuals

Step 3: cusum of residuals

0 ε1 ε1+ε2 ε1+ε2+ε3

… ε1+ε2+…+εn* Kass-Hout, et al, The Joint Statistical Meeting, Vancouver, CA. August, 2010.

Page 12: Change Point Analysis

CUSUM*Level 1: Find a change point maximizing |S|

Step 4: plot the cusum and find where is the maximum of absolute cusum.

* Kass-Hout, et al, The Joint Statistical Meeting, Vancouver, CA. August, 2010.

Page 13: Change Point Analysis

CUSUM*Level 2: Find a change point on each sub-

series

* Kass-Hout, et al, The Joint Statistical Meeting, Vancouver, CA. August, 2010.

Step 5: Break the time-series into two segments and repeat step 1-5.

Page 14: Change Point Analysis

CUSUM*Level n: Final result

* Kass-Hout, et al, The Joint Statistical Meeting, Vancouver, CA. August, 2010.

Page 15: Change Point Analysis

CPA method 2

• Structural Change Model (SCM)– Based on mean-shift model;–Minimizing the sum of squared residuals;– Pros:

• Allow for autoregressive data;• Incorporate independent covariates;• Asymptotic distribution for change points;

– Cons:• Assume a stationary process;• Mathematically complexity.

Page 16: Change Point Analysis

Structure Change Model

• A time-series data Xi, i = 1, …, N• Break series into two segments at any

location k• Sum of squared residuals (SSR) is computed

as

• The change point is located at

Page 17: Change Point Analysis

Change points and statistical inference

Level

CUSUM Structural Change Model

Change Point

95% CI* P value

Change Point

95% CI* P valu

elb* ub* lb* ub*

111/27/20

082/4/2006

11/1/2009

012/13/200

810/14/20

0812/17/20

080

26/23/200

91/20/200

94/4/2010 0 6/22/2009

6/21/2009

7/31/2009

0

310/4/200

97/28/200

94/28/201

00 9/18/2009 9/6/2009

9/21/2009

0

41/20/201

011/3/200

95/7/2010 0 1/5/2010 1/4/2010

1/10/2010

0

5 3/1/2010 2/3/20105/21/201

00 2/16/2010

2/14/2010

2/18/2010

0

6 4/6/20103/12/201

05/24/201

00 4/5/2010 4/1/2010

4/14/2010

0

*: 95% CI= 95% confidence interval; lb=lower bound; ub=upper bound.

Page 18: Change Point Analysis

CUSUM vs. SCM

“I have long given up on CUSUM type procedures (and any of the variants). The tests are plagued with problems of non-monotonic power and to get a date and confidence interval for the break date is not trivial and most methods don't work well.”

“The main difference is that I do not use asymptotic results, but instead employ the computer intensive bootstrapping approach to determine confidence levels and intervals so as to make the procedure nonparametric. ”

Wayne Taylor, Ph.D. Pierre Perron, Ph.D.

Page 19: Change Point Analysis

CPA Method 3

• Bayesian CPA–Weak Prior– Posterior distributions of the change

points

Thomas Bayes (1702-1761)

Image source at http://en.wikipedia.org/wiki/Thomas_Bayes

Page 20: Change Point Analysis

Order TimePosterior probabili

ty1 4/25/2009 12 6/14/2009 13 5/18/2009 0.994 5/22/2009 0.9825 5/25/2009 0.9826 5/14/2009 0.987 6/2/2009 0.9368 1/25/2008 0.929 2/24/2008 0.868

10 5/15/2009 0.85

1112/24/200

80.846

12 5/3/2009 0.81813 2/22/2009 0.806

1411/25/200

90.764

15 7/5/2009 0.74816 6/21/2009 0.71417 11/6/2009 0.6418 6/7/2009 0.62419 1/3/2010 0.61

2011/30/200

90.608

2110/16/200

90.538

2212/24/200

90.532

Q: What is the probability of change occurred?

Bayesian CPA

Page 21: Change Point Analysis

Autocorrelation Simulation

• Autocorrelation in Biosurveillance data

• CUSUM Assumption– Identical– Independent

Page 22: Change Point Analysis

Simulation (cont’d)

• Purpose:– Check CPA robustness and accuracy.– Based on first-order Autoregressive

modelX1 = µ

Xi = ρ Xi-1 + εi , εi ~ N (0, σ2)where i = 2,…,100 and ρ is the autocorrelation

coefficient with ρ = -1, -.8, -.5, -.2, 0, .2, .5, .8, 1

Page 23: Change Point Analysis

Simulation (cont’d)

• CPρ : change point at ρ level;– CP0 : change point at ρ=0 ( treated as

iid sample);– For ρ≠0, if CPρ = CP0 , it is a match; – otherwise, it is mismatch.

• Run 1000 simulations;• % of matches in 1000 simulations.

Page 24: Change Point Analysis

Simulation (cont’d)

Conclusion: Taylor’s CUSUM method is robust in detecting change points in autocorrelated data with ≥80% matching probability at |ρ|≤0.2.

CPρ = CP0

CPρ = CP0±3CPρ = CP0

CPρ = CP0±3

ρ

Page 25: Change Point Analysis

Real Time Trend Analysis

 Moderately Up   Slightly Up   Slightly Down  ForecastModerately Down  

Page 26: Change Point Analysis

Forecasting

• Historic data since last change point;• Forecasting model:– First-order Autoregressive (AR) model;

Xi = ρ Xi-1 + εi , εi ~ N (0, σ2)– Forecast two weeks influenza activity;

• Change point analysis:– Is there any changes since last change point?– Is influenza activity going up, down, or stable

since last change point?

Page 27: Change Point Analysis

Forecasting (cont’d)

Since last detect change, no additional significant changes have been detected; Influenza activity is stable.

Page 28: Change Point Analysis

Conclusions

• CPA is a very useful tool in analyzing surveillance data;

• CPA and control chart/aberration detection method in complimentary fashion;

• Real-time trend analysis;• Integrate CPA in forecasting model.

Page 29: Change Point Analysis

Open-Access Scientific Collaboration

https://sites.google.com/site/changepointanalysis

58 Collaborators, > 100 users from 46 cities

Page 30: Change Point Analysis

Future work

• CPA on counts of ED visits due to ILI;• CPA and forecasting;• Open source programs in R;• Manuscripts

Page 31: Change Point Analysis

References

• Kass-Hout, T., Park, S., Xu, R. McMurray, P. BioSense Program: Scientific Collaboration. The Joint Statistical Meeting, Vancouver, CA. August, 2010.

• Bai, J. Estimation of a change point in multiple regression models. Review of Economics and Statistics, 79: 551-563, 1997.

• Bai, J. and Perron, P. Computation and analysis of multiple structural change models. Journal of Applied Economics, 18: 1-22, 2003.

• bcp: An R package for performing a Bayesian analysis of change point problems. Journal of Statistical Software, 23 (3): 1-13, 2007.

• Wayne A. Taylor, Change-Point Analysis: A Powerful New Tool for Detecting Changes. Retrieved from http://www.variation.com/anonftp/pub/changepoint.pdf

Page 32: Change Point Analysis

AcknowledgementCDCSam Groseclose, DVM, MPH Paul McMurray, MDSSoyoun Park, MS

OthersRafal Raciborshi, Ph.D, Econometrician, STATA Corp, College Station, TX.Wayne Taylor, Ph.D, President of Taylor Enterprise, Inc.Pierre Perron, Ph.D, Professor of Economics, Boston UniversityYajun Mei, Ph.D, Asst. Professor of Statistics, Georgia TechElena Pesavento, Ph.D, Assoc. Professor of Economics, Emory University