Change Point Analysis
-
Upload
taha-kass-hout-md-ms -
Category
Health & Medicine
-
view
4.410 -
download
0
description
Transcript of Change Point Analysis
Change Point AnalysisChange Point AnalysisZhiheng (Roy) Xu, MS (PhD Candidate)Zhiheng (Roy) Xu, MS (PhD Candidate)Senior Research ScientistSenior Research Scientist
Taha A. Kass-Hout, MD, MSTaha A. Kass-Hout, MD, MSDeputy Director for Information Science (Acting) and BioSense Program Deputy Director for Information Science (Acting) and BioSense Program ManagerManagerDivision of Healthcare Information (DHI)Division of Healthcare Information (DHI)Public Health Surveillance Program Office (PHSPO)Public Health Surveillance Program Office (PHSPO)Office of Surveillance, Epidemiology, and Laboratory Services (OSELS)Office of Surveillance, Epidemiology, and Laboratory Services (OSELS)Centers for Disease Control & Prevention (CDC)Centers for Disease Control & Prevention (CDC)
Any views or opinions expressed here do not necessarily represent the views of the CDC, Any views or opinions expressed here do not necessarily represent the views of the CDC, HHS, or any other entity of the United States government. Furthermore, the use of any HHS, or any other entity of the United States government. Furthermore, the use of any product names, trade names, images, or commercial sources is for identification purposes product names, trade names, images, or commercial sources is for identification purposes only, and does not imply endorsement or government sanction by the U.S. Department of only, and does not imply endorsement or government sanction by the U.S. Department of Health and Human Services. Health and Human Services.
Change point
• HIV/AIDs Mortality Rate;• Breast Cancer Screening;• Quality Control;– e.g., Cereal Packaging
• Social Network Change Detection (SNCD)– e.g., An open source social network of
the Al-Qaeda terrorist organization*
* McCulloh, I., Webb, M., Carley, K.M. (2007). Social Network Monitoring of Al-Qaeda. Network Science Report, Vol 1, pp 25–30.
Change point analysis (CPA)
• Purpose– CPA aims at detecting any change in the
mean of a process (e.g., time series)
• Use CPA to answer:– Did a change occur?– Did more than one change occur?–When did the changes occur?–With what confidence did the changes
occur?
Time-series data
• A sequence of data points, measured typically at successive times spaced at uniform time intervals, e.g. stock price, mortgage rate, interest rate, etc.
Source Google, Inc.
Control Chart
• Invented by Walter A. Shewhart in 1920s to improve the reliability of their telephony transmission systems in Bell Labs.
Walter A. Shewhart, Ph.D. (1891-1967)
Image source at http://en.wikipedia.org/wiki/Walter_A._Shewhart
Control Chart
Upper Control Limit (UCL)= µ + 3σ Lower Control Limit (LCL) = µ - 3σ where µ is the sample mean (central line) and σ is the sample standard deviation.
3σ
3σ
SIX SIGMA
A six-sigma process is one in which 99.99966% of the products manufactured are free of defects.
CPA vs. Control Charts
CPA Control Charts
Data type AnyNormal distributed
data
Type of changes
Major and subtle changes
Major changes only
Mean Mean-shift Stable mean
Computation Depends on the
algorithmsSimple and fast
CPA Benefits
• Detect changes in historic data;• Investigate what caused the changes;• Real-time trend analysis;• When was the last change in % ED
visits due to ILI;• Forecasting;• Since last change, is influenza
activity going up, down or stable?
CPA method 1
• Cumulative Sum (CUSUM)– Based on mean-shift model:
–Maximizing the absolute cumulative sum of residuals;
– Data assumption: identical and independent (iid);
– Statistical inferences through bootstrapping.
CUSUM*
Step 1: sample mean
Step 2: residuals
Step 3: cusum of residuals
0 ε1 ε1+ε2 ε1+ε2+ε3
… ε1+ε2+…+εn* Kass-Hout, et al, The Joint Statistical Meeting, Vancouver, CA. August, 2010.
CUSUM*Level 1: Find a change point maximizing |S|
Step 4: plot the cusum and find where is the maximum of absolute cusum.
* Kass-Hout, et al, The Joint Statistical Meeting, Vancouver, CA. August, 2010.
CUSUM*Level 2: Find a change point on each sub-
series
* Kass-Hout, et al, The Joint Statistical Meeting, Vancouver, CA. August, 2010.
Step 5: Break the time-series into two segments and repeat step 1-5.
CUSUM*Level n: Final result
* Kass-Hout, et al, The Joint Statistical Meeting, Vancouver, CA. August, 2010.
CPA method 2
• Structural Change Model (SCM)– Based on mean-shift model;–Minimizing the sum of squared residuals;– Pros:
• Allow for autoregressive data;• Incorporate independent covariates;• Asymptotic distribution for change points;
– Cons:• Assume a stationary process;• Mathematically complexity.
Structure Change Model
• A time-series data Xi, i = 1, …, N• Break series into two segments at any
location k• Sum of squared residuals (SSR) is computed
as
• The change point is located at
Change points and statistical inference
Level
CUSUM Structural Change Model
Change Point
95% CI* P value
Change Point
95% CI* P valu
elb* ub* lb* ub*
111/27/20
082/4/2006
11/1/2009
012/13/200
810/14/20
0812/17/20
080
26/23/200
91/20/200
94/4/2010 0 6/22/2009
6/21/2009
7/31/2009
0
310/4/200
97/28/200
94/28/201
00 9/18/2009 9/6/2009
9/21/2009
0
41/20/201
011/3/200
95/7/2010 0 1/5/2010 1/4/2010
1/10/2010
0
5 3/1/2010 2/3/20105/21/201
00 2/16/2010
2/14/2010
2/18/2010
0
6 4/6/20103/12/201
05/24/201
00 4/5/2010 4/1/2010
4/14/2010
0
*: 95% CI= 95% confidence interval; lb=lower bound; ub=upper bound.
CUSUM vs. SCM
“I have long given up on CUSUM type procedures (and any of the variants). The tests are plagued with problems of non-monotonic power and to get a date and confidence interval for the break date is not trivial and most methods don't work well.”
“The main difference is that I do not use asymptotic results, but instead employ the computer intensive bootstrapping approach to determine confidence levels and intervals so as to make the procedure nonparametric. ”
Wayne Taylor, Ph.D. Pierre Perron, Ph.D.
CPA Method 3
• Bayesian CPA–Weak Prior– Posterior distributions of the change
points
Thomas Bayes (1702-1761)
Image source at http://en.wikipedia.org/wiki/Thomas_Bayes
Order TimePosterior probabili
ty1 4/25/2009 12 6/14/2009 13 5/18/2009 0.994 5/22/2009 0.9825 5/25/2009 0.9826 5/14/2009 0.987 6/2/2009 0.9368 1/25/2008 0.929 2/24/2008 0.868
10 5/15/2009 0.85
1112/24/200
80.846
12 5/3/2009 0.81813 2/22/2009 0.806
1411/25/200
90.764
15 7/5/2009 0.74816 6/21/2009 0.71417 11/6/2009 0.6418 6/7/2009 0.62419 1/3/2010 0.61
2011/30/200
90.608
2110/16/200
90.538
2212/24/200
90.532
Q: What is the probability of change occurred?
Bayesian CPA
Autocorrelation Simulation
• Autocorrelation in Biosurveillance data
• CUSUM Assumption– Identical– Independent
Simulation (cont’d)
• Purpose:– Check CPA robustness and accuracy.– Based on first-order Autoregressive
modelX1 = µ
Xi = ρ Xi-1 + εi , εi ~ N (0, σ2)where i = 2,…,100 and ρ is the autocorrelation
coefficient with ρ = -1, -.8, -.5, -.2, 0, .2, .5, .8, 1
Simulation (cont’d)
• CPρ : change point at ρ level;– CP0 : change point at ρ=0 ( treated as
iid sample);– For ρ≠0, if CPρ = CP0 , it is a match; – otherwise, it is mismatch.
• Run 1000 simulations;• % of matches in 1000 simulations.
Simulation (cont’d)
Conclusion: Taylor’s CUSUM method is robust in detecting change points in autocorrelated data with ≥80% matching probability at |ρ|≤0.2.
CPρ = CP0
CPρ = CP0±3CPρ = CP0
CPρ = CP0±3
ρ
Real Time Trend Analysis
Moderately Up Slightly Up Slightly Down ForecastModerately Down
Forecasting
• Historic data since last change point;• Forecasting model:– First-order Autoregressive (AR) model;
Xi = ρ Xi-1 + εi , εi ~ N (0, σ2)– Forecast two weeks influenza activity;
• Change point analysis:– Is there any changes since last change point?– Is influenza activity going up, down, or stable
since last change point?
Forecasting (cont’d)
Since last detect change, no additional significant changes have been detected; Influenza activity is stable.
Conclusions
• CPA is a very useful tool in analyzing surveillance data;
• CPA and control chart/aberration detection method in complimentary fashion;
• Real-time trend analysis;• Integrate CPA in forecasting model.
Open-Access Scientific Collaboration
https://sites.google.com/site/changepointanalysis
58 Collaborators, > 100 users from 46 cities
Future work
• CPA on counts of ED visits due to ILI;• CPA and forecasting;• Open source programs in R;• Manuscripts
References
• Kass-Hout, T., Park, S., Xu, R. McMurray, P. BioSense Program: Scientific Collaboration. The Joint Statistical Meeting, Vancouver, CA. August, 2010.
• Bai, J. Estimation of a change point in multiple regression models. Review of Economics and Statistics, 79: 551-563, 1997.
• Bai, J. and Perron, P. Computation and analysis of multiple structural change models. Journal of Applied Economics, 18: 1-22, 2003.
• bcp: An R package for performing a Bayesian analysis of change point problems. Journal of Statistical Software, 23 (3): 1-13, 2007.
• Wayne A. Taylor, Change-Point Analysis: A Powerful New Tool for Detecting Changes. Retrieved from http://www.variation.com/anonftp/pub/changepoint.pdf
AcknowledgementCDCSam Groseclose, DVM, MPH Paul McMurray, MDSSoyoun Park, MS
OthersRafal Raciborshi, Ph.D, Econometrician, STATA Corp, College Station, TX.Wayne Taylor, Ph.D, President of Taylor Enterprise, Inc.Pierre Perron, Ph.D, Professor of Economics, Boston UniversityYajun Mei, Ph.D, Asst. Professor of Statistics, Georgia TechElena Pesavento, Ph.D, Assoc. Professor of Economics, Emory University