Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10,...
-
date post
22-Dec-2015 -
Category
Documents
-
view
227 -
download
5
Transcript of Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10,...
Outliers and Influential Data Points in Regression Analysis
James P. Stevens
sujin jangnovember 10, 2008
Beware of Outliers
• Regression is sensitive to outliers– Important to detect outliers and influential points
• Summary stats can be misleading…– Important to explore the data, rather than relying
on just 1-2 summary stats
So what should we do?
• Ways of Detecting Outliers:– Studentized residuals for outliers on y– Mahalanobis distance &Hat matrix for outliers in
the space of predictors
Types of Outliers• Classifying Outliers:
- Outliers in the space of outcomes (outliers on y)- Outliers in the space of predictors (outliers on x)
So what should we do?
• Ways of Detecting Outliers:– Studentized residuals for outliers on y– Mahalanobis distance &Hat matrix for outliers in
the space of predictors
So what should we do?
• Ways of Detecting Outliers:– Studentized residuals for outliers on y– Mahalanobis distance &Hat matrix for outliers in
the space of predictors
BUT…The points they identify will not necessarily be influential in affecting the regression coefficients…
Cook’s Distance:Identifying Influential Points
• A measure of the change in the regression coefficients that would occur if the case was omitted. – Affected by both the case being an outlier on y and in
the set of predictors – Measures the joint (combined) influence on the case
being an outlier on y and on x
Now what?
Step 1. Detect Step 2. IsolateStep 3. Examine
-Are they qualitatively different?-Are they influential?Another thing to consider:
influential “clusters”?
Now what?
Step 1. Detect Step 2. IsolateStep 3. Examine
-Are they qualitatively different?-Are they influential?
Step 4. Delete or retain as you see fit … Or try both