How to Reject Outliers in Data_ 4 Steps (With Pictures) - WikiHow
Transcript of How to Reject Outliers in Data_ 4 Steps (With Pictures) - WikiHow
3/23/2014 How to Reject Outliers in Data: 4 Steps (with Pictures) - wikiHow
http://www.wikihow.com/index.php?title=Reject-Outliers-in-Data&printable=yes 1/3
How to Reject Outliers in DataExperimental data must be scrutinized for outliers in order to draw meaningful
conclusions from it. In the simplest of cases, this is achieved by computing the
mean and the standard deviation using all the data points and rejecting any that
are over 3 standard deviations away from the mean.
However, as the number of samples in the dataset increases, the probability of
seeing extreme samples also increases. To account for the increased
likelihood of coming across extreme values, the following modifications are
suggested.
1
2
3
4
Compute the mean using all the data points, including suspected outliers.
Compute the standard deviation using
For each data point, xi, compute, in a separate column, the number of
standard deviations that each data point is away from the mean. Use the
following steps to calculate the probability of each data point occurring:
For each z > 0, compute Nα, the area under the normal distribution curve between
z and ∞, in a separate column. You may do this in Excel using N* (1 - the
normsdist()) function, or using the following formula:
For each z < 0, compute Nα, the area under the normal distribution curve between
-∞ and z, in a separate column. You may do this in Excel using N* the normsdist()
function, or using the following formula:
If Nα < 0.05, reject the data point as an outlier.
The figure below shows a series of data points with the first two intentionally
set to be visibly different from the others. There were 80 data points, with a mean
of 1122.6 and a standard deviation of 1.430.
The low outlier was 1117, with a computed z=3.899. The Nα value was 0.004,
which is less than 0.05, so this point may be safely rejected as an outlier.
The high outlier was 1128, with a computer z=3.794. The Nα value was 0.006,
which is less than 0.05, so this point may also be safely rejected as an outlier.
Steps
3/23/2014 How to Reject Outliers in Data: 4 Steps (with Pictures) - wikiHow
http://www.wikihow.com/index.php?title=Reject-Outliers-in-Data&printable=yes 2/3
Save
Add your own methodName your method
Add your steps using an ordered list. For example:1. Step one2. Step two3. Step three
If outliers occur, the reason for the outlier should be identified prior to
discarding it. If a value is a data entry error or from another process it
should be corrected if possible rather than deleting it. If the value is from
the process or population you are studying and is not a data entry error it
should not be deleted. It is a part of the natural variability in the data and
should be included in quantifying the variability.
This procedure assumes the values generated by the process or
population follow a normal distribution. Although measurement errors
may follow a normal distribution in many cases, many populations and
processes may not follow a normal distribution. As a result the
procedure described in this article may result in incorrectly deleting
values from the data. Also even with data that is normally distributed
some values beyond 3 standard deviations will occur with a large
number of observations.
It is not considered good statistical practice to discard outliers without
strong cause. Discarding outliers without cause typically results in
underestimating the actual variability of the process that generates the
data. Outliers typically occur from three possible causes:
Data entry error.
Values from another population or process.
Actual unusual values in the data.
Tips
Warnings
Article Info
3/23/2014 How to Reject Outliers in Data: 4 Steps (with Pictures) - wikiHow
http://www.wikihow.com/index.php?title=Reject-Outliers-in-Data&printable=yes 3/3
Thanks to all authors for creating a page that has been read 37,901 times.
Categories: Probability and Statistics
Recent edits by: Teresa, Luv_sarah, Lucky7