26716080 Skewness Kurtosis

15
Skewness/Kurtosis Skewness is the degree of departure from symmetry of a distribution. A positively skewed distribution has a "tail" which is pulled in the positive direction. A negatively skewed distribution has a "tail" which is pulled in the negative direction. Kurtosis is the degree of peakedness of a distribution. A normal distribution is a mesokurtic distribution. A pure leptokurtic distribution has a higher peak than the normal distribution and has heavier tails. A pure platykurtic distribution has a lower peak than a normal distribution and lighter tails. Most departures from normality display combinations of both skewness and kurtosis different from a normal distribution. Calculating Skewness and Kurtosis There are many methods for calculating skewness and kurtosis indices. Not all computer programs calculate Skewness and Kurtosis the same way. If you use a computer program to obtain skewness and kurtosis indices be sure you know how it calculates them!

Transcript of 26716080 Skewness Kurtosis

Page 1: 26716080 Skewness Kurtosis

Skewness/KurtosisSkewness is the degree of departure from symmetry of a distribution. A positively skewed distribution has a "tail" which is pulled in the positive direction. A negatively skewed distribution has a "tail" which is pulled in the negative direction.

Kurtosis is the degree of peakedness of a distribution. A normal distribution is a mesokurtic distribution. A pure leptokurtic distribution has a higher peak than the normal distribution and has heavier tails. A pure platykurtic distribution has a lower peak than a normal distribution and lighter tails.

Most departures from normality display combinations of both skewness and kurtosis different from a normal distribution.

Calculating Skewness and KurtosisThere are many methods for calculating skewness and kurtosis indices. Not all computer programs calculate Skewness and Kurtosis the same way. If you use a computer program to obtain skewness and kurtosis indices be sure you know how it calculates them!

Page 2: 26716080 Skewness Kurtosis

There are measures of skewness such as Pearson's second coefficient of skewness, which is simply three times the mean minus the median divided by the standard deviation. There are also skewness indices which look at the quartiles, and many others. The most important group of measures of skewness and kurtosis use the third and fourth moments about the mean.

The moments about the mean are simply the sum of [each observed value minus the mean] raised to some power and divided by the sample size. In algebraic form, the rth moment about the mean is:

The second moment about the mean is simply the variance.

The Moment Coefficient of Skewness, denoted by statisticians as g3, is defined in dimensionless form as:

The expected value of this statistic will be zero for symmetrical distributions.

And similarly, the Moment Coefficient of Kurtosis, denoted by statisticians as g4, is defined in dimensionless form as:

This expected value of this statistic will be zero for Normal distributions.

These are the Skewness and Kurtosis formulas that are used by MVPstats, and programs such as SPSS, and Excel.

Critical ValuesThe critical value tables for Skewness and Kurtosis may be found on this Website. (See Skewness Critical Values , and Kurtosis Critical Values.) These tables have been generated to match the formulas above. Note that other tables exist which do not match these formulas, and using them would be misleading.

Using the Critical Value Tables and p-valuesThe tests for skewness and kurtosis are two-sided tests. The null hypothesis to be tested is that the skewness and kurtosis values are zero. The alternative hypothesis generally are that skewness and kurtosis are not equal to zero.

The critical value tables, found on this Website, provide the critical values for different selections of alpha, for various sample sizes.

Page 3: 26716080 Skewness Kurtosis

For skewness, if the absolute value is equal or exceeds the critical value for your level of confidence, reject the assumption of normality.

For kurtosis, if the kurtosis value is greater than or equal to the high critical value, or is less than or equal to the low critical value, reject the assumption of normality.

MVPstats displays p-values for skewness based on a t-distribution where (D'Agostino and Tietjen, 1971):

The 't' approximation has an error in p-value no more than 1/2 percent below samples sizes of 20, compared to the published values. It is actually better (1/10 percent error) for small alpha (0.01-0.02) in this range. In the range 20-35 it has an error not greater than 1/10 percent. Above samples of size 40, the p-value is essentially exact.

The kurtosis random sampling distribution is difficult to model, so p-values cannot be calculated. The program simply looks up in the table and displays the range of the significance. Sample sizes greater than 5,000 use the Normal approximation of the Kurtosis random sampling distribution to generate p-values.

fundamental task in many statistical analyses is to characterize the location and variability of a data set. A further characterization of the data includes skewness and kurtosis.

Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of

Page 4: 26716080 Skewness Kurtosis

the center point.

Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution. That is, data sets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly, and have heavy tails. Data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak. A uniform distribution would be the extreme case.

The histogram is an effective graphical technique for showing both the skewness and kurtosis of data set.

Definition of Skewness

For univariate data Y1, Y2, ..., YN, the formula for skewness is:

where is the mean, is the standard deviation, and N is the number of data points. The skewness for a normal distribution is zero, and any symmetric data should have a skewness near zero. Negative values for the skewness indicate data that are skewed left and positive values for the skewness indicate data that are skewed right. By skewed left, we mean that the left tail is long relative to the right tail. Similarly, skewed right means that the right tail is long relative to the left tail. Some measurements have a lower bound and are skewed right. For example, in reliability studies, failure times cannot be negative.

Definition of Kurtosis

For univariate data Y1, Y2, ..., YN, the formula for kurtosis is:

Page 5: 26716080 Skewness Kurtosis

where is the mean, is the standard deviation, and N is the number of data points.

Alternative Definition of Kurtosis

The kurtosis for a standard normal distribution is three. For this reason, some sources use the following definition of kurtosis (often referred to as "excess kurtosis"):

This definition is used so that the standard normal distribution has a kurtosis of zero. In addition, with the second definition positive kurtosis indicates a "peaked" distribution and negative kurtosis indicates a "flat" distribution.

Which definition of kurtosis is used is a matter of convention (this handbook uses the original definition). When using software to compute the sample kurtosis, you need to be aware of which convention is being followed. Many sources use the term kurtosis when they are actually computing "excess kurtosis", so it may not always be clear.

Examples The following example shows histograms for 10,000 random numbers generated from a normal, a double exponential, a Cauchy, and a Weibull distribution.

Normal Distribution

The first histogram is a sample from a normal distribution. The normal distribution is a symmetric distribution with well-behaved tails. This is indicated by the skewness of 0.03. The kurtosis of 2.96 is near the expected value of 3. The histogram verifies the symmetry.

Double Exponential Distribution

The second histogram is a sample from a double exponential distribution. The double exponential is a symmetric distribution. Compared to the normal, it has a stronger peak,

Page 6: 26716080 Skewness Kurtosis

more rapid decay, and heavier tails. That is, we would expect a skewness near zero and a kurtosis higher than 3. The skewness is 0.06 and the kurtosis is 5.9.

Cauchy Distribution

The third histogram is a sample from a Cauchy distribution.

For better visual comparison with the other data sets, we restricted the histogram of the Cauchy distribution to values between -10 and 10. The full data set for the Cauchy data in fact has a minimum of approximately -29,000 and a maximum of approximately 89,000.

The Cauchy distribution is a symmetric distribution with heavy tails and a single peak at the center of the distribution. Since it is symmetric, we would expect a skewness near zero. Due to the heavier tails, we might expect the kurtosis to be larger than for a normal distribution. In fact the skewness is 69.99 and the kurtosis is 6,693. These extremely high values can be explained by the heavy tails. Just as the mean and standard deviation can be distorted by extreme values in the tails, so too can the skewness and kurtosis measures.

Weibull Distribution

The fourth histogram is a sample from a Weibull distribution with shape parameter 1.5. The Weibull distribution is a skewed distribution with the amount of skewness depending on the value of the shape parameter. The degree of decay as we move away from the center also depends on the value of the shape parameter. For this data set, the skewness is 1.08 and the kurtosis is 4.46, which indicates moderate skewness and kurtosis.

Dealing with Skewness and Kurtosis

Many classical statistical tests and intervals depend on normality assumptions. Significant skewness and kurtosis clearly indicate that data are not normal. If a data set exhibits significant skewness or kurtosis (as indicated by a histogram or the numerical measures), what can we do about it?

One approach is to apply some type of transformation to try to make the data normal, or more nearly normal. The Box-Cox transformation is a useful technique for trying to normalize a data set. In particular, taking the log or square root of a data set is often useful for data that exhibit moderate right skewness.

Another approach is to use techniques based on distributions other than the normal. For example, in reliability studies, the exponential, Weibull, and lognormal distributions are typically used as a basis for modeling rather than using the normal distribution. The probability plot correlation coefficient plot and the probability plot are useful tools for determining a good distributional model for the data.

Software The skewness and kurtosis coefficients are available in most general purpose statistical software programs, including Dataplot.

Page 7: 26716080 Skewness Kurtosis

fundamental task in many statistical analyses is to characterize the location and variability of a data set. A further characterization of the data includes skewness and kurtosis.

Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point.

Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution. That is, data sets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly, and have heavy tails. Data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak. A uniform distribution would be the extreme case.

The histogram is an effective graphical technique for

Page 8: 26716080 Skewness Kurtosis

showing both the skewness and kurtosis of data set.

Definition of Skewness

For univariate data Y1, Y2, ..., YN, the formula for skewness is:

where is the mean, is the standard deviation, and N is the number of data points. The skewness for a normal distribution is zero, and any symmetric data should have a skewness near zero. Negative values for the skewness indicate data that are skewed left and positive values for the skewness indicate data that are skewed right. By skewed left, we mean that the left tail is long relative to the right tail. Similarly, skewed right means that the right tail is long relative to the left tail. Some measurements have a lower bound and are skewed right. For example, in reliability studies, failure times cannot be negative.

Definition of Kurtosis

For univariate data Y1, Y2, ..., YN, the formula for kurtosis is:

where is the mean, is the standard deviation, and N is the number of data points.

Alternative Definition of Kurtosis

The kurtosis for a standard normal distribution is three. For this reason, some sources use the following definition of kurtosis (often referred to as "excess kurtosis"):

This definition is used so that the standard normal distribution has a kurtosis of zero. In addition, with the second definition positive kurtosis indicates a "peaked" distribution and negative kurtosis indicates a "flat" distribution.

Which definition of kurtosis is used is a matter of convention (this handbook uses the original definition). When using software to compute the sample kurtosis, you need to be aware of which convention is being followed. Many sources use the term kurtosis when they are actually computing "excess kurtosis", so it may not always be clear.

Examples The following example shows histograms for 10,000 random numbers generated from a normal, a double exponential, a Cauchy, and a Weibull distribution.

Page 9: 26716080 Skewness Kurtosis

Normal Distribution

The first histogram is a sample from a normal distribution. The normal distribution is a symmetric distribution with well-behaved tails. This is indicated by the skewness of 0.03. The kurtosis of 2.96 is near the expected value of 3. The histogram verifies the symmetry.

Double Exponential Distribution

The second histogram is a sample from a double exponential distribution. The double exponential is a symmetric distribution. Compared to the normal, it has a stronger peak, more rapid decay, and heavier tails. That is, we would expect a skewness near zero and a kurtosis higher than 3. The skewness is 0.06 and the kurtosis is 5.9.

Cauchy Distribution

The third histogram is a sample from a Cauchy distribution.

For better visual comparison with the other data sets, we restricted the histogram of the Cauchy distribution to values between -10 and 10. The full data set for the Cauchy data in fact has a minimum of approximately -29,000 and a maximum of approximately 89,000.

The Cauchy distribution is a symmetric distribution with heavy tails and a single peak at the center of the distribution. Since it is symmetric, we would expect a skewness near zero. Due to the heavier tails, we might expect the kurtosis to be larger than for a normal distribution. In fact the skewness is 69.99 and the kurtosis is 6,693. These extremely high values can be explained by the heavy tails. Just as the mean and standard deviation can be distorted by extreme values in the tails, so too can the skewness and kurtosis measures.

Weibull Distribution

The fourth histogram is a sample from a Weibull distribution with shape parameter 1.5. The Weibull distribution is a skewed distribution with the amount of skewness depending on the value of the shape parameter. The degree of decay as we move away

Page 10: 26716080 Skewness Kurtosis

from the center also depends on the value of the shape parameter. For this data set, the skewness is 1.08 and the kurtosis is 4.46, which indicates moderate skewness and kurtosis.

Dealing with Skewness and Kurtosis

Many classical statistical tests and intervals depend on normality assumptions. Significant skewness and kurtosis clearly indicate that data are not normal. If a data set exhibits significant skewness or kurtosis (as indicated by a histogram or the numerical measures), what can we do about it?

One approach is to apply some type of transformation to try to make the data normal, or more nearly normal. The Box-Cox transformation is a useful technique for trying to normalize a data set. In particular, taking the log or square root of a data set is often useful for data that exhibit moderate right skewness.

Another approach is to use techniques based on distributions other than the normal. For example, in reliability studies, the exponential, Weibull, and lognormal distributions are typically used as a basis for modeling rather than using the normal distribution. The probability plot correlation coefficient plot and the probability plot are useful tools for determining a good distributional model for the data.

Software The skewness and kurtosis coefficients are available in most general purpose statistical software programs, including Dataplot.

fundamental task in many statistical analyses is to characterize the location and variability of a data set. A further characterization of the data includes skewness and kurtosis.

Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point.

Kurtosis is a

Page 11: 26716080 Skewness Kurtosis

measure of whether the data are peaked or flat relative to a normal distribution. That is, data sets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly, and have heavy tails. Data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak. A uniform distribution would be the extreme case.

The histogram is an effective graphical technique for showing both the skewness and kurtosis of data set.

Definition of Skewness

For univariate data Y1, Y2, ..., YN, the formula for skewness is:

where is the mean, is the standard deviation, and N is the number of data points. The skewness for a normal distribution is zero, and any symmetric data should have a skewness near zero. Negative values for the skewness indicate data that are skewed left and positive values for the skewness indicate data that are skewed right. By skewed left, we mean that the left tail is long relative to the right tail. Similarly, skewed right means that the right tail is long relative to the left tail. Some measurements have a lower bound and are skewed right. For example, in reliability studies, failure times cannot be negative.

Definition of Kurtosis

For univariate data Y1, Y2, ..., YN, the formula for kurtosis is:

Page 12: 26716080 Skewness Kurtosis

where is the mean, is the standard deviation, and N is the number of data points.

Alternative Definition of Kurtosis

The kurtosis for a standard normal distribution is three. For this reason, some sources use the following definition of kurtosis (often referred to as "excess kurtosis"):

This definition is used so that the standard normal distribution has a kurtosis of zero. In addition, with the second definition positive kurtosis indicates a "peaked" distribution and negative kurtosis indicates a "flat" distribution.

Which definition of kurtosis is used is a matter of convention (this handbook uses the original definition). When using software to compute the sample kurtosis, you need to be aware of which convention is being followed. Many sources use the term kurtosis when they are actually computing "excess kurtosis", so it may not always be clear.

Examples The following example shows histograms for 10,000 random numbers generated from a normal, a double exponential, a Cauchy, and a Weibull distribution.

A high kurtosis distribution has a sharper peak and longer, fatter tails, while a low kurtosis distribution has a more rounded peak and shorter thinner tails.

Distributions with zero excess kurtosis are called mesokurtic, or mesokurtotic. The most prominent example of a mesokurtic distribution is the normal distribution

Page 13: 26716080 Skewness Kurtosis

family, regardless of the values of its parameters. A few other well-known distributions can be mesokurtic, depending on parameter values: for example the binomial

distribution is mesokurtic for .

A distribution with positive excess kurtosis is called leptokurtic, or leptokurtotic. In terms of shape, a leptokurtic distribution has a more acute peak around the mean (that is, a higher probability than a normally distributed variable of values near the mean) and fatter tails (that is, a higher probability than a normally distributed variable of extreme values). Examples of leptokurtic distributions include the Laplace distribution and the logistic distribution. Such distributions are sometimes termed super Gaussian.

The coin toss is the most platykurtic distribution

A distribution with negative excess kurtosis is called platykurtic, or platykurtotic. In terms of shape, a platykurtic distribution has a lower, wider peak around the mean (that is, a lower probability than a normally distributed variable of values near the mean) and thinner tails (if viewed as the height of the probability density—that is, a lower probability than a normally distributed variable of extreme values). Examples of platykurtic distributions include the continuous or discrete uniform distributions, and the raised cosine distribution. The most platykurtic distribution of all is the Bernoulli distribution with p = ½ (for example the number of times one obtains "heads" when flipping a coin once, a coin toss), for which the kurtosis is −2. Such distributions are sometimes termed sub Gaussian.

[edit] Graphical examplesIn probability theory and statistics, kurtosis (from the Greek word κυρτός, kyrtos or kurtos, meaning bulging) is a measure of the "peakedness" of the probability distribution of a real-valued random variable. Higher kurtosis means more of the variance is the

Page 14: 26716080 Skewness Kurtosis

result of infrequent extreme deviations, as opposed to frequent modestly sized deviations.

Kurtosis:

Kurtosis measures the "heaviness of the tails" of a distribution (in compared to a normal distribution).

Kurtosis is positive if the tails are "heavier" then for a normal distribution, and negative if the tails are "lighter" than for a normal distribution. The normal distribution

has kurtosis of zero.

Kurtosis characterizes the shape of a distribution - that is, its value does not depend on an arbitrary change of the scale and location of the distribution. For example,

kurtosis of a sample (or population) of temperature values in Fahrenheit will not change if you transform the

values to Celsius (the mean and the variance will, however, change).

The kurtosis of a distribution or sample is equal to the 4th central moment divided by the 4th power of the

standard deviation, minus 3.

To calculate the kurtosis of a sample:

i) subtract the mean from each value to get a set of deviations from the mean;

ii) divide each deviation by the standard deviation of all the deviations;

iii) average the 4th power of the deviations and subtract 3 from the result.

Browse Other Glossary Entries

Kurtosis, of Greek origin meaning "bulging" or "swelling", is a measurement used to determine the peakedness of a data

distribution. It essentially measures a bell curve. In other words, Kurtosis measures

whether the data is sharp or flat relative to a normal distribution. Since Kurtosis measures

the shape of the distribution (the fatness of the tails), it focuses on how returns are ranged around the mean. A Kurtosis coefficient of

Page 15: 26716080 Skewness Kurtosis

three indicates a normal distribution. Kurtosis of less than three indicates a low peak with a fat midrange on either side; this is referred to as platykurtic. Conversely, Kurtosis greater

than three indicates a sharp/high peak with a thin midrange and fat tails; this is called

leptokurtic. Therefore, put simply, Kurtosis describes how bunched around the center or

spread at the endpoints a frequency distribution is. Investors can use the

information of Kurtosis to describe trends found in the charts to assess volatility;

sometimes Kurtosis is called "the volatility of volatility." Kurtosis is like skewness, except skewness only measures one tail's fatness.