Statistical functions

16
Correlation Function A correlation function is a statistical correlation between random variables at two different points in space or time, usually as a function of the spatial or temporal distance between the points. If one considers the correlation function between random variables representing the same quantity measured at two different points then this is often referred to as an autocorrelation function being made up of autocorrelations. Correlation functions of different random variables are sometimes called cross correlation functions to emphasise that different variables are being considered and because they are made up of cross correlations. Correlation functions are a useful indicator of dependencies as a function of distance in time or space, and they can be used to assess the distance required between sample points for the values to be effectively uncorrelated. In addition, they can form the basis of rules for interpolating values at points for which there are no observations.

description

review of Statistical functions

Transcript of Statistical functions

Page 1: Statistical functions

Correlation Function  

A correlation function is a statistical correlation between random variables at two different points in space or time, usually as a function of the spatial or temporal distance between the points. If one considers the correlation function between random variables representing the same quantity measured at two different points then this is often referred to as an autocorrelation function being made up of autocorrelations. Correlation functions of different random variables are sometimes called cross correlation functions to emphasise that different variables are being considered and because they are made up of cross correlations.

Correlation functions are a useful indicator of dependencies as a function of distance in time or space, and they can be used to assess the distance required between sample points for the values to be effectively uncorrelated. In addition, they can form the basis of rules for interpolating values at points for which there are no observations.

Page 2: Statistical functions

AutocorrelationAutocorrelation, also known as serial correlation or cross-autocorrelation,[1] is the cross-correlation of a signal with itself at different points in time (that is what the cross stands for). Informally, it is the similarity between observations as a function of the time lag between them. It is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals.

Statistics

In statistics, the autocorrelation of a random process describes the correlation between values of the process at different times, as a function of the two times or of the time lag.

Let X be some repeatable process, and i be some point in time after the start of that process, then Xi is the value (or realization) produced by a given run of the process at time i. Suppose that the process is further known to have defined values for mean μi and variance σi

2 for all times i.

Then the definition of the autocorrelation between times   s   and  t is

where "E" is the expected value operator.

If Xt is a wide-sense stationary process then the mean μ and the variance σ2 are time-independent, and further the autocorrelation depends only on the lag between t and s: the correlation depends only on the time-distance between the pair of values but not on their position in time.

This further implies that the autocorrelation can be expressed as a function of the time-lag, and that this would be an even function of the lag τ = s − t. This gives the more familiar form

Signal processing

In signal processing, the above definition is often used without the normalization, that is, without subtracting the mean and dividing by the variance. When the

Page 3: Statistical functions

autocorrelation function is normalized by mean and variance, it is sometimes referred to as the autocorrelation coefficient.[2]

For wide-sense-stationary random processes, the autocorrelations are defined as

For a real function,  .

For processes that are also ergodic, the expectation can be replaced by the limit of a time average and the autocorrelation is defined as or equated to:

- It also, represents the average power as function of time delay 

Expected Value

In probability theory, the expected value of a random variable is intuitively the long-run average value of repetitions of the experiment it represents. For example, the expected value of a dice roll is 3.5 because, roughly speaking, the average of an extremely large number of dice rolls is practically always nearly equal to 3.5. Less roughly, the law of large numbers guarantees that the arithmetic mean of the values almost surely converges to the expected value as the number of repetitions goes to infinity. The expected value is also known as the expectation, mathematical expectation, EV, mean, or first moment.

More practically, the expected value of a discrete random variable is the probability-weighted average of all possible values. In other words, each possible value the random variable can assume is multiplied by its probability of occurring, and the resulting products are summed to produce the expected value. The same works for continuous random variables, except the sum is replaced by an integral and the probabilities by probability densities. The formal definition subsumes both of these and also works for distributions which are neither discrete nor continuous: the expected value of a random variable is the integral of the random variable with respect to its probability measure.[1][2]

Univariate discrete random variable, countable case[edit]

Let X be a discrete random variable taking values x1, x2, ... with probabilities p1, p2, ... respectively. Then the expected value of this random variable is the infinite sum

Page 4: Statistical functions

Univariate continuous random variable[edit]

If the probability distribution of   admits a probability density function  , then the expected value can be computed as

Uncorrelated Random Variables:

In probability theory and statistics, two real-valued random variables, X,Y, are said to be uncorrelated if their covariance, E(XY) − E(X)E(Y), is zero. A set of two or more random variables is called uncorrelated if each pair of them are uncorrelated. If two variables are uncorrelated, there is no linear relationship between them.

Uncorrelated random variables have a Pearson correlation coefficient of zero, except in the trivial case when either variable has zero variance (is a constant). In this case the correlation is undefined.

In general, uncorrelatedness is not the same as orthogonality, except in the special case where either X or Y has an expected value of 0. In this case, the covariance is the expectation of the product, and X and Y are uncorrelated if and only if E(XY) = 0.

If X and Y are independent, with finite second moments, then they are uncorrelated. However, not all uncorrelated variables are independent. For example, if X is a continuous random variable uniformly distributed on [−1, 1] and Y = X2, then X and Y are uncorrelated even though X determines Y and a particular value of Y can be produced by only one or two values of X.

Uncorrelated random variables are not necessarily independent Let X be a random variable that takes the value 0 with probability 1/2, and

takes the value 1 with probability 1/2.

Page 5: Statistical functions

Let Z be a random variable, independent of X, that takes the value −1 with probability 1/2, and takes the value 1 with probability 1/2.

Let U be a random variable constructed as U = XZ.

The claim is that U and X have zero covariance (and thus are uncorrelated), but are not independent.

Variance  

In probability theory and statistics, variance measures how far a set of numbers is spread out. A variance of zero indicates that all the values are identical. Variance is always non-negative: a small variance indicates that the data points tend to be very close to the mean (expected value) and hence to each other, while a high variance indicates that the data points are very spread out around the mean and from each other.

An equivalent measure is the square root of the variance, called the standard deviation. The standard deviation has the same dimension as the data, and hence is comparable to deviations from the mean.

Definition

The variance of a set of samples that is represented by random variable X is its second central moment, the expected value of the squared deviation from the mean μ = E[X]:

Page 6: Statistical functions

Continuous random variable

If the random variable X represents samples generated by a continuous with probability density function f(x), then the population variance is given by

where   is the expected value,

and where the integrals are definite integrals taken for x ranging over the range of X.

Discrete random variable[edit]

If the generator of random variable X is discrete with probability mass function x1 ↦ p1, ..., xn ↦ pn, then

Where   is the expected value, i.e.

Standard Deviation  

In statistics, the standard deviation (SD, also represented by the Greek letter sigma σ or s) is a measure that is used to quantify the amount of variation or dispersion of a set of data values.[1] A standard deviation close to 0 indicates that the data points tend to be very close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the data points are spread out over a wider range of values.

Basic examples

For a finite set of numbers, the standard deviation is found by taking the square root of the average of the squared deviations of the values from their average value. For example, the marks of a class of eight students (that is, a population) are the following eight values:

Page 7: Statistical functions

These eight data points have the mean (average) of 5:

First, calculate the deviations of each data point from the mean, and square the result of each:

The variance is the mean of these values:

and the population standard deviation is equal to the square root of the variance:

Definition of population

Let X be a random variable with mean value μ:

Here the operator E denotes the average or expected value of X. Then the standard deviation of X is the quantity

Discrete random variable

Continuous random variable[edit]

Page 8: Statistical functions
Page 9: Statistical functions

Normal distribution

The probability density of the normal distribution is:

Here:-  is the mean or expectation of the distribution (and also its median and mode).- The parameter   is its standard deviation with its variance then  .

A random variable with a Gaussian distribution is said to be normally distributedand is called a normal deviate.

If   and  , the distribution is called the standard normal distribution or the unit normal distribution denoted by   and a random variable with that distribution is a standard normal deviate.

Exponential distribution:

- The exponential distribution with parameter λ, often called the rate parameter, is a continuous distribution whose support is the semi-infinite interval [0,∞).

- (a.k.a. negative exponential distribution)- It is the probability distribution that describes the time between events in a Poisson process, i.e. a process in which events occur continuously and independently at a constant average rate.- It is a particular case of the gamma

Page 10: Statistical functions

distribution.

- It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless.

Poisson distribution

A discrete random variable X   is said to have a Poisson distribution with parameter λ > 0, if, for k = 0, 1, 2, ..., the probability mass function of X   is given by:[9]

Where:- e is Euler's number (e = 2.71828...)- k! is the factorial of k.

The positive real number λ is equal to the expected value of X and also to its variance [10]

Page 11: Statistical functions

Covariance

In probability theory and statistics, covariance is a measure of how much two random variables change together. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the smaller values, i.e., the variables tend to show similar behavior, the covariance is positive.[1] In the opposite case, when the greater values of one variable mainly correspond to the smaller values of the other, i.e., the variables tend to show opposite behavior, the covariance is negative. The sign of the covariance therefore shows the tendency in the linear relationship between the variables. The magnitude of the covariance is not easy to interpret. The normalized version of the covariance, the correlation coefficient, however, shows by its magnitude the strength of the linear relation.

A distinction must be made between (1) the covariance of two random variables, which is a population parameter that can be seen as a property of the joint probability distribution, and (2) the sample covariance, which serves as an estimated value of the parameter.

Definition

The covariance between two jointly distributed real-valued random variables X and Y with finite second moments is defined as[2]

Variance is a special case of the covariance when the two variables are identical:

Page 12: Statistical functions

Pearson Product-Moment Correlation CoefficientIn statistics, the Pearson product-moment correlation coefficient (sometimes referred to as the PPMCC or PCC or Pearson's r) is a measure of the linear correlation between two variables X and Y, giving a value between +1 and −1 inclusive, where 1 is total positive correlation, 0 is no correlation, and −1 is total negative correlation. 

For a population

- It is commonly represented by the Greek letter ρ- The formula for ρ is:

Where:-  is the covariance-  is the standard deviation of

-

For a sample

- It represented by the letter r- The formula for r  is:

Where:-  are defined as above

-  (the sample mean); and analogously for 

Page 13: Statistical functions

Regression analysisIn statistics, regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables (or 'predictors'). More specifically, regression analysis helps one understand how the typical value of the dependent variable (or 'criterion variable') changes when any one of the independent variables is varied, while the other independent variables are held fixed.

Most commonly, regression analysis estimates the conditional expectation of the dependent variable given the independent variables – that is, the average value of the dependent variable when the independent variables are fixed. Less commonly, the focus is on a quantile, or other location parameter of the conditional distribution of the dependent variable given the independent variables. In all cases, the estimation target is a function of the independent variables called the regression function. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function which can be described by a probability distribution.