Statistical functions

Correlation Function

A correlation function is a statistical correlation between random variables at two different points in space or time, usually as a function of the spatial or temporal distance between the points. If one considers the correlation function between random variables representing the same quantity measured at two different points then this is often referred to as an autocorrelation function being made up of autocorrelations. Correlation functions of different random variables are sometimes called cross correlation functions to emphasise that different variables are being considered and because they are made up of cross correlations.

Correlation functions are a useful indicator of dependencies as a function of distance in time or space, and they can be used to assess the distance required between sample points for the values to be effectively uncorrelated. In addition, they can form the basis of rules for interpolating values at points for which there are no observations.

https://en.wikipedia.org/wiki/Cross_correlation

https://en.wikipedia.org/wiki/Cross_correlation

https://en.wikipedia.org/wiki/Autocorrelation

https://en.wikipedia.org/wiki/Autocorrelation_function

https://en.wikipedia.org/wiki/Random_variable

https://en.wikipedia.org/wiki/Correlation

AutocorrelationAutocorrelation, also known as serial correlation or cross-autocorrelation,[1] is the cross-correlation of a signal with itself at different points in time (that is what the cross stands for). Informally, it is the similarity between observations as a function of the time lag between them. It is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals.

Statistics

In statistics, the autocorrelation of a random process describes the correlation between values of the process at different times, as a function of the two times or of the time lag.

Let X be some repeatable process, and i be some point in time after the start of that process, then Xi is the value (or realization) produced by a given run of the process at time i. Suppose that the process is further known to have defined values for mean μi and variance σi

2 for all times i.

Then the definition of the autocorrelation between times s and t is

where "E" is the expected value operator.

If Xt is a wide-sense stationary process then the mean μ and the variance σ2 are time-independent, and further the autocorrelation depends only on the lag between t and s: the correlation depends only on the time-distance between the pair of values but not on their position in time.

This further implies that the autocorrelation can be expressed as a function of the time-lag, and that this would be an even function of the lag τ = s − t. This gives the more familiar form

Signal processing

In signal processing, the above definition is often used without the normalization, that is, without subtracting the mean and dividing by the variance. When the

https://en.wikipedia.org/wiki/Signal_processing

https://en.wikipedia.org/wiki/Even_and_odd_functions

https://en.wikipedia.org/wiki/Stationary_process#Weak_or_wide-sense_stationarity

https://en.wikipedia.org/wiki/Expected_value

https://en.wikipedia.org/wiki/Variance

https://en.wikipedia.org/wiki/Mean

https://en.wikipedia.org/wiki/Execution_(computing)

https://en.wikipedia.org/wiki/Realization_(probability)


https://en.wikipedia.org/wiki/Random_process

https://en.wikipedia.org/wiki/Statistics

https://en.wikipedia.org/wiki/Time_domain

https://en.wikipedia.org/wiki/Signal_processing

https://en.wikipedia.org/wiki/Harmonic

https://en.wikipedia.org/wiki/Missing_fundamental

https://en.wikipedia.org/wiki/Signal_(information_theory)

https://en.wikipedia.org/wiki/Cross-correlation

https://en.wikipedia.org/wiki/Autocorrelation#cite_note-1

autocorrelation function is normalized by mean and variance, it is sometimes referred to as the autocorrelation coefficient.[2]

For wide-sense-stationary random processes, the autocorrelations are defined as

For a real function, .

For processes that are also ergodic, the expectation can be replaced by the limit of a time average and the autocorrelation is defined as or equated to:

- It also, represents the average power as function of time delay

Expected Value

In probability theory, the expected value of a random variable is intuitively the long-run average value of repetitions of the experiment it represents. For example, the expected value of a dice roll is 3.5 because, roughly speaking, the average of an extremely large number of dice rolls is practically always nearly equal to 3.5. Less roughly, the law of large numbers guarantees that the arithmetic mean of the values almost surely converges to the expected value as the number of repetitions goes to infinity. The expected value is also known as the expectation, mathematical expectation, EV, mean, or first moment.

More practically, the expected value of a discrete random variable is the probability-weighted average of all possible values. In other words, each possible value the random variable can assume is multiplied by its probability of occurring, and the resulting products are summed to produce the expected value. The same works for continuous random variables, except the sum is replaced by an integral and the probabilities by probability densities. The formal definition subsumes both of these and also works for distributions which are neither discrete nor continuous: the expected value of a random variable is the integral of the random variable with respect to its probability measure.[1][2]

Univariate discrete random variable, countable case[edit]

Let X be a discrete random variable taking values x1, x2, ... with probabilities p1, p2, ... respectively. Then the expected value of this random variable is the infinite sum

https://en.wikipedia.org/wiki/Expected_value#cite_note-Hamming-2

https://en.wikipedia.org/wiki/Expected_value#cite_note-Ross-1

https://en.wikipedia.org/wiki/Probability_measure

https://en.wikipedia.org/wiki/Lebesgue_integral

https://en.wikipedia.org/wiki/Probability_density_function

https://en.wikipedia.org/wiki/Continuous_random_variable

https://en.wikipedia.org/wiki/Weighted_average

https://en.wikipedia.org/wiki/Discrete_random_variable

https://en.wikipedia.org/wiki/Almost_surely

https://en.wikipedia.org/wiki/Arithmetic_mean

https://en.wikipedia.org/wiki/Law_of_large_numbers


https://en.wikipedia.org/wiki/Probability_theory

https://en.wikipedia.org/wiki/Ergodic_process

https://en.wikipedia.org/wiki/Real_function

https://en.wikipedia.org/wiki/Stationary_process

https://en.wikipedia.org/wiki/Autocorrelation#cite_note-dunn-2

Univariate continuous random variable[edit]

If the probability distribution of admits a probability density function , then the expected value can be computed as

Uncorrelated Random Variables:

In probability theory and statistics, two real-valued random variables, X,Y, are said to be uncorrelated if their covariance, E(XY) − E(X)E(Y), is zero. A set of two or more random variables is called uncorrelated if each pair of them are uncorrelated. If two variables are uncorrelated, there is no linear relationship between them.

Uncorrelated random variables have a Pearson correlation coefficient of zero, except in the trivial case when either variable has zero variance (is a constant). In this case the correlation is undefined.

In general, uncorrelatedness is not the same as orthogonality, except in the special case where either X or Y has an expected value of 0. In this case, the covariance is the expectation of the product, and X and Y are uncorrelated if and only if E(XY) = 0.

If X and Y are independent, with finite second moments, then they are uncorrelated. However, not all uncorrelated variables are independent. For example, if X is a continuous random variable uniformly distributed on [−1, 1] and Y = X2, then X and Y are uncorrelated even though X determines Y and a particular value of Y can be produced by only one or two values of X.

Uncorrelated random variables are not necessarily independent Let X be a random variable that takes the value 0 with probability 1/2, and

takes the value 1 with probability 1/2.

Let Z be a random variable, independent of X, that takes the value −1 with probability 1/2, and takes the value 1 with probability 1/2.

Let U be a random variable constructed as U = XZ.

The claim is that U and X have zero covariance (and thus are uncorrelated), but are not independent.

Variance

In probability theory and statistics, variance measures how far a set of numbers is spread out. A variance of zero indicates that all the values are identical. Variance is always non-negative: a small variance indicates that the data points tend to be very close to the mean (expected value) and hence to each other, while a high variance indicates that the data points are very spread out around the mean and from each other.

An equivalent measure is the square root of the variance, called the standard deviation. The standard deviation has the same dimension as the data, and hence is comparable to deviations from the mean.

Definition

The variance of a set of samples that is represented by random variable X is its second central moment, the expected value of the squared deviation from the mean μ = E[X]:



https://en.wikipedia.org/wiki/Central_moment

https://en.wikipedia.org/wiki/Standard_deviation






Continuous random variable

If the random variable X represents samples generated by a continuous with probability density function f(x), then the population variance is given by

where is the expected value,

and where the integrals are definite integrals taken for x ranging over the range of X.

Discrete random variable[edit]

If the generator of random variable X is discrete with probability mass function x1 ↦ p1, ..., xn ↦ pn, then

Where is the expected value, i.e.

Standard Deviation

In statistics, the standard deviation (SD, also represented by the Greek letter sigma σ or s) is a measure that is used to quantify the amount of variation or dispersion of a set of data values.[1] A standard deviation close to 0 indicates that the data points tend to be very close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the data points are spread out over a wider range of values.

Basic examples

For a finite set of numbers, the standard deviation is found by taking the square root of the average of the squared deviations of the values from their average value. For example, the marks of a class of eight students (that is, a population) are the following eight values:

https://en.wikipedia.org/wiki/Statistical_population

https://en.wikipedia.org/wiki/Arithmetic_average

https://en.wikipedia.org/wiki/Square_root

https://en.wikipedia.org/wiki/Square_root


https://en.wikipedia.org/wiki/Standard_deviation#cite_note-StatNotes-1

https://en.wikipedia.org/wiki/Statistical_dispersion

https://en.wikipedia.org/wiki/Sigma


https://en.wikipedia.org/wiki/Probability_mass_function


https://en.wikipedia.org/wiki/Discrete_probability_distribution

https://en.wikipedia.org/w/index.php?title=Variance&action=edit&section=3

https://en.wikipedia.org/wiki/Definite_integral

https://en.wikipedia.org/wiki/Probability_density_function

https://en.wikipedia.org/wiki/Continuous_distribution

These eight data points have the mean (average) of 5:

First, calculate the deviations of each data point from the mean, and square the result of each:

The variance is the mean of these values:

and the population standard deviation is equal to the square root of the variance:

Definition of population

Let X be a random variable with mean value μ:

Here the operator E denotes the average or expected value of X. Then the standard deviation of X is the quantity

Discrete random variable

Continuous random variable[edit]

https://en.wikipedia.org/w/index.php?title=Standard_deviation&action=edit&section=4




https://en.wikipedia.org/wiki/Square_(algebra)

Normal distribution

The probability density of the normal distribution is:

Here:- is the mean or expectation of the distribution (and also its median and mode).- The parameter is its standard deviation with its variance then .

A random variable with a Gaussian distribution is said to be normally distributedand is called a normal deviate.

If and , the distribution is called the standard normal distribution or the unit normal distribution denoted by and a random variable with that distribution is a standard normal deviate.

Exponential distribution:

- The exponential distribution with parameter λ, often called the rate parameter, is a continuous distribution whose support is the semi-infinite interval [0,∞).

- (a.k.a. negative exponential distribution)- It is the probability distribution that describes the time between events in a Poisson process, i.e. a process in which events occur continuously and independently at a constant average rate.- It is a particular case of the gamma

https://en.wikipedia.org/wiki/Gamma_distribution

https://en.wikipedia.org/wiki/Poisson_process

https://en.wikipedia.org/wiki/Probability_distribution

https://en.wikipedia.org/wiki/Exponential_distribution



https://en.wikipedia.org/wiki/Mode_(statistics)

https://en.wikipedia.org/wiki/Median



https://en.wikipedia.org/wiki/Probability_density

distribution.

- It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless.

Poisson distribution

A discrete random variable X is said to have a Poisson distribution with parameter λ > 0, if, for k = 0, 1, 2, ..., the probability mass function of X is given by:[9]

Where:- e is Euler's number (e = 2.71828...)- k! is the factorial of k.

The positive real number λ is equal to the expected value of X and also to its variance [10]

https://en.wikipedia.org/wiki/Poisson_distribution#cite_note-10




https://en.wikipedia.org/wiki/Real_number

https://en.wikipedia.org/wiki/Factorial

https://en.wikipedia.org/wiki/E_(mathematical_constant)

https://en.wikipedia.org/wiki/Poisson_distribution#cite_note-9





https://en.wikipedia.org/wiki/Memoryless

https://en.wikipedia.org/wiki/Geometric_distribution

https://en.wikipedia.org/wiki/Gamma_distribution

Covariance

In probability theory and statistics, covariance is a measure of how much two random variables change together. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the smaller values, i.e., the variables tend to show similar behavior, the covariance is positive.[1] In the opposite case, when the greater values of one variable mainly correspond to the smaller values of the other, i.e., the variables tend to show opposite behavior, the covariance is negative. The sign of the covariance therefore shows the tendency in the linear relationship between the variables. The magnitude of the covariance is not easy to interpret. The normalized version of the covariance, the correlation coefficient, however, shows by its magnitude the strength of the linear relation.

A distinction must be made between (1) the covariance of two random variables, which is a population parameter that can be seen as a property of the joint probability distribution, and (2) the sample covariance, which serves as an estimated value of the parameter.

Definition

The covariance between two jointly distributed real-valued random variables X and Y with finite second moments is defined as[2]

Variance is a special case of the covariance when the two variables are identical:


https://en.wikipedia.org/wiki/Covariance#cite_note-2

https://en.wikipedia.org/wiki/Second_moment



https://en.wikipedia.org/wiki/Real_number

https://en.wikipedia.org/wiki/Joint_distribution

https://en.wikipedia.org/wiki/Statistical_estimation

https://en.wikipedia.org/wiki/Sample_(statistics)

https://en.wikipedia.org/wiki/Joint_probability_distribution

https://en.wikipedia.org/wiki/Joint_probability_distribution

https://en.wikipedia.org/wiki/Statistical_parameter

https://en.wikipedia.org/wiki/Statistical_population

https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient

https://en.wikipedia.org/wiki/Covariance_and_correlation

https://en.wikipedia.org/wiki/Covariance#cite_note-1




Pearson Product-Moment Correlation CoefficientIn statistics, the Pearson product-moment correlation coefficient (sometimes referred to as the PPMCC or PCC or Pearson's r) is a measure of the linear correlation between two variables X and Y, giving a value between +1 and −1 inclusive, where 1 is total positive correlation, 0 is no correlation, and −1 is total negative correlation.

For a population

- It is commonly represented by the Greek letter ρ- The formula for ρ is:

Where:- is the covariance- is the standard deviation of

-

For a sample

- It represented by the letter r- The formula for r is:

Where:- are defined as above

- (the sample mean); and analogously for


https://en.wikipedia.org/wiki/Sample_(statistics)


https://en.wikipedia.org/wiki/Covariance

https://en.wikipedia.org/wiki/Statistical_Population



Regression analysisIn statistics, regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables (or 'predictors'). More specifically, regression analysis helps one understand how the typical value of the dependent variable (or 'criterion variable') changes when any one of the independent variables is varied, while the other independent variables are held fixed.

Most commonly, regression analysis estimates the conditional expectation of the dependent variable given the independent variables – that is, the average value of the dependent variable when the independent variables are fixed. Less commonly, the focus is on a quantile, or other location parameter of the conditional distribution of the dependent variable given the independent variables. In all cases, the estimation target is a function of the independent variables called the regression function. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function which can be described by a probability distribution.

https://en.wikipedia.org/wiki/Probability_distribution

https://en.wikipedia.org/wiki/Function_(mathematics)

https://en.wikipedia.org/wiki/Location_parameter

https://en.wikipedia.org/wiki/Quantile

https://en.wikipedia.org/wiki/Average_value

https://en.wikipedia.org/wiki/Conditional_expectation

https://en.wikipedia.org/wiki/Conditional_expectation

https://en.wikipedia.org/wiki/Independent_variable

https://en.wikipedia.org/wiki/Dependent_variable


Statistical functions

Documents

Transcript of Statistical functions