Beyond Gaussian Processes: On the Distributions of …lisa/seminaires/07-03-2006-2.pdf · lim jxj!1...

29
Beyond Gaussian Processes: On the Distributions of Infinite Networks Ricky Der Daniel Lee Department of Mathematics Department of Electrical Engineering [email protected] [email protected] Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.1/18

Transcript of Beyond Gaussian Processes: On the Distributions of …lisa/seminaires/07-03-2006-2.pdf · lim jxj!1...

Beyond Gaussian Processes: On theDistributions of Infinite Networks

Ricky Der Daniel LeeDepartment of Mathematics Department of Electrical [email protected] [email protected]

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.1/18

Prior on neural networks

Extension to Radford Neal’s ph.d. thesis:

fn(x) = 1sn

n∑

j=1v jh(x; u j) ≡

1sn

n∑

j=1v jh j(x),

this can be viewed as a multi-layer perceptron withinput x, hidden functions h, weights u j, output weigthsv j and sn, a sequence of normalizing constants.

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.2/18

Gaussian process limit

When v j are i.i.d. with finite variance, Neal has shownthat the limiting distribution (n→ ∞), is a Gaussianprocess.

The authors investigate1. the case when v j has infinite variance2. the case when v j is not i.i.d.3. the possibility of doing regression with stable

processes

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.3/18

Gaussian process limit

When v j are i.i.d. with finite variance, Neal has shownthat the limiting distribution (n→ ∞), is a Gaussianprocess.The authors investigate

1. the case when v j has infinite variance2. the case when v j is not i.i.d.3. the possibility of doing regression with stable

processes

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.3/18

Assumptions

• h j(x) ≡ h(x, u j) are uniformily bounded in x, e.g. his a fixed nonlinearity

• {u j} is an i.i.d. sequence• this entails that h j(x) are i.i.d. for fixed x and

independant of {v j}

The choice of output priors v j will dictate the large net-work behavior.

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.4/18

Assumptions

• h j(x) ≡ h(x, u j) are uniformily bounded in x, e.g. his a fixed nonlinearity

• {u j} is an i.i.d. sequence• this entails that h j(x) are i.i.d. for fixed x and

independant of {v j}

The choice of output priors v j will dictate the large net-work behavior.

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.4/18

Stable distributions

X1, X2 independent copies of Gaussian variable X,then for any a, b ∈ �

aX1 + bX2d= cX + d,

for some c, d ∈ �.

This stability property is satisfied by all the stable dis-tributions.

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.5/18

Stable distributions

X1, X2 independent copies of Gaussian variable X,then for any a, b ∈ �

aX1 + bX2d= cX + d,

for some c, d ∈ �.

This stability property is satisfied by all the stable dis-tributions.

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.5/18

Definition of stable variable

A symmetric stable distribution has the followingcharacteristic function:

Φ(t) = e−σα|t|α ,

where σ > 0 is the spread parameter and 0 ≤ α ≤ 2 isthe stability index.

We write X ∼ S α(σ) for a symmetric α-stable variableof spread σ > 0.

Xα = 2 is the Gaussian case and for α < 2, E[X2] = ∞.

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.6/18

Definition of stable variable

A symmetric stable distribution has the followingcharacteristic function:

Φ(t) = e−σα|t|α ,

where σ > 0 is the spread parameter and 0 ≤ α ≤ 2 isthe stability index.We write X ∼ S α(σ) for a symmetric α-stable variableof spread σ > 0.

Xα = 2 is the Gaussian case and for α < 2, E[X2] = ∞.

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.6/18

Definition of stable variable

A symmetric stable distribution has the followingcharacteristic function:

Φ(t) = e−σα|t|α ,

where σ > 0 is the spread parameter and 0 ≤ α ≤ 2 isthe stability index.We write X ∼ S α(σ) for a symmetric α-stable variableof spread σ > 0.

Xα = 2 is the Gaussian case and for α < 2, E[X2] =∞.

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.6/18

Domain of attraction

Let Y1, ..., Yn be independent copies of a randomvariable Y; we say that Y belongs to the domain of thevariable X if:

an +1sn

n∑

j=1Y j

d→ X,

for appropriate sequences an, sn ∈ �. Then X must bestable.

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.7/18

Multivariate stable distributions

Let X1, X2 be independent copies of X, then X is stableif for every a, b ∈ �, there exists c ∈ � such that:

aX1 + bX2d= cX

A process is said to be stable if all its finite-dimensionaldistributions are multivariate stable.

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.8/18

Multivariate stable distributions

Let X1, X2 be independent copies of X, then X is stableif for every a, b ∈ �, there exists c ∈ � such that:

aX1 + bX2d= cX

A process is said to be stable if all its finite-dimensionaldistributions are multivariate stable.

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.8/18

Characteristic function

Theorem X is a symmetric α-stable vector if andonly if it has characteristic function

Φ(t) = exp{

S d−1| < t, s > |αdΓ(s)

}

where Γ is a finite measure on the unit (d-1)-sphereS d−1, and 0 ≤ α ≤ 2.

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.9/18

Preliminary result

Lemma Let v ∼ S α(σ) and let h be independent of vwith E|h|α < ∞. If y = hv, and {yi} are independentcopies of y, then

1n1/α

n∑

i=1yi

d→ X,

where X is α-stable with characteristic function:

Φ(t) = exp {−|σt|αE|h|α} .

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.10/18

Stable prior

Proposition Let the output weights of the neuralnetwork be i.i.d. v j ∼ S α(σ). Then

fn(x) = 1n1/α

n∑

j=1v jh j(x) d

→ f (x),

where f (x) is a symmetric α-stable process.The finite-dimensional distribution of ( f (x1), . . . , f (xd))has characteristic function:

Ψ(t) = exp(−σαEh| < t, h > |α).

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.11/18

Normal domain of attraction

The normal domain of attraction of index αencompasses distributions whose tails areasymptoticly equivalent to |x|−(α+1), for 0 < α < 2.

XThe previous proposition holds if the output weightsv j are i.i.d. random variable in the normal domain ofattraction of index α.

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.12/18

Normal domain of attraction

The normal domain of attraction of index αencompasses distributions whose tails areasymptoticly equivalent to |x|−(α+1), for 0 < α < 2.

XThe previous proposition holds if the output weightsv j are i.i.d. random variable in the normal domain ofattraction of index α.

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.12/18

Local Brownian motion

Let h(x) = sgn(a + ux), the step-function, where a and uare independant Gaussians with zero mean.

lim|x|→∞

f (x) = constant

In the “central region”, Neal has shown that this givesrise to a local Brownian motion in the central regime.

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.13/18

Symmetric α-stable Lévy motion

Symmetric α-stable priors give rise to a symmetricα-stable Lévy motion, i.e. a process {wt; t ∈ �}satisfying:• w0 = 0 almost surely• independent increments wt − ws, with s < t• wt − ws ∼ S α(|t − s|1/α)

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.14/18

Brownian vs Lévy processes

0 200 400 600 800 1000−3

−2

−1

0

1

2

3

0 200 400 600 800 1000−1500

−1000

−500

0

500

1000

0 200 400 600 800 1000−50

−40

−30

−20

−10

0

10

0 200 400 600 800 1000−1000

−500

0

500

1000

1500

Student Version of MATLAB

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.15/18

Limits with non-i.i.d. priors

What conditions on independent priors v j, notnecessarily identically distributed for convergence to aGaussian process?

An easy condition to verify is given by the followingcorollary which is based on the Lindeberg-Fellertheorem.Corollary If the ouput weights {v j} are a uniformlybounded sequence of independent variables, andlimn→∞ sn = ∞ then fn(x) converges to a Gaussianprocess.

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.16/18

Limits with non-i.i.d. priors

What conditions on independent priors v j, notnecessarily identically distributed for convergence to aGaussian process?An easy condition to verify is given by the followingcorollary which is based on the Lindeberg-Fellertheorem.

Corollary If the ouput weights {v j} are a uniformlybounded sequence of independent variables, andlimn→∞ sn = ∞ then fn(x) converges to a Gaussianprocess.

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.16/18

Limits with non-i.i.d. priors

What conditions on independent priors v j, notnecessarily identically distributed for convergence to aGaussian process?An easy condition to verify is given by the followingcorollary which is based on the Lindeberg-Fellertheorem.Corollary If the ouput weights {v j} are a uniformlybounded sequence of independent variables, andlimn→∞ sn = ∞ then fn(x) converges to a Gaussianprocess.

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.16/18

Learning with stable processes

Instead of using a neural network, perform Gaussianprocess regression.XLimitation: Gaussian processes not as rich as finiteneural networks.

Regression problem: y(x) = u(x) + ε, estimate u(x) fromy(xi) and ε ⊥ u(x).Generalization of Gaussian process: place an α-stableprocess prior on y(x) and ε is i.i.d. α-stable noise.

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.17/18

Learning with stable processes

Instead of using a neural network, perform Gaussianprocess regression.XLimitation: Gaussian processes not as rich as finiteneural networks.

Regression problem: y(x) = u(x) + ε, estimate u(x) fromy(xi) and ε ⊥ u(x).Generalization of Gaussian process: place an α-stableprocess prior on y(x) and ε is i.i.d. α-stable noise.

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.17/18

Classes of stable processesSymmetric α-stable processes with discrete spectralmeasure:• {v j} j: an i.i.d α-stable process• µ(x): a mean function• h(x, v): a bivariate filter function that introduces

dependency

Sub-Gaussian processes are α-stable processes of theform u(x) = A1/2G(x) where:• A is a totally right-skewed α/2-stable variable• G is a Gaussian process of mean zero and

covariance K

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.18/18

Classes of stable processesSymmetric α-stable processes with discrete spectralmeasure:• {v j} j: an i.i.d α-stable process• µ(x): a mean function• h(x, v): a bivariate filter function that introduces

dependency

Sub-Gaussian processes are α-stable processes of theform u(x) = A1/2G(x) where:• A is a totally right-skewed α/2-stable variable• G is a Gaussian process of mean zero and

covariance K

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.18/18