Leonardo Rojas-Nandayapa
Transcript of Leonardo Rojas-Nandayapa
Approximation of Heavy-tailed distributionsvia infinite dimensional phase–type
distributions
Leonardo Rojas-Nandayapa
The University of Queensland
ANZAPW March, 2015.Barossa Valley, SA, Australia.
1 / 36
Summary of Mogens’ talk...
2 / 36
Phase–typeDistributions of first passage times associated to a MarkovJump Process with a finite number of states.
Facts
1. Dense class in the nonnegative distributions.2. Mathematically tractable.3. However, it is a subclass of light–tailed distributions.
3 / 36
Key problemFor most applications, a phase–type approximation will suffice.However, for certain important problems this fail.
A partial solutionConsider Markov Jump processes with infinite number ofstates. This class contains heavy–tailed distribution but alsomany other degenerate cases.
4 / 36
A more refined approach: Infinite mixtures of PHConsider X a random variable with phase–type distribution, andlet S be a nonnegative discrete random variable. Then thedistribution of the random variable
Y := S · X
is an infinite mixture of phase–type distributions.Such a class is mathematically tractable and obviously dense inthe nonnegative distributions.
5 / 36
Let F , G and H be the cdf of Y , S and X respectively. Then
F (y) =
∫ ∞0
G(y/s)dH(s), y ≥ 0.
The right hand side is called the Mellin–Stieltjes transform.
If G is phase–type, the F is absolutely continuous with density
f (y) =
∫ ∞0
g(y/s)dH(s), y ≥ 0.
6 / 36
Some interesting questions related to this class of distributions?
7 / 36
1. What are the possible tail behaviors in this class?2. How to approximate a “theoretic” heavy–tailed distribution?3. Given a sample y1, . . . , yn. How to estimate the parameters
of a distribution in this class?
8 / 36
Heavy Tails
9 / 36
Recall that...
Heavy–tailed distributionsA nonnegative random variable X has a heavy–taileddistribution iff
E [eθX ] =∞, ∀θ > 0.
Equivalently,
lim supx→∞
P(X > x)
e−θx =∞, ∀θ > 0.
Otherwise we say X is light–tailed.
10 / 36
Characterization via Convolutions
Let X1, X2 be nonnegative iid rv with unbounded support,1. It holds that
lim infx→∞
P(X1 + X2 > x)
P(X1 > x)≥ 2.
2. Moreover, X1 is heavy–tailed iff
lim infx→∞
P(X1 + X2 > x)
P(X1 > x)= 2.
3. Furthermore, X is subexponential iff
limx→∞
P(X1 + X2 > x)
P(X1 > x)= 2.
11 / 36
Heavy–tailed random variables arise asI Exponential transformations of some light–tailed random
variables (normal—lognormal, exponential—Pareto).I As weak limits of normalized maxima (also normalized
sums).I As reciprocals of certain random variables.I Random sums of light–tailed distributions.I Products of light–tailed distributions.
12 / 36
Problem 1:What are the possible tail behaviours in this class?
13 / 36
TheoremLet X be a phase–type distribution and S be a nonnegativerandom variable. Define
Y := S · X
Y is heavy–tailed iff S has unbounded support.
14 / 36
ExampleLet X ∼ exp(λ) and S any nonnegative discrete distributionsupported over a countable set {s1, s2, . . . }.
limy→∞
P(S · X > y)
e−θy = limy→∞
∑∞k=1 e−λy/skP(S = sk )
e−θy =∞.
(choose sk large enough and such that λ/sk < θ).
15 / 36
Theorem (generalization)Let S ∼ H1 and X ∼ H2, and let Y := S · X .
1. If there exist θ > 0 and ξ(x) a nonnegative function suchthat
lim supx→∞
eθx(
H1(x/ξ(x)) + H2(ξ(x)))
= 0, (1)
then Y is light–tailed.2. If there exists ξ(x) a nonnegative function such that for all
θ > 0 it holds that
lim supx→∞
eθx H1(x/ξ(x)) · H2(ξ(x)) =∞, (2)
then Y is heavy–tailed.
16 / 36
ExampleLet S ∼Weibull(λ,p) and Y ∼Weibull(β,q). Then Y := S ·X islight–tailed iff
1p
+1q≤ 1.
Otherwise it is heavy–tailed.
17 / 36
What are the possible maximum domains of attraction?
18 / 36
Frechét domain of attraction
Breiman’s LemmaIf S is a α–regularly varying random variable and there existsδ > 0 such that E [Xα+δ] <∞, then
Y := S · X
has an α–regularly varying distribution. Moreover
P(Y > x) ∼ E [X ]P(S > x).
19 / 36
Breiman’s LemmaIf L1/S is a α–regularly varying function then
Y := S · X
has an α–regularly varying distribution. Moreover
P(Y > x) ∼ E [X ]P(S > x).
20 / 36
Gumbel domain of attraction and Subexponentiality
Let V (x) = (−1)η−1L(η−1)1/S (x).1
TheoremIf V (·) is a von Mises function, then F ∈ MDA(Λ). Moreover, if
lim infx→∞
V (tx)V ′(x)
V ′(tx)V (x)> 1, ∀t > 1,
then F is subexponential.
1η is the largest dimension among the Jordan blocks associated to thelargest eigenvalue of the sub–intensity matrix
21 / 36
Problem 2:How to approximate theoretical heavy–tailed distributions?
22 / 36
An approach for approximating theoretic distributions
I Take a sequence of Erlang rvs
Xn ∼ Erlang(n,n).
I Consider increasing sequences
Sn := {sn,0, sn,1, sn,2, . . . }, n ∈ N
such that sn,0 = 0, sn,i →∞ as i →∞ and
∆n := sup{sn,i+1 − sn,i : i ∈ N} → 0,
as n→ 0 For each n, define the cdf’s
Fn(x) = F (sn,i+1), x ∈ [sn,i , sn,i+1).
23 / 36
Construct a sequences of random variables
Sn ∼ Fn, Xn ∼ Erlang(n,n).
Since Xn → 1, then Slutsky’s theorem implies that
Sn · Xn → X ∼ F .
Note: It can be modified and will also work for light–taileddistributions.
24 / 36
0 1 2 3 4 50
0.5
1
1.5
2
2.5
3
3.5
25 / 36
Ruin problem
DefinitionLet F and G be two distribution functions with non-negativesupport. The relative error from F to G from 0 to s is definedby
supx∈[0,s]
|F (x)−G(x)||F (x)|
.
26 / 36
TheoremFix u > 0 and let F and G be such that
supx∈[0,u]
|FI(x)−GI(x)||F I(x)|
≤ ε(u)
where ε : R+ → R+ is some non-decreasing function. Let FI bethe integrated tail of F and let G and approximation of F .Further assume that F dominates G stochastically. Then
|ψF (u)− ψ̂G(u)| ≤ ε(u)E (Number of down-crossings; A),
where A := {Ruin happens in the last down-crossing}.
27 / 36
Problem 3:Statistical inference?
28 / 36
Key IdeaSee the first passages times as incomplete data set from theevolution of the Markov Jump Process. Employ the EMalgorithm for estimating the parameters.
29 / 36
Density of an infinite mixture of PH
fA(y) =∞∑
i=1
πi(θ)αeT y/si t/si , x ≥ 0.
Complete Data Likelihood
`c(θ,α,T )
=∞∑
i=1
Li logπi(θ) +∞∑
i=1
p∑k=1
Bik logαk +
∞∑i=1
∑k 6=`
N ik` log
(Tk`
si
)
−∞∑
i=1
∑k 6=`
Tk`
siZ i
k +∞∑
i=1
p∑k=1
Aik log
(tksi
)−∞∑
i=1
p∑k=1
tksi
Z ik .
30 / 36
EM estimatorsAssume (θ,α,T ) is the current set of parameters in the EMalgorithm. The one step EM-estimators are thus given by
θ̂ = argmaxΘ
∞∑i=1
(αUi t) · logπi(Θ),
α̂ =diag (α)U t
N,
T̂kl = ( diag (R)−1T ◦ R)k`, k 6= `,
t̂ = αU diag (t) diag (R)−1,
where
R = R(θ,α,T ) =∞∑
i=1
N∑j=1
πi(θ)
si
J(yj/si ;α,T , t
)fA(yj ;θ,α,T )
,
Ui = Ui(θ,α,T ) =N∑
j=1
πi(θ)
si
exp(
T yjsi
)fA(yj ;θ,α,T )
,
U = U(θ,α,T )31 / 36
In the above
Ji(y) :=
∫ y
0eTi (y−u)tiαeTi udu.
PropositionLet c = max{Tkk : k = 1, . . . ,p} and set K := c−1T + I . Define
D(s) := D(s;α,T ) =1c
s∑r=0
K r tαK s−r
Then
J(y ; θ,α,T ) =∞∑
s=0
(cy)s+1e−cy
(s + 1)!D(s).
32 / 36
An alternative modelLet r ∈ (0,1).
fB(y) = r ·αeT y t + (1− r)∞∑
i=1
πi(θ)λq
i yq−1e−λi y
Γ(q − 1),
33 / 36
PropositionThe EM estimators take the form
r̂ =rαUt
N
α̂ =diag (α)Ut
αUtT̂k` =
(diag (R)−1T ◦ R′
)k`
t̂ = αU diag (t) · diag (R)−1
θ̂ = argmaxΘ
∞∑i=1
wi logπi(Θ)
λ̂ =∞∑
i=1
wi
/ ∞∑i=1
vi
si.
34 / 36
where
R = R(α,T , λ, θ) =N∑
j=1
J(yj ;α,T , t
)fB(yj ; r ,α,T ,θ, λ)
.
U = U(α,T , λ, θ) =N∑
j=1
exp(T yj)
fB(yj ; r ,α,T ,θ, λ)
wi = wi(α,T , λ, θ) = πi(θ)N∑
j=1
g(yj ; q, λ/si)
fB(yj ; r ,α,T ,θ, λ)
vi = vi(α,T , λ, θ) = πi(θ)N∑
j=1
yj · g(yj ; q, λ/si)
fB(yj ; r ,α,T ,θ, λ)
35 / 36
Advantages
I Fast.I Recovers the parameters of simulated data.I Works reasonable well with real data.I Converges to the MLE estimators most of the times.I Provides a better approximation for the tails.
Disadvantages
I There are some (minor) numerical issues yet to beresolved.
I MLE estimators are not appropriate for estimating “tailparameters”. EVT methods could be implemented.
I Sensitive with respect to the tail parameters.I Kullback-Leibler does not work properly. Though we now
have the first simpler approach.
36 / 36