Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department...

39
Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University [email protected] [email protected] line Generating one pair of correlated discrete random variables. (a) Lognormal-Poisson hierarchy (b) Overlapping sums Generating a vector of correlated discrete random variables overlapping sums . Examples

Transcript of Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department...

Page 1: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

Simulation of spatially correlated discrete random variablesDan Dalthorp and Lisa Madsen

Department of StatisticsOregon State University

[email protected]@science.oregonstate.edu

Outline

I. Generating one pair of correlated discrete random variables. (a) Lognormal-Poisson hierarchy (b) Overlapping sums

II. Generating a vector of correlated discrete random variables by overlapping sums

III. Examples

Page 2: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

Introduction

Generate Y1, Y2 where

• Y1, Y2 have specified means

variances

and correlation Y 0

• Y1, Y2 are count r.v.'s

i.e., y = 0, 1, 2, ...

• Distributions of Y1, Y2 are unimodal, Poisson-like

• If 2 < , then both 2 and are small

21, YY

22

21, YY

Page 3: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

Lognormal-Poisson MethodFor Generating Y1 and Y2

• Generate correlated normal RVs Z1, Z2

• Transform to lognormals Xi = exp(Zi)

Y1 and Y2 resemble negative binomial RVs.

• Generate conditionally independent Yi ~ Poisson(Xi)

Page 4: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

Obtaining the Right Moments

2,~ii YYiY

,22

iii YYX 2211

21

22YYYY

YYYX

To get with corr(Y1, Y2) = Y,

generate lognormals X1, X2 with

This requires normals Z1, Z2 with

and

, , ,X i Y i

22

2

logii

i

i

XX

XZ

1log

2

22

i

i

i

X

XZ

1log1log

1log

2211

21

21

22

XXXX

XX

XXX

Z

Page 5: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

Constraints on Moments of Y1, Y2 with Lognormal-Poisson Method

,2

ii YY •

11log1logexp

2

2

2

2

2

22

1

11

21

21

Y

YY

Y

YY

YY

YYY

Page 6: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

Upper Bound for Correlation–Lognormal Poisson

Page 7: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

Upper Bound for Correlation–Lognormal Poisson

Page 8: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

Overlapping Sums Method For Generating Y1 and Y2

• Generate independent, discrete RVs X1, X2, X

• Let Y1 = X + X1

Y2 = X + X2

Holgate (1964): Correlated Poissons

We are not concerned with the exact distribution of Y1 and Y2,but we require them to be ecologically plausible.

Page 9: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

Obtaining the Right Moments

To get with corr(Y1, Y2) = Y,

Generate independent X1, X2, X with

1 2

1 2

2 2

2

i iX Y Y Y Y

X Y Y Y

and

),(cov 21 YY

22

11

YXX

YXX

2,~ii YYiY

Page 10: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

Choose distributions for Xs based on relationship between variance and mean:

• If , use X ~ Negative binomial(X, X2)2

X X

• If , use X ~ Poisson(X)2X X

• If , use X ~ Bernoulli(X)2 (1 )X X X

• If and , use , where

2X X X P B

B~Bernoulli(p), and P~Poisson(),

with and2X Xp 2

X X X

then X cannot be simulated—by any method.• If ,12XXX

XXX 12

Page 11: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

Constraints on Moments of Y1, Y2 with Overlapping Sums Method

• No constraints on means of Yi, but we require

1

2

2

1 ,minY

Y

Y

YY

0iY

▪ Relationship between and ecologically plausibleiY 2

iY

Page 12: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

Upper Bound for Correlation–Overlapping Sums

Page 13: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

Comparing Methods

0.5 1 1.5 2 2.5 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

21

Max

imum

Pos

sibl

e C

orre

latio

n

1=0.8;

2=0.8

OS: 22=0.8

LP: 22=0.8

OS: 22=1.9

LP: 22=1.9

OS: 22=3

LP: 22=3

Page 14: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

Comparing Methods

0.5 1 1.5 2 2.5 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

21

Max

imum

Pos

sibl

e C

orre

latio

n

1=0.8;

2=1.4

OS: 22=1.4

LP: 22=1.4

OS: 22=3.2

LP: 22=3.2

OS: 22=5

LP: 22=5

Page 15: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

Step 1: Find variances and means of X's Y1 = X + X1

Y2 = X + X2 where X, X1, and X2 are independent count random variables with ...

1 1

2 2

,Y X X

Y X X

1 2

2 0.0836X Y Y

1 1

2 2 2 1.172X Y X

2 2

2 2 2 0.0554X Y X

Variances:

Means:

1

2

0.0921

0.928

0.0579

X

X

X

A quick example: Simulate Y1 and Y2 with and = 0.2

15.0,139.0

02.1,256.1

22

11

2

2

YY

YY

Two equations, three unknowns ...

Try so X would be Bernoulli.

0921.025.05.0 2 XX

Page 16: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

Step 2: Define distributions for X's

X ~ Bernoulli(0.0921) since by design

X1 ~ Negative binomial with = 0.928 and 2 = 1.172

X2 = Bernoulli(p) + Poisson() with p = 0.05 and = 0.0079

2 (1 )X X X

Step 3: Simulate

XY T

X

X

X

Y

Y

2

12

1

101

011

Y1 = X + X1

Y2 = X + X2

Page 17: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

Generalizing to n > 2:

1. Park & Shin (1998) algorithm gives variances for X's:

)( 22

1 mXXX Find n m matrix T consisting of 0’s and 1’s and m-vector

such that and

2. Linear programming gives reasonable means for X's:

Find m-vector that solves

subject to constraints: (i) i > 0 for all i; and

(ii) when i2 0.25

nY

Y

MT

1

X mXX

1MX

225.05.0 ii

3. Generate independent X's with the appropriate distributions and multiply by T:

binomial Negative~2 XXX

Poisson~2 XXX

Bernoulli~)1(2 XXXX

Poisson Bernoulli~25.0)1( 2 XXXX

11

llnn

T XY where X is a vector of independent r.v.’s, andT is a matrix of 0’s and 1’s

TXnYY ),,( 22

1 )cov()cov( XY T

Page 18: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

Park & Shin (1998) algorithm gives variances of X's

36.0***

051.2**

03.041.021.2*

01.011.021.091.0

T

09.0

1

1

1

1

E.g., Suppose

45.009.012.01.0

09.06.25.02.0

12.05.03.23.0

1.02.03.01

)cov(Y

0.90 0.20 0.11 0

* 2.20 0.41 0.02

* * 2.51 0

* * * 0.35

T

01.0

09.0

11

01

11

11

2),cov( Xji YY

09.0

09.0

09.0

09.0

4

3

2

1

X

X

X

X

Y

Y

Y

Y

01.009.0

09.0

01.009.0

01.009.0

4

3

2

1

XX

X

XX

XX

Y

Y

Y

Y

20.09 X

for the common component of Y3 and Y4

Page 19: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

0.90 0.20 0.11 0

* 2.18 0.39 0

* * 2.51 0

* * * 0.33

02.0

01.0

09.0

111

001

111

011

0.90 0.20 0.11 0

* 2.20 0.41 0.02

* * 2.51 0

* * * 0.35

Page 20: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

0.90 0.20 0.11 0

* 2.18 0.38 0

* * 2.51 0

* * * 0.33

0.091 1 0 1

0.011 1 1 1

0.021 0 0 1

0.111 1 1 0

T

0.79 0.09 0 0

* 2.08 0.27 0

* * 2.40 0

* * * 0.33

Page 21: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

0.79 0.09 0 0

* 2.08 0.27 0

* * 2.40 0

* * * 0.33

0.09

1 1 0 1 1 0.01

1 1 1 1 1 0.02

1 0 0 1 0 0.11

1 1 1 0 0 0.09

T

0.70 0 0 0

* 1.99 0.27 0

* * 2.41 0

* * * 0.33

Page 22: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

0.70 0 0 0

* 1.98 0.27 0

* * 2.41 0

* * * 0.33

0.09

0.011 1 0 1 1 0

0.021 1 1 1 1 1

0.111 0 0 1 0 1

0.091 1 1 0 0 0

0.27

T

0.70 0 0 0

* 1.71 0 0

* * 2.14 0

* * * 0.33

Page 23: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

0.70 0 0 0

* 1.71 0 0

* * 2.14 0

* * * 0.33

0.09

0.01

1 1 0 1 1 0 0 0.02

1 1 1 1 1 1 0 0.11

1 0 0 1 0 1 0 0.09

1 1 1 0 0 0 1 0.27

0.33

T

0.70 0 0 0

* 1.71 0 0

* * 2.14 0

* * * 0

Page 24: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

0.09

0.01

0.021 1 0 1 1 0 0 1

0.111 1 1 1 1 1 0 0

0.091 0 0 1 0 1 0 0

0.271 1 1 0 0 0 1 0

0.33

0.70

T

0 0 0 0

* 1.71 0 0

* * 2.14 0

* * * 0

0.70 0 0 0

* 1.71 0 0

* * 2.14 0

* * * 0

Page 25: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

0.09

0.01

0.02

1 1 0 1 1 0 0 1 0 0.11

1 1 1 1 1 1 0 0 1 0.09

1 0 0 1 0 1 0 0 0 0.27

1 1 1 0 0 0 1 0 0 0.33

0.70

1.71

T

0 0 0 0

* 0 0 0

* * 2.14 0

* * * 0

0 0 0 0

* 1.71 0 0

* * 2.14 0

* * * 0

Page 26: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

0.09

0.01

0.02

1 1 0 1 1 0 0 1 0 0 0.11

1 1 1 1 1 1 0 0 0 1 0.09

1 0 0 1 0 1 0 0 1 0 0.27

1 1 1 0 0 0 1 0 0 0 0.33

0.70

1.71

2.14

T

0***

00**

000*

0000

0 0 0 0

* 0 0 0

* * 2.14 0

* * * 0

Page 27: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

Grubs Adult Activity

Distance toNearest Tree

OrganicMatter

Grub population density as a function of several covariates

Name Description

clayuk clay content of soil

dml distance to nearest tree

dnx distance to nearest patch of soil

with high organic matter content

fair fairway/rough indicator

heat intensity of adult activity

om.e organic matter flexure

tap total adult population

tw45 number of trees within 45 meters

vix vegetation index

wbuk soil organic matter

Page 28: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

Grub count

Fre

quen

cy

020

4060

80

0 2 4 6Fitted Values

(quartiles)

Var

ian

ce o

f R

esid

ual

s

0.5

1.0

1.5

2.0

1st 2nd 3rd 4th

0.0

0.1

0.2

0 60 120 180Co

rrel

atio

n o

f R

esid

ual

s

Lag distance(feet)

Are the conditions for multiple regression met?

1. Non-normal response variable

2. Variance not constant

3. Observations not independent

Page 29: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

with quasi-likelihood estimation (Wedderburn, 1974)

Generalized linear model (Fisher 1935; Dempster 1971; Berk 1972; Nelder and Wedderburn 1972)

adapted for spatially dependent observations (Liang and Zeger 1986; McCullagh amd Nelder 1989; Albert and McShane 1995; Gotway and Stroup 1997; Dalthorp 2004)

A. Accommodates response variables with distribution in exponential family (including normal, binomial, Poisson, gamma, exponential, chi-squared, etc.)B. Allows for non-constant variance

A. Accommodates response variables that are not in an exponential family (including negative binomial, unspecified distributions)B. Requires only that the variance of the response variable be expressed as a function of the mean

A. Accounts for spatial autocorrelation in the residualsB. The statistical theory for the model is not well-developed

Page 30: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

Example: Japanese beetle grub population density vs. soil organic matter

••

••

••

•• •••

•• • •• •

••

• • •••••

•• •

•••

•• ••

••

••

••

•• •

••

•••

••

••

••

•••••

• • ••• •••

••

••

•• •

•••••• • • •••••

• • •••••• • • ••••••• • •••• •••• ••••••

•••

Organic matter content (%)

Gru

bs p

er s

oil s

ampl

e

3 4 5 6 7 8 9

0

24

6

Means

0.5 1.0 1.5

0.5

1.0

23

xs2

0.0

0.1

0.2

0 60 120 180

Co

rrel

atio

n

Lag distance(feet)

Variances Correlations

6.762 3( 33.3 13.2 2.03 0.0965 )( 0.0148)i iOM OM OM

Means (via GLM):

Variances (via TPL): 2 1.1481.23

Correlations (via spherical model):

1 2

31 2 1 2 1 2 1 2

1 2

1 if 0

( ) 0.25(1 0.015 0.5( /100) ) if 0 100

0 if 100

Y Y

Y Y Y Y Y Y Y Y

Y Y

Page 31: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

• X’s are independent, count-valued random variables-- variances from Park & Shin’s algorithm-- means from linear programming

### PROBLEM ### No solution found!

Choice between one of the following:

i. One Y mean off-target but no impossible X r.v.'s

Need: Y with = 0.141

Can only do: = 0.151

ii. One impossible X r.v. ( )We need: r.v. with = 0.0385, 2 = 0.0272Can do Bernoulli: = 0.0385, 2 = 0.0370

Consequences? Var(Y16) = 0.139 vs. target of 0.129

The simulation

1000 reps with n = 143: 143 1000 143 3720 3720 1000

Y T X

20.5 0.25

Page 32: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

0.2 0.6 1.0 1.4

0.2

0.6

1.0

1.4

Results for 1000 simulation runs:

• 3720 X's consisting of:-- Negative binomial: 1580-- Bernoulli: 2099-- Bernoulli + Poisson: 40-- Impossible: 1 (simulated 2 slightly larger than target)

Target mean

Sim

ulat

ed m

ean

Means

0.5 1.0 1.5 2.0

0.5

1.0

1.5

2.0

Target variance

Sim

ulat

ed v

aria

nce Variances

Page 33: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

0 50 100 150 200 250 300 350 400 450

-0.05

0

0.05

0.1

0.15

0.2

0.25

Lag distance

Co

rre

latio

n

Target correlation

Sim

ula

ted

co

rre

latio

n

0.0 0.05 0.10 0.15 0.20

-0.1

0.0

0.1

0.2

0.3

Correlations

Page 34: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

••

••

••

•••

••

•••

••

••

••

••

••

••

•••

•••

0.1 0.5 2.0

12

510

Example: Diamond back moth dispersal

Distance from release point

DB

M c

ount

5 10 15 20 25

02

46

8

Release point

Traps

Means Variances

Mean

Var

ianc

e

Lag Distance

Cor

rela

tion

00.

1

Correlation

Page 35: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

The simulation

1000 reps with n = 114: 1000188018801141000114

XTY

• X’s are independent negative binomials-- variances from Park & Shin’s algorithm-- means from linear programming

• T is a matrix of zeros and ones that defines the common components of the Y’s

0.5 1.0 1.5

0.5

1.0

1.5

y

1 2 3 4 5

12

34

56

2

s2

Results

Means Variances

Page 36: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

0 5 10 15 20 25 30 35 40

-0.05

0

0.05

0.1

0.15

0.2

0.25

Lag distance

Co

rre

latio

n

Correlation: Simulated vs. target

* Circles are averages for 1000 sims

Page 37: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

Example: Weed counts (Chenopodium polyspermum) vs. soil magnesium

Weed counts and soil [Mg] inrandom quadrats in a field ...

Soil Magnesium

Wee

ds p

er s

ampl

e

220 260 300 340

05

1015

20

Means

0.5 1.0 2.0 4.0 8.0

15

1040

x

s2

Variances

Page 38: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

Lag distance

Cor

rela

tion

5 10 15 20

0.0

0.2

0.4

0.6 Correlation ### Infeasible correlations ###

Highest possible correlation between Yi , Yj

is:2

, 2 2=Corr( , ) j j

ij i

Y Y

i j i jYY Y

Y Y

With 49 pairs of points in the weed data, target i,j is too high.

Page 39: Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University dalthorp@science.oregonstate.edu.

Summary

• Correlated count r.v.'s can be simulated by overlapping sums of independent negative binomials, Bernoullis, and Poissons

• The simulated r.v.'s are very close to negative binomial where < 2 and very close to Bernoulli + Poisson where > 2

• Negative correlations and strong positive correlations between r.v.’s with very different variances are not attainable, but ...

• The method can accommodate a wide variety of ecologically important scenarios that the hierarchical lognormal-Poisson model balks at, including:

-- underdispersed count r.v.'s

-- moderately strong correlations where 1 2 and 12 2

2