Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to...

25
Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’ slides…simulation exercise adapts pedagogy o Trumbo,Suess, and Okumura (2005)

Transcript of Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to...

Page 1: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’

Simulation for Examining Margin of Error and Sample Size: Binomial

Proportions

Acknowledgements to Mandy Kauffman (WEST, Inc.) for photosand ‘background’ slides…simulation exercise adapts pedagogy ofTrumbo,Suess, and Okumura (2005)

Page 2: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’

2

Background – Bovine brucellosis

• Bacterial disease– History in US– Elk, bison, cattle (humans)– Cattle wildlife– Causes abortions– Environmental contamination– Potential transmission to cattle

• $$$$• Management implications

Page 3: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’

3

• Harsh winters + development elk starving, commingling with cattle

• 23 supplemental winter elk feedgrounds created• 22 WGFD• 1 USFWS

• Up 84% of elk use feedgrounds• Low winter mortality• Costly• 22% seroprevalence

on feedgrounds3.7% elsewhere

Background - Elk Feedgrounds

Preble 1911

Page 4: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’

Background – Management

• Management strategies1. Maintain cattle/elk separation

-hazing elk -fencing haystacks-elk feedgrounds

2. ↓ likelihood of exposed cattle experiencing abortions (RB51)3. ↓ seroprevalence in elk

-T&S -low density feeding-elk vaccination

Page 5: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’

Background: Management• Despite ongoing management:

– Recent cases in cattle/bison traced back to elk– Affected area expanding

• Limited $$ available for management

– No clear scientifically sound method– Need for economic evaluation of available management strategies

• Groups 1 & 2 already evaluated/underway• Evaluation of Group 3 strategies still needed

– How to assess sero-prevalence of brucellosis in elk on feedgrounds…how many elk to sample?

Page 6: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’

A Beginning

R Code:

samp <- sample(0:1,25,rep=T,prob=c(0.78,0.22))samp

Let’s start with simulating brucellosis diagnosis from 25randomly sampled elk…assume prevalence is 0.22…goal to estimate prevalence within 5%...how many elk needed?

Issue with the assumption of random here?

> samp

[1] 0 1 1 0 0 0 0 0 0 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 0

Page 7: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’

Generate a Profile

Let’s observe the incidence rate for a variety of samplesizes...keep in mind that this profile doesn’t display independent samples

R Code:n <- 25NumElk <- 1:np.bruc <- c(0.78,0.22)x <- sample(0:1,n,rep=T,prob=p.bruc)run.tot.pos <- cumsum(x)Proportion <- run.tot.pos/NumElktabresults <- round(cbind(NumElk,x,run.tot.pos,Proportion),3)tabresults

Page 8: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’

Generating a Profile

R Code:

plot(NumElk,Proportion,type="l", ylim= c(0,1))

abline(h=0.22,col="green",lwd=2)abline(h=0.17,col="blue",lwd=2,lty=3)abline(h=0.27,col="blue",lwd=2,lty=3)

Running the simulation and corresponding plots several timeswill provide differing versions of the profile on the next page…variation in profiles displays the instability in our statistic

Page 9: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’
Page 10: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’

Generating Multiple Profiles

R Code:

set.seed(11); n <- 25; numsamps <- 20

plot(0,pch=" ", xlim=c(0,n), ylim = c(0.1,1.0), xlab = "NumElk", ylab="Proportion")

#Loop below will produce a different profile for each of the specified#numsamps…

for(i in 1:numsamps){x <- cumsum(sample(0:1,n,rep=T,prob=p.bruc)) / (1:n)lines(1:n,x)}

abline(h=0.22,col="green",lwd=2)abline(h=0.17,col="blue",lwd=2,lty=3)abline(h=0.27,col="blue",lwd=2,lty=3)

Page 11: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’

Calculated margin of error:

0.22*0.781.96 0.1624

25

Page 12: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’

How about 263 elk?

Page 13: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’

• Power is a concept that is often difficult to teach, regardless of the level of the course…probably because hypothesis testing is such a strange beast!

• Many practical applications involve evaluating sampling protocols in terms of the ability to detect change over a period of time

• Simulation often quite effective for determining power associated with a particular design

Simulation for Determining Power

Page 14: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’

• Cobble bars are an important feature for many streams…home of native and non-native plants and many species of birds nest in these regions

What is the proportion of ‘woody vegetation’ cover on cobblebars in the Great Smokey Mountains?

Monitoring Woody Vegetation in Cobble Bars

Page 15: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’

-

22 cobble bars exist in the BISO area…can afford to sample 9 of them

-GRTS selection of sampling units

- rotating panel monitoring schedule

Page 16: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’

transect

Point intercept countson transect

Page 17: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’

While there are built-in functions in the ‘pwr’ library for computing power and

websites such as Russ Lenth’s Power And Sample Size

(http://homepage.stat.uiowa.edu/~rlenth/Power/), it is often the case that

your study design will be more complicated than what ‘canned’ allow for when

computing power.

Ex 1. Suppose that we are investigating a single cobblebar and that this year we

observe 34% coverage in woody vegetation. We will go out once a year to measure the % coverage. If there is a linear trend then the % coverage can be modeled as a function of year using:

0 1cover%i i iyear

Page 18: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’

Let’s scale ‘year’ to be 0, 1, 2, 3, and 4 (so a five year study) with 0 denoting

the % coverage in year 0, the slope, 1 , denoting the per-annum change in %

woody coverage, and the model error term, i denotes the uncertainty in the

measured woody percentage value within a given year (due to measurement

error, weather events, etc.)

0 1 0 1: 0 : 0 H vs. H

Will observe this cobblebar over time, run a linear regression and compute the

estimated slope along with its estimated standard error and see if the p-value

is less than our specified .

Page 19: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’

We can estimate power by simulating many data sets with a given slope,

intercept, standard deviation, and then keeping track of how many times we

reject the null that the slope is zero. Let’s suppose we want to detect a per

annum change of 3.4% in the %woody coverage.

R Code for p-value extraction:

bzero <- 34 b1 <- 3.4 yr <- 0:4 N <- 5 sd <- 7.4 pct_mean <- bzero + b1 * yr pct <- rnorm(N, mean = pct_mean, sd = sd) m <- lm(pct ~ yr) coef(summary(m))["yr", "Pr(>|t|)"]

Page 20: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’

Below, we have the true trend (i.e. 3.4% per year) plotted vs. the estimated

trend from one set of simulated data:

Page 21: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’

For this particular simulation, the p-value extracted was: > coef(summary(m))["yr", "Pr(>|t|)"] [1] 0.1726658 Thus, although we know there is a 3.4% per annum change, our data did not

result in a small enough p-value for us to detect an actual trend.

To compute the power, we need to do what we just did many times and then

record what percentage of times the null hypothesis would have been rejected

Page 22: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’

We can use the code below to do this: R Code: numsim <- 500 pvals <- numeric(numsim) for(i in 1:numsim){ pct_mean <- bzero + b1 * yr pct <- rnorm(N, mean = pct_mean, sd = sd) m <- lm(pct ~ yr) pvals[i] <- coef(summary(m))["yr", "Pr(>|t|)"] } sum(pvals < 0.05)/numsim

Page 23: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’

lin_reg_sim <- function(bzero,b1,numyear,sd,numsim){ yr <- 0:(numyear-1) pvals <- numeric(numsim) for (i in 1:numsim) { pct_mean <- bzero + b1 * yr pct <- rnorm(numyear, mean = pct_mean, sd = sd) m <- lm(pct ~ yr) pvals[i] <- coef(summary(m))["yr", "Pr(>|t|)"] } # end of i loop power <- (sum(pvals < 0.05)/numsim) return (power) } # end of function linreg_sim lin_reg_sim(bzero=34,b1=3.4,numyear=5,sd=7.4,numsim=500)

As a Single Function

Page 24: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’

The code to the left computesas we change the effect size (i.e. the per annum change)

R Code: bzero <- 34; yr <- 0:4; N <- 5; sd <- 7.4 bvec <- seq(0.85,8.5,by=0.85) power.b <- numeric(length(bvec)) for (j in 1:length(bvec)){ b1 <- bvec[j] for (i in 1:500) { pct_mean <- bzero + b1 * yr pct <- rnorm(N, mean = pct_mean, sd = sd) m <- lm(pct ~ yr) pvals[i] <- coef(summary(m))["yr", "Pr(>|t|)"] } # end of i loop power.b[j] <- (sum(pvals < 0.05)/500) } # end of j loop plot(bvec,power.b,xlab="slopes",ylab="power")

Page 25: Simulation for Examining Margin of Error and Sample Size: Binomial Proportions Acknowledgements to Mandy Kauffman (WEST, Inc.) for photos and ‘background’

Just as we would expect, the power increases with larger trends