Statistical Sampling. Sampling Simple Random Sampling Every possible combination of sample units has...

Post on 31-Mar-2015

224 views 1 download

Tags:

Transcript of Statistical Sampling. Sampling Simple Random Sampling Every possible combination of sample units has...

Statistical Sampling

Sampling

Simple Random Sampling

• Every possible combination of sample units has an equal and independent chance of being selected.

• However…

Systemic Sampling

• Beware coincidental bias of sample interval and natural area.

• Ridges• River bends• Etc.

Stratified Random Sampling

• The point is to reduce variability within strata.

• Example: if you were measuring average estrogen levels in humans, you would stratify male versus female.

• Can you think of some forest examples?

Stratified Random Sampling

Sampling

mean of the squared deviations

Square root of variance

In Excel=AVERAGE(A1:An)

Standard Deviation

𝜎=√ 1𝑛−1∑𝑖=1

𝑛

(𝑥1−𝑥)2

Use Excel function

=STDEV(A1:An) or =STDEV.S(A1:An)

Exercise in Random Sampling

• Student heights equals population• Calculate population mean, etc.• Take a systemic 20% sample compare estimates

of population.• Take a 50% sample (systemic or random) and

compare results.

• Calculate mean, variance, SD and CV of both population and samples.

VariabilityThe differences between individuals or units in a population

Standard Error of the mean

• Equals the standard deviation of all possible sample means around the true population mean.

Finite Population Correction FactorThe finite population correction factor serves to reduce the standard error when relatively large samples are drawn from finite populations

Confidence Interval

• specify the precision of the sample mean in relation to the population mean.

Student’s t distribution

Confidence Interval

Effect of Standard Deviation

The red distribution has a mean of 40 and a standard deviation of 5;the blue distribution has a mean of 60 and a standard deviation of 10.For the red distribution, 68% of the distribution is between 45 and 55;for the blue distribution, 68% is between 40 and 60.

Sampling ErrorRather than work with absolute confidence limits, convert them to a percent of the sample mean which is called sampling error. The notation in the handbook is an upper case E. Take the confidence interval quantity and scale it to the sample mean by dividing by the sample mean. Express this value as a percent by multiplying by 100. By expressing the confidence interval as a percentage, the mean can be plus or minus the percentage derived.

For example, at 95% confidence, an estimate of the mean has a confidence interval of 46.4 plus or minus 2.6. When expressed as a sampling error percent, the mean is plus or minus 5.6% which says the true population mean falls within 95% percent of the estimate.

Determining Sample SizeFor a 95% confidence level, the t value approaches 2 as the sample size gets large, so a t value of 2 is commonly used when estimating sample size. The CV is the relative variability in the population being sampled. Use the population CV if known or use an estimate if it is not known. The E represents the desired sampling error, for example, 10%

Items with Possible

Impacts on Sampling Intensity

Effect of CV ChangeAs the coefficient of variation increases, so does the required sample size.

Using CV for Comparison

Because CVs have no associated unit of measure, they can be useful in comparing sampling methods to determine which is most efficient.

So which method of sampling would require fewer samples?

• The USFS Way

Sampling Intensity Revisited

1. Determine the sampling error for the sale as a whole. (set to 10%)2. Subdivide (or stratify) the sale population into sampling

components as needed to reduce the variability within the sampling strata.

3. Calculate the coefficient of variation (CV) by stratum and a weighted CV over all strata. (this will be covered more later in the statistics lectures)

4. Calculate number of plots for the sale as a whole and then distribute by stratum.

Sample Selection – from Precruise data

Number of PlotsValue of t is assumed to be 2Error is set at 10%

• For each stratum, the calculation would look like this:

• n1 = (17.6 * 185) / 67.9 = 48 plots

• n2 = (7.7 * 185) / 67.9 = 21 plots

• n3 = (7.2 * 185) / 67.9 = 20 plots

• n4 = (35.4 * 185) / 67.9 = 96 plots• Which totals to the 185 plots for the sale.

Distribute Plots by Stratum

Tree Expansion Factor• 1 divided by the fixed plot size times the

number of plots

Sample Error - step 1 (Calculate Standard Error)

Sample Error – Step 2

36.2% is a bit larger than the level we set to begin with (10%) – Implications?