How long do we need to run an experiment? Ignacio Colonna & Don Bullock.

51
How long do we need to run an experiment? Ignacio Colonna & Don Bullock

Transcript of How long do we need to run an experiment? Ignacio Colonna & Don Bullock.

How long do we need to run an experiment?

Ignacio Colonna & Don Bullock

•Grain yield maps show a considerable variability across years

Yield map

AlgorithmNF = 21.4 kg N / MT grain * YG – Rotation Credit – Incidental N

But is the variability in yield that matters?

It is the variability in response to inputs that matters.

Assuming a profit-maximizing farmer…

Yield response function

Yield response and profit functions

(only due to N)

ProfitN= YieldN * $ Corn – kg N * $ N

Statistics vs. management:

•Responses may be significantly different statistically, yet yield similar management decisions.

Different curves, same optimumSimilar curves, different optima

But these are ‘after the fact’ optimum N rates…

Farmers’ decisions usually based on the best a priori guess

Concept of ex-post vs ex-ante optima

• Ex-post optimum: computed after collecting the data

• Ex-ante optimum: best guess given the information available before the fact. (“long-run” optimum)

e.g. we have 15 years of N response data for a site:

For this study

• Question 1:

How does the uncertainty about the ‘true’ N rate at a given site change with years of experimentation?

No published estimates on uncertainty in ex-ante N rates as a function of experiment length in the US Midwest.

Question 2:What is the cost of not knowing the true N rate at a given site?

No published estimates on practical consequences of different lengths of experimentation on fertilizer application decisions.

Data : source

• N fertilizer trial at Monmouth, IL

• (conducted by Nafziger,Adee,Hoeft,Mainz)

Data : experimental design

•21 years : 1983-2003

•Split plot in RCBD, 3 reps.

•2 rotations: C/C and C/S

•5 fertilizer rates: 0, 67, 134, 201, 269 kg/ha

(pre-plant)

•Individual plots

6.1 m

18 m

• 21 years x 2 Rotations: Raw Yield Means

C/C C/S

(

• 21 years x 2 Rotations: Model fitsY

ield

re

sp

on

se

(to

n/h

a)

C/C C/S

• 21 years x 2 Rotations: Variability in ex-post N opt

kg/ha 85.6Nopt̂

kg/ha 63.6Nopt̂

‘True’ ex-ante

Nopt=173 kg/ha

‘True’ ex-ante

Nopt=110 kg/ha

Pick two years at random (#1)Compute ex-ante optimum N rate (#1)

Pick two years at random (#2)Compute ex-ante optimum N rate (#2).

Pick two years at random (#3)1000 samples=1000 estimates of ex-ante N rate

Repeat for groups of 3 years,4 years,…etc.

A look at uncertainty in ex-ante N optima

Results from resampling approach: distributions

Ex-ante optimum N (kg/ha)

Results from resampling approach: SD and CV

Years of experimentation Years of experimentation

Ex-

ante

No

pt s

td.d

evi

atio

n

Ex-

ante

No

pt C

V

C/C

C/S

kg/ha 85.6Nopt̂

kg/ha 63.6Nopt̂

Results from resampling approach: Practical implicationsC/C

Error (+ or -)

Results from resampling approach: Practical implications

C/C

Profit at ‘true’ ex- ante Nopt= 249 $/ha

Loss relative to maximum

Conclusions so far :

•Relatively small effect in monetary terms (~very small for >4 years, e.g. at 4 years < 10% prob of loss > 10$/ha

•But, how do these errors compare with within-field spatial variability in Nopt?

•Is this of use to conventional systems?

Regression of Crop Yield with Soil and Landscape

Attributes: An Assessment of Some Common Methods for Dealing with Spatially

Correlated Residuals

Spatial correlation of residuals in regressions are often overlooked in agronomic and engineering research, especially so in analyses related to precision agriculture, with a few exceptions.

We argue that this oversight is not trivial and neither is the choice for its solution.

Field Experiment

Soybean yield monitor data

(2 years)

1999 2001

Soil sample data (P and K)

Elevation data and derivatives (Slope, Aspect, etc.)

20 m grid

0 1 2 3

4 5 6

7

t t t t t

t t t

t t

Yield P K PrCurv

PlanCurv Asp SPI

CTI

OLS (Ordinary least squares)

ˆOLS -1β X'X X'Y

2~ (0, )N ε I

if errors are as assumed, but often residuals do show spatial correlation due to variables not included in the model

Semivariograms of OLS residuals

Spatial Mixed

~ (0, )Nε Σ

Errors not assumed independent. Σ estimated with geostatistical models. Parameters for est by ML or REML

ˆ -1-1 -1GLSβ = X'Σ X X'Σ Y

GLS: Generalized Least Squares estimator

Y = Xβ+ε

Nearest Neighbors

(non-iterative version – computations are simple)

Average of neighboring OLS residuals

γY = Xβ + Wε +ξ

ˆOLS -1β X'X X'Y

Computation:•Compute OLS regression Y=X+ and save residuals ().•Compute average of neighboring residuals for each point (W).•Compute new OLS regression but using Wfrom 2 as a covariate in: Y=X+ W +

Spatially autoregressive approaches

SAR error - the effect of the observed OLS residuals is due to the omission of spatially structured explanatory variables in the X matrix.

SAR lag - value of response variable is in part due to a contagion or diffusion from the same variable at nearby locations or there is a mismatch between the scale at the a variable is measured and the true scale of the process.

Decide upon model based on substantive interpretation and Lagrange Multiplier specification tests (Anselin).

“Queen Structure” for W

Yellow: neighbors = 1

Blue: not neighbors = 0

Red: Point i

SAR-Error (Spatial Autoregressive –

Error) WXY

Average of neighboring OLS residuals

: autoregressive coefficient of the residuals (estimated by ML)

: weights matrix defining covariance structure.

1 if i and j neighbors, 0 otherwise.ij

W

W

12 ')( WIWI

),0(~ N

YXXXGLS111 ''ˆ

SAR-Lag (Spatial

Autoregressive – Lag)

12 ')( WIWIYVar

Average of neighboring values for Y

“Direct effect of neighbors on point i”

XWYY

YWIXXXML )(''ˆ 1

: autoregressive coefficient of the residuals (estimated by ML)

: weights matrix defining covariance structure.

1 if i and j neighbors, 0 otherwise.ij

W

W

Flat line→spatially uncorrelated residuals. All methods seem to achieve similar results in terms of residual spatial structure.

Points shifted vertically to aid visualization.

Effect p-val. p-val. p-val. p-val. p-val.

Intercept 99 528.6 <.001 569.7 <.001 373.0 0.00 531.6 <.001 553.9 0.00P 99 0.60 0.02 0.64 0.11 0.6 0.02 0.59 0.1 0.7 0.00K 99 -0.28 <.001 -0.32 0.00 -0.2 0.00 -0.29 0.0 -0.3 0.00Prof 99 -283.7 0.00 -265.8 0.00 -249.3 0.00 -276.6 0.0 -275.4 0.00Plan 99 -337.0 <.001 -363.0 0.00 -322.6 0.00 -344.2 <.001 -359.4 0.00Spi 99 1.10 0.04 1.93 0.00 1.0 0.04 1.67 0.0 1.6 0.00Cti 99 -20.46 <.001 -25.02 0.00 -19.1 0.00 -20.79 <.001 -23.1 0.00Ang 99 5.81 0.00 6.77 0.00 4.8 0.00 5.97 0.0 6.4 0.00

Intercept 01 179.7 <.001 330.1 0.00 41.3 0.06 339.9 <.001 237.3 0.00P 01 1.08 <.001 0.14 0.72 0.20 0.13 0.31 0.5 1.0 0.00K 01 -0.12 0.04 -0.24 0.06 -0.05 0.14 -0.29 0.1 -0.2 0.00Prof 01 36.4 0.57 -86.2 0.07 -58.5 0.16 -96.0 0.0 -51.7 0.22Plan 01 3.95 0.92 -93.4 0.00 -72.3 0.00 -87.8 0.0 -62.2 0.01Spi 01 0.98 0.03 -0.25 0.50 -0.03 0.91 -0.01 1.0 -0.1 0.84Cti 01 10.35 0.00 -0.02 0.99 1.20 0.55 -0.38 0.9 4.8 0.02Ang 01 1.17 0.34 1.86 0.11 0.91 0.25 1.74 0.1 1.6 0.05

OLS SAR-error Queen SAR-lag Queen DCR (REML) Nearest Neighbors

Shaded values are significantly different from OLS estimates

Regression example - Conclusions

Spatial Mixed, SAR-error and SAR-lag parameter estimates showed significant differences to those from OLS only for the year with the largest spatial structure. Parameter estimates from NN where not significantly different from OLS ones, despite the apparent difference in magnitude.

Estimates from SAR-lag were in general smaller in magnitude relative to all other methods. This is due to the “filtering” performed by this method on the response variable.

Is this reasonable for this type of analysis? We believe it is not.

So, which method should we choose to account for the spatial correlation of residuals in regression?

This question motivates the second part of our analysis.

Simulation Experiment

•3 independent variables: x1,x2 and e with• short and long range error structures: "Short range" error structure

meanpartial sill

nuggetrange (m)

x1 5 2 0.5 80x2 5 1.4 0 50

e short 0 5 0 70

"Long range" error structure

meanpartial sill

nugget range

x1 5 2 0.5 80x2 5 1.4 0 50

e long 0 5 0 210

•Random values for each variable generated at 4 densities in a 400 m x 400 m field.

Values generated using Sim2d in SAS®. Based on LU decomposition of the covariance matrix.• Spatial structure based on a spherical model.• 1000 realizations for each variable-density-error structure combination (e.g. e-440-short range)

• Generate dependent variable Y: Yshort=10+0.6 x1+1.2 x2+eshort

Ylong=10+0.6 x1+1.2 x2+eshort• Adjusted theoretical R2=0.37

+ +=

x1 x2e y

1000 X

• Regression model: Y=b0+b1 x1+b2 x2 • •Parameters Estimated by •OLS• Spatial Mixed• SAR-error • SAR-lag• Nearest neighbors

Methodology: Analysis of simulated data

Higher point densities:

• Dispersion: OLS and NN show a considerably higher dispersion than Spatial Mixed and SAR methods.

• Bias: SAR-lag shows a marked downward bias at high densities, resulting in an underestimation of the true effect of x1.

Lower point densities:

• Neither dispersion nor bias differ among methods for a short correlation range.

• For a larger correlation range in the residuals, dispersion for OLS and bias for SAR-lag are still important at lower densities.

• Results are similar for 2 (not shown)

SAR-lag bias

SAR-lag bias

70mrange 440, density point - 1̂ m. 210range 440, density point 1̂

70mrange 110, density point - 1̂ m. 210range 110, density point 1̂

70mrange 440, density point - 1̂ m. 210range 440, density point 1̂

SAR-lag bias

SAR-lag bias

70mrange 110, density point - 1̂ m. 210range 110, density point 1̂

Spatial structure effect

Conclusions from simulationsn (partial)

The inadequate use of a SAR-lag model can generate a considerable downward bias in parameter estimates. The meaningfulness of such model for regression analysis of agronomic data as in the example above may be questionable (i.e. there is no direct “influence of neighbors yield on yield at point i”).

Spatial Mixed and SAR-error resulted in similar outcomes when the latter was based on a “Queen” neighbors matrix. The use of other matrices proved inefficient (not shown), while the results for Spatial Mixed were consistent even when the covariance model used was incorrect (e.g. exponential instead of spherical).

While all “spatial” methods showed a markedly lower dispersion than OLS, NN was clearly less efficient than Spatial Mixed and SAR-Error. An iterative version of NN was not evaluated and might prove more efficient than the simple version used here.