Modelling in Precision Agri
Modelling in Precision Agri
context, and of candidates for variable-rate management. These requirements must be addressed
with model structure.
Given that precision agriculture involves explicitly managing within-field variation, it would appear
that almost all relevant research objectives involve estimating spatial crop growth and yield. For
some objectives, the relevant area for which yield is simulated could be a management zone or a
soil map unit of reasonably homogenous soil characteristics. Simulating yield for these conditions
is a natural extension of prior modeling research, and the goal may be considered to be the map
unit or management zone mean. For many other objectives, however, the requirement would
appear to be the simulation of yield at all points in the field. Examples of such studies include
spatial recommendations for on-the-go management, or feasibility studies to examine whether there
would be economic benefit to precision agriculturalmanagement. For any case, if the interpretation
depends on zone or point-wise accuracy in simulating yield, then the conclusions of the paper are
only as good as the accuracy of the model.
How good is good enough?
General accuracy issues regarding modeling for precision agriculture were discussed by Sadler et
al. (2000), who pointed out that accuracy requirements are as varied as model research objectives.
Thus, there can be no definitive statement of required accuracy. Ideally, the model result would
exactly match the corresponding measurement at all points in the field. However, sub-ideal results
can provide sufficient information to meet some research objectives. For instance, qualitative
accuracy, in which the direction of the effect of some management change is simulated correctly,
can indicate what management might be recommended in some cases. If the simulated high and
low yields properly indicate the areas of the field where the high and low yields occur, management
zones could be delineated from the information. Target yields for zones or map units may require
only the accuracy of the mean.
However, any research objective that depends on the extremes or range of yields expected would
suffer if these were not quantitatively accurate. Any objective depending on the sensitivity of the
model, such as optimizing variable-rate management, would need to have accuracy of both the mean
and of the derivative with respect to the managed input (Sadler et al.,2000). Risk analysis probably
puts even more emphasis on the model's ability to simulate well the tails of the distributions. These
latter imply the variation is also simulated accurately. The need for unbiased yield estimates, or
accuracy of simulating the mean, is generally recognized. The need for accurate simulation of the
variation is not.
How can we tell?
Bearing these considerationsin mind, how can models be tested and their performance be confirmed?
Model tests fiom pre-precision agriculture literature typically included regression or correlation of
simulated against observed values (or observed against simulated -more on that later), calculations
of root mean square error, mean error (or bias), and in some cases, model efficiency as defined by
Nash and Sutcliffe (1970). Most model tests in precision agriculture have used regression as the
primary test. Further, there has been essentially no discussion of measurement error in published
tests. This issue must eventually be considered, but it is beyond the scope of this work.
Simple linear regression
Simple linear regression of simulated yield as a function of observed yield is pre-programmed in
most application software and therefore is probably most widely used. The interpretation of the
coefficient of determination (9)
as the fiaction of variation in the measurements being explained
by the model is intuitive as a performance measure. There is some difference of opinion whether to
Preclslon agriculture '07
243
regress simulated against observed or the reverse, but ? from both is numerically equivalent, and
perfect agreement converges to the same coefficients, with intercept of zero and slope of one.
Researchers using the regression approach generally conclude that a model produces useful
results if the simulated output represents -70-80% or more of the variation in the observed result.
Although there has been less discussion of slope and intercept, it is not recommended to rely solely,
or even primarily, on r2 without due consideration being given to slope and intercept (Krause et
al., 2005).
Root mean square error
Many researchers have reported the root mean square error (RMSE) as a performance measure. It
has useful characteristics in that it approaches zero with perfect performance and penalizes large
error with the commonly used square function.
(1)
Where 0 = observed value, S= simulated value (formally, 'predicted' is not rigorous because it
does not exist concurrently with observed values), and n is the number of values.
Model efficiency
The hydrologic disciplines often employ a model efficiency developed for river forecasting by
The ENS statistic approaches 1 for perfect model performance, and a value of 0 indicates that the
mean value is as good a predictor asthe model (Krause et al. 2005). In the hydrologic interpretation,
ENS of 0.5 or more is generally considered sufficient to begin interpreting the model results as
representative. However, that threshold is more than likely specific to the hydrologic discipline and
244
yield simulation was perfect or nearly so. For the constant case, we set the 'simulated' yield to equal
the observed annual mean (Figure 1). For the random case, we generated a pseudo-random number
using the random normal function in SAS (SAS Institute, 2006) with mean equal to the observed
annual mean and coefficient of variation (CV) of 10% (Figure 2). Thus, in both cases, the temporal
2000
4000
6000
8000
I0000
Figure I. Synthesized data t o illustrate zero spatial performance with perfect temporal
performance. Simulated output was the annual mean of the measured values (measured data
from Sadler et a/., 2000).
2000
4000
8000
8000
I0000
Figure 2. Synthesized data t o illustrate zero spatial performance with perfect temporal
performance. Simulated output was random values about the annual mean of the measured
values with coefficient of variation (CV) of 10% (Sadler et a/., 2000).
245
variation was simulated very well by design, but there was no spatial simulation accuracy at all. As
seen in the two figures, the ? values obtained were 0.81 and 0.75. However, by definition, these
two cases have no value at all except in estimating the mean yield for the year. It is very difficult
to argue that either case would contribute information useful to precision agriculture.
While it is immediately apparent that neither of the test cases just discussed were capable of
simulating spatial yield, such is not usually the case with real data. In some cases, the model appears
capable of simulating both temporal and spatial yield variation relatively well. For this situation,
a method is needed to objectively analyze the data. We propose a method to separate the temporal
and spatial components of variation, somewhat analogous to Kobayashi and Salarn (2000), who
separated mean squared deviation into its components. Our method uses linear regression to test
temporal performance by comparing annual means of simulated and observed values, and then uses
linear regression to test spatial performance by comparing the residuals fiom those means for the
entire dataset. The residuals are computed using the following equations for each data value.
where the subscripts Y indicate the annual mean for observed and simulated values.
This procedure is illustrated using soybean yield data from Wang et al. (2003). For the purposes
of this illustration only, their calibration and validation datasets were combined (and one apparent
outlier that they identified was deleted), providing 13 sites in 3 years overall. The simple linear
regression of S on 0 (Figure 3) indicates remarkably good fit to the 1:1 line, with S=-228+1.09*0,
?=O.98, ENs=0.96, RMSE=II6, and bim=3.69. These measures all compare quite favorably with
the best results these authors have seen. However, as shown in Figure 4, regression of the annual
means indicates S=-449+1.19*0, ?=I. 00, suggesting that there is a slight underestimation of low
yields in one year. When the regression was performed on the residuals from the means (Figure
5), the relationship was R9=0.898*0, with ?=0.96 (the intercept is zero by definition, but the
regression was not constrruned). This result, unanticipated from prior analyses of the combined data,
illustrates additional interpretationthat may be possible once the temporal and spatial performances
500
Figure 3. Simulated and observed soybean yields for three years from Wang et a/. (2003). Their
data point B3, which they identified as an outlier, was deleted. The data shown are the calibration
and validation data combined.
246
500
Figure 4. Simulated and observed annual mean soybean yields for three years from Wang et ol.
(2003).
800.
600-
o
o
A
1997
1998
1999
600
800
Figure 5. Simulated and observed residuals from annual mean soybean yields for three years
from Wang et ol. (2003).
are considered separately. Here, the slope less than unity seen in Figure 5 indicates a slight but
systematic underestimation of the measured variation in yield. This was not apparent from the
commonly used regression shown in Figure 3.
Conclusions
Tests of models should be chosen to match research objectives, in particular considering multiple
sources of variation in the test data set. In precision agriculture, one would expect the primary
goal to be the ability of a model to simulate spatial variation. A test combining year-to-year and
Precision agriculture '07