Sources of Measurement Error, Misclassification Error, and ...simons/Euring.pdf · Sources of...

18
Sources of Measurement Error, Misclassification Error, and Bias in Auditory Avian Point Count Data Theodore R. Simons, Kenneth H. Pollock, John M. Wettroth, Mathew W. Alldredge, Krishna Pacifici, and Jerome Brewster Abstract Avian point counts vary over space and time due to actual differences in abundance, differences in detection probabilities among counts, and differences associated with measurement and misclassification errors. However, despite the substantial time, effort, and money expended counting birds in ecological research and monitoring, the validity of common survey methods remains largely untested, and there is still considerable disagreement over the importance of estimating detec- tion probabilities associated with individual counts. Most practitioners assume that current methods for estimating detection probability are accurate, and that observer training obviates the need to account for measurement and misclassification errors in point count data. Our approach combines empirical data from field studies with field experiments using a system for simulating avian census conditions when most birds are identified by sound. Our objectives are to: identify the factors that influ- ence detection probability on auditory point counts, quantify the bias and preci- sion of current sampling methods, and find new applications of sampling theory and methodologies that produce practical improvements in the quality of bird census data. We have found that factors affecting detection probabilities on auditory counts, such as ambient noise, can cause substantial biases in count data. Distance sampling data are subject to substantial measurement error due to the difficulty of estimating the distance to a sound source when visual cues are lacking. Misclassification errors are also inherent in time of detection methods due to the difficulty of accurately identifying and localizing sounds during a count. Factors affecting detection prob- ability, measurement errors, and misclassification errors are important but often ignored components of the uncertainty associated with point-count-based abun- dance estimates. T.R. Simons (B) USGS, NC Cooperative Fish and Wildlife Research Unit, Department of Zoology, Campus Box 7617, North Carolina State University, Raleigh, NC 27695, USA D.L. Thomson et al. (eds.), Modeling Demographic Processes in Marked Populations, Environmental and Ecological Statistics 3, DOI 10.1007/978-0-387-78151-8 10, C Springer Science+Business Media, LLC 2009 237

Transcript of Sources of Measurement Error, Misclassification Error, and ...simons/Euring.pdf · Sources of...

Sources of Measurement Error, MisclassificationError, and Bias in Auditory Avian PointCount Data

Theodore R. Simons, Kenneth H. Pollock, John M. Wettroth,Mathew W. Alldredge, Krishna Pacifici, and Jerome Brewster

Abstract Avian point counts vary over space and time due to actual differencesin abundance, differences in detection probabilities among counts, and differencesassociated with measurement and misclassification errors. However, despite thesubstantial time, effort, and money expended counting birds in ecological researchand monitoring, the validity of common survey methods remains largely untested,and there is still considerable disagreement over the importance of estimating detec-tion probabilities associated with individual counts. Most practitioners assume thatcurrent methods for estimating detection probability are accurate, and that observertraining obviates the need to account for measurement and misclassification errorsin point count data. Our approach combines empirical data from field studies withfield experiments using a system for simulating avian census conditions when mostbirds are identified by sound. Our objectives are to: identify the factors that influ-ence detection probability on auditory point counts, quantify the bias and preci-sion of current sampling methods, and find new applications of sampling theoryand methodologies that produce practical improvements in the quality of birdcensus data.

We have found that factors affecting detection probabilities on auditory counts,such as ambient noise, can cause substantial biases in count data. Distance samplingdata are subject to substantial measurement error due to the difficulty of estimatingthe distance to a sound source when visual cues are lacking. Misclassification errorsare also inherent in time of detection methods due to the difficulty of accuratelyidentifying and localizing sounds during a count. Factors affecting detection prob-ability, measurement errors, and misclassification errors are important but oftenignored components of the uncertainty associated with point-count-based abun-dance estimates.

T.R. Simons (B)USGS, NC Cooperative Fish and Wildlife Research Unit, Department of Zoology, Campus Box7617, North Carolina State University, Raleigh, NC 27695, USA

D.L. Thomson et al. (eds.), Modeling Demographic Processes in Marked Populations,Environmental and Ecological Statistics 3, DOI 10.1007/978-0-387-78151-8 10,C© Springer Science+Business Media, LLC 2009

237

238 T.R. Simons et al.

1 Introduction

The most common method of estimating avian abundance is the point count (Ralphet al. 1995) where a single observer records all birds seen or heard at a point duringa prescribed interval (usually 3-10 min) (Fig. 1). Surveys of breeding birds relyheavily on auditory detections which can comprise 70% of observations in suburbanlandscapes (Sauer et al. 1994), 81% in tropical forests (Scott et al. 1981), and up to97% of observations in closed-canopy deciduous forest (DeJong and Emlen 1985;Brewster 2007).

Avian abundance estimates can vary over space and time due to actual differ-ences in abundance, differences in detection probabilities among counts, and differ-ences associated with measurement and misclassification errors (Nichols et al. 2008;Nichols et al. 2000; Farnsworth et al. 2002; Pollock et al. 2002; Rosenstock et al.2002; Thompson 2002).

A general conceptual model (Marsh and Sinclair 1989; Pollock et al. 2004, 2006)for auditory count-based abundance estimates can be represented as:

N = C

parea papda

(A) (B)

Fig. 1 Common point count protocols. (a) A single observer standing at the center of a circular plotmaps the location and estimated distance to all birds seen or heard during a prescribed time interval,usually 3–10 min. Time of detection information can be incorporated by color-coding or annotatingobservations to indicate the time interval of initial and subsequent detections. (b) Multiple observermethods employ two or more observers who collect data simultaneously. Observers can assumeeither dependent (Nichols et al. 2000) or independent (Alldredge et al. 2006) roles

Sources of Measurement Error, Misclassification Error, and Bias 239

where:

N = the population estimate

C = the count statistic

parea = the fraction of the area sampled

pa = the probability that a bird is available to be counted

pda = the probability that a bird is detected given that it is available.

Many factors influence the probability of detecting birds during auditory pointcounts. These factors include both “measurement error” factors associated withobserver skill and ability, and “signal to noise ratio” factors that influence howmuch information about bird diversity and abundance is available to observers.Measurement error factors relate to observer skill in identifying and localizingindividual birds (Kepler and Scott 1981), and hearing ability (Emlen and Dejong1981, 1992; Sauer et al. 1994; Kendall et al. 1996; Downes 2004). Signal tonoise ratio factors include the spectral qualities of songs (Schieck 1997), songvolume, singing rate (Best 1981; Ralph 1981; Skirvin 1981), time of day (Sheilds1977; Robbins 1981a; Skirvin 1981), the orientation of singing birds (towardor away from observers), (Alldredge et al. 2007c) presence of an observer(McShea and Rappole 1997), the number of species and number of individualssinging during a count (Simons et al. 2007), pairing status (Krebs et al. 1980;Johnson 1983; Cuthill and Hindmarsh 1985; Gibbs and Wenny 1993), stage ofnesting cycle (Wilson and Bart 1985), vegetation structure (Diehl 1981; McSheaand Rappole 1997; Simons et al. 2006; Pacifici et al. in press), topography,weather (Mayfield 1981; Robbins 1981b), temperature, humidity, and ambientnoise (Simons et al. 2007). Systematic variation in any of these factors will imparta systematic bias in count data.

At least five methods of estimating detection probabilities on avian pointcounts are currently available (Nichols et al. 2008): distance sampling (Bucklandet al. 2001), multiple-observer methods (Nichols et al. 2000; Alldredge et al. 2007a),time-of-detection methods (Farnsworth et al. 2002; Alldredge et al. 2007b), doublesampling (Bart and Earnst 2002) and repeated count methods (Royle and Nichols2003; Kery et al. 2005). Applications of combined methods are also possible(Kissling and Garton 2006; Alldredge et al. 2007a, b). Different methods estimatedifferent components of the detection process. For example, distance sampling andmultiple observer approaches assume that all birds on a given sample plot are avail-able (sing during the count interval) and they estimate the probability of detectiongiven availability. Time of detection methods provide estimates of the product ofavailability and detection given availability (Alldredge et al. 2007a) but they cannotseparate the two components. Repeated count methods (Royle and Nichols 2003;Nichols et al. this volume) estimate the product of availability, detection givenavailability, and a third component of the detection process, the probability thatan individual is present in the sample area.

240 T.R. Simons et al.

Fig. 2 Comparisons of abundance measures for 18 species of breeding birds recorded at 247paired survey locations in primary and secondary forests in Great Smoky Mountains NationalPark (Simons et al. 2006). (Top) Mean relative abundance; counts are not adjusted for differencesin de tectability. (Middle) Effective detection radii (EDR) estimates (calculated using ProgramDISTANCE, Thomas et al. 1998). (Bottom) Estimated densities based on effective detection radii.Error bars represent standard errors. Significant differences are denoted by asterisks (*p < 0.05,**p < 0.01, ***p < 0.001)

Sources of Measurement Error, Misclassification Error, and Bias 241

A brief example will illustrate how correcting avian point count data for vari-ations in detection probability among species and habitats can dramatically alterabundance estimates. Figure 2 summarizes count data for 18 species of forestsongbirds from 247 paired point count locations in primary and secondary southernAppalachian forests (Simons et al. 2006). Of interest is the similarity of unadjustedcounts (Fig. 2a) for two very dissimilar species, the Scarlet Tanager (SCTA, Pirangaolivacea) and the Golden-crowned Kinglet (GCKI, Regulus satrapa). The ScarletTanager is a brightly-colored, very active and vocal species that flies constantlyabout the forest canopy giving a loud, high-energy call. The Golden-crownedKinglet is a small, drab, and generally inconspicuous species that forages alongbranches and gives a high, thin, low-energy call. The unadjusted counts for thesetwo species suggest that their abundance is similar in primary and secondary forests.However when we examine the effective detection radii (Buckland et al. 2001) of thetwo species (Fig. 2b) and use this information to convert the raw counts into densityestimates (Fig. 2c) dramatic differences in abundance become apparent. Accountingfor differences in detection probability related to differences in the conspicuous-ness of theses two species results in a nearly four-fold difference in our abundanceestimate. Similarly, differences in abundance between primary and secondary foresthabitats become apparent once counts are adjusted for differences in detection prob-ability between habitats. Failure to account for such differences detection probabil-ities among species and habitats weakens inferences from comparative studies ofavian abundance (Yoccoz et al. 2001; Pollock et al. 2002; Williams et al. 2002).

Nevertheless, there is still considerable disagreement over the importance ofestimating detection probabilities associated with individual counts (Rosenstocket al. 2002). Although common survey methods are largely unvalidated, mostpractitioners assume that current methods for estimating detection probability areaccurate, and that observer training obviates the need to account for measure-ment and misclassification errors in point count data. Given the substantial time,effort, and money expended conducting avian point counts to address ecologicalresearch and monitoring objectives (Bart 2005; Simons et al. 2007), validatingcurrent avian sampling methods has enormous practical importance. In this paper wefirst review factors that influence detection probabilities on auditory counts, and wethen summarize key findings of recent field experiments aimed at understanding thefactors affecting detection probabilities, and the sources and magnitude of measure-ment and misclassification errors inherent in several common sampling methods.

2 Approach

Our approach uses empirical data from field studies of southern Appalachian song-birds (Shriner 2001; Lichstein et al. 2002; Brewster 2007) to inform the devel-opment of new avian sampling methods (Farnsworth et al. 2002; Alldredge etal. 2007a), and to design field experiments (Simons et al. 2007) that assess thefactors affecting detection probabilities on auditory counts (Alldredge et al. 2007b;

242 T.R. Simons et al.

Fig. 3 System diagram of song playback system. (A) laptop computer and playback software, (B)transmitter, (C) portable receiver/player. See Simons et al. (2007) for specifications

Pacifici et al. in press), and the precision and accuracy of auditory avian point countmethods. (Alldredge et al. 2007c, d; Alldredge et al. in press).

Field experiments are conducted using a system for simulating avian censusconditions when most birds are identified by sound (Fig. 3). The system uses a

Fig. 4 Volunteers participating in a playback experiment. Observers standing at the center of theexperimental plot conduct point counts simulated using up to 45 different playback devices. Playersare placed at known locations (up to 200 m) and heights (up to 15 m) on the surrounding plot

Sources of Measurement Error, Misclassification Error, and Bias 243

laptop computer to control up to 50 amplified MP3 players placed at known loca-tions up to 200 m around a survey point (see Simons et al. (2007) for details). Todate we have simulated over 5,000 unlimited radius point counts with 50 observers(Fig. 4). The system can realistically simulate a known population of songbirdsunder a range of factors that affect detection probabilities. Validation experi-ments evaluate traditional methods for estimating detection probabilities, such asdistance sampling, and new approaches that incorporate information from multipleobservers, the time sequence of observations, and combined methods.

2.1 Factors Affecting Detection Probabilities

Figure 5 illustrates the number of six observers able to hear (Heard), correctly iden-tify (Correct), and number of observers who misidentified (Wrong) calls of Black-throated Blue Warblers (BTBW, Dendroica caerulescens), at 25 distances between40 and 160 m. Calls were played randomly at each distance for approximately 20 s.

Fig. 5 Number of six observers able to hear (Heard), correctly identify (Correct), and numberof observers who misidentified (Wrong) calls of Black-throated Blue Warblers at 25 distancesbetween 40 and 160 m. Calls played randomly at each distance for approximately 20 s. Experi-ments were replicated under four ambient noise conditions: (a) quiet (mean ambient noise 40.6dB, S.D. 4.47 dB), (b) breezy (10–20 km/h gusty winds, 55.4 dB, S.D. 3.87 dB), (c) quiet condi-tions with 1–3 background birds (Winter Wren, Troglodytes troglodytes, Yellow-throated Warbler,Dendroica dominica, and Ovenbird) singing 20 m behind or to either side of the observers, and(d) quiet conditions with white noise added (10 dB above ambient). White noise (uniform power,spectral frequency = 1.0) was played from a speaker facing the observers at a distance of 10 m

244 T.R. Simons et al.

Experiments were replicated under four ambient noise conditions: (a) quiet (meanambient noise 40.6 dB, S.D. 4.47 dB), (b) breezy (10–20 km/h gusty winds, 55.4dB, S.D. 3.87 dB), (c) quiet conditions with three background birds Winter Wren(WIWR, Troglodytes troglodytes), Yellow-throated Warbler (YTWA, Dendroicadominica), and Ovenbird (OVEN, Seiurus aurocapillus) singing 40 m behind or toeither side of the observers, and (d) quiet conditions with white noise added (10 dBabove ambient). White noise (uniform power, spectral frequency = 1.0) was playedfrom a speaker facing the observers at a distance of 10 m. Results illustrate howdetection distances decline and identification errors increase with increasing levelsof ambient noise. Overall, the proportion of birds heard by observers decreasedby 28 ± 4.7% under breezy conditions, 41 ± 5.2% by the presence of additionalbackground birds, and 42 ± 3.4% by the addition of 10 dB of white noise.

Temporal trends in environmental factors such as ambient noise can impart trendsin count data unrelated to the true abundance of birds. To provide some context forour ambient noise experiment we asked observers to record ambient noise levelson 21 Breeding Bird Survey (Sauer et al. 2005) routes across North Carolina in2006. Note the proportion of North Carolina BBS counts in which ambient noiselevels exceed 40 dB (Fig. 6). Ambient noise experiments (Fig. 5) indicate that anincrease in ambient noise from 40 to 50 dB produces a 42% average reduction inthe counts of six common species. Thus, if ambient noise levels along these NorthCarolina routes increased by 10 dB over the past 20 years, we would expect BBScounts of species detected by ear to decline over that interval by about 40%, evenif populations were stable. Because BBS counts are not adjusted for differencesin detection probability, in this example there is no way of knowing if declinesrepresent actual population declines or simply declines in detection probability dueto increasing ambient noise.

Fig. 6 Measured levels of ambient noise on 20 North Carolina Breeding Bird Survey routes in2006. Observers conduct 50 3-min unlimited radius point counts along a 40 km route. Symbolsrepresent the mean of three sound pressure readings measured along each route using a MartelElectronics model 325 sound level meter (accuracy ± 1.5 dB)

Sources of Measurement Error, Misclassification Error, and Bias 245

21

20

10 26

23

0

0.5

1

1.5

2

2.5

5/2 – 5/7 5/16 – 5/21 5/31 – 6/5 6/13 – 6/18Sampling period

So

ng

s/m

in

6/28 – 7/1

Fig. 7 Declining singing rates of Black-throated Blue Warblers (Dendroica caerulescens) overthe breeding season in North Carolina. Sample sizes (individual birds sampled for a minimum of30 min) and standard errors indicated for each sample period (Brewster 2007)

Trends in other factors affecting detection probabilities, such as observers orhabitat conditions (Sauer et al. 1994; Norvell et al. 2003), can impart similar biases.For example, a recent analysis indicated that 76% of observers conducting CanadianBreeding Bird Survey routes are over 45 years old (Downes 2004; Simons et al.2007). Forty-five percent of observers cited “hearing loss” as their primary reasonfor retiring from the survey. As with ambient noise, trends in age-related hearingloss can impart trends in count data that are unrelated to true abundance.

Singing rates of most breeding songbirds decline steadily during the breedingseason. Brewster (2007) found that singing rates of southern Appalachian Black-throated Blue Warblers declined by 50% during the first month of the breedingseason (Fig. 7). Temporal trends in factors such as average singing rates, that influ-ence availability during a count, can also impart trends in count data unrelated to thetrue abundance. For example, there is increasing evidence that birds are breedingearlier now than in the past, presumably due to global warming (Butler 2003).Climatic trends that impart trends in the average singing rates of birds will biasabundance estimates over time unless analyses account for the temporal trends indetection probabilities.

We assessed several factors thought to influence overall detection probabilities(pa pda) on 40 experimental 3-min point counts comprised of 10 birds per countand five primary species (Black-and-white Warbler (BAWW), Mniotilta varia),Black-throated Blue Warbler, Black-throated Green Warbler (BTNW, Dendroicavirens), Hooded Warbler (HOWA, Wilsonia citrina), and Ovenbird over a rangeof 15 distances (34–143 m). Songs were played at low (two songs per count)and high (13–21 songs per count) singing rates (Alldredge et al. 2007b). Detec-tion probabilities at 100 m ranged from 0.60 (Black-and-white Warbler) to 0.83(Hooded Warbler) at the high singing rate and 0.41 (Black-and-white Warbler)to 0.67 (Hooded Warbler) at the low singing rate (Fig. 8). Logistic regression

246 T.R. Simons et al.

Fig. 8 Logistic regression models for a single observer illustrating the relationship betweendetection probability and distance for counts of five species (Black-and-white Warbler (BAWW,Mniotilta varia), Black-throated Blue Warbler (BTBW, Dendroica caerulescens), Black-throatedGreen Warbler (BTNW, Dendroica virens), Hooded Warbler (HOWA, Wilsonia citrina), and Oven-bird (OVEN, Seiurus aurocapillus) singing at high and low singing rates (Alldredge et al. 2007b).Note the consistent effect of singing rate on detection probability

analyses indicated that species, singing rate, distance, and observer were all signifi-cant factors affecting detection probabilities. Simulations of expected counts basedon the best logistic model (Table 1), indicated that observers detected between 19%(190/1000 birds for the worst observer, lowest singing rate, and least detectablespecies) and 65% (653/1000 birds for the best observer, highest singing rate, andmost detectable species) of the true population.

2.2 Evaluation of Distance Measurement Error

Detection distance is one of the most important and common auxiliary variablesmeasured during point count surveys of avian abundance. The distance to individualbirds is used to determine the effective area sampled, to determine if birds are withina fixed radius plot, and to model the detection process. In densely vegetated habi-tats, visual detections of birds are rare, and most estimates of detection distanceare based solely on auditory cues. Distance sampling theory assumes that detec-tion distances are measured accurately, but empirical validation of this assumptionfor auditory detections is lacking. We simulated avian point counts in a forestedhabitat to determine the error structure of distance estimates based on auditorydetections (Alldredge et al. 2007c). Experiments were conducted with six expe-rienced observers both before and after distance estimation training. Experiments

Sources of Measurement Error, Misclassification Error, and Bias 247

Table 1 Detection probabilities at distances from 30 to 150 m, and expected counts for a simulatedpopulation of 1,000 uniformly distributed birds, based on the logistic models for BAWW (leastdetectable species) and HOWA (most detectable species) using the best and worst observers andboth high and low singing rates

Worst Best

Distance Low High Low High

BAWW30 (40) 0.87 0.99 0.94 1.0060 (120) 0.61 0.92 0.80 0.9790 (200) 0.26 0.55 0.48 0.75120 (280) 0.08 0.11 0.17 0.23150 (360) 0.02 0.01 0.05 0.03Expected Count 190 294 295 382

HOWA30 0.97 1.00 0.99 1.0060 0.88 0.99 0.95 1.0090 0.64 0.93 0.82 0.97120 0.29 0.55 0.51 0.76150 0.08 0.11 0.19 0.24Expected Count 382 538 529 653

were also conducted to determine the effect of the height and orientation (towardor away from observers) of the song source on distance estimation error. Distanceestimation errors for all experiments were substantial, although training did reduceerrors and bias in distance estimates. Distance estimates for all observers increasedfor all species played between 23 and 65 m. Distance estimates did not increase forsongs played at distances between 65 and 86 m, indicating observers were not ableto differentiate distances among songs played within this range. The height fromwhich songs were played had no effect on distance estimation errors. The orientationof the song source did have a large effect on distance estimation errors; observersgenerally doubled their estimates for songs played away from them compared tosongs played directly toward them (Fig. 9).

2.3 Double-Observer Methods

Comparing simultaneous observations by two or more observers provides anothermeasure of point count detection probabilities (Alldredge et al. 2006). The methodrequires that observers accurately map bird locations and match birds detected by allobservers. We evaluated the accuracy and sources of measurement and classificationerror in double-observer counts by conducting 60 experimental 3-min point countswith six experienced observers (Alldredge et al. in press). Thirty five players wereuniformly distributed with respect to area around a single point. All players wereset 1m above ground at radial distances between 0 and 120m. Songs for all specieswere played at a sound intensity of approximately 90 dB at a distance of 1m.

Each count had exactly 12 birds of up to eight species. Six primary species;Scarlet Tanager (SCTA, Piranga olivacea), Acadian Flycatcher (ACFL, Empidonax

248 T.R. Simons et al.

Fig. 9 Differences in distance estimation errors for songs oriented toward observers comparedto those oriented away from observers. Errors for six observers averaged across three distancecategories. WOTH, (Wood Thrush Hylocichla mustelina), RBNU, (Red-breasted Nuthatch, SittaCanadensis), BTNW (Black-throated Green Warbler, Dendroica virens), BAWW (Black and WhiteWarbler, Mniotilta varia), ACFL (Acadian Flycatcher, Empidonax virescens)

Sources of Measurement Error, Misclassification Error, and Bias 249

Table 2 Identification and matching errors for two observers conducting a double-observerpoint count

Observer A Observer B

# Birds simulated 730 – 730# Birds recorded (% correct) 581 (72.1) – 598 (75.3)Birds mapped within true quadrant (%) 448 (77.1) – 424 (70.9)Birds double-counted (%) 52 (9.0) – 42 (7.0)Imagined birds (%) 3 (0.5) – 6 (1.0)180 degree birds (%) 4 (0.7) – 5 (0.8)Total observations for both – 679 –Observations match (%) – 495 (72.9) –Observations matched in same quadrant (%) – 432 (63.6) –

virescens), BAWW, BTBW, BTNW, and HOWA were simulated across a rangeof distances to approximate a population of 100 birds uniformly distributed withrespect to area. Table 2 provides typical results from a pair of observers whoconducted 60 3-min double-observer point counts. A quadrant was a 90◦ segment ofthe circular plot, centered on the true location of the song (for individual observers)or the mapped location of one observer (for combined data). Overall, observersundercounted the total number of birds available, recording on average 75.5% (S.E.1.7%) of simulated birds, and correctly matching 75% (S.E. 1.7%) of birds recorded.In contrast, counts were inflated by an average of 8% (S.E. 1.6%) due a combinationof double-counting and misidentification (imagined birds) errors.

2.4 Time of Detection Methods

We evaluated the time of detection method (Farnsworth et al. 2002; Alldredgeet al. 2007a) by conducting 60, 8-min point counts with four experienced observers(Alldredge et al. 2007d). Counts were divided into four, 2-min intervals, andobservers recorded birds using multi-colored pens to distinguish time intervals.Detections of birds recorded in a previous interval were noted by underlining theinitial notation in the color of the current time interval. Thirty five players wereuniformly distributed with respect to area around a single point. All players wereset 1m above ground at radial distances between 0 and 1m. Songs for all specieswere played at a sound intensity of approximately 90 dB at a distance of 1m.

Eighteen birds of 12 species were simulated on each point. Analyses werefocused on eight species; ACFL, BAWW, BTBW, BTNW, HOWA, OVEN, SCTA,and YTWA. Songs of 100 total individual birds of each of these eight species wereplayed on the 60 counts. The total simulated population size was 800 birds becausenot all birds in the simulated population were available on some counts. Availabilityof ACFL, BAWW, BTNW, SCTA, and YTWA was simulated under a Markovianprocess with availability during the count interval varying between 0.60 and 1.0, andsinging rates varying from 2 to 8 songs per minute. Singing rates and availabilityfor BTBW and OVEN were based on empirical field data (Brewster 2007).

250 T.R. Simons et al.

Double counting, counting a single bird as more than one bird, was a signifi-cant source of error among the four experienced observers. Double counting ratesranged from 0.9 to 3.4% (S.E. 0.6%) of total observations among observers. Doublecounting occurred in a variety of forms. In most cases single birds were recorded astwo birds throughout the count interval. This created two or more capture historiesthat clearly indicated a single individual tracked as two birds throughout the count.Occasionally observers mapped a bird in one location at the beginning of the count,then mapped the same bird in a new location and continued to track it at the newlocation for the remainder of the count. These cases produced two or more capturehistories of the form xx00 for the original bird and 00xx for the double count, wherex could be a either a one or a zero.

Overall 2.0–4.1% (S.E. 0.43%) of observations were recorded in the wrong timeinterval among the four observers. This can occur when two or more individuals ofthe same species sing during a count, and observers attribute a song to the wrongindividual. Finally, as we found in the multiple-observer experiment, misidentifi-cation errors were rare with experienced observers. Misidentification errors rangedfrom 0.1 to 0.6% (S.E. 0.09%) of total observations among the four experiencedobservers.

3 Discussion

Many known and unknown factors influence detection probabilities on auditorypoint counts. These include factors such as ambient noise, habitat structure, andthe singing rates of individual birds that can impart trends in abundance estimatesunrelated to true abundance. Fortunately, multiple methods of estimating detectionprobabilities are now available, and we believe direct estimates of detection proba-bility should accompany all analyses of avian point count data.

Unfortunately, methods that require the localization of auditory detections aresubject to large measurement and misclassification errors. Our experiments illustratethe source and magnitude of those errors, although we believe errors on actual pointcounts are probably larger than our results suggest, because our simulations involveda relatively small number of species, our observers were highly experienced, andmany sources of variability were carefully controlled in our experiments.

As the example for the Golden-crowned Kinglet and Scarlet Tanager in the intro-duction illustrates, distance sampling can reduce sampling bias for species withlarge differences in behavior, plumage, and song characteristics. Nevertheless wefound that measurement error on auditory point counts is substantial, presumablybecause the complexities of sound attenuation and reverberation in natural envi-ronments make the localization of auditory cues very difficult. We recommend thatpractitioners distinguish between visual and auditory detections in their analysesof point count data, and recognize the limitations of distance sampling methodson avian point counts when detections are auditory. If distance sampling basedsolely on auditory detections is unavoidable, analyses should incorporate estimatesof measurement error.

Sources of Measurement Error, Misclassification Error, and Bias 251

Classification and matching errors are important and generally ignored compo-nents of overall error rates on multiple-observer and time of detection point counts.Errors result primarily due to the difficulty of localizing auditory cues and accuratelymapping bird locations (Buckland 2006). We would expect substantially largererrors on actual point counts that employ these methods due to the movement ofbirds during the count interval.

These findings imply that the uncertainty surrounding estimates of avian diversityand abundance based on distance, double-observer, and time of detection auditorysampling methods is much higher than is currently assumed by practitioners. There-fore conclusions based on those estimates are, in most cases, weaker than thosecurrently reported in studies based on auditory detections. Approaches to accountfor this uncertainty in abundance models are clearly needed if abundance estimatesare to withstand critical scrutiny.

Alternatively, it is possible that modern avian auditory sampling methods, thatrequire observers to simultaneously track the location and singing rates of multipleindividuals and multiple species, often exceed the ability of human observers.Simplified protocols, such as single species surveys, or occupancy approaches basedon presence/absence data (Royle and Nichols 2003) that reduce the demands onhuman observers, may ultimately yield better results.

Acknowledgments We are very grateful to the many volunteers who assisted with this research:David Allen, Brady Beck, Jenna Begier, Scott Bosworth, Amy Bleckinger, Dan Boone, MarshallBrooks, Gordon Brown, Becky Browning, Sue Cameron, Susan Campbell, John Connors, DeannaDawson, Jimmy Dodson, Barbara Dowell, Curtis Dykestra, Adam Efird, Patrick Farrell, JohnFinnegan, Lena Gallitano, John Gerwin, Stephanie Horton, Becky Hylton, Mark Johns, Chris Kelly,Salina Kovach, Ed Laurent, Harry Legrand, Merrill Lynch, Sarah Mabey, Jeff Marcus, Kevin Miller,Melissa Miller, Ryan Myers, Keith Pardieck, Bruce Peterjohn, Andrei Podolsky, Chan Robbins,James Sasser, Shiloh Schulte, Clyde Sorenson, Ed Swab, Chris Szell, Nathan Tarr, Kendrick Weeks,Dan Williams, and Diana Wray. C.M. Downes generously allowed us to cite her survey of Cana-dian Breeding Bird Survey volunteers. Electrical engineering students at NCSU: John Marsh, MarcWilliams, and Michael Foster and Wendy Moore provided valuable technical assistance. Funding forthis research was provided by the USGS Status and Trends Program, the US Forest Service, the USNational Park Service, and the North Carolina Wildlife Resources Commission.

References

Alldredge MW, Pollock KH, Simons TR (2006) Estimating detection probabilities from multipleobserver point counts. Auk 123:1172–1182.

Alldredge MW, Pollock KH, Simons TR, Collazo J, Shriner SA (2007a) Time of detection methodfor estimating abundance from point count surveys. Auk 124:653–664.

Alldredge MW, Simons TR, Pollock KH (2007b) Factors affecting aural detections of songbirds.Ecological Applications 17:948–955.

Alldredge MW, Simons TR, Pollock KH (2007c) An experimental evaluation of distancemeasurement error in avian point count surveys. Journal of Wildlife Management 71:2759–2766.

Alldredge MW, Simons TR, Pollock KH, Pacifici K (2007d) A field evaluation of the time-of-detection method to estimate population size and density for aural avian point counts. Avian

252 T.R. Simons et al.

Conservation and Ecology – Ecologie et conservation des oiseaux 2(2):13. http://www.ace-eco.org/vol2/iss2/art13/

Alldredge MW, Pacifici K, Simons TR, Pollock KH (In press) A novel field evaluation of theeffectiveness of distance sampling and double independent observer methods to estimate auralavian detection probabilities. Journal of Applied Ecology.

Bart J (2005) Monitoring the abundance of bird populations. AUK 122:15–25.Bart J, Earnst S (2002) Double sampling to estimate density and population trends in birds. Auk

119:36–45.Best LB (1981) Seasonal changes in detection of individual bird species. Studies in Avian Biology

6:252–261.Brewster JP (2007) Spatial and temporal variation in the singing rates of two forest songbirds, the

Ovenbird and Black-throated Blue Warbler; Implications for aural counts of songbirds. MSThesis. Department of Zoology, North Carolina State University, Raleigh, NC.

Buckland ST (2006) Point-transect surveys for songbirds: Robust methodologies. Auk 123:345–357.

Buckland ST, Anderson DR, Burnham KP, Laake JL, Borchers DL, Thomas L (2001) Introduc-tion to distance sampling: Estimating abundance of biological populations. Oxford UniversityPress, New York.

Butler C (2003) The disproportionate effect of global warming on the arrival dates of short-distancemigratory birds in North America. IBIS 145:484–495.

Cuthill I, Hindmarsh A (1985) Increase in Starling song activity with removal of mate. AnimalBehavior 33:326–335.

DeJong MJ, Emlen JT (1985) The shape of the auditory detection function and its implication forsongbird censusing. Journal of Field Ornithology 56:213–223.

Diehl B (1981) Bird populations consist of individuals differing in many respects. Studies in AvianBiology 6:225–229.

Downes CM (2004) Canadian Wildlife Service, Environment Canada. Results of the 2004Questionnaire for Canadian Participants in the Breeding Bird Survey. http://www.cws-scf.ec.gc.ca/nwrc-cnrf/default.asp?lang=en&n=929AA800-1

Emlen JT, DeJong MJ (1981) The application of song detection threshold distance to census oper-ations. Studies in Avian Biology 6:346–352.

Emlen JT, DeJong MJ (1992) Counting birds: The problem of variable hearing abilities. Journal ofField Ornithology 63:26–31.

Farnsworth GL, Pollock KH, Nichols JD, Simons TR, Hines JE, Sauer JR (2002) A removal modelfor estimating detection probabilities from point-count surveys. Auk 119:414–425.

Gibbs JP, Wenny DG (1993) Song output as a population estimator: Effect of male pairing status.Journal of Field Ornithology 64:316–322.

Johnson LS (1983) Effect of mate loss on song performance in the Plain Titmouse. Condor 85:378–380.

Kendall WL, Peterjohn BG, Sauer JR (1996) First-time observer effects in the North AmericanBreeding Bird Survey. Auk 113:823–829.

Kepler CB, Scott JM (1981) Reducing bird count variability by training observers. Studies in AvianBiology 6:366–371.

Kery M, Royle JA, Schmid H (2005) Modeling avian abundance from replicated counts usingbinomial mixture models. Ecological Applications 15:1450–1461.

Kissling ML, Garton EO (2006) Estimating detection probability and density from point-countsurveys: a combination of distance and double-observer sampling. Auk 123:735–752.

Krebs JR, Avery M, Cowie RJ (1980) Effect of removal of mate on the singing behavior of GreatTits. Animal Behavior 29:635–637.

Lichstein JW, Simons TR, Franzreb KE (2002) Landscape effects on breeding songbird abundancein managed forests. Ecological Applications 12:836–857.

Marsh H, Sinclair DF (1989) Correcting for visibility bias in strip transect aerial surveys of aquaticfauna. Journal of Wildlife Management 53:1017–1024.

Mayfield HF (1981) Problems in estimating population size through counts of singing males.Studies in Avian Biology 6:220–224.

Sources of Measurement Error, Misclassification Error, and Bias 253

McShea WJ, Rappole JH (1997) Variable song rates in three species of passerines and implicationsfor estimating bird population. Journal of Field Ornithology 68:367–375.

Nichols JD, Hines JE, Sauer JR, Fallon FW, Fallon JE, Heglund PJ (2000) A double-observerapproach for estimating detection probability and abundance from point counts. Auk 117:393–408.

Nichols JD, Thomas L, Conn PB (2008) Inferences about landbird abundance from count data:Recent advances and future directions. Journal of Ecological and Environmental Statistics3:201–236.

Pacific K, Simons TR, Pollock KH (In press) Effects of vegetation and background noise on thedetection process in auditory avian point count surveys. The Auk 125.

Norvell RE, Howe FP, Parrish JR (2003) A seven-year comparison of relative abundance anddistance sampling methods. Auk 120:1013–1028.

Pollock KH, Marsh H, Bailey LL, Farnsworth GL, Simons TR, Alldredge MW (2004) Sepa-rating components of detection probability in abundance estimation: An overview with diverseexamples. Pages 43–58 in Sampling Rare and Elusive Species: Concepts, Designs and Tech-niques for Estimating Population Parameters (W. L. Thompson ed.). Island Press, WashingtonDC.

Pollock KH, Marsh HD, Lawler IR, Alldredge MW (2006) Estimating animal abundance in hetero-geneous environments: An application to aerial surveys for dugongs. Journal of WildlifeManagement 70:255–262.

Pollock KH, Nichols JD, Simons TR, Farnsworth GL, Bailey LL, Sauer JR (2002) Large scalewildlife monitoring studies: Statistical methods for design and analysis. Environmetrics13:105–119.

Ralph CJ (1981) An investigation of the effects of seasonal activity levels on avian censusing.Studies in Avian Biology 6:265–270.

Ralph JC, Droege S, Sauer JR (1995) Managing and monitoring birds using point counts: stan-dards and applications. Pages 161–168 in Monitoring bird populations by point counts(Ralph JC, Sauer JR, Droege S Eds.). U.S. Forest Service General Technical Report PSWGTR-149.

Robbins CS (1981a) Effect of time of day on bird activity. Studies in Avian Biology 6:275–286.Robbins CS (1981b) Bird activity levels related to weather. Studies in Avian Biology 6:301–310.Rosenstock SS, Anderson DR, Giesen KM, Leukering T, Carter MF (2002) Landbird counting

techniques: Current practices and an alternative. Auk 119:46–53.Royle JA, Nichols JD (2003) Estimating abundance from repeated presence-absence data or point

counts. Ecology 84:777–790.Sauer JR, Barker RJ, Geissler PH (1994) Statistical aspects of modeling population change from

population size data. Pages 451–466 in Wildlife Toxicology and Population Modeling: Inte-grated Studies of Agroecosystems (R. J. Kendall and Lacher TE, Jr., eds.). CRC Press, BocaRaton, FL.

Sauer JR, Hines JE, Fallon J (2005) The North American Breeding Bird Survey, Results andAnalysis 1966–2004. Version 2005.2, USGS Patuxent Wildlife Research Center, Laurel, MDhttp://www.mbr-wrc.usgs.gov/bbs/bbs.html.

Schieck J (1997) Biased detection of bird vocalizations affects comparisons of bird abundanceamong forested habitats. The Condor 99:179–190.

Scott JM, Ramsey FL, Kepler CB (1981) Distance estimation as a variable in estimating birdnumbers from vocalizations. Studies in Avian Biology 6:334–340.

Sheilds WM (1977) The effect of time of day of avian census results. Auk 94:380–383.Shriner SA (2001) Distribution of breeding birds in Great Smoky Mountains National Park. Ph.D.

Dissertation, Department of Zoology, North Caroline State University, Raleigh, NC.Simons TR, Shriner SA, Farnsworth GL (2006) Comparison of breeding bird and vegetation

communities in primary and secondary forests of Great Smoky Mountains National Park.Biological Conservation 129:302–311.

Simons TR, Alldredge MW, Pollock KH, Wettroth JM (2007) Experimental analysis of the auditorydetection process on avian point counts. The Auk 124:986–999.

254 T.R. Simons et al.

Skirvin AA (1981) Effects of time of day and time of season on the number of observations anddensity estimates of breeding birds. Studies in Avian Biology 6:271–274.

Thomas L, Laake JL, Derry JF, Buckland ST, Borchers DL, Anderson DR, Burnham KP, Strind-berg S, Hedley SL, Burt ML, Marques FFC, Pollard JH, Fewster RM (1998) Distance 3.5.Release 6. Research Unit for Wildlife Population Assessment, University of St.Andrews, UK.Available: http://www.ruwpa.st-and.ac.uk/distance/

Thompson WL (2002) Towards reliable bird surveys: Accounting for individuals present but notdetected. Auk 119:18–25.

Williams BK, Nichols JD, Conroy MJ (2002) Analysis and management of animal populations.Academic Press, New York.

Wilson DM, Bart J (1985) Reliability of singing bird surveys: Effects of song phenology duringthe breeding season. Condor 87:69–73.

Yoccoz NG, Nichols JD, Boulinier T (2001) Monitoring of biological diversity in space and time.Trends in Ecology and Evolution 16:446–453.