Characteristics of Reprocessed Hydrometeorological ...

10
Characteristics of Reprocessed Hydrometeorological Automated Data System (HADS) Hourly Precipitation Data DONGSOO KIM AND BRIAN NELSON NOAA/NESDIS/NCDC, Asheville, North Carolina DONG-JUN SEO NOAA/NWS/Office of Hydrologic Development, Silver Spring, Maryland, and University Corporation for Atmospheric Research, Boulder, Colorado (Manuscript received 28 October 2008, in final form 29 April 2009) ABSTRACT The Hydrometeorological Automated Data System (HADS) is a real-time data acquisition, processing, and distribution system operated by the Office of Hydrologic Development (OHD) of NOAA’s National Weather Service (NWS). The initial reprocessing of HADS data from its original format since its inception in July 1996 has been completed at NOAA’s National Climatic Data Center (NCDC). The quality of the reprocessed HADS hourly precipitation data from rain gauges is assessed by two objective metrics: the average fraction of missing values and the percentage of top-of-the-hour observations for a 3-yr period (2003–05). Pairwise comparisons between the reprocessed product and the real-time product are made using repre- sentative samples (about 13%) from the 48 contiguous United States. The monthly average of missing values varies from 0.5% to 2% in the reprocessed product and from 1.7% to 10.1% in the real-time product. Except for January 2003, the reprocessed product consistently reduced missing values, by as much as 9.4% in October 2004. The availability of top-of-the-hour observations is about 85% in the reprocessed product, while the real- time product has top-of-the-hour observations only about 50% of the time. This paper discusses real-time product quality issues, additional quality assurance algorithms used in the reprocessing environment, and the design of system-wide performance comparisons. Thus, the benefits to users of reprocessing the HADS data are the correction of 4-h observation time errors during 1 July–11 August 2005 and the demonstration of diurnals pattern of precipitation frequencies in regional domains. A Web-based interactive quality assessment tool for reprocessed HADS hourly precipitation data and access to the data are also presented. 1. Introduction The Hydrometeorological Automated Data System (HADS) provides a collection of hydrometeorological observations from diverse networks that use Geosta- tionary Operational Environmental Satellite (GOES) data collection platforms (DCPs) for real-time data trans- mission. The diverse networks that compose HADS in- clude the U.S. Geological Survey (USGS), the U.S. Army Corps of Engineers (USACE) districts, and participants in the Remote Automated Weather Stations (RAWS) program hosted by the U.S. Department of Agricul- ture’s (USDA’s) Forest Service. Data are transmitted to the HADS program office at the National Weather Service’s Office of Hydrologic Development (NWS/OHD) for processing and archiving. In this paper we focus on one particular class of observations (hourly precipita- tion) from the HADS dataset and undertake an effort to enhance and improve it, both spatially and temporally. This reprocessing effort is driven by the fact that hourly rain gauge data are needed in order to describe precipi- tation for finer-scale events, such as diurnal variations of convective storms, heavy rains that trigger debris flow, and verifications of model forecasts, to name a few. For any scientific study, high quality data are necessary. Of- ten, however, missing values render the record incom- plete, and therefore the users have to estimate the missing values. The reprocessing effort allows for the recovery of certain missing data points and for the rigorous quality Corresponding author address: Dongsoo Kim, NOAA/NESDIS/ National Climatic Data Center, 151 Patton Ave., Asheville, NC 28801-5001. E-mail: [email protected] OCTOBER 2009 KIM ET AL. 1287 DOI: 10.1175/2009WAF2222227.1 Unauthenticated | Downloaded 11/27/21 12:24 PM UTC

Transcript of Characteristics of Reprocessed Hydrometeorological ...

Characteristics of Reprocessed Hydrometeorological Automated Data System(HADS) Hourly Precipitation Data

DONGSOO KIM AND BRIAN NELSON

NOAA/NESDIS/NCDC, Asheville, North Carolina

DONG-JUN SEO

NOAA/NWS/Office of Hydrologic Development, Silver Spring, Maryland, and University Corporation for

Atmospheric Research, Boulder, Colorado

(Manuscript received 28 October 2008, in final form 29 April 2009)

ABSTRACT

The Hydrometeorological Automated Data System (HADS) is a real-time data acquisition, processing,

and distribution system operated by the Office of Hydrologic Development (OHD) of NOAA’s National

Weather Service (NWS). The initial reprocessing of HADS data from its original format since its inception in

July 1996 has been completed at NOAA’s National Climatic Data Center (NCDC). The quality of the

reprocessed HADS hourly precipitation data from rain gauges is assessed by two objective metrics: the average

fraction of missing values and the percentage of top-of-the-hour observations for a 3-yr period (2003–05).

Pairwise comparisons between the reprocessed product and the real-time product are made using repre-

sentative samples (about 13%) from the 48 contiguous United States. The monthly average of missing values

varies from 0.5% to 2% in the reprocessed product and from 1.7% to 10.1% in the real-time product. Except

for January 2003, the reprocessed product consistently reduced missing values, by as much as 9.4% in October

2004. The availability of top-of-the-hour observations is about 85% in the reprocessed product, while the real-

time product has top-of-the-hour observations only about 50% of the time. This paper discusses real-time

product quality issues, additional quality assurance algorithms used in the reprocessing environment, and the

design of system-wide performance comparisons. Thus, the benefits to users of reprocessing the HADS data

are the correction of 4-h observation time errors during 1 July–11 August 2005 and the demonstration of

diurnals pattern of precipitation frequencies in regional domains. A Web-based interactive quality assessment

tool for reprocessed HADS hourly precipitation data and access to the data are also presented.

1. Introduction

The Hydrometeorological Automated Data System

(HADS) provides a collection of hydrometeorological

observations from diverse networks that use Geosta-

tionary Operational Environmental Satellite (GOES)

data collection platforms (DCPs) for real-time data trans-

mission. The diverse networks that compose HADS in-

clude the U.S. Geological Survey (USGS), the U.S. Army

Corps of Engineers (USACE) districts, and participants

in the Remote Automated Weather Stations (RAWS)

program hosted by the U.S. Department of Agricul-

ture’s (USDA’s) Forest Service. Data are transmitted

to the HADS program office at the National Weather

Service’s Office of Hydrologic Development (NWS/OHD)

for processing and archiving. In this paper we focus on

one particular class of observations (hourly precipita-

tion) from the HADS dataset and undertake an effort to

enhance and improve it, both spatially and temporally.

This reprocessing effort is driven by the fact that hourly

rain gauge data are needed in order to describe precipi-

tation for finer-scale events, such as diurnal variations of

convective storms, heavy rains that trigger debris flow,

and verifications of model forecasts, to name a few. For

any scientific study, high quality data are necessary. Of-

ten, however, missing values render the record incom-

plete, and therefore the users have to estimate the missing

values. The reprocessing effort allows for the recovery of

certain missing data points and for the rigorous quality

Corresponding author address: Dongsoo Kim, NOAA/NESDIS/

National Climatic Data Center, 151 Patton Ave., Asheville, NC

28801-5001.

E-mail: [email protected]

OCTOBER 2009 K I M E T A L . 1287

DOI: 10.1175/2009WAF2222227.1

Unauthenticated | Downloaded 11/27/21 12:24 PM UTC

control of the raw data to provide an improved dataset for

use in research and climatic applications.

The purposes of reprocessing the HADS data are

threefold: 1) to enlarge the hourly hydroclimate data-

base for use in various applications such as multisensor

precipitation reanalysis (Nelson et al. 2008); 2) to pro-

vide real-time data to users, such as forecasters at NWS

Weather Forecast Offices (WFOs) and River Forecast

Centers (RFCs), with data quality information for spe-

cific gauge stations; and 3) to provide improved-quality

hourly precipitation data to the user community. Rain

gauge data often come with some measure of ambiguity.

Missing values are a source of much of this ambiguity in

rain gauge datasets. Precipitation data are encoded as

missing when 1) the gauge was not functioning at the

time of a scheduled measurement, 2) there was a dis-

ruption of data transfer at the time of transmission, and

3) there was a temporary failure in the data storage or

product generation processes. In addition, the produc-

tion system may encode the value as missing when the

data failure was assumed by a quality threshold, for ex-

ample, a negative hourly precipitation amount. The is-

sues of missing values in real-time precipitation data

used by the NWS WFOs and RFCs were revisited and

corrective measures applied by reprocessing the original-

format precipitation data and comparing the results with

hourly precipitation products generated in real time.

Near-real-time HADS data are available online for

1 week at the NWS/Office of Hydrologic Development

(OHD) Web site (http://www.nws.noaa.gov/oh/hads/).

Currently, the original-format precipitation data are trans-

ferred to the National Climatic Data Center (NCDC) at

the end of the day. Most of the historical data, collected

since June 1996, are then stored and available for use at

NCDC. Because of the diverse ownership of the net-

works included in HADS, it is difficult to expect uni-

form quality in precipitation measurements and sensor

maintenance. In addition, the locations of the surface

stations are determined by the network owner’s mis-

sion requirements. As a result, the spatial density of the

gauges is highly inhomogeneous and the number of

stations changes over time. On average, about 6200 rain

gauges were available in 2007, while only about 2800

were available in 1996.

The HADS program produces hourly precipitation

data in real time to support operational hydrologic

forecasting at the NWS. For example, the HADS precip-

itation data are used in quantitative precipitation esti-

mation (QPE) such as multisensor precipitation analysis

(Seo and Breidenbach 2002). At least 70% of the hourly

precipitation data used by RFC forecasters are com-

posed of HADS precipitation data. As such, improve-

ments in quality, including reduction of missing values,

contribute directly the to overall improvement of the

QPE product at each RFC.

There are two precipitation-related variables in the

HADS data: cumulative and incremental precipitation

amounts. More than 95% of the gauges have been

reporting cumulative precipitation amounts since the

reset of the value (coded as PC). Less than 5% of the

gauges are reporting incremental precipitation at pre-

specified time intervals (coded as PP). It is simple to

convert PC to PP by subtracting the previous PC value

from the current PC value. When the increment is

60 min, the output measures hourly precipitation and

is usually measured at the top of the hour. If the gauge

reports subhourly PP, the running total of subhourly PP

for 1 h also measures hourly precipitation.

The HADS program produces hourly precipitation

data and makes them available to users. This product is

defined as ‘‘real-time PP,’’ as it is produced in real time.

In the retrospective environment, data are recovered

that would have been dropped in the real-time envi-

ronment. This reprocessed PP output is defined as ‘‘re-

pro PP.’’ In the remainder of the paper, we present the

HADS precipitation data flow to help understand the

staging places of the data and quality control practices.

We discuss the reprocessing steps at NCDC and the

analysis approaches with metrics of the fraction of missing

values and the percentage of top-of-the-hour observa-

tions. We demonstrate the importance of the repro-

cessing by analyzing the diurnal cycle of the precipitation

frequency in a regional domain. Finally, we conclude

with recommendations for future study.

2. Data flow, quality assurance, and control practices

a. Data flow

Figure 1 shows a schematic of the HADS precipita-

tion data and product flow in real time and from the

archive. The HADS program office at OHD collects

data from the DCP owners, produces PP values, and

disseminates the data. In the real-time environment

(solid lines in Fig. 1), both PC and PP are delivered to

users at RFCs, WFOs, and the National Centers for

Environmental Prediction (NCEP). NCEP collects PP

values from both HADS and non-HADS data [e.g.,

Automated Surface Observation System (ASOS) hourly

precipitation] for assimilation and verification purposes

(Lin and Mitchell 2005). Here, a ‘‘real-time PP’’ value is

defined as the product generated in an hourly cycle even

if the station reports subhourly measurements. A his-

torical archive of these values is available from the Na-

tional Center for Atmospheric Research’s (NCAR)

Earth Observing Laboratory (EOL) Web site (http://

data.eol.ucar.edu/codiac/dss/id521.004). In the archival

1288 W E A T H E R A N D F O R E C A S T I N G VOLUME 24

Unauthenticated | Downloaded 11/27/21 12:24 PM UTC

environment, most NWS operational products (texts,

grids, and graphics) are archived at NCDC through the

Service Records Retention System (SRRS). Manually

edited precipitation data created by RFCs and WFOs

are embedded into this data flow. However, not all

RFCs and WFOs report manually edited precipitation

data; hence, the knowledge of the QC process in oper-

ational QPE is not preserved and the product is not

reproducible (e.g., Kursinski and Mullen 2008). Another

archival flow that began in May 2005 is for original-

format PC values to be sent to NCDC. This archival flow

is a part of the reprocessing of HADS precipitation data.

b. Quality control and assurance practices

The quality control and assurance (QC/QA) of HADS

precipitation data were originally designed for real-time

use in order to meet the operational mission of the NWS.

The HADS program staff monitors incoming HADS

data, updates metadata, and isolates obviously prob-

lematic stations. However, the QC/QA of the observed

values is left to the end users.

The operational QC process for the hourly precipi-

tation data at the RFCs follows four levels of QC pro-

cedures as described in Kondragunta and Shrestha

(2006). The first level of the QC process deals with gross

errors caused by instrument malfunction and transmis-

sion and coding–decoding errors due to format and

configuration changes. The second level of the QC pro-

cedure checks for outliers outside of threshold values for

each season and location. The third level uses neigh-

boring gauge data and independent observations for

spatial consistency checks, temporal consistency checks,

and multisensor checks. The last level is left to the ex-

pert judgment of the forecaster. Screening of prob-

lematic data is the most important and time-consuming

duty of the forecasters at RFC (J. Bradberry 2005, per-

sonal communication).

Gauge precipitation data are often used for the veri-

fication of quantitative precipitation forecasts (QPFs)

from numerical weather prediction (NWP) models.

Tollerud et al. (2005) developed a QC system for HADS

precipitation data to verify model-based precipitation

forecasts. In their work, the QC system was used to

screen out questionable gauge stations. Questionable

gauge measurements that violate internal threshold val-

ues in the QC system are considered to be gross errors.

If the gross errors continue to be present, the gauge is

labeled a ‘‘repeat offender.’’ These repeat offenders

are entered into the list of rejected stations. Data from

rejected stations were not used in the rest of the QC

process. Improved verification scores resulted from the

use of the quality-controlled data. The above system was

developed based on real-time PP data served by NCEP,

half of which are not from the top of the hour.

3. Reprocessing

The reprocessing of HADS hourly precipitation data

begins with the decoding of original-format HADS data

at full resolution as soon as OHD pushes them to NCDC

at the close of the day. The decoded cumulative pre-

cipitation data are checked for temporal inconsistencies

to recover missing values. Then, the detection and cor-

rection of spikes and noise in the hourly data complete

the reprocessing step. At the beginning of a new month,

we repeat the procedure by double-checking the data

inventory and the metadata of the previous month.

a. Data preparation

Each month’s HADS data were parsed for two

precipitation-related variables, PC and PP, using the

NWS’s Standardized Hydrometeorological Exchange For-

mat (SHEF) decoding package (NWS 2002). In this pro-

cess, illegal characters embedded in the SHEF-encoded

HADS data were removed. Occasionally, a misplaced

digit in the SHEF text caused a decoding failure. In these

cases, the misplaced location of the digit was manually

corrected and the decoding step rerun. All decoded PC

values were saved at reported intervals along with sim-

plified metadata that include the following fields: station

name, network owner, latitude, longitude, and measure-

ment interval. In this way, a metadata list was created for

FIG. 1. A schematic diagram of HADS real-time and archival

product flows. A real-time precipitation product begins at NWS/

OHD and is delivered to end users at an RFC or WFO. This prod-

uct is also stored at NCEP and NCAR for other applications. The

HADS program pushes original-format HADS data to NCDC once

a day, where it is then reprocessed. Some RFCs report manually

edited precipitation data, and they are also archived at NCDC

through the SRRS.

OCTOBER 2009 K I M E T A L . 1289

Unauthenticated | Downloaded 11/27/21 12:24 PM UTC

each month that excludes stations that do not report

precipitation. The inhomogeneity of the network pro-

viders means that not all measurements are reported at

the same temporal interval. For example, some net-

works report measurements at the 5-, 15-, and 30-min, as

well as hourly, intervals. The subhourly intervals pro-

vide an easy way to report hourly measurements at the

top of the hour. However, some stations report only

hourly intervals, which represent off the top-of-the-hour

accumulations (e.g., 15 min past the hour, 30 min past

the hour) causing misrepresentations of the observa-

tions in hourly precipitation data. We urge caution when

using these off-the-top-of-the-hour measurements, and

in this paper we separate these off-the-top-of-the-hour

measurements from the top-of-the-hour measurements in

all analyses. A resulting indicator file shows if the hourly

PP is from the top of the hour or off the top of the hour.

The real-time PP process is set up to provide the latest

real-time data to its users. This process, however, does

not ensure that the hourly PP is the top-of-the-hour

measurement. The issue of off-the-top-of-the hour PP

data arises in retrospective hourly analyses and can be

detrimental to specific applications such as hydrologic

forecasting or multisensor quantitative precipitation es-

timation. We have found that some Remote Automated

Weather Station (RAWS) gauges were measuring at off

the top of the hour even though a majority of the RAWS

gauges were measuring on the top of the hour. This is not

a comprehensive picture, as other gauges from other

networks report off the top of the hour too.

b. Restoration of missing values

The most frequently observed quality problem was

that of missing values during nonprecipitating events.

Nonprecipitating events are easily recognized as con-

stant PC values before and after a period of missing

values. During the conversion of PC to PP, strings of

missing values were checked. If both PC values before

and after the missing period were identical, the missing

values were replaced with the same PC value, which

resulted in a zero PP value. The missing period was not

extended any longer than 24 h for fear of stuck gauges. If

PC values are different, precipitation is assumed, and

values are left as missing even if the difference is as small

as 0.25 mm (0.01 in.). Then, observation times are clas-

sified into 15-min bins to assure that the derived PP is on

the top of the hour. The output of this step is defined as

‘‘baseline PP’’ to distinguish it from real-time PP.

c. Spikes and noise control

Spikes and noise are nonphysical events. They are

caused by many situations, but the two most common

are a lack of system maintenance and exposure to a se-

vere environment. The DCP system includes gauge in-

struments as well as a datalogger and a transmitter. A

malfunction of any or all of these components can cause

errors of this kind. The HADS metadata do not include

gauge type and system information and, therefore,

controlling spikes and noise requires detection of such

errors in the time series of baseline PP. Such problems

were detected by analyzing baseline PP values for reg-

ular patterns of negative and positive values of equal

size at certain observation times. Then, nonnegativity

constraints were imposed on the PP time series. The

application of the spikes and noise control algorithm

outputs reprocessed hourly precipitation (repro PP).

Figure 2 exemplifies noise in PC values during 20–26

May 2006 at the gauge station in Hungry Horse, Mon-

tana. No rain during the period from 0000 UTC 21 May

through 0500 UTC 25 May 2006 should display a flat line

in its PC values, but there are wiggles in the PC values.

The straightforward derivation to a PP value results in a

sequence of many 20.01 and 10.01 values. Such noise

has existed since the beginning of our archival record

and covered the period October 1997–March 2008.

Clusters of stations of noisy PC values were found in the

northwestern and northeastern United States.

In summary, daily reprocessing steps involve the

following:

d decoding of the SHEF-format PC variable in full fre-

quency, and the creation of metadata;d generation of the top-of-the-hour baseline PP with

recovery of some missing values; andd generation of the repro PP by controlling some spikes

and noise in the baseline PP.

In the first day of the month, the previous month’s HADS

data are reprocessed to update the monthly metadata

and compute each station’s monthly quality flag.

FIG. 2. The time series of accumulated precipitation at the

Hungry Horse, MT (HGHM8), gauge station during a 7-day period

from 0000 UTC 20 May through 2100 UTC 26 May 2006. Apparent

small perturbations make true rain events difficult to detect. Such

noise has existed since the beginning of the archive (October 1997).

1290 W E A T H E R A N D F O R E C A S T I N G VOLUME 24

Unauthenticated | Downloaded 11/27/21 12:24 PM UTC

4. Assessment of reprocessed HADS

a. Comparison with real-time PP values

Real-time PP data generated by the HADS program

were retrieved from the NCAR EOL site. The two

metrics used for the comparison were the percentage of

missing values and the percentage of top-of-the-hour

measurements during each month of 2003–05. To man-

age the high volume of data, every seventh station from

an alphabetical list of all stations in each of the 48 con-

tiguous United States was subsampled for this assess-

ment. Additionally, stations with more than 7 days of

missing values in either repro PP or real-time PP were

removed. The HADS program was unable to deliver

SHEF-encoded historical HADS data to NCDC for the

months of November 2003 and January 2004. December

2003 contained too many missing values in the real-time

PP to allow for a fair comparison. Figure 3 shows the

distribution of subsampled HADS stations during Sep-

tember 2005. The spatial inhomogeneity is not caused by

the subsampling process, but by the network design.

b. Comparison with COOP daily precipitation

For a detailed comparisons of repro PP and real-time

PP, a regional domain (North and South Carolina), dur-

ing the warm season (April–September), was selected.

In this domain, both repro PP and real-time PP were

compared with Cooperative Observer Network (COOP)

daily precipitation data (NCDC 2003). Figure 4 shows

spatial distributions of HADS and COOP stations, and

the average nearest distance between HADS and COOP

stations is about 11 km. Each time series of HADS hourly

precipitation was summed up according to the COOP’s

reported observation time (at the top-of-the-hour) for

24 h. From this process, quality metrics were computed

for two daily time series, HADS (repro PP and real-time

PP) and COOP, for every month. Any HADS–COOP

time series pairs were removed if the ratio of the two was

greater than 3 or less than 1/3 for fear of gross error in the

COOP and/or HADS data. Out of 2408 pairs, 344 were

removed from this gross error check. If a missing value

was present in the daily COOP data, then the next-

nearest COOP station data (within 50 km) were used.

The differences in the monthly totals between repro PP

and real-time PP were defined as the gain, and the

HADS-to-COOP ratio of the monthly totals was referred

to as the bias ratio, which is a commonly used measure in

QPE. As these statistics are based on the monthly totals,

we excluded the missing values from the calculation of

the monthly accumulation. The gain, bias ratio, and per-

centage of missing values are the three quality metrics

used in the detailed comparison.

c. Patterns of missing values and their implications

In general, the rain gauge or electronics malfunctions

at the time of measurement and/or during data trans-

mission caused data to be unavailable at the specified

observation time. On the other hand, the data provider

deletes observed values that fail quality criteria at the

processing level. The two causes must be differentiated

so that the users are in control of correcting suspected

data. We illustrate two examples: HADS station LLDN7

in July 2003 and MCKN7 in August 2003. The original

data was reported at 15-min frequencies, so that hourly

data on the top of the hour are available. Table 1 shows

15-min decoded PC values, real-time PP values, and

reprocessed PP values at the LLDN7 on 1 July 2003.

During the 5-h period, obvious measurement errors oc-

curred. The reprocessed HADS data restored them

rather than encoding them as missing values. Often-

times, such gross errors help in the diagnosis of the du-

ration of a disturbance. Table 2 is an example of station

ROKN7, which incorrectly set to default zero values

instead of encoding the suspect data as missing values.

FIG. 3. Distribution of subsampled HADS stations during Sep-

tember 2005. The subsampling was made from every seventh sta-

tion selected from an alphabetical list of all stations in the CONUS.

At least one station must be present in each state and stations with

more than seven days (168 h) of missing values are deleted.

FIG. 4. Distribution of all HADS stations available in NC and SC

(solid dots) and COOP daily rain gauge stations (open circles).

OCTOBER 2009 K I M E T A L . 1291

Unauthenticated | Downloaded 11/27/21 12:24 PM UTC

Even though the repro PP identified the pattern of spikes

and corrected them, false zero PC values had appeared

as early as October 2001. A history of station quality

should be helpful to users in determining observation

validity, and in the refinement of the QC algorithm.

The occurrence of missing values is hard to charac-

terize when a gauge instrument malfunctions, but we

have observed that a higher frequency of missing values

in real-time PP may be attributed to the latency of the

data ingestion process to the processing environment

at the HADS program office. The recovery of missing

values is possible by reprocessing data from the original

SHEF-formatted archive. We have analyzed the diurnal

cycle of the precipitation frequency (e.g., Dai 1999), one

of the hydroclimate variables, for three warm seasons

in North and South Carolina. A full-blown analysis of

the hydroclimate variables is beyond the scope of this

paper.

5. Results

Direct comparisons between repro PP and real-time

PP are shown in Fig. 5. The monthly average of the

fractional missing values varies from 0.5% to 2% in re-

pro PP, and from 1.7% to 10.1% in real-time PP. Except

for January 2003, repro PP consistently reduced the

missing values, by as much as 9.4% in October 2004.

Overall, the average missing value in repro PP is about

1.0%, which is equivalent to seven missed observations

in 1 month. The improvement in the fractional missing

values from real-time PP to repro PP is possible only

through reprocessing. The fractional percentage of miss-

ing values in repro PP reflects the rate of unrecoverable

missing values due to malfunctions by the gauge and in

data transmission. The top-of the-hour observations are

also important when comparing QPE data from other

platforms such as radar. On average, the top-of-the-hour

observations are available for about 85% of the times

in repro PP, while in real-time PP they are available for

about 50% of the times. The reason for the higher rate

in the off the-top-of-the-hour observations in real-time

PP is because the HADS program processes the latest

available observations to support real-time hydrologic

forecasting. The real-time focus means that the HADS

data processing produces the hourly estimates as they

become available. Thus, many non-top-of-the-hour data

in real-time PP are transmitted to the users. The RFC, as

a user, applies a narrow time window around the data;

62 min on PP values and 610 min on PC values from the

top of the hour (J. Bradberry 2008, personal communi-

cation). Practically, half of the real-time PP data will be

discarded in the retrospective production of MPE. An

advantage of reanalysis is that many more top-of-the-

hour values are available (Nelson et al. 2008).

TABLE 1. Decoded HADS data, real-time PP, and repro PP for

station LLDN7 on 1 Jul 2003. The real-time PP withheld values of

11.92 and 21.60 for having failed the QC check. We denoted these

values as NA.

Observation time

(UTC, 1 Jul)

Decoded PC

(in.)

Real-time PP

(in.)

Repro PP

(in.)

0500 17.92 0.00 0.00

0515 17.92

0530 17.92

0545 17.99

0600 18.13 0.21 0.21

0615 18.22

0630 19.85

0645 20.53

0700 20.88 2.75 2.75

0715 20.92

0730 23.06

0745 23.67

0800 32.80 NA 11.92

0815 36.99

0830 41.02

0845 46.92

0900 54.40 NA 21.60

0915 55.42

0930 56.56

0945 58.65

1000 58.66 4.26 4.26

TABLE 2. Decoded HADS data, real-time PP, and repro PP for

station ROKN7 on 7 Jun 2004. The real-time PP withheld values of

20.71, but the 0.71 values that survived as legitimate.

Observation time

(UTC, 7 Jun)

Decoded PC

(in.)

Real-time PP

(in.)

Repro PP

(in.)

0345 0.71

0400 0.00 NA 0.00

0415 0.71

0430 0.71

0445 0.71

0500 0.71 0.71 0.00

0515 0.71

0530 0.71

0545 0.71

0600 0.00 NA 0.00

0615 0.71

0630 0.71

0645 0.71

0700 0.71 0.71 0.00

0715 0.71

0730 0.71

0745 0.71

0800 0.00 NA 0.00

0815 0.71

0830 0.71

0845 0.71

0900 0.71 0.71 0.00

1292 W E A T H E R A N D F O R E C A S T I N G VOLUME 24

Unauthenticated | Downloaded 11/27/21 12:24 PM UTC

Figure 6 shows the bias ratio results for both repro PP

and real-time PP for the warm seasons (April–September)

of 2003–05 in the Carolinas. A bias ratio close to unity

indicates close agreement with the COOP data in the

monthly total. The median values of repro PP are closer

to unity than those of the real-time PP. Figure 7 is the

empirical probability function of the gain (repro PP 2

real-time PP) for all three warm seasons. The function

is trimmed between 210 and 111 mm, with a 0.5-mm

interval. The distribution shows a positive skewness,

namely, repro PP recovered observation values that real-

time PP missed. The mean value of the highest probability

bin (0.0–0.5) was 0.254 mm. The dashed line fitted the

probability density function whose peak is at 0.18 mm.

Figure 8a shows the frequencies of the missing values

in the daily cycle. The missing values in repro PP show a

uniform distribution throughout the day over the 3-yr

period, but those of the real-time PP display certain times

of increased missing values. A disturbing feature of the

real-time PP is the sharp increase in missing values dur-

ing 1800–2300 UTC (1300–1800 local time) during 2004

when warm-season convective rains were active. Figure 8b

shows a sharp drop in rain events in real-time PP against

repro PP at 2100 UTC during 2004. Note that the in-

creased number of missing values during 1800–2300

UTC causes a misinterpretation of the diurnal precipi-

tation pattern. The secondary maximum rain events

during 1200–1500 UTC during 2004 are attributed to the

remnants of Hurricanes Charley, Florence, Ivan, and

Jeanne, which passed through the region in the month

of September. The pattern of the shift during 2005 was a

result of the time reference error in real-time PP. The

4-h shift in real-time PP lasted from 1 July through

11 August 2005.

The results in this section have potentially large im-

plications for the various applications and analyses. For

example, the recovery of the missing values will provide a

better dataset for studies of finescale climate signals such

as for the diurnal pattern of precipitation. Figure 7 shows

that the recovery of the missing values can provide a

dataset that shows a more representative diurnal pattern

of precipitation. In addition, the recovery of the no-rain

events from missing values has implications for direct

comparisons of the hourly rain gauge measurements to

other rainfall measurements such as those from radar

and satellite. Finally, the identification of both the top-of-

the-hour and off-the-top-of-the hour values in the hourly

precipitation data can have a significant impact in specific

applications such as multisensor precipitation estimation

and the modeling of hydrologic processes at fine scales.

6. Conclusions and future researchrecommendations

The retrospective reprocessing of HADS hourly pre-

cipitation data has reduced the average number of frac-

tional missing values from 5% in the real-time product

down to 1% during the assessment period 2003–05 in the

FIG. 5. Two quality metrics comparing repro PP (dark bars) and

real-time PP (gray bars) during 2003–05 for the CONUS. (a)

Fractional missing values (the smaller the better). (b) Percentage

of top-of-the-hour observations (the larger the fraction is, the

better the time representation).

FIG. 6. Box plots of the bias ratio (monthly total precipitation

comparing HADS to COOP) for the warm seasons for 2003–05.

Median values of repro PP (in the dark color box) are closer to

unity than those of real-time PP.

OCTOBER 2009 K I M E T A L . 1293

Unauthenticated | Downloaded 11/27/21 12:24 PM UTC

conterminous U.S. (CONUS) domain. This is equivalent

to a recovery of 29 h of missing values per month. The

missing values in the reprocessed product are uniformly

distributed across all hours of the day while the real-

time product displayed a diurnal pattern. In addition,

the reprocessed product improved the availability of the

top-of-the-hour observations from 50% in the real-time

product to 85%. The improved availability of the top-of-

the-hour observations significantly increases the value

of the hourly precipitation data in finescale applications,

for example, data fusion with other high-frequency QPE

methods from radars and satellites. The reprocessed

HADS data are expected to be used as an input source

to the Climate Prediction Center’s extended-period

gridded observations for the detection and diagnostics of

precipitation variations and long-term changes (Higgins

et al. 1996). Currently, reprocessed HADS hourly data

are available from NCDC in a 1-day-delayed mode (see

the appendix).

For future research, we offer the following recom-

mendations:

d Preservation of original data is absolutely required in

order to diagnose quality problems. Original SHEF-

formatted HADS data made it possible not only to

improve the quality of the data, but also to determine

the origins of quality problems in the hourly precipi-

tation product.d A single repository of gauge quality information is

necessary in order to improve the quality of the pre-

cipitation data. Many RFCs save manual gauge QC

results for their service area, but do not share it with

other communities, and some network owners apply

extra QC measures unknown to other users. The

gauge quality Web page can serve as a common tool

for both end users and network operators.d Gauge metadata must be completed in order to assess

quality issues. The metadata must include not only

geospatial information, but instrument type and main-

tenance records, in order to understand the history of

the quality problems.d Reprocessing must utilize product and algorithm ver-

sion control to allow the well-documented transitions

to newer techniques.

FIG. 7. Empirical probability function of the gain (repro PP 2

real-time PP) for all three warm seasons. The function is trimmed

between 210 and 111 mm with 0.5-mm class intervals. The dis-

tribution shows a skewness toward positive values; namely, repro

PP recovered observation values that real-time PP missed. The

mean value of the highest probability bin (0.0 to 0.5) was 0.254 mm.

The dashed line shows the fitted probability density function with a

peak value of 0.18 mm.

FIG. 8. (a) Diurnal patterns of frequencies in missing values

during warm seasons in the NC–SC domain. Solid circles connected

with dashed lines are taken from real-time PP; open circles with

solid lines are taken from repro PP. Real-time PP shows peaks of

missing values at certain hours of the day, while repro PP reflects

more of a uniform distribution in time. The peaks of missing values

in 2004 are from May 2004. (b) As in (a) but for precipitation

frequencies. Positive PP values are counted as rain events.

1294 W E A T H E R A N D F O R E C A S T I N G VOLUME 24

Unauthenticated | Downloaded 11/27/21 12:24 PM UTC

Acknowledgments. The authors thank Lawrence

Cedrone and the entire NWS/OHD HADS Program

staff who have always been responsive and corrected

problematic HADS gauge reports. The authors ac-

knowledge Arthur Fotos for programming support of

the reprocessed HADS Data Web site. The authors

thank Anne Markel, Tom Peterson, Ed Kearns, and

Xuangang Yin of NCDC for their careful review and

three anonymous reviewers for many suggestions.

APPENDIX

Reprocessed HADS Hourly Precipitation Web Site

For the first time since the inception of the HADS

program, the hourly precipitation data in HADS have

been reprocessed. Reprocessing HADS data has im-

proved the data quality by recovering many missing

values and by choosing top-of-the-hour observations

when subhourly data were available. Currently, version

1.0 HADS-reprocessed PP products are available for

further applications. There were extended periods of

missing values when the retrieval of original-format

HADS data from OHD’s storage system failed, for ex-

ample, December 1996, January 1997, August 1997,

January 1998, June 1998, May 1999, January–April

2000, July–September 2000, December 2000, January–

September 2001, November 2003, and January 2004.

As of January 2008, the initial version of the repro PP

data has been populated on the Web so that users can

assess the quality and download them (http://www.ncdc.

noaa.gov/hads/). The first Web site page guides the user

to enter the month/year and click on the desired U.S.

state. On the next page, the user can choose the desired

HADS station from a map or enter the five-letter station

name, which leads to a time series page.

a. Time series page

The lower two panels on the Web page display rela-

tive locations of neighboring HADS stations (lower-left

panel) and the relative locations of neighboring daily

COOP stations within a 18 3 18 box from the target

HADS station. The user can view the neighboring sta-

tion’s time series by clicking on the HADS location,

where data can be viewed and/or downloaded.

Monthly statistics of HADS–COOP pair data are

viewable by clicking ‘‘View Data’’ below the panel of

neighboring COOP stations. The header displays the

HADS station name, year, month, latitude, longitude,

and number of collocated COOP stations. The 14 col-

umns of each pair are described in Table A1.

b. Mass analysis page

An extensive user interface page can be found by

clicking on the ‘‘Mass Analysis’’ link on the time series

page. This page overlays accumulated precipitation with

neighboring HADS stations using different colors for up

to four stations. The effects of missing values (marked

with black dots), variability of rain events as a function

of distance and direction, and gross errors can be easily

understood.

c. Storm period page

Users can examine storm periods by clicking the

‘‘Storm Period’’ link on the time series page, and

selecting he desired storm period by entering the start

and end times. This page displays time series of target

stations as well as storm totals for all available neigh-

boring HADS stations within a 18 3 18 box.

TABLE A1. Description of columns used in the monthly statistics of HADS and COOP.

Column Description

1 COOP station ID

2 Conversion factor from UTC to local standard time (LST) (add factor to UTC to convert to LST)

3 COOP observation time in LST

4 No. of missing values in daily COOP

5 Monthly sum of daily COOP precipitation data

6 No. of missing values in hourly HADS (29 when COOP has a missing day)

7 Monthly sum of hourly HADS after shifting from UTC to COOP LST (299 when COOP has a missing day)

8 No. of cases that were entered into the statistical computation (namely, days either COOP or aggregated HADS

reported rain .0.01 in.)

9 Mean differences with degrees of freedom in column 8 (in.)

10 Root-mean-squared differences (in.)

11 Ratio of two monthly sums (columns 7 and 5), also called bias ratio

12 Correlation coefficient between daily COOP and aggregated HADS values with the degree of freedom in column 8

(both no-rain cases are not entered here)

13 Distance to COOP from HADS (8)

14 Relative angular direction to COOP from HADS (8)

OCTOBER 2009 K I M E T A L . 1295

Unauthenticated | Downloaded 11/27/21 12:24 PM UTC

The Web page is considered experimental until the

station quality history and the rescue of missing values

are completed. After that process has been completed,

initial versions of the reprocessed HADS hourly pre-

cipitation data are available (and at higher quality than

the real-time data).

REFERENCES

Dai, A., 1999: Recent changes in the diurnal cycle of precip-

itation over the United States. Geophys. Res. Lett., 26,341–344.

Higgins, R. W., J. E. Janowiak, and Y.-P. Yao, 1996: A gridded

hourly precipitation data base for the United States (1963–

1993). NCEP/Climate Prediction Center ATLAS 1, 47 pp.

Kondragunta, C., and K. Shrestha, 2006: Automated real-time

operational rain gauge quality controls in NWS hydrologic

operations. Preprints, 20th Conf. on Hydrology, Atlanta, GA,

Amer. Meteor. Soc., P2.4. [Available online at http://ams.

confex.com/ams/pdfpapers/102834.pdf.]

Kursinski, A. L., and S. L. Mullen, 2008: Spatiotemporal variability

of hourly precipitation over the eastern contiguous Unites

States from stage IV multisensor analyses. J. Hydrometeor., 9,

3–21.

Lin, Y., and K. E. Mitchell, 2005: The NCEP stage II/IV hourly

precipitation analyses: Development and applications. Pre-

prints, 19th Conf. on Hydrology, San Diego, CA, Amer. Me-

teor. Soc., 1.2. [Available online at http://ams.confex.com/ams/

pdfpapers/83847.pdf.]

NCDC, cited 2003: Data documentation for Data Set 3200

(DSI-3200). [Available online at http://www.ncdc.noaa.gov/

oa/documentlibrary/.]

Nelson, B., D. J. Seo, and D. Kim, 2008: Multi-sensor precipita-

tion reanalysis. Preprints, Int. Symp. on Weather Radar and

Hydrology, Grenoble, France, Laboratoire d’etude des Trans-

ferts en Hydrologie et Environnement (LTHE), 02-004,

150 pp. [Available online at http://www.wrah-2008.com/PDF/

O2-004.pdf.]

NWS, 2002: Standard hydrometeorological exchange format (SHEF)

manual. National Weather Service Manual 10-944. [Available

online at http://www.nws.noaa.gov/directives/.]

Seo, D.-J., and J. Breidenbach, 2002: Real-time correction of

spatially nonuniform bias in radar rainfall data using gauge

measurements. J. Hydrometeor., 3, 93–111.

Tollerud, E., R. Collander, Y. Lin, and A. Loughe, 2005: On the

performance, impact, and liabilities of automated precipita-

tion gage screening algorithms. Preprints, 21st Conf. on

Weather Analysis and Forecasting, Washington, DC, Amer.

Meteor. Soc., P1.42. [Available online at http://ams.confex.

com/ams/pdfpapers/95173.pdf.]

1296 W E A T H E R A N D F O R E C A S T I N G VOLUME 24

Unauthenticated | Downloaded 11/27/21 12:24 PM UTC