Post on 17-Dec-2015
Helper Variables
Why do we want them?
How do we create them?
What to avoid with them?
Tom Paganotom.pagano@por.usda.gov 503-414-3010
Why helper variables?
The target (i.e. predictand) time series may have holes in important years
or a short period of record.
If that data is easily estimated, filling the gaps may lead to a
better, or at least more honest, forecast.
Sargents is missing during a hydrologically interesting period. This is also the period of most of our predictors (i.e. SNOTEL). Gunnison could be used to fill in gaps.
Why?
Sargents is missing during a hydrologically interesting period. This is also the period of most of our predictors (i.e. SNOTEL). Gunnison could be used to fill in gaps.
Strength of correlation very good
Why?
Another example… seasonally operated gages
Correlation between mar-sep and apr-sep = 0.9996No point in throwing away years where only march is missing.
Helper variable interface
Neat stuff here but don’t touch if you don’t know what you’re doing.Default is unchecked.
Main ways to use helper variables
Different station, same months: (Upstream vs downstream)Estimating one gage from another
Same station, different months: (May-Jul vs Apr-Jul)Estimating longer time period from shorter
Same station, months, different sources: (USGS vs AWDB)Estimating natural flow from observed
Helper vs targetscatterplot
Helper used
Wider range of years…
More stable relationship
More consistent
with nearby forecasts
Dangers of helper variables
Statistically, we do not include the imperfect relationship between helper and original target in the final forecast error bounds.
We are increasing our chances of overconfident forecasts.
Therefore, it is best to only estimate a few years and only if the relationship is very good (e.g. r2>0.9)
Dangers of helper variables
Statistically, we do not include the imperfect relationship between helper and original target in the final forecast error bounds.
We are increasing our chances of overconfident forecasts.
Therefore, it is best to only estimate a few years and only if the relationship is very good (e.g. r2>0.9)
Consider too whether the relationshipbetween the helper and the original target
is stable versus time…
For example… Use observed flow as helper
to estimate natural flow.Have the regulations changed over time?