Current Monthly Homogenization Approaches …Compare full homogenisation algorithms Benchmark...
Transcript of Current Monthly Homogenization Approaches …Compare full homogenisation algorithms Benchmark...
Current Monthly Homogenization Approaches –
Benchmarking their Strengths and Weaknesses
Victor Venema
Content
Global Historical Climate Network (NOAA-
GHCNv3)
– Trend: 0.8°C per century since 1880
– Raw data: 0.6°C
Need independent lines of research
1. Statistical homogenization
2. Physical understanding (parallel measurements)
3. Modelling (UHI, radiation screens)
Homogenisation: WHY?
Example of PAU-UZEIN temperature
1912 PAU-LESCAR (EN) 2005 PAU-UZEIN (AERO)
Slide: Olivier Mestre
Pairwise homogenization
http://variable-variability.blogspot.de/2012/08/statistical-homogenisation-for-dummies.html
HOME validation study
Compare full homogenisation algorithms
Benchmark dataset
– Monthly temperature and precipitation networks
– Most realistic to date
Configuration
– Typical for Europe
– Number of stations: 5, 9, 15
Scatterplots monthly CRMSE
0 0.5 1 1.5
0
0.5
1
1.5
ACMANT
CRMSE inhomogeneous data [°C]
CR
MS
E h
om
og
en
ise
d d
ata
[°C
]
0 0.5 1 1.5
0
0.5
1
1.5
PRODIGE monthly
CRMSE inhomogeneous data [°C]
CR
MS
E h
om
og
en
ise
d d
ata
[°C
]
0 0.5 1 1.5
0
0.5
1
1.5
USHCN main
CRMSE inhomogeneous data [°C]
CR
MS
E h
om
og
en
ise
d d
ata
[°C
]
0 0.5 1 1.5
0
0.5
1
1.5
MASH main
CRMSE inhomogeneous data [°C]
CR
MS
E h
om
og
en
ise
d d
ata
[°C
]
0 0.5 1 1.5
0
0.5
1
1.5
C3SNHT
CRMSE inhomogeneous data [°C]
CR
MS
E h
om
og
en
ise
d d
ata
[°C
]
0 0.5 1 1.50
0.5
1
1.5
2
2.5
PMFred abs
CRMSE inhomogeneous data [°C]
CR
MS
E h
om
og
en
ise
d d
ata
[°C
]
Errors in trends
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
ACMANT
SNHT DWD
C3SNHT
PMFred abs
PMTred rel
AnClim main
USHCN cx8
USHCN 52x
USHCN main
PRODIGE trendy
PRODIGE monthly
PRODIGE main
MASH main
Inhom. data
Trend difference [°C/100a]
-50 -40 -30 -20 -10 0 10 20 30 40 50
Climatol
C3SNHT
PMFred abs
PMTred rel
AnClim main
PRODIGE trendy
PRODIGE monthly
PRODIGE main
MASH main
Inhom. data
Trend difference [mm/100a]
Lessons
Modern methods a factor 2 more accurate
– Multiple breakpoint methods
– Methods that are designed to work with
inhomogeneous reference series
Training is important
Automatic methods as good as manual methods
– No metadata in validation dataset
SNHT is not recommended
Absolute homogenization is method of last resort
Decomposition method on Benchmark
Domonkos, P., V. Venema, O. Mestre. Efficiencies of homogenisation methods: our present knowledge and its limitation.
Proceedings of the Seventh seminar for homogenization and quality control in climatological databases, Budapest, Hungary, 24
– 28 October 2011, WMO report, Climate data and monitoring, WCDMP-No. 78, pp. 11-24, 2013.
RMSE station
trends
CRMSE
Annual data
Caveats HOME: ISTI
1. Missing homogenization methods
– Two- or multi-phase regression method
2. Size breaks (random walk or noise) – Ralf Lindau and Victor Venema. The joint influence of break and noise variance on the break detection
capability in time series homogenization.
3. Signal to noise ratio varies regionally
4. Regional trends (absolute homogenization)
5. Length of the series – Ralf Lindau and Victor Venema. On the multiple breakpoint problem and the number of significant breaks
in homogenisation of climate records. Idojaras, 117, no. 1, pp. 1-34, 2013.
6. Non-climatic trend bias
International Surface Temperature Initiative – Kate Willett et al. A framework for benchmarking of homogenisation algorithm performance on the global
scale. Geosci. Instrum. Method. Data Syst., 3, pp. 187-200, 2014.
Radiation error
Radiation error
Climates largest radiation errors:
* Strong insolation
* Low wind
* Dry ground
* High specific humidity
Parallel measurements
Transition to Stevenson screens
North-West Europe: < 0.2°C (Various, Parker)
Basel, Switzerland: 0°C (Wild screen)
Kremsmünster, Austria: 0.2°C (North-wall)
Adelaide, South Australia: 0.2°C (Glaisher stand)
Spain: 0.35°C (French screen)
Sri Lanka: 0.37°C
(Tropical screen)
India: 0.42° (Tropical screen)
Sources of global temperature trend bias
Transition to Stevenson screens
Transition to Automatic Weather Stations
Urbanization
Siting
Irrigation
Relocations to airports
Research on parallel data
Large database with parallel measurements
needed to study daily inhomogeneities
o Study statistical and physical properties of (daily)
inhomogeneities
o Dependence on local weather and regional climate
o Most studies are currently about mid-latitudes
o Validate detected inhomogeneities
o Independent evidence for
trend bias
Parallel Data Initiative
Produce an open database
Initially data is restricted to contributors
– Incentive to contribute
– Until first joint paper(s) by contributors are written
First action: Inventory of parallel datasets
– https://ourproject.org/moin/projects/parallel
– Dozens of datasets available
More information
– http://tinyurl.com/paralleldata
Conclusions & outlook
Statistical homogenization improves temperature trend estimates – Only best method improve precipitation trends
Modern homogenization methods more accurate
1. Statistical homogenization – Global validation study missing
– Better mathematical understanding methods
2. Better physical understanding of causes – http://tinyurl.com/paralleldata
3. More modelling to improve understanding
Q&A slides
Shorter length, less certainty
n = 21 years n = 101 years
Exceeding probability
1/128
1/64
1/32
1/16
1/8
1/4
Ralf Lindau and Victor Venema. On the multiple breakpoint problem and the number of significant breaks in
homogenisation of climate records. Idojaras, Quart. journal Hungarian Meteorol. Service, 117, no. 1, pp. 1-34, 2013.
Which SNR is sufficient?
RMS skill for:
0 Random segmentation
+ Standard search
for different SNRs.
So far we considered SNR = ½
Random segmentation and
standard search have comparable
skills.
Only for SNR > 1, the standard
search is significantly better.
Random
Standard
Surrogate temperature section
Generated homogeneous temperature networks
– Stochastic modelling
– Based on statistical properties of homogenized data
Configuration
– Typical for Europe
– 15 networks
– Length: 100 years
– Number of stations: 5, 9, 15
Added non-climatic changes
– Most realistic to date
Beginning
Missing data
WWII
Missing data
Outliers
Breaks
Simulataneous
Breaks
Local trends
Physical causes of inhomogeneities
Shelter type, exposure
– Radiation & wetting protection
– Natural or forced ventilation
– Snow cover
– Plastic screen: insolation on hot
days
Relocation of station
– City-> airport, suburbs, lower heights
– Deurbanisation of network
Instrument
– Response, integration time
– Zero drift, shrinking glass initial
years
– Calibration errors
– Temperature out of range
– Quicksilver thermometers: T < -39°C
Change surrounding
– Urbanization, growing vegetation,
irrigation
Definitions
– Computation daily means
Measurement procedures
– Reading times
Maintenance procedures
– AWS: Icing, damage detection
– Painting & cleaning schedule
Digitisation & database
– Minus sign forgotten
– Station names mixed up
– Pre-homogenised data
Correction methodology - inflation
Corrections have deterministic (explained variance) and stochastic (unexplained) component
Downscaling: problems deterministic corrections
– Variance inflation (Von Storch, 1999)
– Quantile Matching (Maraun, 2013)
– Unintended change trend in mean
Should correct unexplained variance with noise
Homogenization – Trend in difference TS is small
– Gradual inhomogeneities (urbanization)
Maraun, D. Bias correction, quantile mapping, and downscaling: revisiting the inflation issue. J. Clim., 26, pp. 2137-2143, doi: 10.1175/JCLI-D-12-00821.1, 2013.
Von Storch, H. On the Use of ‘‘Inflation’’ in Statistical Downscaling. J. Clim., 12, pp. 3505-3506, 1999.
Correction – change in noise source
Change in cross-correlation
– Relocation, change in noise source
Simple example
– |N1| = |N2|
– No inhomogeneity in distribution
– Jump in difference time series
R+ +W1N1 R+ +W1N2
R Regional climate signalN Instrument specific errorW Station specific weather
R+ +W2N1
Station 2
Station 1
Correction – change in noise source
Large database with parallel measurements
needed to study daily inhomogeneities
o Generate benchmark data with realistic inhomogeneities
o For example, second cycle of ISTI
o Validate detected inhomogeneities
Research on parallel data
Exposure
Insolation
– Sun, hot ground, scattered
radiation
Humidity and clouds
– Infrared radiative cooling
Wind
– Heat exchange
Design
– Size sensor
– Shielding
– Mechanical ventilation
Australia: Albany airport and town
Trewin (2012)
Parallel measurements – Kremsmünster
Böhm et al. (2010)
Kremsmünster – percentiles difference
Böhm et al. (2010)
Spain: Montsouri screen, Stevenson observations,
Stevenson automatic
Montsouri vs. Stevenson: difference as function of
Diurnal Temperature Range and Tmax
Murcia: South East Spain, Mediterranean.
La Corunia, Corunna: North West Spain, Atlantic.
Juli
April
Montsouri vs. Stevenson: difference as function of
Diurnal Temperature Range and Tmax
Murcia: South East Spain, Mediterranean.
La Corunia, Corunna: North West Spain, Atlantic.
Juli
April
Motivation: daily data
“[Inhomogeneous data] affects, in particular, the
understanding of extremes, because
changes in extremes are often more sensitive to
inhomogeneous climate monitoring practices
than changes in the mean.”
Trenberth, K.E., et al., 2007: Observations: Surface and Atmospheric Climate Change. In: Climate
Change 2007: The Physical Science Basis. Cambridge University Press, Cambridge, United Kingdom
and New York, NY, USA.
Extremes, mean and variability
Importance changes in variability and mean
The relative sensitivity of an
extreme to changes in the
mean (dashed line) and in
the standard deviation
(solid line) for a certain
temperature threshold (x-
axis). The relative sensitivity
of the mean (standard
deviation) is the change in
probability of an extreme
event to a change in the
mean (or standard deviation)
divided by its probability.
From Katz and Brown
(1992).
A priori formula
The different reaction of breaks
and noise on randomly inserted
breaks makes it possible to
estimate break variance and
break number a priori.
If we insert many breaks, almost
the entire break variance is
explained plus a known fraction of
noise.
At k = nk half of the break variance
is reached (22.8% in total).
No bias component.
0.228
3.1
Pairwise homogenization
http://variable-variability.blogspot.de/2012/08/statistical-homogenisation-for-dummies.html
Pairwise homogenization
http://variable-variability.blogspot.de/2012/08/statistical-homogenisation-for-dummies.html
Pairwise homogenization
http://variable-variability.blogspot.de/2012/08/statistical-homogenisation-for-dummies.html
Pairwise homogenization
http://variable-variability.blogspot.de/2012/08/statistical-homogenisation-for-dummies.html
Pairwise homogenization
http://variable-variability.blogspot.de/2012/08/statistical-homogenisation-for-dummies.html
Pairwise homogenization
http://variable-variability.blogspot.de/2012/08/statistical-homogenisation-for-dummies.html
A blind test of
monthly homogenisation algorithms
Victor Venema, O. Mestre, E. Aguilar, I. Auer, J. A. Guijarro, P. Domonkos, G. Vertacnik,
T. Szentimrey, P. Stepanek, P. Zahradnicek, J. Viarre, G. Müller-Westermeier, M. Lakatos,
C. N. Williams, M. Menne, R. Lindau, D. Rasol, E. Rustemeier, K. Kolokythas, T. Marinova,
L. Andresen, F. Acquaotta, S. Fratianni, S. Cheval, M. Klancar, M Brunetti, C. Gruber, M. Prohom Duran,
T. Likso, P. Esteban, T. Brandsma
MeteorologicalInstitute
Bonn
Participant returned the data
25 blind contributions
Some algorithms multiple contributions
– Test versions
– Test influence operator (manual methods)
Algorithms/software
– USHCN
– PRODIGE
– MASH
– Craddock
– AnClim
– RhTestV2
– SNHT
– Climatol
– ACMANT
Monthly CRMSE complete contributions
0 0.2 0.4 0.6 0.8 1 1.2 1.4
ACMANT
SNHT DWD
C3SNHT
PMFred abs
PMTred rel
AnClim main
USHCN cx8
USHCN 52x
USHCN main
PRODIGE trendy
PRODIGE monthly
PRODIGE main
MASH main
Inhom. data
CRMSE [°C]
Temperature
0 5 10 15 20 25 30
Climatol
C3SNHT
PMFred abs
PMTred rel
AnClim main
PRODIGE trendy
PRODIGE monthly
PRODIGE main
MASH main
Inhom. data
CRMSE [mm]
Precipitation
Decadal CRMSE complete contributions
0 0.2 0.4 0.6 0.8 1 1.2 1.4
ACMANT
SNHT DWD
C3SNHT
PMFred abs
PMTred rel
AnClim main
USHCN cx8
USHCN 52x
USHCN main
PRODIGE trendy
PRODIGE monthly
PRODIGE main
MASH main
Inhom. data
CRMSE [°C]
Temperature
0 5 10 15 20 25 30
Climatol
C3SNHT
PMFred abs
PMTred rel
AnClim main
PRODIGE trendy
PRODIGE monthly
PRODIGE main
MASH main
Inhom. data
CRMSE [mm]
Precipitation
Contribution
No
stations POD POFD
Pierce
Skill Score
Heidke
Skill Score
Heidke
Special
MASH main 111 0.63 0.09 0.53 0.31 -0.20
PRODIGE main 111 0.35 0.02 0.33 0.35 0.41
PRODIGE monthly 111 0.39 0.02 0.37 0.40 0.44
PRODIGE trendy 111 0.35 0.02 0.32 0.35 0.41
USHCN main 111 0.34 0.00 0.33 0.46 0.61
USHCN 52x 111 0.40 0.01 0.39 0.51 0.62
USHCN cx8 111 0.35 0.01 0.35 0.47 0.61
AnClim main 111 0.18 0.03 0.15 0.16 0.20
iCraddock Vertacnik 55 0.60 0.03 0.57 0.54 0.49
PMTred rel 111 0.41 0.04 0.37 0.34 0.27
PMFred abs 111 0.21 0.01 0.20 0.27 0.46
C3SNHT 111 0.23 0.05 0.18 0.16 0.04
SNHT DWD 111 0.12 0.01 0.11 0.15 0.40
Climatol 111 0.38 0.01 0.37 0.45 0.55
ACMANT 111 0.50 0.03 0.47 0.44 0.41
Contingency scores
Pairwise vs composite reference
Composite reference
– Compute a weighted average of neighbours
– Reduces the influence of IH in single stations
– Careful selection of stations needed
No large breaks for detection
No breaks for corrections
Pairwise
– Need to attribute breaks found in the pairs to a station
– Solution to this problem is still ad-hoc or manual
– Potential for optimal mathematical solution
– Joint detection: all stations simultaneously
– Solving combinatorial problem for large breaks