Climate Data Quality Control - India Meteorological...
Transcript of Climate Data Quality Control - India Meteorological...
Climate Data Quality Control
U. R. Joshi National Data Centre
India Meteorological Department
Pune
Outline
Introduction
Data Quality Control
• General considerations
• Specific data oSurface, Upper-air, Autographic
Qc Flags
Introduction
Information about the weather has been observed and recorded for many
centuries.
With the development of instrumentation to quantify meteorological parameters,
much more data was being collected and needed to be kept. This need, paved
the way for the development of organized management of climate data.
Since 1940s, standardized forms and procedures gradually became more
prevalent amongst recorders, these forms greatly assisted the computerized
data entry process and consequently the development of computer data
archives.
The latter part of the twentieth century saw the routine exchange of weather
data in digital form and many meteorological and related data centers came
into existence
Direct capture and storing these data in the databases has become a common
feature.
Introduction
Data produced from meteorological and climatological networks represent a valuable and often unique resource, acquired with substantial expenditure of time, money and effort.
The data observed today may have limited usage in the present day requirements but it may have an immense importance some days later as new applications emerge, long after the information is acquired.
The main interest in the use of observed climatological data is not to simply describe the data, but to make inferences from the data that are helpful to users of climatological information.
India Meteorological Department (IMD) has a large holding of meteorological data (about 400 million records) collected over last 125 years. These data consists of surface, upper air, radiation, rainfall, agro-meteorological etc. They are archived at National Data Centre (NDC).
Data Reception at NDC
AMSS – CliSys (SYNOPs, METARs, AWS, etc.)
Scrutinized data through e-mail from Technical Sections of RMCs / MCs
Climate Data Reception at NDC
IMD has 6 Regional Meteorological Centres (RMC) and 19 Meteorological Centres (MC). Each centre has a group of observatories under its control.
Each observatory sends data to the respective RMC / MC.
These are fed to the Automatic Message Switching Service (AMSS) and are transmitted to a central location, i.e. Regional Telecom Hub (RTH) at Delhi.
The copy of the observed data available at RMC / MC are scrutinised for any errors and sent to NDC regularly for further processing and archival.
Data Flow - RMCs / MCs
Reception of monthly registers from observatories
Scrutiny
• Neatness
• Correctness • Detect and rectify obvious errors
• Detect and rectify systematic errors
• Promptness
Data entry
Dispatch to NDC in electronic form
Data Quality
General QA considerations:
Quality of meteorological data begins well before the data are recorded and ingested into data base. e.g.
o Fundamental standards must be adhered to during the observation process:
• proper station site
• proper and routine site maintenance
• proper and routine calibration of sensors
o Other practices greatly aid the QA process
• Always archive the original observations
• Use UTC and standard units for observation reporting
• Use similar instruments and instrument configurations for sites that will be compared during the QA process
Even the best quality control tests cannot be expected to improve data quality of poorly observed data
Data Quality Control
The possibility of checking meteorological observations is based upon the redundancy of the information.
In most cases it is possible to determine the likelihood of certain observed value, using statistical techniques.
Most algorithms for the rejection of erroneous observations will therefore be based on a compromise between the risk of accepting erroneous values and the risk of rejecting correct values.
Risks of failure are largest for the most valuable information e.g. Extreme values occur in data sparse areas
Operational quality control algorithms will have an ad-hoc formulation with empirical constants which have to be tuned for operational applications.
Following the World Meteorological Organisation (WMO) guidelines the data are subjected to various checks viz., validity, duplicate, field, absurd value, climatological, time consistency, internal consistency, vertical & horizontal consistency, statistical checks etc.
Errors
Quality Control – General Considerations
Level - I
Syntax checks
Duplicate checks
Gross Limit outliers
Station Limit outliers
Internal Consistency
Level –II
Time consistency – Check for the variation in the parameter value with the time (hourly / daily)
• Deviation from climatological means
• Deviation from previous value
Spatial Consistency – Check for changes spatially
• Display maps to visualize the spatial distribution
• Apply interpolation technique to see variations
Quality control flags – Assign quality control flags to each parameter
• Use them to filter while displays on maps
• Pending flags to be cleared by observing stations
Preliminary Checks
All the records are first subjected to the following preliminary checks
Validity check
• Each field of the record is checked for valid character
• Checking for valid date, time and location (mandatory columns)
• Checking is done for valid identification code, location
Field check
Each field in the data record is checked whether data is fully available
Preliminary Checks
Duplicate check
Duplicates can be either identical or mandatory
Absurd value check
Fields having absurd value
• Wind Direction >360o
• Relative Humidity >100%
Gross Error Limit Checks
• Regional - e.g. T > 55o C and minimum of - 40o C
• Station specific - Exceeding climatological limits of the station
The checks are made to control the values of the parameters, which are closely related e.g. Dry bulb temperature, Wet bulb temperature and Dew point temperature.
Checking algorithms have been divided into areas where the physical parameters are closely related. As an example checking algorithm for Surface data is described:
Surface Data
Consistency Checks
Gross Limit Outliers
Station Limit Outliers
Internal Consistency
Temperature T, Td & Tw
• Limit -40 to 55° C and Station wise minimum and maximum values (extremes, daily / monthly)
• T ≥ Tw ≥ Td
• Consecutive same values for hourly / 3 hourly values
Example
Tx > Global limit
Tn > Tx
Invalid Wx code
Invalid Rf Dur.
Invalid Data
Duplicate Data
Surface Data
The wind information is considered to be erroneous in the following case :
• dd = 00 and ff ≠ 00
• dd ≠ 00 and ff = 00
• dd = 99 and ff = 00 or ff ≥ 5 m/ sec
Pressure
• MSLP > SLP
• Limit 500 hPa - 1050 hPa and station-wise maximum and minimum values (pentads / monthly )
• Compare MSLP reported and calculated using temperature value and SLP
Surface Data
Relative Humidity
• Range 0 – 100 compare with calculated value using T, Tw and SLP
• Comparing with Pentad Normals
Cloud information - The values for cloud cover are considered erroneous when :
• N < Nh and N ≠ 8 and Nh = 9 and CL = 1
• Nh = 0 and {CL ≠ 0 or CM ≠ 0 or h ≠ 9};
• 1 ≤ Nh ≤8 and CL = 0 and CM = 0;
• Nh = 9 and {CL ≥ 0 or CM ≥ 0 or CH ≥ 0};
• Nh = 9 and h ≥ 0;
• N = 0 and { CH or CM or CL >0};
Cloud information and Weather ww - Clouds and weather are considered suspect when :
• N = 9 and {ww < 39 or ww = 40 or ww = 41 or ww = 42 or ww = 44 or ww = 46 or ww = 48 or ww = 50 or ww ≥ 79};
• N ≠ 9 and {ww = 43 or ww = 45 or ww = 47 or ww = 49};
Surface Data
Temperature T and Weather ww both elements are considered suspect when :
• T < – 2°C and {50 ≤ ww ≤ 55 or 58 ≤ ww ≤ 65};
• T < – 2° C and {68 ≤ ww ≤ 69 or 80 ≤ ww ≤ 82};
• Both values are considered suspect when : T - Td > 5° C and 40 ≤ ww ≤49
Evaporation
• 0 – 50 mm
• Pan water temperature between 10 – 30 ° C
Visibility VV
The values for visibility and weather are considered suspect when :
– 41 ≤ ww ≤ 49 and 10 ≤ VV ≤ 89 or 94 ≤ VV ≤ 99
– ww = 10 and 0 ≤ VV ≤ 9 or 90 ≤ VV ≤ 93
Visibility VV and Cloud Information
The values for visibility and weather are considered suspect when :
– 0≤ h ≤ 1 and 70 ≤ VV ≤ 89 or 98 ≤ VV ≤ 99
Surface Data
Rainfall
• Comparing consecutive values and the totals at 03 and 12 UTC
• Comparing with day’s summary data
• Consecutive same daily rainfall values
Maximum and Minimum Temperature
• Tx – 10 to 55° C and Tn – 30 to 35° C
• Compare with Station Extremes (Monthly / Pentads)
• Tx > Tn
• Same values for consecutive 3 days
Sunshine Duration
• 0 to 14 hrs
• Compare with the calculated value
Compare weather phenomenon with day with Wx in Day’s Summary data
Quality Control Surface Data Level - II
• Temporal consistency
Suggested tolerances for the temperatures and the tendency as a function of time period
between consecutive reports:
Parameter dt=1 hr dt=2 hrs dt=3hrs dt=6 hrs dt=12 hrs
T TOL 4°C 7°C 9°C 15°C 25°C
TdTOL 4°C 6°C 8°C 12°C 20°C
ppTOL 3 hPa 6 hPa 9 hPa 18 hPa 36 hPa
Preceding observation time is noted as dt.
TOL : Tolerance value, t: Dry bulb temperature,
Td : Dew point temperature
Comparison with 5 day moving average (for T, Tx, Tn,Td, Tw, SLP, MSLP, RH, VP)
Spatial consistency – Check for changes spatially
• Display maps to visualize the spatial distribution
• Apply interpolation technique to see variations
Quality control flags – Assign quality control flags to each parameter
• Use them to filter while displays on maps
• Pending flags to be cleared by observing stations
Internal Consistency Checks
Rainfall Data
Rainfall data is highly variable parameter and depends upon orography, latitude, altitude and local conditions, thus making data processing a complicated one. The following methods are used to check the rainfall data.
Heaviest rainfall value is compared with the monthly total (heaviest value should not exceed the monthly total).
Extreme values are compared with the rainfall values of nearby station for spatial consistency.
Monthly rainfall and number of rainy days are compared with stations monthly normal rainfall and number of rainy days normal. Monthly values are checked based on monthly normal of the station and its standard deviation by using the formula.
mean ± 2σ and mean ± 3σ (where σ is standard deviation)
Time Consistency Check
Time Consistency Checks - Redundancy of information in consecutive reports
• E.g. Observed pressure tendencies and the time consistency checks are simple but efficient means of detecting minute observational errors.
• Time consistency check can be used for dry bulb temperatures recorded at different times of the day.
• Time consistency checks are also important in verifying the position in consecutive ship and buoy reports.
Autographic Data - Hour-to-hour variation
• Temperature > 6° C
• Pressure > 3 hPa
• Relative Humidity > 25%
Example
Time Consistency Check
Quality Control - Autographic Data Data reception:
Presently from RMCs / MCs after tabulation and scrutiny
Pressure
Compare with station-wise, month-wise maximum and minimum values
Check hour to hour variation more than 4 hPa
Temperature
Compare with station-wise, month-wise maximum and minimum values
Check hour to hour variation more than 7°C
Relative Humidity
Compare with station-wise, month-wise maximum and minimum values
Check hour to hour variation more than 40%
Sunshine
Total sunshine duration for the day ≤14 hrs Compare with calculated value
Check hour to hour ≤1hr
Rainfall
Comparison with daily total (possible for stations with DRF data)
Check hour to hour >90 mm ??? or Step Check ???
Quality Control - Autographic Data
Squall : Flag the data where
• Squall duration < 5 min
• Wind speed difference before and after ≤ 20 kmph
• Pressure difference ≤ 3hPa
• Temperature difference ≥ 3°C
• Tw before ≥ Tw after and difference ≥ 3°C
Level –II
Spatial Consistency – Check for changes spatially
• Display maps to visualize the spatial distribution hour-wise
Quality control flags – Assign quality control flags to each parameter
• Use them to filter while displays on maps
• Pending flags to be cleared by observing stations
Hour to Hour Variation and Limits
Hourly Temperature data
Consistency Checks
Space Consistency Check or Neighbour Test
• Redundant information of nearby observations can be used to compare the observations using the statistical methods of the objective analysis scheme. Each pair of neighbouring observations is compared, if they agree both are likely to be either correct or erroneous.
• By inter-comparison of all neighbouring pairs in local areas, it is possible to filter out the erroneous ones.
Quality Control - Upper-air Data
Upper-air
Gross limit error checks (WMO recommended)
Height
Quality Control - Upper-air Data
Upper-air
Gross limit error checks (WMO recommended)
Temperature
Quality Control - Upper-air Data
Upper-air
Gross limit error checks (WMO recommended)
Wind
Quality Control - Upper-air Data Upper-air
WMO suggested checks for inversions and Lapse rate check for vertical Temperature profile
Check for consistency between Standard and significant data
Check Hydrostatic balance between standard level height and temperature
Consider a layer between two consecutive standard pressure levels pi and pi+1
Assuming the linear variation of temperature in ln p in the layer compute the thickness Di of the layer
Rd = 0.287J/g and g=9.808 m/s
Where T* denotes virtual temperature(the virtual temperature of a moist air parcel is the temperature at which a theoretical dry air parcel would have a total pressure and density equal to the moist parcel of air) The computed thickness generally varies from the actual thickness Zi+1 – Zi Then calculate tolerance
Ι Zi+1 – Zi - Di | = TOL
Quality Control - Upper-air Data
with a
minimum value of TOL taken as 20 gpm and a maximum value taken as 50 gpm for layers below 400 hPa
80 gpm for layers above 400 hPa.
Beyond this limit one or some of the reported values Zi , T*I , Zi +1 or T*
i+1 are erroneous.
If computed the deviation Ei = Zi +1 - Zi - Di and that the absolute value of one of the deviations is larger than the corresponding tolerance, then it is possible to compute the index
Fi = Ei / Ei +1
• If 0.5 < Fi < 2.0 The temperature Ti +1 is probably erroneous since Ei +1 has large value of same sign as Ei
• -2.0 < Fi < -0.5 The height Zi +1 is probably erroneous since Ei +1 has a large value of the opposite sign as Zi .
• Fi > 2.0 All heights at and above level (i + 1) are probably erroneous
• Fi ≤ 0.5 With a value of Fi in this interval, it is more difficult to find a unique error. Probably several of the values involved are erroneous.
Quality Control Upper-air Data
Upper-air – vertical wind shear check Maximum wind speed shear permitted is
• ff I – ff i+1 = α + β * (ff I +ff i+1) where α = 20.6 and β = 0.275
Maximum wind directional shear (D) permitted is
• D = dd I – dd i+1
The permitted sum of speeds in relation to directional shear and to the different levels is given below
Main Tabs of the RSRW Quality Control Software
To Upload the data file
into the database
To view the uploaded
data
To process the data i.e.
to apply quality
checks and display errors
To set the limits for
parameter(s)
To view the rejected
data
To extract the desired data from database
Checks Applied
Sample RS/RW Flight Report
Graphs
Pressure – Temperature Temperature - Height
Pressure - Height
Errors
Setting Limits
Data Extraction
Quality Control Agro-parameters
Consistency Checks.
The data are flagged when Daily maximum surface air temperatures that are less than the
lowest minimum surface air temperature for the respective station and calendar month;
Daily minimum temperatures that are greater than the highest maximum temperature for the station and calendar month;
Observation-time temperatures that are higher than the highest maximum temperature or lower than the lowest minimum temperature for the station and calendar month;
Daily maximum evaporation pan temperatures that are less than the lowest minimum evaporation pan temperature for the respective station and calendar month, less than the lowest minimum surface air temperature for the respective station and calendar month, or more than 10°C above the highest surface air temperature for the respective station and calendar month;
Quality Control Agro-parameters
Daily maximum evaporation pan temperatures that are less than the lowest minimum temperature for the respective station and calendar month;
Daily minimum evaporation pan temperatures that are greater than the highest maximum evaporation pan temperature, greater than the highest maximum surface air temperature, or one and 10°C below the lowest minimum surface air temperature for the station and calendar month; daily maximum soil temperatures that are less than the lowest minimum soil temperature for the station, calendar month, groundcover, and depth; and,
Daily minimum soil temperatures that are greater than the highest maximum soil temperature for the station, calendar month, groundcover, and depth.
Percentile-based climatological outlier check - Checks for daily precipitation totals that exceed the respective 29-day climatological 95th percentiles by at least a certain factor e.g. 9 when the day's mean temperature is above freezing.
Flagging
In a checking system, it is necessary to take care of all the information from different quality control techniques.
Quality control indicators or flags can express the manual or automatic decisions made in the different checking procedures, which characterise the status of the individual system.
The construction depends very much on how and at what stage the different checking methods are applied.
Flagging (e.g. Ship Observations)
0 No QC has been performed 1 QC performed ; element appears correct 2 QC performed ; element appears inconsistent with
other elements 3 QC performed ; element appears doubtful 4 QC performed ; element appears erroneous 5 QC performed ; element changed (possibly to
missing) as a result 6 QC flag amended; element flagged by CM as correct
but according to MQCS still appears suspect 7 QC flag amended; element flagged by CM as changed
(5) but according to MQCS still appears suspect 8 reserved 9 Element missing
Thank You