Data Sources Sources, integration, quality, error, uncertainty.
-
Upload
noah-gaines -
Category
Documents
-
view
222 -
download
0
Transcript of Data Sources Sources, integration, quality, error, uncertainty.
Data SourcesSources, integration, quality, error,
uncertainty
Data acquisition
• Land surveying (geodesy), GPS• Aerial photography• Satellite images• Laser altimetry• Digitizing of paper maps• Scanning of paper maps• Statistical data (e.g. from census bureaus)• Surface and soil measurements
Land measurement, GPS
• Measuring devices: theodolites, laser range finders, GPS (for angles, distances and location),Dutch reference systems:– RD-net (Rijksdriehoeksmeting)– GPS-kernnet (415 points)
• Field sketchesImportant for attributes (street names, which crops exactly, etc.) and for verification
GPS
• GPS: precision of up to a few meters(a few centimeters for differential GPS)– Based on 30 satellites– 3D coordinates
• Can also be used for tracking objects: cars, animals, criminals gives trajectory data
Aerial photography
• Most important source for the Topographic Survey (TDN)
• Aerial photos are digitized by hand, and interpreted by the eye
• Precision (resolution) of ~15 cm
digital air photo, 15 cm resolution
Satellite images
• Measured electromagnetic radiation (reflection)• Of types of surface coverage the reflected wave
lengths are known approximately (for instance, vegetation reflects much infra-red)
• Also called: remote sensing• Resolution Landsat: 30x30 m; SPOT 20x20 m or
10x10 monochromatic• EROS pan: 1.8 m, IKONOS: 0.82 m, QuickBird: 0.60
m, GeoEye: 0.41 m• Spatial, temporal, spectral resolution• E.g.: RapidEye has 5 equivalent 5 m spatial
resolution satellites that cover every point on earth daily
Landsat Thematic Mapper, 30 m resolution, Cape Breton Island
IKONOS, 82 cm (Singapore)
QuickBird, 60 cm
Laser altimetry (LIDAR data)
• For elevation data, gives 3D point cloud• Precision ~10 cm• Correction is needed
Correction of laser altimetry data
Filtering elevation data to remove towns, trees, cars
Digitizing of paper maps
Digitizing of paper maps
• Redraw bounding lines, add attributes (same as for aerial photography)
• With digitizer tablet or table, or heads-up digitising
• Line mode or stream mode• After digitizing the topology must be added,
and attributes must be added
Scanning of paper maps
• Convert a map by a scanner into a pixel image• Automatic interpretation difficult and error-
prone checking and correction necessary• Vector scanning exists too
Statistical data
• Surveys, by questionnaires or interviews• Human-geographic of economic-geographic:
number of dogs per 1000 households, income, political preference
• Usually collected by the CBS (NL), Census Bureau (USA), or by marketing bureaus
• Usually a sample (subset) of the population is interviewed
• Results are often mapped as a choropleth map(administrative regions with shades-of-a-color-coded meaning, classified)
Soil measurements
• For non-visible data like pollution, temperature, soil type
• Choose sampling strategy – Ideal: random sampling– Practice: sampling in easily accessible areas
(cannot take a soil sample under a building)– Additional samples in ‘interesting’ areas
0
0
0
0
05
00
9
Sensors
• RFID tags: Radio frequency identification: for tracking objects; can give trajectories
• Wireless sensor networks– Sensors that have limited computing power and
can communicate– Energy consumption problem
• Smartdust: hypothetical wireless sensor network system
Using existing data sets
• Data collection is expensive: if possible, buy existing data sets (provided they are available and the quality is sufficient need meta-data)
Data integration
• Convert data from two different sources in order to compare, and make analysis possible
• Same date of sources desirable• Same level of aggregation desirable (highest
level determines the level of comparison)
Integration of not aggregated
and aggregateddata
population density
life expectancy
Data integration and data consistency
105 m111 m
Edge matching
• Integration of digitized data sets based on adjacent map sheets or aerial photos
• Idea: create seamless digital data set
Data quality, I
• Precision: number of known decimals, depending on measuring device
• Accuracy: absense of systematic bias(No faulty fine-tuning)
precise
accurate
both
neither
Data quality, II
• Validitity: degree in which data is relevant for an application (complex geographic variables)E.g.: income for well-being; temperature for good weather
• Reliability: up-to-date, not old for purposeE.g.: data of last week is out-of-date for temperature, but not for land cover
Geometric and topological quality
• Absence of error• Presence of consistenty
Sources of geometrical and topological errors
• Digitizing • Integration • Generalization• Raster-vector conversion • Edge-matching
Mismatch of boundaries of different themes
Digitizing errors
Other sources and problems
• Wrong attachment of geometry and attributes• Missing attribute data• Uncertainty at classification of satellite images• Clouds in satellite images, shadows in aerial
photos• Unknown quality (e.g. precision) of paper
maps used for digitizing: missing metadata• Deforming of paper maps
Dealing with error/uncertainty
• Errors in data have consequences for e.g. the cost of projects– Provide metadata (when data collected, how, what
equipment)– Visualize uncertainty
E.g. classification of satellite images for land covergrass: 0.86forest: 0.08water: 0.03…
grass: 0.34forest: 0.31water: 0.25…
Confusion:1 - (pmax- psecond)
Dealing with error/uncertainty
• Errors in data have consequences for e.g. the cost of projects– Provide metadata (when data collected, how, what
equipment)– Visualize uncertainty– Provide bounds on the range of outcomes (cost) of
an analysis, based on an uncertainty model and Monte-Carlo simulation