Data Sources Sources, integration, quality, error, uncertainty.

36
Data Sources Sources, integration, quality, error, uncertainty

Transcript of Data Sources Sources, integration, quality, error, uncertainty.

Page 1: Data Sources Sources, integration, quality, error, uncertainty.

Data SourcesSources, integration, quality, error,

uncertainty

Page 2: Data Sources Sources, integration, quality, error, uncertainty.

Data acquisition

• Land surveying (geodesy), GPS• Aerial photography• Satellite images• Laser altimetry• Digitizing of paper maps• Scanning of paper maps• Statistical data (e.g. from census bureaus)• Surface and soil measurements

Page 3: Data Sources Sources, integration, quality, error, uncertainty.

Land measurement, GPS

• Measuring devices: theodolites, laser range finders, GPS (for angles, distances and location),Dutch reference systems:– RD-net (Rijksdriehoeksmeting)– GPS-kernnet (415 points)

• Field sketchesImportant for attributes (street names, which crops exactly, etc.) and for verification

Page 4: Data Sources Sources, integration, quality, error, uncertainty.

GPS

• GPS: precision of up to a few meters(a few centimeters for differential GPS)– Based on 30 satellites– 3D coordinates

• Can also be used for tracking objects: cars, animals, criminals gives trajectory data

Page 5: Data Sources Sources, integration, quality, error, uncertainty.

Aerial photography

• Most important source for the Topographic Survey (TDN)

• Aerial photos are digitized by hand, and interpreted by the eye

• Precision (resolution) of ~15 cm

Page 6: Data Sources Sources, integration, quality, error, uncertainty.

digital air photo, 15 cm resolution

Page 7: Data Sources Sources, integration, quality, error, uncertainty.

Satellite images

• Measured electromagnetic radiation (reflection)• Of types of surface coverage the reflected wave

lengths are known approximately (for instance, vegetation reflects much infra-red)

• Also called: remote sensing• Resolution Landsat: 30x30 m; SPOT 20x20 m or

10x10 monochromatic• EROS pan: 1.8 m, IKONOS: 0.82 m, QuickBird: 0.60

m, GeoEye: 0.41 m• Spatial, temporal, spectral resolution• E.g.: RapidEye has 5 equivalent 5 m spatial

resolution satellites that cover every point on earth daily

Page 8: Data Sources Sources, integration, quality, error, uncertainty.

Landsat Thematic Mapper, 30 m resolution, Cape Breton Island

Page 9: Data Sources Sources, integration, quality, error, uncertainty.

IKONOS, 82 cm (Singapore)

Page 10: Data Sources Sources, integration, quality, error, uncertainty.

QuickBird, 60 cm

Page 11: Data Sources Sources, integration, quality, error, uncertainty.

Laser altimetry (LIDAR data)

• For elevation data, gives 3D point cloud• Precision ~10 cm• Correction is needed

Page 12: Data Sources Sources, integration, quality, error, uncertainty.

Correction of laser altimetry data

Filtering elevation data to remove towns, trees, cars

Page 13: Data Sources Sources, integration, quality, error, uncertainty.

Digitizing of paper maps

Page 14: Data Sources Sources, integration, quality, error, uncertainty.

Digitizing of paper maps

• Redraw bounding lines, add attributes (same as for aerial photography)

• With digitizer tablet or table, or heads-up digitising

• Line mode or stream mode• After digitizing the topology must be added,

and attributes must be added

Page 15: Data Sources Sources, integration, quality, error, uncertainty.

Scanning of paper maps

• Convert a map by a scanner into a pixel image• Automatic interpretation difficult and error-

prone checking and correction necessary• Vector scanning exists too

Page 16: Data Sources Sources, integration, quality, error, uncertainty.

Statistical data

• Surveys, by questionnaires or interviews• Human-geographic of economic-geographic:

number of dogs per 1000 households, income, political preference

• Usually collected by the CBS (NL), Census Bureau (USA), or by marketing bureaus

• Usually a sample (subset) of the population is interviewed

• Results are often mapped as a choropleth map(administrative regions with shades-of-a-color-coded meaning, classified)

Page 17: Data Sources Sources, integration, quality, error, uncertainty.

Soil measurements

• For non-visible data like pollution, temperature, soil type

• Choose sampling strategy – Ideal: random sampling– Practice: sampling in easily accessible areas

(cannot take a soil sample under a building)– Additional samples in ‘interesting’ areas

0

0

0

0

05

00

9

Page 18: Data Sources Sources, integration, quality, error, uncertainty.
Page 19: Data Sources Sources, integration, quality, error, uncertainty.

Sensors

• RFID tags: Radio frequency identification: for tracking objects; can give trajectories

• Wireless sensor networks– Sensors that have limited computing power and

can communicate– Energy consumption problem

• Smartdust: hypothetical wireless sensor network system

Page 20: Data Sources Sources, integration, quality, error, uncertainty.

Using existing data sets

• Data collection is expensive: if possible, buy existing data sets (provided they are available and the quality is sufficient need meta-data)

Page 21: Data Sources Sources, integration, quality, error, uncertainty.

Data integration

• Convert data from two different sources in order to compare, and make analysis possible

• Same date of sources desirable• Same level of aggregation desirable (highest

level determines the level of comparison)

Page 22: Data Sources Sources, integration, quality, error, uncertainty.

Integration of not aggregated

and aggregateddata

population density

life expectancy

Page 23: Data Sources Sources, integration, quality, error, uncertainty.

Data integration and data consistency

Page 24: Data Sources Sources, integration, quality, error, uncertainty.

105 m111 m

Page 25: Data Sources Sources, integration, quality, error, uncertainty.

Edge matching

• Integration of digitized data sets based on adjacent map sheets or aerial photos

• Idea: create seamless digital data set

Page 26: Data Sources Sources, integration, quality, error, uncertainty.

Data quality, I

• Precision: number of known decimals, depending on measuring device

• Accuracy: absense of systematic bias(No faulty fine-tuning)

precise

accurate

both

neither

Page 27: Data Sources Sources, integration, quality, error, uncertainty.

Data quality, II

• Validitity: degree in which data is relevant for an application (complex geographic variables)E.g.: income for well-being; temperature for good weather

• Reliability: up-to-date, not old for purposeE.g.: data of last week is out-of-date for temperature, but not for land cover

Page 28: Data Sources Sources, integration, quality, error, uncertainty.

Geometric and topological quality

• Absence of error• Presence of consistenty

Page 29: Data Sources Sources, integration, quality, error, uncertainty.
Page 30: Data Sources Sources, integration, quality, error, uncertainty.
Page 31: Data Sources Sources, integration, quality, error, uncertainty.

Sources of geometrical and topological errors

• Digitizing • Integration • Generalization• Raster-vector conversion • Edge-matching

Page 32: Data Sources Sources, integration, quality, error, uncertainty.

Mismatch of boundaries of different themes

Page 33: Data Sources Sources, integration, quality, error, uncertainty.

Digitizing errors

Page 34: Data Sources Sources, integration, quality, error, uncertainty.

Other sources and problems

• Wrong attachment of geometry and attributes• Missing attribute data• Uncertainty at classification of satellite images• Clouds in satellite images, shadows in aerial

photos• Unknown quality (e.g. precision) of paper

maps used for digitizing: missing metadata• Deforming of paper maps

Page 35: Data Sources Sources, integration, quality, error, uncertainty.

Dealing with error/uncertainty

• Errors in data have consequences for e.g. the cost of projects– Provide metadata (when data collected, how, what

equipment)– Visualize uncertainty

E.g. classification of satellite images for land covergrass: 0.86forest: 0.08water: 0.03…

grass: 0.34forest: 0.31water: 0.25…

Confusion:1 - (pmax- psecond)

Page 36: Data Sources Sources, integration, quality, error, uncertainty.

Dealing with error/uncertainty

• Errors in data have consequences for e.g. the cost of projects– Provide metadata (when data collected, how, what

equipment)– Visualize uncertainty– Provide bounds on the range of outcomes (cost) of

an analysis, based on an uncertainty model and Monte-Carlo simulation