A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvija Stankute, Hartmut...

18
data| fusion 1/18 © stankute|asche·ifg·uni·potsdam 2012 A data fusion system for spatial data mining, analysis and improvement Silvija Stankute, Hartmut Asche Geoinformation Research Group Dept of Geography | University of Potsdam | Germany ICCSA 2012 | GEOG-AN-MOD 2012 | Salvador da Bahia, Brazil | 18-21/06/2012

Transcript of A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvija Stankute, Hartmut...

Page 1: A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvija Stankute, Hartmut Asche - University of Potsdam

data|fusion

1/18

© stankute|asche·ifg·uni·potsdam 2012

A data fusion system for spatial data mining, analysis and improvement

Silvija Stankute, Hartmut AscheGeoinformation Research GroupDept of Geography | University of Potsdam | Germany

ICCSA 2012 | GEOG-AN-MOD 2012 | Salvador da Bahia, Brazil | 18-21/06/2012

Page 2: A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvija Stankute, Hartmut Asche - University of Potsdam

data|fusion

2/18

© stankute|asche·ifg·uni·potsdam 2012

Summary Data fusion system for spatial data mining

1. Motivation

2. Concept: Automated data fusion

3. System architecture: Generic

components

4. Fusion pipeline: Operations and

workflow

5. System operation: User interface

6. Conclusion

Page 3: A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvija Stankute, Hartmut Asche - University of Potsdam

data|fusion

3/18

© stankute|asche·ifg·uni·potsdam 2012

Acquistion of geodata by range of actors including state insti-tutions (NMAs) and private enterprises resulting in heteroge-nous, frequently redundant geospatial databases

Geometric, semantic quality of geospatial data heterogenous, frequently insufficient or inaccurate: unreliable data quality of existing datasets for identical real world section

Effective geodata management and use necessitate harmonisa-tion of heterogenous geodata according to application-specific data quality specifications

To avoid fresh data acquisition automated process required to fuse imperfect geometric and/or semantic information of 2 or more datasets to produce optimal application-specific data

1 Motivation Improvement of geodata quality

Page 4: A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvija Stankute, Hartmut Asche - University of Potsdam

data|fusion

4/18

© stankute|asche·ifg·uni·potsdam 2012

21

1+2

1+2+33

1+2

2 Concept Automated fusion of imperfect geodata

Page 5: A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvija Stankute, Hartmut Asche - University of Potsdam

data|fusion

5/18

© stankute|asche·ifg·uni·potsdam 2012

Development and implementation of automated fusion process (DataFusion) to produce single geospatial dataset from existing datasets superior in geometric and/or semantic quality to im-perfect source data

Objective to extract, filter and combine relevant features from diverse source data into single best-fit quality dataset according to user and application specifications

Data harmonisation and fusion process allows for selection, elimination and/or substitution of unwanted source attribute features by user-specified geometric and/or semantic attributes

DataFusion or DAFU provides user-defined data filter to gene-rate optimal geodata in automated filtering process

2 Concept Automated fusion of imperfect geodata

Page 6: A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvija Stankute, Hartmut Asche - University of Potsdam

data|fusion

6/18

© stankute|asche·ifg·uni·potsdam 2012

3 System architecture Modular components

Page 7: A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvija Stankute, Hartmut Asche - University of Potsdam

data|fusion

7/18

© stankute|asche·ifg·uni·potsdam 2012

Implementation of DataFusion based on generic, modular com-ponent architecture and object-oriented, procedural cross-plat-form programming language (Perl)

Presently DataFusion consists of 3 components, sequentially linked in fusion pipeline

Preprocessing component: preprocessing modules for Tele-atlas, Navteq, ATKIS input data, at present

Filtering/fusion component: merge of 2 or more different input datasets into single optimal dataset

Validation component: quality assessment of merged dataset according to user or application specifications

3 System architecture Modular component system

Page 8: A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvija Stankute, Hartmut Asche - University of Potsdam

data|fusion

8/18

© stankute|asche·ifg·uni·potsdam 2012

4 Fusion pipeline Preprocessing of source data

Quality measures

Analysis for topo-logical errors

Conversion to uniform coordinate system

Analysis for uniqueness

3

1

2

6

Source

data

Conversion to uniform data format

Analysis for topological completeness

Analysis for completeness

Geometric correction

Preprocessed input data

2

3 4 5 6

7

Page 9: A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvija Stankute, Hartmut Asche - University of Potsdam

data|fusion

9/18

© stankute|asche·ifg·uni·potsdam 2012

Preprocessing component executes the following operations on heterogenous geospatial source data:

Objective: Quality assessment of input vector data model underlying each source dataset

Operations: Selection of source data; integration of source data by conversion to unified coordinate system; transformation into common data format; source data assessment for uniqueness and completeness; quality assessment and adjustment of topo-logical correctness, thematic completeness

Result: Preprocessed input datasets used as input data for sub-sequent fusion/filtering component

4 Fusion pipeline Preprocessing of source data

Page 10: A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvija Stankute, Hartmut Asche - University of Potsdam

data|fusion

10/18

© stankute|asche·ifg·uni·potsdam 2012

Detection of relations among input data

1

Merged output data

4 Fusion pipeline Fusion of preprocessed data

Preprocessed input data

Assignments of related objects

3

2Transfer of the-matic information

Transfer of geo-metric information

2

3 4

Page 11: A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvija Stankute, Hartmut Asche - University of Potsdam

data|fusion

11/18

© stankute|asche·ifg·uni·potsdam 2012

Data filtering/fusion component executes following operations on preprocessed geospatial input data:

Objective: Generation of single optimal dataset by transmission and augmentation of attribute features from n input datasets

Operations: Iterative comparison of geometric features (coor-dinates) of vector input datasets; determination of relationships between data features and real-world objects; generation of non-redundant fusion data (1 semantic feature assigned 1 geo-metric feature only, vice versa); transfer (cross-referencing) and extension of specified attributes to target dataset

Result: Merged dataset used as input data for subsequent vali-dation component

4 Fusion pipeline Fusion of preprocessed data

Page 12: A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvija Stankute, Hartmut Asche - University of Potsdam

data|fusion

12/18

© stankute|asche·ifg·uni·potsdam 2012

Specified DAFU data

Validation of fusion quality

1

4 Fusion pipeline Validation of merged data

Interactive error correction

3

2

Data format con-version

Coordinate system transformation

2

3

4

Merged output data

Page 13: A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvija Stankute, Hartmut Asche - University of Potsdam

data|fusion

13/18

© stankute|asche·ifg·uni·potsdam 2012

Validation component executes the following operations on single merged geospatial dataset:

Objective: Quality verification of fusion process

Operations: Calculation and evaluation of data fusion quality; if required and/or specified: interactive correction of errors of source data (< 5 percent for linear objects, <10-15 percent for polygonal objects); transfer of merged geodata to specified co-ordinate systems; conversion of merged dataset into specified data formats (SVG, CSV, SHP, etc.)

End result: Application and/or user-specified optimal geospatial dataset

4 Fusion pipeline Validation of merged data

Page 14: A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvija Stankute, Hartmut Asche - University of Potsdam

data|fusion

14/18

© stankute|asche·ifg·uni·potsdam 2012

5 System operation User interface

Front-end of Data Fusion system allows for 2 operation modes: graphical user interface (GUI) or command-line interface

Command-line operation for implementation into remote sys-tems, such als servers, clusters, etc., by GI experts

GUI operation standard operation mode for application-orien-ted GI users

GUI composed of 8 widgets covering core funtions of DAFU; widgets communicate via data exchange and signal exchange (bindings)

Additional flexible support system provides user with relevant information on operation and understanding of DAFU

Page 15: A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvija Stankute, Hartmut Asche - University of Potsdam

data|fusion

15/18

© stankute|asche·ifg·uni·potsdam 2012

Abb 5-3 Diss

5 System operation User interface > GUI

Page 16: A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvija Stankute, Hartmut Asche - University of Potsdam

data|fusion

16/18

© stankute|asche·ifg·uni·potsdam 2012

6 Conclusion Data fusion – what‘s the benefit?

Page 17: A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvija Stankute, Hartmut Asche - University of Potsdam

data|fusion

17/18

© stankute|asche·ifg·uni·potsdam 2012

6 Conclusion Data fusion – what‘s the benefit? The DataFusion system presents an innovatiove

approach to geospatial data mining by harmonising and improving the geo-metric and semantic quality of digital vector data

DAFU demonstrates that single optimal geospatial data can be generated from existing suboptimal datasets making repeated data acquistion unneccessary

DAFU facilitates cost-effective geospatial data management by multiple re-use of existing datasets customised to individual user and/or application requirements

DAFU contributes to reducing heterogeneity and redundancy of geospatial data in geo databases, at the same time increasing efficient, meaningful use of geographically-related mass data

Page 18: A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvija Stankute, Hartmut Asche - University of Potsdam

data|fusion

18/18

© stankute|asche·ifg·uni·potsdam 2012

Thank you for your attention

Questions? Comments? Feedback?

Contact Hartmut Asche | [email protected] of Geography | University of Potsdam

| GER Web www.geographie.uni-potsdam.de/geoinformatik

ICCSA 2012 | GEOG-AN-MOD 2012 | Salvador da Bahia, Brazil | 18-21/06/2012