Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

45
Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds

Transcript of Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

Page 1: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

Geographical Data Mining

Stan Openshaw

Centre for Computational Geography

University of Leeds

Page 2: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

BUTIan Turton, CCG, Leeds University

For the latest on Stanhttp://www.geog.leeds.ac.uk/staff/s.openshaw/latest.html

Page 3: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

Why would we want to do this?

• Geographical Data Explosion

• Public imperative

• Lack of geographically aware tools

Page 4: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

Mountains of Data

Page 5: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

Swamps of Data

Page 6: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

We know what you spend...

Page 7: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

…where you spend it...

Page 8: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

…who you talk to...

Page 9: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

…where you live...

LS2 9JT

What your neighbours are like

Page 10: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

...Crime data and...

• crime type• crime location• insurance data

Page 11: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

...Health data

• environmental data• socio-economic data• admissions data

Page 12: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

Geographical Hyperspace

• Geography – x,y co-ordinates, postcodes

• Time – days, hours, months

• Attributes– place - pollution sources, soil type, distance to

motorway– cases - type of disease, age, sex

Page 13: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

Data Mining

Page 14: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

Turning data into knowledge

• How do these data sets fit together?

• Is there anything important hidden in here?

• Does geography make a difference?

Page 15: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

Datatype Nature of Data Interaction_________________________________________1. spatial data2. time data3. multiple attribute data4. geography and time data5. time and multiple attribute data6. geography and multiple attribute data7. geography, time, and multiple

attribute data

Page 16: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

HISTORICALLY

these effects have been hidden

by research design

BUT

Page 17: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

BUT

Page 18: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

The result is often data strangulation

The patterns are being destroyed

or

damaged

by the research design

Page 19: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

What is needed is a geographic data mining technology that works

Page 20: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

How can we do this?

• Developing new smarter methods

• Testing them– HPC is vital to this

process

• Disseminating them– Internet

– Java

Page 21: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

Being SMART is not just a matter of

methodology but also involves access,

usability, relevancy, and result

communication factors

Page 22: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

The complete novice should be able to perform some

sophisticated geographical analysis and get some useful and understandable results on the same day the work

started

Page 23: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

User Friendly Spatial Analysis

• provides analysis that users need

• simple to perform

• highly automated making it fast and efficient

• readily understood

• results are self-evident and can be communicated to non-experts

• safe and trustworthy

Page 24: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

What we did in this study

• Comparison of techniques on the same data

• Multiple techniques– GAM/K– GAM/K-T– MAPEX– GDM1/2– FLOCK– Proprietary Data Mining Tools

Page 25: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

Study Area

Page 26: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

Stan’s Cases

Page 27: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

Chris’ cases

Page 28: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

How to search the geographic space

• Exhaustively – GAM, GEM

• Smartly– Genetic algorithm

• mapex, gdm

– Flocking • boids

Page 29: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

GAM & GEM

Page 30: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

Mapex & GDM

Page 31: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

FLOCK

Page 32: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

And the Attributes...

• Exhaustively – GAM, GEM

• Smartly– Genetic algorithm

• mapex, gdm, boids

Page 33: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

GAM & GEM with time

Page 34: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

Rock A

Rock B Rock C

Rock D

Geology Map

Page 35: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

railway

2 km

buffer polygon

Page 36: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

Combined Geology and Railway Buffer Map

Rock A

Rock B Rock C

Rock D2 km

Page 37: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

Combinations of Attributes

• If we have 8 attributes with 10 classes each

• There are 3160 permutations of 2 classes from 80 compared with 24,040,016 if any 5 are used

• Smart searches are essential– use GA to generate possible combinations of

interest

Page 38: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

Proprietary Data Miners

Page 39: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

Results

How to visualise

them?

Page 40: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

Results• GAM/K

– did very well– was not put off by time or attributes

• GAM/KT– worked well – time clusters found

• MAPEX / GDM/1– worked well

Page 41: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

Results continued

• FLOCK– worked very well

• Data mining– didn’t work at all well out of the box– could have built a GAM inside them

Page 42: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

What next?

• Build a harder data set for more tests

• Re-run the analysis

• Put it all on the web

Page 43: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

Thanks to

• European Research Office of the US Army

• ESRC grant R237260 for paying Ian’s salary.

• ESRC/JISC for the Census data purchase.

• OS for the bits of the maps they own.

Page 44: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

To find out more

• Web based Multi-engine spatial analysis tools James Macgill, Openshaw and Turton– Session 1A - 14.00 Sunday

• Smart Crime Pattern Analysis using GAM Ian Turton, Openshaw and Macgill– Session 7A - 10.40 Tuesday

Page 45: Geographical Data Mining Stan Openshaw Centre for Computational Geography University of Leeds.

Contacts

Email ian,stan,[email protected]

check out smart pattern analysis on the web

http://www.ccg.leeds.ac.uk/smart

http://www.ccg.leeds.ac.uk/smart/hyper.doc

Latest news on Stanhttp://www.geog.leeds.ac.uk/staff/s.openshaw/latest.html