Exploratory Analysis of Forestry Data in NEFIS
-
Upload
virginia-hutchinson -
Category
Documents
-
view
20 -
download
0
description
Transcript of Exploratory Analysis of Forestry Data in NEFIS
![Page 1: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/1.jpg)
Exploratory Analysis of Forestry Data in NEFIS
Natalia Andrienko & Gennady AndrienkoFHG AIS (Fraunhofer Institute for Autonomous
Intelligent Systems)http://www.ais.fraunhofer.de/and
NEFIS Project Workshop, JRC Italy, 29th June 2005
![Page 2: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/2.jpg)
NEFIS and our research
• Our research focus is EDA – Exploratory Data Analysis (in particular, spatial and temporal data)
• In NEFIS, we strive at explaining and promoting the ideas and principles of EDA
• We have used the ICP Forests defoliation data as a non-trivial example to demonstrate systematic, comprehensive EDA
• We hope to receive valuable feedback from you for guiding our further work
![Page 3: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/3.jpg)
What Is EDA?
• Emerged in statistics in 1970ies; originator: John Tukey
• A philosophy and discipline of unbiased looking at data: “What can data tell me?” rather than “Do they agree with my expectations?”– Similar to the work of a detective (J.Tukey)
• Need to look at data focus on visualisation and user interaction with data displays
![Page 4: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/4.jpg)
Purposes of EDA
• Uncover peculiarities of the data and, on this basis, understand how the data should be further processed (e.g. filtered, transformed, split into parts, fused, …)
• Generate hypotheses for further testing (e.g. using statistical methods)
• Choose proper methods for in-depth analysis (possibly, domain-specific)
• Especially important for previously unknown data, e.g. found in the Web relevant to NEFIS
![Page 5: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/5.jpg)
EDA vs. other analyses
• EDA does not substitute rigor methods of numerical analysis, either general or domain-specific, but should give the understanding what methods and how to apply
Original data
1. EDAUnderstanding
of the data (mental model)
2. Data processing
Processed data
3. In-depth analysis
Conclusions, theories,
decisions, …
![Page 6: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/6.jpg)
EDA vs. information presentation
• EDA makes intensive use of graphics• However, “nice” presentation and reporting are
not EDA purposes• Primary goal of presentation: convey certain
idea or set of ideas to others– Understandably– Convincingly– Aesthetically attractively
• This requires different visual means than exploration
![Page 7: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/7.jpg)
The defoliation data
• Large volume: 6169 spatially-referenced time series
• Two dimensions: S&T• Many missing values• No full compatibility
across countries, species, time etc.
![Page 8: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/8.jpg)
EDA: data quality issues
Specialists’ opinion (after seeing the draft report of the data exploration): “The data were not meant for analysis!”
But:1. There are no ideal data (especially in the Web and
for free)2. Even for understanding data inadequacy one needs
first to explore them3. Even imperfect data can be useful4. The principles of EDA (demonstrated further) are
applicable to perfect data as well
![Page 9: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/9.jpg)
General procedure of the EDA
1. See the whole– Space + Time 2 complementary views
1) Evolution of spatial patterns in time
2) Distribution of temporal behaviours in space
2. Divide and focus– Data are complex Have to be explored by slices
and subsets (species, age groups, countries, years, …)
3. Attend to particulars– Detect outliers, strange behaviours, unexpected
patterns, …
![Page 10: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/10.jpg)
See the whole: Handle large data volumes
• General approach: Data aggregation
• Task 1: Explore evolution of spatial patterns
• Appropriate data transformation: aggregate by small space compartments (regular grid with 4025 cells); separately for different species; various aggregates (mean, max)
Gain: no symbol overlapping
![Page 11: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/11.jpg)
Explore evolution of spatial patterns
a) Animated mapb) Map sequence
Observations:• Persistently high
values in Poland• Improvement in
Belarus• Mosaic distribution in
most countries: great differences between close locations
• Outliers
![Page 12: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/12.jpg)
Divide and Focus: Exploration on country level
• Recommendable due to inconsistencies between countries
• Observation: abrupt changes between locations spatial smoothing methods are not appropriate
![Page 13: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/13.jpg)
Explore spatial distribution of temporal behaviours
• Are behaviours in neighbouring places similar?
• Step 1. Smoothing supports revealing general patterns and disregarding fluctuations and outliers (we shall look at outliers later)
![Page 14: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/14.jpg)
Explore spatial distribution of temporal behaviours
• Are behaviours in neighbouring places similar?
• Step 2. Temporal comparison (e.g. with particular year, mean for a period) helps to disregard absolute differences in values and thus focus on behaviours
Observation: no strong similarity between neighbouring places
![Page 15: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/15.jpg)
Compare behaviours in plots with different main species
• Mosaic signs: – 6 rows for species; – 14 columns for years
1990-2003;– Colours encode
defoliation values
Observation: behaviours differ for different main species
![Page 16: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/16.jpg)
Explore overall temporal trends
Line overlapping obstructs data analysis apply aggregation
![Page 17: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/17.jpg)
Aggregation method 1: by quantiles
![Page 18: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/18.jpg)
Aggregation method 2: by intervals
![Page 19: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/19.jpg)
Divide and Focus: Germany
![Page 20: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/20.jpg)
Divide and Focus: age groups 1,3
![Page 21: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/21.jpg)
Attend to particulars
Types of particulars (examples):– Extreme values– Extreme changes– High variability– …
Questions:– When?– Where?– What is around?– Why? (a question for further, in-depth analysis)
Domain knowledge is essential
![Page 22: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/22.jpg)
Attend to particulars: extreme values
1. Click on a segment corresponding to extreme values
2. The behaviour(s) is(are) highlighted on the time graph
3. The location(s) is(are) highlighted on the map
![Page 23: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/23.jpg)
Attend to particulars: what is around?
• In some neighbouring places the behaviours during the period 2000 - 2003 are somewhat similar
![Page 24: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/24.jpg)
Attend to particulars: extreme changes
1. Transform the time graph to show changes
2. Select extreme changes in a specific year (here 2003)
![Page 25: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/25.jpg)
Attend to particulars: high variation
1. Aggregate time graph by quantiles
2. Save counts
3. Visualise e.g. on a scatter plot
4. Select items with high variation
![Page 26: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/26.jpg)
Attend to particulars: high fluctuation
• Select items with maximal number of jumps between quantiles
![Page 27: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/27.jpg)
Attend to particulars: stable extremes
• Select items being always in the topmost 10%
![Page 28: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/28.jpg)
Attend to particulars: stable increase
1. Turn the time graph in the segmentation mode
2. Choose “increase” and set minimum difference
3. Select a sequence of years by clicking
4. Check sensitivity to the time period!
![Page 29: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/29.jpg)
Conclusions: the Data
• This dataset is not suitable for application of major statistical analysis methods due to– absence of spatial & temporal smoothness– skewed distributions– outliers– missing values
• The data may be suitable for other purposes (e.g. in a context of a broader study of the ecological situation over Europe)– EDA methods can promote insights
![Page 30: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/30.jpg)
Recap: Exploration procedure
• See the whole– Evolution of spatial patterns in time– Distribution of temporal behaviours in space
• Divide and focus– Data were explored by slices and subsets
(species, age groups, countries, years, …)
• Attend to particulars– Extreme values, extreme changes, high
variation, high fluctuations, stable growth …
![Page 31: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/31.jpg)
Recap: Tools
• Visualisation on thematic maps, time graphs, other aspatial displays
• Aggregation: reduce data volume & symbol overlapping
• Filtering: divide and focus (select subsets)• Marking: see corresponding data on different
displays• Data transformation: smoothing, computing
changes, normalisation etc. It is important to use the tools in combination
![Page 32: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/32.jpg)
Further information• Software: http://www.commongis.com
• Scientific issues (papers, tutorials, demos): http://www.ais.fraunhofer.de/and
• Book to appear:
N. and G. Andrienko
“Exploratory Analysis of Spatial and Temporal data. A Systematic Approach”
(Springer-Verlag, end 2005)A systematic approach to defining tasks, tools, and principles of EDA
![Page 33: Exploratory Analysis of Forestry Data in NEFIS](https://reader035.fdocuments.us/reader035/viewer/2022081519/568137ec550346895d9fa3cb/html5/thumbnails/33.jpg)
In press, to appear end 2005
http://www.ais.fraunhofer.de/and