This report is very disappointing. What kind of software are you using?
description
Transcript of This report is very disappointing. What kind of software are you using?
This report is very disappointing. What kind of software are you using?
Space Age and Stone Age Syndrome
• Data: Space Age/Stone Age
• Analysis: Space Age/Stone Age
Data Space Age Stone AgeAnalysisSpace Age + +Stone Age +
Life and Death withAverages and Variability
Happy Hunter:
First shot-- one inch on the left of the animalSecond shot-- one inch on the right of the animal
So, on the average, shot on the spot; a perfect average shot!
Senior Lawyer:
Initially in my career I lost some cases I should have won Lately in my career I have won some cases I should have
lost So, on the average, justice has been accomplished.
Life and Death with Averages and Variability
• Happy Hunter:– First shot, one inch on the left of the animal; second shot, one
inch on the right of the animal. So, on the average, shot on the spot; a perfect average shot
• Tourist:– I wish to cross the river. I cannot swim. Can you help?– Native: Certainly! Average depth of this river around here is
known to be well below three feet. You look to be six. – Tourist: You are encouraging, and yet not quite helpful.
Depth is usually uneven. Variability sure is a matter of life and death.
Life and Death with Averages and Variability
• Birds: – Concerned about the typical direction in which disoriented
birds of a certain species fly, someone goes out in an open field, stands facing north, and observes a bird vanish at the horizon at an angle of 10 degrees. A little later, he finds a second bird vanish at the horizon at an angle of 350 degrees. What can be said of the typical direction based on the evidence. After submitting these data to a computer and requesting the average direction, the software returns a value of (10 + 350)/2 =180 degrees. The report concludes that, on average, the birds are flying south. Of course, the exact opposite is true, demanding correct and appropriate software.
Blind Men and the Elephant
by J. G. Saxe (1816‑1887)
It was six men of IndostanTo learning much inclined.
Who went to see the Elephant(Though all of them were blind).
That each by observationMight satisfy his mind.
The First approached the Elephant,And happening to fall
Against his broad and sturdy side,At once began to bawl:
“God bless! but the ElephantIs very like a wall!”
The Second, feeling of the tusk,Cried, “Ho! what have we here
So very round and smooth and sharp?To me tis mighty clear
This wonder of an ElephantIs very like a spear!”
The Third approached the animal,And happening to take
The squirming trunk within his hands,Thus boldly up and spake:
“I see,” quoth he, “the ElephantIs very like a Snake!”
The Fourth reached out an eager hand,And felt about the knee,
“What most this wondrous beast is likeIs mighty plain,” quoth he:
“Tis clear enough the Elephant Is very like a tree!”
The Fifth who chanced to touch the ear,Said: “E'en the blindest man
Can tell what this resembles most;Deny the fact who can,
This marvel of an ElephantIs very like a fan!”
The Sixth no sooner had begunAbout the beast to grope.
Than, seizing on the swinging tailThat fell within his scope,
“I see,” quoth he, “the ElephantIs very like a rope!”
And so these men of IndostanDisputed loud and long,
Each in his own opinionExceeding stiff and strong.
Thought each was partly in the rightAnd all were in the wrong!
Comprehensive vs. Comprehensible:1. For lack of information, we do not quite comprehend the situation.2. We therefore collect information, tending to collect comprehensiveinformation.3. Because the information is comprehensive, we do not quite comprehend it.4. Therefore we summarize the information through a set of indices(statistics) so that it would be comprehensible.5. Now, however, we do not comprehend quite what the indices exactly mean.6. Therefore we do not quite comprehend the situation.7. Thus, without (all) information, or with (partial) information, orwith summarized information, we do not quite comprehend a situation!This dilemma is not to suggest a bleak picture for one's ability to understand, predict, or manage a situation in the face of uncertainty. It is more to suggest a need to clearly state the purpose, formulation and solution for the study under consideration, in line of Data Quality Objectives.
How Many of Them are Out ThereThis scenario takes place in a court of law.
The issue is about the abundance of species seemingly endangered, threatened, or rare. The judge orders an investigation. A seasoned investigator conducts the survey. He reports having seen 75 individual members of the species under consideration.
The judge invites comments.
Industrial Lobby: The reported record of 75 members makes sense. The visibility factor is low in such surveys. The investigator has surely missed some of them that are out there. The exploitation should not cause alarm.
Environmental Lobby: The reported record of 75 members makes sense. The investigator is an expert in such surveys. He has observed and recorded most of them that are out there. And, therefore, only a few are out there. The species population needs to be protected.
The scenario is a typical one. It brings home the issues characteristic of field observations often lacking a sampling frame necessary for the classical sampling theory to apply. One needs to work with visibility analysis instead. Satisfactory estimation of biological population abundance depends largely, in such cases, on adequate measurement of visibility, variously termed catchability, audibility, etc. And, this is not a trivial problem!
Am I a Specialist or a Generalist?
My wife: I am a specialist...because I do `something;' not cooking, not washing, not shopping, etc.
My son: I am a generalist...because I read, play, swim, drive, draw, etc.
My Dean: I am a specialist...because I do statistics; not physics, not chemistry, not astronomy, etc.
My Head: I am a generalist...because I do statistical ecology,environmental statistics, risk assessment, journal editing, etc.
In other words, the degree of specialization/diversification has to be relative to the categories identified.
Diversity Measurement and Comparison
Basic Question
S(2) = 1 + 1 = ?
S(n) = 1 + 1 + .. + 1 = ?
n times
WHAT IS A WATERSHED?
• A watershed is an area of land, which drains water (and everything the water carries) to a common outlet.
• The critical thing to remember about watersheds is that the streams and rivers, the hills, and the bottom lands are all part of an inter-connected system.
• Every activity on the land, in the water or even in the air has the potential to affect a watershed.
Figure 4. River basins, watersheds, and stream order. One watershed within the Patapsco River Basin is that of Herring Run. The numbers beside the streams indicate each stream’s order. The smallest permanently flowing stream is termed first order, and the union of two first order streams creates a second order stream. A third order stream is formed where two second order
streams join.
Selected landscape metrics for the medium-delineated watersheds
Metric Name Definition
PSCV Patch Size Coefficient of Variation
Variability in patch size, or the size of homogeneous land cover areas, relative to the mean patch size
DFLD Double Log Fractal Dimension
2 divided by the slope of the regression line calculated by regressing the log of the patch area against the log of patch perimeter
IJIInterspersion and Juxtaposition Index
Measures the unevenness in patch types across a watershed
CONTAG Contagion Index
Measures the unevenness in patch types across all pixels in a watershed
Metric Name Definition
LPI Largest Patch Index
Percentage of watershed comprised by the largest continuous patch of homogeneous land cover type
PSCV Patch Size Coefficient of Variation
Variability in patch size, or the size of homogeneous land cover areas, relative to the mean patch size
DFLD Double Log Fractal Dimension
2 divided by the slope of the regression line calculated by regressing the log of the patch area against the log of patch perimeter
CONTAG Contagion Index
Measures the unevenness in patch types across all pixels in a watershed
Selected landscape metric for the large-delineated watersheds
Comparison of Three Watershed Types Using Conditional Entropy Profiles
0
1
2
3
4
5
6
7
1 2 3 4 5 6 7 8
Resolution
En
tro
py
Mostly Forested
Transitional
Agricultural/Urban
Comparison of Land Cover Proportions for Three Types of Watersheds
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Land Cover Type
Pro
po
rtio
n
Medium 25
Medium 76
Medium 126
Figure 1. Example of perfect positive and perfect negative correlation between two coordinates (variables).
1 2 3 4 5 6 7
8 9 10
11
12
13
14
15
16
17
18
19 20
21
22
23
24
25 26 27
28
29
30
31
32
33
34
35
36
37
38
39
40
41 42
43
44
45 46 47 48
49
50
51
52
53
54
55
56
57 58
59
60
61 62 63 64
65
66
67
68
69
70
71
72 73
74 75
76
77 78 79
80
81
82
83 84 85
86
87
88
89 90 91 92
93 94
95
96
97
98 99
100
101
102
103
104
105
106
107
108 109
110
111
112 113
114
115
116
117
118
119
120
121
122
123
124
125 126
127 128
129
130
131
132
133
134
135
136
137
138
139
140 141
Hasse Diagram (all countries)
Hasse Diagram (W Europe)
Iceland Sweden Finland Norway
Austria
Switzerland Spain
France Germany
Portugal
Italy
Greece Belgium Netherlands Denmark UK
Ireland
Cumulative Rank Frequency Operator Cumulative Rank Frequency Operator – 4– 4
An Example of the ProcedureAn Example of the Procedure
• We illustrate with the following poset containing 6 elements
Poset(Hasse Diagram)
a b
dc
e f
Cumulative Rank Frequency Operator Cumulative Rank Frequency Operator – 5– 5
An Example of the ProcedureAn Example of the ProcedureIn the example from the preceding slide, there are a total of 16 linear extensions, giving the following cumulative frequency table.
Rank
Element 1 2 3 4 5 6
a 9 14 16 16 16 16
b 7 12 15 16 16 16
c 0 4 10 16 16 16
d 0 2 6 12 16 16
e 0 0 1 4 10 16
f 0 0 0 0 6 16
Each entry gives the number of linear extensions in which the element (row label) receives a rank equal to or better that the column heading
Cumulative Rank Frequency Operator Cumulative Rank Frequency Operator – 6– 6
An Example of the ProcedureAn Example of the Procedure
0
4
8
12
16
1 2 3 4 5 6
Rank
Cum
ulat
ive
Fre
quen
cyabcdef
16
The curves are stacked one above the other and the result is a linear ordering of the elements: a > b > c > d > e > f
Cumulative Rank Frequency Operator Cumulative Rank Frequency Operator – 7– 7
An example where An example where F must be iteratedmust be iterated
Original Poset(Hasse Diagram)
a f
eb
c g d
h
a
f
e
b
ad
c
h
g
a
f
e
b
ad
c
h
g
F F 2
Cumulative Rank Frequency Operator Cumulative Rank Frequency Operator – 8– 8
An example where An example where F results in tiesresults in ties
Original Poset(Hasse Diagram)
a
cb
d
a
b, c (tied)
d
F
•Ties reflect symmetries among incomparable elements in the original Hasse diagram
• Elements that are comparable in the original Hasse diagram will not become tied after applying F operator
0 10 20 30 40 50 60 70 80 90 100 110
HEI Rank
0
10
20
30
40
50
60
70
80
90
100
110
Ran
k In
terv
als
Upper endpoints
Lower endpoints
Midpoints
• Rank intervals for all 106 countries. The intervals (countries) are labeled by their midpoints as shown along the horizontal axis. For each interval, the lower endpoint and the upper endpoint are shown vertically. The length of each interval corresponds to the ambiguity inherent in attempting to rank the country among all 106 countries.
0 10 20 30 40 50 60 70 80 90 100 110
Midpoint
0
10
20
30
40
50
60
70
80
90
100
110
Ran
k In
terv
als
Upper endpoints
Lower endpoints
Midpoints
• Rank intervals for all 106 countries, plotted against their HEI rank. The HEI rank appears as the 45-degree line. The HEI tends to be optimistic (closer to lower endpoint) for better ranked countries and pessimistic (closer to upper endpoint) for poorer ranked countries.
SIR (maximum likelihood estimate)
more than 100% above expected (28)50% to 100% above expected (93)15% to 49% above expected (279)within 15% of expected (471)15% to 50% below expected (338)more than 50% below expected (100)very sparse data (104)
Breast Cancer by ZIP CodeNew York State, 1993-1997
Simple SIRs as observed/expected
cluster * SIR LL Young Multiple Atypical Late StageCases Cancers Demographics of Diagnosis
LF2 2.09 10.36 2 1 1 2LM14 1.5 36 2 0 0 2LM4 2.04 19.21 2 0 0 2LF7 1.51 15.43 1 1 1 1B2 1.21 31.3 2 1 0 2B4 1.25 28.4 1 0 0 0LM1 2.32 21.91 0 1 0 2LM3 2.13 21.26 1 1 0 1LM7 2.12 13.33 1 0 0 2
* LF = lung, female; LM = lung, male; B = breast
Ranking Possible Disease Clusters in the State of New York
Data Matrix
Logo for Statistics, Ecology, Environment, and Society