This report is very disappointing. What kind of software are you using?

Space Age and Stone Age Syndrome

• Data: Space Age/Stone Age

• Analysis: Space Age/Stone Age

Data Space Age Stone AgeAnalysisSpace Age + +Stone Age +

Life and Death withAverages and Variability

Happy Hunter:

First shot-- one inch on the left of the animalSecond shot-- one inch on the right of the animal

So, on the average, shot on the spot; a perfect average shot!

Senior Lawyer:

Initially in my career I lost some cases I should have won Lately in my career I have won some cases I should have

lost So, on the average, justice has been accomplished.

Life and Death with Averages and Variability

• Happy Hunter:– First shot, one inch on the left of the animal; second shot, one

inch on the right of the animal. So, on the average, shot on the spot; a perfect average shot

• Tourist:– I wish to cross the river. I cannot swim. Can you help?– Native: Certainly! Average depth of this river around here is

known to be well below three feet. You look to be six. – Tourist: You are encouraging, and yet not quite helpful.

Depth is usually uneven. Variability sure is a matter of life and death.

Life and Death with Averages and Variability

• Birds: – Concerned about the typical direction in which disoriented

birds of a certain species fly, someone goes out in an open field, stands facing north, and observes a bird vanish at the horizon at an angle of 10 degrees. A little later, he finds a second bird vanish at the horizon at an angle of 350 degrees. What can be said of the typical direction based on the evidence. After submitting these data to a computer and requesting the average direction, the software returns a value of (10 + 350)/2 =180 degrees. The report concludes that, on average, the birds are flying south. Of course, the exact opposite is true, demanding correct and appropriate software.

Blind Men and the Elephant

by J. G. Saxe (1816‑1887)

It was six men of IndostanTo learning much inclined.

Who went to see the Elephant(Though all of them were blind).

That each by observationMight satisfy his mind.

The First approached the Elephant,And happening to fall

Against his broad and sturdy side,At once began to bawl:

“God bless! but the ElephantIs very like a wall!”

The Second, feeling of the tusk,Cried, “Ho! what have we here

So very round and smooth and sharp?To me tis mighty clear

This wonder of an ElephantIs very like a spear!”

The Third approached the animal,And happening to take

The squirming trunk within his hands,Thus boldly up and spake:

“I see,” quoth he, “the ElephantIs very like a Snake!”

The Fourth reached out an eager hand,And felt about the knee,

“What most this wondrous beast is likeIs mighty plain,” quoth he:

“Tis clear enough the Elephant Is very like a tree!”

The Fifth who chanced to touch the ear,Said: “E'en the blindest man

Can tell what this resembles most;Deny the fact who can,

This marvel of an ElephantIs very like a fan!”

The Sixth no sooner had begunAbout the beast to grope.

Than, seizing on the swinging tailThat fell within his scope,

“I see,” quoth he, “the ElephantIs very like a rope!”

And so these men of IndostanDisputed loud and long,

Each in his own opinionExceeding stiff and strong.

Thought each was partly in the rightAnd all were in the wrong!

Comprehensive vs. Comprehensible:1. For lack of information, we do not quite comprehend the situation.2. We therefore collect information, tending to collect comprehensiveinformation.3. Because the information is comprehensive, we do not quite comprehend it.4. Therefore we summarize the information through a set of indices(statistics) so that it would be comprehensible.5. Now, however, we do not comprehend quite what the indices exactly mean.6. Therefore we do not quite comprehend the situation.7. Thus, without (all) information, or with (partial) information, orwith summarized information, we do not quite comprehend a situation!This dilemma is not to suggest a bleak picture for one's ability to understand, predict, or manage a situation in the face of uncertainty. It is more to suggest a need to clearly state the purpose, formulation and solution for the study under consideration, in line of Data Quality Objectives.

How Many of Them are Out ThereThis scenario takes place in a court of law.

The issue is about the abundance of species seemingly endangered, threatened, or rare. The judge orders an investigation. A seasoned investigator conducts the survey. He reports having seen 75 individual members of the species under consideration.

The judge invites comments.

Industrial Lobby: The reported record of 75 members makes sense. The visibility factor is low in such surveys. The investigator has surely missed some of them that are out there. The exploitation should not cause alarm.

Environmental Lobby: The reported record of 75 members makes sense. The investigator is an expert in such surveys. He has observed and recorded most of them that are out there. And, therefore, only a few are out there. The species population needs to be protected.

The scenario is a typical one. It brings home the issues characteristic of field observations often lacking a sampling frame necessary for the classical sampling theory to apply. One needs to work with visibility analysis instead. Satisfactory estimation of biological population abundance depends largely, in such cases, on adequate measurement of visibility, variously termed catchability, audibility, etc. And, this is not a trivial problem!

Am I a Specialist or a Generalist?

My wife: I am a specialist...because I do `something;' not cooking, not washing, not shopping, etc.

My son: I am a generalist...because I read, play, swim, drive, draw, etc.

My Dean: I am a specialist...because I do statistics; not physics, not chemistry, not astronomy, etc.

My Head: I am a generalist...because I do statistical ecology,environmental statistics, risk assessment, journal editing, etc.

In other words, the degree of specialization/diversification has to be relative to the categories identified.

Diversity Measurement and Comparison

Basic Question

S(2) = 1 + 1 = ?

S(n) = 1 + 1 + .. + 1 = ?

n times

WHAT IS A WATERSHED?

• A watershed is an area of land, which drains water (and everything the water carries) to a common outlet.

• The critical thing to remember about watersheds is that the streams and rivers, the hills, and the bottom lands are all part of an inter-connected system.

• Every activity on the land, in the water or even in the air has the potential to affect a watershed.

Figure 4. River basins, watersheds, and stream order. One watershed within the Patapsco River Basin is that of Herring Run. The numbers beside the streams indicate each stream’s order. The smallest permanently flowing stream is termed first order, and the union of two first order streams creates a second order stream. A third order stream is formed where two second order

streams join.

Selected landscape metrics for the medium-delineated watersheds

Metric Name Definition

PSCV Patch Size Coefficient of Variation

Variability in patch size, or the size of homogeneous land cover areas, relative to the mean patch size

DFLD Double Log Fractal Dimension

2 divided by the slope of the regression line calculated by regressing the log of the patch area against the log of patch perimeter

IJIInterspersion and Juxtaposition Index

Measures the unevenness in patch types across a watershed

CONTAG Contagion Index

Measures the unevenness in patch types across all pixels in a watershed

Metric Name Definition

LPI Largest Patch Index

Percentage of watershed comprised by the largest continuous patch of homogeneous land cover type

PSCV Patch Size Coefficient of Variation

Variability in patch size, or the size of homogeneous land cover areas, relative to the mean patch size

DFLD Double Log Fractal Dimension

2 divided by the slope of the regression line calculated by regressing the log of the patch area against the log of patch perimeter

CONTAG Contagion Index

Measures the unevenness in patch types across all pixels in a watershed

Selected landscape metric for the large-delineated watersheds

Comparison of Three Watershed Types Using Conditional Entropy Profiles

0

1

2

3

4

5

6

7

1 2 3 4 5 6 7 8

Resolution

En

tro

py

Mostly Forested

Transitional

Agricultural/Urban

Comparison of Land Cover Proportions for Three Types of Watersheds

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Land Cover Type

Pro

po

rtio

n

Medium 25

Medium 76

Medium 126

Figure 1. Example of perfect positive and perfect negative correlation between two coordinates (variables).

1 2 3 4 5 6 7

8 9 10

11

12

13

14

15

16

17

18

19 20

21

22

23

24

25 26 27

28

29

30

31

32

33

34

35

36

37

38

39

40

41 42

43

44

45 46 47 48

49

50

51

52

53

54

55

56

57 58

59

60

61 62 63 64

65

66

67

68

69

70

71

72 73

74 75

76

77 78 79

80

81

82

83 84 85

86

87

88

89 90 91 92

93 94

95

96

97

98 99

100

101

102

103

104

105

106

107

108 109

110

111

112 113

114

115

116

117

118

119

120

121

122

123

124

125 126

127 128

129

130

131

132

133

134

135

136

137

138

139

140 141

Hasse Diagram (all countries)

Hasse Diagram (W Europe)

Iceland Sweden Finland Norway

Austria

Switzerland Spain

France Germany

Portugal

Italy

Greece Belgium Netherlands Denmark UK

Ireland

http://www.personal.psu.edu/users/i/x/ixf114/HEI_Report2_files/Triangle_eu.jpg

Cumulative Rank Frequency Operator Cumulative Rank Frequency Operator – 4– 4

An Example of the ProcedureAn Example of the Procedure

• We illustrate with the following poset containing 6 elements

Poset(Hasse Diagram)

a b

dc

e f


An Example of the ProcedureAn Example of the ProcedureIn the example from the preceding slide, there are a total of 16 linear extensions, giving the following cumulative frequency table.

Rank

Element 1 2 3 4 5 6

a 9 14 16 16 16 16

b 7 12 15 16 16 16

c 0 4 10 16 16 16

d 0 2 6 12 16 16

e 0 0 1 4 10 16

f 0 0 0 0 6 16

Each entry gives the number of linear extensions in which the element (row label) receives a rank equal to or better that the column heading


An Example of the ProcedureAn Example of the Procedure

0

4

8

12

16

1 2 3 4 5 6

Rank

Cum

ulat

ive

Fre

quen

cyabcdef

16

The curves are stacked one above the other and the result is a linear ordering of the elements: a > b > c > d > e > f


An example where An example where F must be iteratedmust be iterated

Original Poset(Hasse Diagram)

a f

eb

c g d

h

a

f

e

b

ad

c

h

g

a

f

e

b

ad

c

h

g

F F 2


An example where An example where F results in tiesresults in ties

Original Poset(Hasse Diagram)

a

cb

d

a

b, c (tied)

d

F

•Ties reflect symmetries among incomparable elements in the original Hasse diagram

• Elements that are comparable in the original Hasse diagram will not become tied after applying F operator

0 10 20 30 40 50 60 70 80 90 100 110

HEI Rank

0

10

20

30

40

50

60

70

80

90

100

110

Ran

k In

terv

als

Upper endpoints

Lower endpoints

Midpoints

• Rank intervals for all 106 countries. The intervals (countries) are labeled by their midpoints as shown along the horizontal axis. For each interval, the lower endpoint and the upper endpoint are shown vertically. The length of each interval corresponds to the ambiguity inherent in attempting to rank the country among all 106 countries.

0 10 20 30 40 50 60 70 80 90 100 110

Midpoint

0

10

20

30

40

50

60

70

80

90

100

110

Ran

k In

terv

als

Upper endpoints

Lower endpoints

Midpoints

• Rank intervals for all 106 countries, plotted against their HEI rank. The HEI rank appears as the 45-degree line. The HEI tends to be optimistic (closer to lower endpoint) for better ranked countries and pessimistic (closer to upper endpoint) for poorer ranked countries.

SIR (maximum likelihood estimate)

more than 100% above expected (28)50% to 100% above expected (93)15% to 49% above expected (279)within 15% of expected (471)15% to 50% below expected (338)more than 50% below expected (100)very sparse data (104)

Breast Cancer by ZIP CodeNew York State, 1993-1997

Simple SIRs as observed/expected

cluster * SIR LL Young Multiple Atypical Late StageCases Cancers Demographics of Diagnosis

LF2 2.09 10.36 2 1 1 2LM14 1.5 36 2 0 0 2LM4 2.04 19.21 2 0 0 2LF7 1.51 15.43 1 1 1 1B2 1.21 31.3 2 1 0 2B4 1.25 28.4 1 0 0 0LM1 2.32 21.91 0 1 0 2LM3 2.13 21.26 1 1 0 1LM7 2.12 13.33 1 0 0 2

* LF = lung, female; LM = lung, male; B = breast

Ranking Possible Disease Clusters in the State of New York

Data Matrix

Logo for Statistics, Ecology, Environment, and Society

This report is very disappointing. What kind of software are you using?

Documents

Transcript of This report is very disappointing. What kind of software are you using?