1
Subjective ProbabilityInformation Design
Scott MatthewsCourses: 12-706 / 19-702/ 73-359
12-706 and 73-359 2
Admin Issues
HW 5 (due next wed)Next project scheduleCase studies coming
12-706 and 73-359 3
Subjective Probabilities
Main Idea: We all have to make personal judgments (and decisions) in the face of uncertainty (Granger Morgan’s career) These personal judgments are subjective Subjective judgments of uncertainty can be
made in terms of probabilityExamples:
“My house will not be destroyed by a hurricane.”
“The Pirates will have a winning record (ever).” “Driving after I have 2 drinks is safe”.
12-706 and 73-359 4
Outcomes and Events
Event: something about which we are uncertain Outcome: result of uncertain event
Subjectively: once event (e.g., coin flip) has occurred, what is our judgment on outcome? Represents degree of belief of outcome Long-run frequencies, etc. irrelevant - need one Example: Steelers* play AFC championship
game at home. I Tivo it instead of watching live. I assume before watching that they will lose.
*Insert Cubs, etc. as needed (Sox removed 2005)
12-706 and 73-359 5
Next Steps
Goal is capturing the uncertainty/ biases/ etc. in these judgments Might need to quantify verbal expressions
(e.g., remote, likely, non-negligible..)What to do if question not answerable
directly? Example: if I say there is a “negligible”
chance of anyone failing this class, what probability do you assume?
What if I say “non-negligible chance that someone will fail”?
12-706 and 73-359 6
Merging of Theories
Science has known that “objective” and “subjective” factors existed for a long time
Only more recently did we realize we could represent subjective as probabilities
But inherently all of these subjective decisions can be ordered by decision tree Where we have a gamble or bet between what
we know and what we think we know Clemen uses the basketball game gamble
example We would keep adjusting payoffs until optimal
12-706 and 73-359 7
Probability Wheel
Mechanism for formalizing our thoughts on probabilities of comparative lotteries
You select the area of the pie chart until you’re indifferent between the two lotteries
Quick 2-person exercise. Then we’ll discuss p-values.
12-706 and 73-359 8
Continuous Distributions
Similar to above, but we need to do it a few times. E.g., try to get 5%, 50%, 95% points on
distribution Each point done with a “cdf-like” lottery
comparison
12-706 and 73-359 9
Danger: Heuristics and Biases
Heuristics are “rules of thumb” Which do we use in life? Biased? How?
Representativeness (fit in a category)Availability (seen it before, fits memory)Anchoring/Adjusting (common base
point)Motivational Bias (perverse incentives)Idea is to consider these in advance and
make people aware of them
12-706 and 73-359 10
Asking Experts
In the end, often we do studies like this, but use experts for elicitation Idea is we should “trust” their
predictions more, and can better deal with biases
Lots of training and reinforcement steps But in the end, get nice prob functions
12-706 and 73-359 11
Information DesignWhat is it? Idea of carefully linking what data you
have with what you want to say“God” of the field: Edward Tufte (.com)
Quotes from his books (mostly his first)The eye can recognize 150 Mbits of information
And is connected to our brain, a great processorPerhaps most important: don’t just blindly use built-in
graph/graphic tools when you have a significant point to make a.k.a. Excel and Powerpoint are not friends! They create simplistic graphs that dumb us down Your graphics say a lot about your perceived command
12-706 and 73-359 12
Some pre-thoughts
In statistics, plotting raw data is useful - because it can show outliers (easy to see)
Analytical results need same treatment
12-706 and 73-359 13
Strive for “Graphical Excellence”
"consists of complex ideas communicated with clarity, precision, and efficiency
is that which gives to the viewer the greatest number of ideas in the shortest time with the least “ink” in the smallest space
is nearly always multivariate“requires telling the truth about the data."
12-706 and 73-359 14
Graphics/Viz should: "show the data induce viewer to think about the substance rather than about
methodology, graphic design, the technology, etc. avoid distorting what the data have to say present many numbers in a small space make large data sets coherent encourage the eye to compare different pieces of data reveal the data at several levels of detail, from a broad overview to
the fine structure serve a reasonably clear purpose: description, exploration, tabulation,
or decoration be closely integrated with the statistical and verbal descriptions of a
data set."
12-706 and 73-359 15
Visualization goals
content focuscomparison rather than mere descriptionIntegrityhigh resolutionutilization of classic designs and concepts
proven by time.
12-706 and 73-359 16
Content Focus
“Above all else show the data." The focus should be on the content of the data, not the visualization technique. This leads to design transparency.
The success of a visualization is based on deep knowledge and care about the substance, and the quality, relevance and integrity of the content
Assume that the viewer is just as smart as you and cares just as much
Never `dumb-down' a visualization.
12-706 and 73-359 17
Comparison vs. Description
At the heart of quantitative reasoning is a single question: Compared to what?
Most visualizations today are descriptive rather than comparative. The xy-plot invites reasoning about causality in a way that even the most impressive isosurface does not.
We should strive for relational, rather than merely descriptive, visualizations.
Avoid relying on the viewer's memory to make visual comparisons; a weak facility in most of us.
12-706 and 73-359 18
Integrity - Misleading visualizations are common
To help limit unintentional visualization lies: "The representation of numbers, as physically measured on
the surface of the graphic itself, should be directly proportional to the numerical quantities represented
Clear, detailed, and thorough labeling should be used to defeat graphical distortion and ambiguity
Write out explanations of the data on the graphic itself. Label important events in the data
Show data variation, not design variation The number of information-carrying (variable) dimensions
depicted should not exceed the number of dimensions in the data
Graphics must not quote data out of context
12-706 and 73-359 19
“Lie Factor”
Lie-factor = size-of-effect-shown-in-visualization / size-of-effect-in-data
12-706 and 73-359 20
Design Guidelines
Visualizations "are paragraphs about data and should be treated as such." Words, pictures, and numbers are all part of the information to be visualized, not separate entities "have a properly chosen format and design use words, numbers, and drawing together reflect balance, proportion, sense of relevant scale display an accessible complexity of detail often have a narrative quality, a story to tell about the data avoid content-free decoration, including “chartjunk”
(miscellaneous graphics that have nothing to do with the data)
12-706 and 73-359 21
Examples, and what’s wrong?
Think of Tufte’s “rules” above. Specify.
12-706 and 73-359 22
Nice attempt gone bad..
Graphic was bad before scan made it worse ;-)
Source: NY Times, Aug 9, 1978, p. D-2
Caption says “Fuel Economy Standards for Autos, set by CongressAnd supplemented by DOT, in miles per gallon”
12-706 and 73-359 23
12-706 and 73-359 24
12-706 and 73-359 25
12-706 and 73-359 26
12-706 and 73-359 27
12-706 and 73-359 28
12-706 and 73-359 29
What’s wrong?
What could we do better?
12-706 and 73-359 30
Sorted by 5-yrFormatted nicer (big small)
Source:http://edwardtufte.com
12-706 and 73-359 31
Consistent scale in this caseCauses lots of crossover and Clutter.
12-706 and 73-359 32
12-706 and 73-359 33
Labels on both sides!
12-706 and 73-359 34
12-706 and 73-359 35
How far we’ve come!
Top Related