Random Thoughts 2012 (COMP 066)
description
Transcript of Random Thoughts 2012 (COMP 066)
Random Thoughts 2012(COMP 066)
Jan-Michael FrahmJared Heinly
2
Values to Summarize Data
• Mean (EXCEL: AVERAGE(<range>)• Can informally be seen as the middle of the
data• Be careful they do not always tell the whole
story outliers influence the mean (significantly)
3
Median• Median (EXCEL: MEDIAN(<range>))1) Order the data from smallest to largest2) If the dataset is an odd number the median is
the one in the middle. If there is an even number of data the average of the middle two is the median
• Which measure should be used mean or median? reporting both is never a problem
• Always ask for the other if given only one
4
Measure of Variability• Standard deviation (EXCEL: STDEV.S(<range>))1. Find the average of the data2. Subtract average from the data3. Square the differences4. Divide the sum of squares by the number of
data minus one (this is also called variance)5. Take the square root of the variance
5
Standard Deviation Properties
• Can never be negative• Smallest possible value is 0• Effected by outliers• Same unit as original data
6
Percentile• k-th percentile1. Order all numbers in the dataset2. Multiply k percent times the number of data points n
round up if not a whole number
3. Find the value at the in step 2 computed position. Then the k-th percentile is the average of that number and the next number
• Median is the 50-th percentile• Percentile is not a percent it a number that is a
certain percentage of the way through the dataset
7
Coincidences• Recall the bet that two people in the room have
the same birthday• Was it a bad bet to make?
8
Coincidences• Johnny Carson example from Paulos book: In
order to have a 50% probability of someone in the room having a particular birthday, you need 253 people.
• Does this make sense?• Wouldn’t you need only 50% or 366 people
which is 183?
9
Coincidences• 1000 letters, 1000 mailboxes, random
assignment• Probability of at least 1 getting to correct
destination• Why is it 63%?
10
Coincidences• 1000 letters 1000 random addresses (allowing
duplicates), 1000 mailboxes, random assignment
• Probability of at least 1 getting to correct destination
11
Coincidences• 1000 letters, 1000 mailboxes, random
assignment• Probability of at least 1 getting to correct
destination• Why is it 63%?• Derangements – permutation such that no
element appears in its original position Complex calculation, but as number of elements
increases, probability approaches 1 – 1/e ≈ 63%
12
Pigeonhole Principle• If n items are put into m pigeonholes with n > m, at least one pigeonhole must have more than 1 item
Source: http://en.wikipedia.org/wiki/File:TooManyPigeons.jpg
13
Pigeonhole Principle• 1.54 million people in Philadelphia• At most 500,000 hairs are on a person’s head• What is the minimum number of people that
have the same number of hairs on their head?
14
Chance Encounters• Probability that two people from USA know
someone in common ie. they are “linked” via one person
• Assumption: there are 300 million people in the USA
• Assumption: each person knows 1500 other people
• Probability that two people from USA are linked via 2 individuals
15
Degrees of Separation• Six degrees of separation
There are on average 6 links between any 2 people on earth
• Six degrees of Kevin Bacon, Bacon number Determine the number of links (movies acted in)
between a random actor and Kevin Bacon
• Assume 2 million actors• Assume each actor has acted with 80 others
16
Expected Value• Expected value = probability of event *
value of event• Ex: pay $1 to play a game, 10% chance of
winning $5, 40% chance of winning $1• Expected Value = -1 + 0.1 * 5 + 0.4 * 1 = $-
0.10• Ex: Dice game
Keep earning points until you roll a 1 When does your expected value of points stop
increasing?
Σ
17
Blood Testing• 1% of people have disease• Need to test 100 samples of blood• Probability that all samples are healthy• What if we pool the blood into 2 sets of 50 each
and then test? What is the expected number of tests?
• Can we do better?