Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof....

45
Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran

Transcript of Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof....

Page 1: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Statistics and Quantitative Analysis U4320

Segment 4: Statistics and

Quantitative Analysis Prof. Sharyn O’Halloran

Page 2: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Probability Distributions A. Distributions: How do simple

probability tables relate to distributions? 1. What is the Probability of getting a head? ( 1 coin

toss)

1/2

0 1 Proportion of Heads

Prob.

Page 3: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Probability Distributions(cont.)

2. Now say we flip the coin twice. The picture now looks like:

1/2

0 1/2 Proportion of Heads

1/4

1

0 1headhead 2 headss

Page 4: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Probability Distributions(cont.)

As number of coin tosses increases, the distribution looks like a bell-shaped curve:

0

Proportion of Heads

1/2 1

Page 5: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Probability Distributions(cont.)

3. General: Normal Distribution Probability distributions are idealized bar graphs or histograms. As we get

more and more tosses, the probability of any one observation falls to zero. Thus, the final result is a bell-shaped curve

Page 6: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Probability Distributions(cont.)

B. Properties of a Normal Distribution 1. Formulas: Mean and Variance

Population Sample

Mean XN

ii

N

1 X

X

n

ii

n

1

Variance

2

2

1

( )X

N

ii

N

s

X X

n

ii

n

2

2

1

1

( )

Page 7: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Probability Distributions(cont.)

2. Note Difference with the Book The population mean is written as:

Variance as:

Example: Two tosses of a coin xpx(),

2 2( )()x px.

Number of Heads x

Probability p(x)

0 1/4 1 1/2 2 1/4

Page 8: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Probability Distributions(cont.)

2. Note Difference with the Book (cont.) So the average, or expected, number of heads in two tosses of a coin is:

0*1/4 + 1*1/2 + 2*1/4 = 1.

Page 9: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Probability Distributions(cont.)

3. Expected Value

E(x) = Average or Mean

Ex2 = Expected Variance

Page 10: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Probability Distributions(cont.)

C. Standard Normal Distribution 1.Definition: a normal distribution with mean 0 and standard deviation 1.

Z-values - Are points on the x-axis that show how many standard deviations that point is away from the mean m.

Z-values

Total Area of Curve = 1

Page 11: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Probability Distributions(cont.)

C. Standard Normal Distribution 2. Characteristics

symmetric Unimodal. continuous distributions

3. Example: Height of people are normally distributed with mean 5'7"

What is the proportion of people taller than 5'7"?

5'7" Z-values

Total Area of Curve = 1

area=1/2

Page 12: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Probability Distributions(cont.)

D. How to Calculate Z-scores Definition: Z-value is the number of standard deviations away from the mean Definition: Z-tables give the probability (score) of observing a particular z-

value.

Page 13: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Probability Distributions(cont.)

1. What is the area under the curve that is greater than 1 ? Prob (Z>1)

The entry in the table is 0.159, which is the total area to the right of 1.

Z-values

Total Area of Curve = 1

z=1

0.159

Page 14: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Probability Distributions(cont.)

2. What is the area to the right of 1.64? Prob (Z > 1.64)

The table gives 0.051, or about 5%.

Z-values

Total Area of Curve = 1

1.64

0.051

Page 15: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Probability Distributions(cont.)

3. What is the area to the left of -1.64? Prob (Z < -1.64)

Z-values

Total Area of Curve = 1

-1.64

0.051

Page 16: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Probability Distributions(cont.)

4. What is the probability that an observation lies between 0 and 1? Prob (0 < Z < 1)

Z-values 1.00

0.159

0.50

0.34

Page 17: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Probability Distributions(cont.)

5. How would you figure out the area between 1 and 1.5 on the graph? Prob (1 < Z < 1.5)

Z-values 1.00

0.159

1.50

0.0670.092

Page 18: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Probability Distributions(cont.)

6. What is the area between -1 and 2? Prob (-1 < Z < 2) P (-1<Z<0) = .341 P (0<z<2) = .50-.023 =.477 0.477 + 0.341 = .818

Z-values-1.00 2.00

0.023

0.159

0.341 0.477

Page 19: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Probability Distributions(cont.)

7. What is the area between -2 and 2? Prob(-2<Z<2) 1- Prob (Z< -2) - Prob (Z>2) = 1 - .023 - .023 = .954

Z-values-2.00 2.00

0.023

0.023

Page 20: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Probability Distributions(cont.)

E. Standardization 1. Standard Normal Distribution -- is a very special case where the

mean of distribution equals 0 and the standard deviation equals 1.

Z-values

Page 21: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Probability Distributions(cont.)

2. Case 1: Standard deviation differs from 1 For a normal distribution with mean 0 and some standard deviation , you

can convert any point x to the standard normal distribution by changing it to x/.

Z-values-2.00 2.00-2.00' 2.00'

SD=1

SD=3

Page 22: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Probability Distributions(cont.)

3. Case 2: Mean differs from 0 So starting with any normal distribution with mean and standard deviation 1, you can convert to a

standard normal by taking x- and using this as your Z-value. Now, what would be the area under the graph between 50 and 51?

Prob (0<Z<1) = .341

Z-values x=51

SD=1

Page 23: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Probability Distributions(cont.)

4. General Case: Mean not equal to 0 and SD not equal to 1 Say you have a normal distribution with mean & standard deviation . You can convert any point x in that distribution to the same point in the standard normal by computing

This is called standardization. The Z-value corresponds to x. The Z-table lets you look up the Z-Score of any number.

Zx.

Page 24: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Probability Distributions(cont.)

5. Trout Example: a. The lengths of trout caught in a lake are normally distributed with mean 9.5" and standard deviation 1.4". There is a law that you can't keep any fish below 12". What percent of the trout is this? (Can keep above 12)

Step 1: Standardize Find the Z-score of 12: Prob (x>12)

Step 2: Find z-score Find Prob (Z>1.79) Look up 1.79 in your table; only .037, or about 4% of the fish could be kept.

12 - 9.5Z = --------- = 1.79. 1.4

Page 25: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Probability Distributions(cont.)

5. Trout Example (cont.): b. Now they're thinking of changing the standard to 10" instead of 12". What proportion of fish could be kept under the new limit?

Standardize Prob (x>10)

Step 2: Find z-score Prob (Z>.36) In your tables, this gives .359, or almost 36% of the fish could be kept under the new law.

10 - 9.5Z = --------- = 0.36. 1.4

Page 26: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Joint Distributions A. Probability Tables

1. Example: Toss a coin 3 times. How many heads and how many runs do we observe? Def: A run is a sequence of one or more of the same event in a row Possible outcomes

Toss Probability Heads x

Runs y

TTT 1/8 0 1 TTH 1/8 1 2 THT 1/8 1 3 THH 1/8 2 2 HTT 1/8 1 2 HTH 1/8 2 3 HHT 1/8 2 2 HHH 1/8 3 1

Page 27: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Joint Distributions (cont.)

2. Joint Distribution Table

Runs Heads y x

1 2 3

0 1/8 0 0 1/8 1 0 1/4 (2/8) 1/8 3/8 2 0 1/4 (2/8) 1/8 3/8 3 1/8 0 0 1/8

1/4 1/2 1/4 1

Page 28: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Joint Distributions (cont.)

3. Definition: The joint probability of x and y is the probability that both x and y occur.

p(x,y) = Pr(X and Y)

p(0, 1) = 1/8, p(1, 2) = 1/4, and p(3, 3) = 0.

Page 29: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Joint Distributions (cont.)

B. Marginal Probabilities 1. Def: Marginal probability is the sum of the rows and columns. The overall probability of an event occurring.

So the probability that there are just 1 head is the prob of 1 head and 1 runs + 1 head and 2 runs + 1 head and 3 runs

= 0 + 1/4 + 1/8 = 3/8

px pxyy

() (,).

Page 30: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Joint Distributions (cont.)

C. Independence A and B are independent if P(A|B) = P(A).

P A BP A B

P B( | )

( & )

( ) ;

P A P A B( ) ( | ) ,

P A B P A P B( , ) ( ) ( ) .

Page 31: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Joint Distributions (cont.)

Are the # of heads and the # of runs independent?

# Runs y

# heads x

1 2 3 marg dist

0 1/8

1 3/8

2 3/8

3 1/8

marg dist 1/4 1/2 1/4 1

Page 32: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Correlation and Covariance A. Covariance

1.Definition of Covariance Which is defined as the expected value of the product of the differences from the means.

x y x Y

i x i Yi

N

i x i Y

E X Y

X Y

N

X Y p x y

, ( )( )

( )( )

( )( ) ( , ).

1

Page 33: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Correlation and Covariance

2.Graph

Page 34: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Correlation and Covariance B. Correlation

1.Definition of Correlation

x y

x y ySD,

* Covariance

SDx

Page 35: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Correlation and Covariance

2. Characteristics of Correlation

-1 1 if =1 then

x

y

Page 36: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Correlation and Covariance

2. Characteristics of Correlation (cont.)

if = -1

x

y

Page 37: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Correlation and Covariance

2. Characteristics of Correlation (cont.)

if = 0

x

y

Page 38: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Correlation and Covariance

2. Characteristics of Correlation (cont.)Why?

x y

x y

x i y

i xi

n

i yi

n

y

x

N

y

N

N,

)( )

( ) ( )

(x i

2

1

2

1

Page 39: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Sample HomeworkGET /FILE 'gss91.sys'.The SPSS/PC+ system file is read from file gss91.sys The SPSS/PC+ system file contains 1517 cases, each consisting of 203 variables (including system variables). 203 variables will be used in this session.------------------------------- COMPUTE AFFAIRS = XMARSEX.RECODE AFFAIRS (0,5,8,9 = SYSMIS) (1,2 = 0) (3,4 = 1).VALUE LABELS AFFAIRS 0 'BAD' 1 'OK'.The raw data or transformation pass is proceeding 1517 cases are written to the compressed active file. ***** Memory allows a total of 10345 Values, accumulated across all Variables. There also may be up to 1293 Value Labels for each Variable.

 

Page 40: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Sample Homework------------------------------------------------------------------------------- AFFAIRS Valid Cum Value Label Value Frequency Percent Percent Percent BAD .00 870 57.4 90.2 90.2 OK 1.00 94 6.2 9.8 100.0 . 553 36.5 Missing ------- ------- ------- Total 1517 100.0 100.0 -------------------------------------------------------------------------------

Page 41: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Sample Homework

AFFAIRS Mean .098 Std err .010 Median .000 Mode .000 Std dev .297 Variance .088 Kurtosis 5.398 S E Kurt .157 Skewness 2.718 S E Skew .079 Range 1.000 Minimum .000 Maximum 1.000 Sum 94.000 Valid cases 964 Missing cases 553

Page 42: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Sample HomeworkCOMPUTE MONEY = INCOME91. RECODE MONEY (0,22,98,99 = SYSMIS) (1 THRU 15 = 0) (16 THRU 21 = 1). VALUE LABELS MONEY 0 'LOW' 1 'HIGH'. FREQUENCIES /VARIABLES AFFAIRS MONEY /STATISTICS ALL. MONEY Valid Cum Value Label Value Frequency Percent Percent Percent LOW .00 787 51.9 57.5 57.5 HIGH 1.00 581 38.3 42.5 100.0 . 149 9.8 Missing ------- ------- ------- Total 1517 100.0 100.0 -------------------------------------------------------------------------------

Page 43: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Sample HomeworkMONEY Mean .425 Std err .013 Median .000 Mode .000 Std dev .494 Variance .245 Kurtosis -1.910 S E Kurt .132 Skewness .305 S E Skew .066 Range 1.000 Minimum .000 Maximum 1.000 Sum 581.000 Valid cases 1368 Missing cases 149

Page 44: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Sample HomeworkCROSSTABS /TABLES=MONEY BY AFFAIRS /CELLS /STATISTICS=CORR. Memory allows for 7,021 cells with 2 dimensions for general CROSSTABS. ------------------------------------------------------------------------------- MONEY by AFFAIRS

AFFAIRS MONEY

BAD OK COLUMN TOTAL

LOW COUNT 450 55 505 ROW % 89.1 10.9 COL. % 57.0 66.3 TOTAL % 51.6 6.3 57.9 HIGH COUNT 339 28 367 ROW % 92.4 7.6 COL. % 43.0 33.7 TOTAL % 38.9 3.2 42.1 ROW 789 83 872 TOTAL 90.5 9.5 100.0

Page 45: Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.

Sample Homework

Approximate Statistic Value ASE1 T-value Significance -------------------- --------- -------- ------- ------------ Pearson's R -.05487 .03267 -1.62089 .10540 Spearman Correlation -.05487 .03267 -1.62089 .10540 Number of Missing Observations: 645