Dr. Engr. Sami ur Rahman Data Analysis Lecture 3: Data Distribution Normal Distribution.

20
Dr. Engr. Sami ur Rahman Data Analysis Lecture 3: Data Distribution Normal Distribution

Transcript of Dr. Engr. Sami ur Rahman Data Analysis Lecture 3: Data Distribution Normal Distribution.

Page 1: Dr. Engr. Sami ur Rahman Data Analysis Lecture 3: Data Distribution Normal Distribution.

Dr. Engr. Sami ur Rahman

Data AnalysisLecture 3: Data DistributionNormal Distribution

Page 2: Dr. Engr. Sami ur Rahman Data Analysis Lecture 3: Data Distribution Normal Distribution.

University Of Malakand | Department of Computer Science | UoMIPS | Dr. Engr. Sami ur Rahman | 2

Introductory Statistics

Dispersion

The Normal Distribution Curve

Variability

Calculating a Mean and a Standard Deviation

Interpreting Distributions

Page 3: Dr. Engr. Sami ur Rahman Data Analysis Lecture 3: Data Distribution Normal Distribution.

University Of Malakand | Department of Computer Science | UoMIPS | Dr. Engr. Sami ur Rahman | 3

Dispersion

Dispersion – The distribution of values around some central value, such as an average.

Distribution of a variable tells us what values & how often (frequency of a variable)

Page 4: Dr. Engr. Sami ur Rahman Data Analysis Lecture 3: Data Distribution Normal Distribution.

Distribution

University Of Malakand | Department of Computer Science | UoMIPS | Dr. Engr. Sami ur Rahman | 4

St id age St id Age1 18 11 202 20 12 193 19 13 204 19 14 225 20 15 196 20 16 217 21 17 128 21 18 189 21 19 22

10 23 20 20

Page 5: Dr. Engr. Sami ur Rahman Data Analysis Lecture 3: Data Distribution Normal Distribution.

Distribution (Frequency)

University Of Malakand | Department of Computer Science | UoMIPS | Dr. Engr. Sami ur Rahman | 5

St id age St id Age1 18 11 202 20 12 193 19 13 204 19 14 225 20 15 196 20 16 217 21 17 178 21 18 189 21 19 22

10 23 20 20

Age Frequency

23 1

17 118 219 420 6

22 221 3

Mean?

Median?

Mode?

Page 6: Dr. Engr. Sami ur Rahman Data Analysis Lecture 3: Data Distribution Normal Distribution.

Histogram

University Of Malakand | Department of Computer Science | UoMIPS | Dr. Engr. Sami ur Rahman | 6

Age Frequency17 118 219 420 621 422 223 1

17 18 19 20 21 22 23 24 More0

1

2

3

4

5

6

7

Histogram

Frequency

Bin

Fre

qu

ency

Page 7: Dr. Engr. Sami ur Rahman Data Analysis Lecture 3: Data Distribution Normal Distribution.

The Normal Distribution Curve

00.0050.01

0.0150.02

0.025

0 20 40 60 80 100

It is bell-shaped and symmetrical about the mean

The mean, median and mode are equal

Mean, Median, Mode

It is a function of the mean and the standard deviation

University Of Malakand | Department of Computer Science | UoMIPS | Dr. Engr. Sami ur Rahman | 7

Page 8: Dr. Engr. Sami ur Rahman Data Analysis Lecture 3: Data Distribution Normal Distribution.

Examples of Normal Distribution

Examples of normal distribution in everyday life many:

• Height

• Weight

• Shoe size

• Exam marks

University Of Malakand | Department of Computer Science | UoMIPS | Dr. Engr. Sami ur Rahman | 8

Page 9: Dr. Engr. Sami ur Rahman Data Analysis Lecture 3: Data Distribution Normal Distribution.

Variation or Spread of Distributions

Measures that indicate the spread of scores:

Range

Standard Deviation

University Of Malakand | Department of Computer Science | UoMIPS | Dr. Engr. Sami ur Rahman | 9

Page 10: Dr. Engr. Sami ur Rahman Data Analysis Lecture 3: Data Distribution Normal Distribution.

Variation or Spread of Distributions

Range It compares the minimum score with the maximum

score Max score – Min score = Range It is a crude indication of the spread of the scores

because it does not tell us much about the shape of the distribution and how much the scores vary from the mean

University Of Malakand | Department of Computer Science | UoMIPS | Dr. Engr. Sami ur Rahman | 10

Page 11: Dr. Engr. Sami ur Rahman Data Analysis Lecture 3: Data Distribution Normal Distribution.

Variation or Spread of Distributions

Standard Deviation It tells us what is happening between the minimum

and maximum scores It tells us how much the scores in the data set vary

around the mean

University Of Malakand | Department of Computer Science | UoMIPS | Dr. Engr. Sami ur Rahman | 11

Page 12: Dr. Engr. Sami ur Rahman Data Analysis Lecture 3: Data Distribution Normal Distribution.

Calculating Mean and Standard Deviation

Absolute SquaredData Deviation Deviation Deviation

x x - Mean |x - Mean| (x-Mean)²10 -20 20 40020 -10 10 10030 0 0 040 10 10 10050 20 20 400

Sums 150 0 60 1000Means 30 0 12 200

Variance

14.1421356Standard deviation = Variance

University Of Malakand | Department of Computer Science | UoMIPS | Dr. Engr. Sami ur Rahman | 12

Page 13: Dr. Engr. Sami ur Rahman Data Analysis Lecture 3: Data Distribution Normal Distribution.

University Of Malakand | Department of Computer Science | UoMIPS | Dr. Engr. Sami ur Rahman | 13

Standard deviation(s)

Used as a measure of spread when mean=center

Units of s=same as data units

s always positive

Higher s->more spread

s=0->no spread -> all observations equal

s affected by outliers

Page 14: Dr. Engr. Sami ur Rahman Data Analysis Lecture 3: Data Distribution Normal Distribution.

University Of Malakand | Department of Computer Science | UoMIPS | Dr. Engr. Sami ur Rahman | 14

Standard Deviation

A measure of dispersion around the mean, calculated so that approximately 68 percent of the cases will lie within plus or minus one standard deviation from the mean, 95 percent within two, and 99.9 percent within three standard deviations.

This is often referred to as the 68-95-99.7 rule

When to Use Standard Deviation

When you need to determine how much a set of scores vary from each other.

Page 15: Dr. Engr. Sami ur Rahman Data Analysis Lecture 3: Data Distribution Normal Distribution.

Interpreting Distributions

0

0.005

0.01

0.015

0.02

0.025

0.03

0 10 20 30 40 50 60 70 80 90 100

Mean = 50

Std Dev = 15

34%

14%

2%34%

14%

2%

0 +1 +2 +3-2-3 -1s d

50 80 955 3520 65scores

50% 84% 98% 100%2%0% 16%rank

University Of Malakand | Department of Computer Science | UoMIPS | Dr. Engr. Sami ur Rahman | 15

Page 16: Dr. Engr. Sami ur Rahman Data Analysis Lecture 3: Data Distribution Normal Distribution.

Interpreting Distributions

School A School B School CMean 50 60 70S.d. 10 10 10

0

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0 20 40 60 80 100 120

University Of Malakand | Department of Computer Science | UoMIPS | Dr. Engr. Sami ur Rahman | 16

Page 17: Dr. Engr. Sami ur Rahman Data Analysis Lecture 3: Data Distribution Normal Distribution.

Interpreting Distributions

School A School B School CMean 50 50 50S.d. 10 13 16

0

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0 20 40 60 80 100 120

University Of Malakand | Department of Computer Science | UoMIPS | Dr. Engr. Sami ur Rahman | 17

Page 18: Dr. Engr. Sami ur Rahman Data Analysis Lecture 3: Data Distribution Normal Distribution.

Interpreting Distributions

National Mean School A School BMean 55 60 40S.d. 10 15 15

0

0.005

0.01

0.015

0.020.025

0.03

0.035

0.04

0.045

-20 0 20 40 60 80 100 120

University Of Malakand | Department of Computer Science | UoMIPS | Dr. Engr. Sami ur Rahman | 18

Page 19: Dr. Engr. Sami ur Rahman Data Analysis Lecture 3: Data Distribution Normal Distribution.

University Of Malakand | Department of Computer Science | UoMIPS | Dr. Engr. Sami ur Rahman | 19

Example-Do women study more than men?Variable: minutes studied on a typical weeknight of a first-

year university classRandom samples of 30 women and 30 men:Women:180,120,150, 200, 120,90,120,180,120, 150, 60,

240,180,120,180,180,120, 180, 360, 240, 180, 150, 180, 115,240, 170, 150,180,180,120

Men: 90, 90,150,240,30,0, 120,45,120,60,230,200,30,30, 60, 120, 120, 120, 90, 120, 240, 60, 95, 120,200,75,300, 30, 150,180

Page 20: Dr. Engr. Sami ur Rahman Data Analysis Lecture 3: Data Distribution Normal Distribution.

University Of Malakand | Department of Computer Science | UoMIPS | Dr. Engr. Sami ur Rahman | 20

Thanks for your attention