Presentation on Statistics for Research Lecture 7.

46
Presentation on Statistics for Research Lecture 7
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Presentation on Statistics for Research Lecture 7.

Page 1: Presentation on Statistics for Research Lecture 7.

Presentation on Statistics for

Research

Lecture 7

Page 2: Presentation on Statistics for Research Lecture 7.

Contents

What is Statistics?- its scope Is Statistics Science or Arts?- Debatable Types of Data Presentation of Data Measure of Central Tendency Measures of Variability Chi square test T test for testing difference between

two means

Page 3: Presentation on Statistics for Research Lecture 7.

What is Statistics?

”Statistics is a body of methods or tools for obtaining knowledge”

That is Statistics is a tool for obtaining knowledge.

Example : correlation coefficient between height and weight is + 8.5

Page 4: Presentation on Statistics for Research Lecture 7.

Functions of statistics:

•presents facts in definite form

•Simplifies huge number of figures and facilitates analysis

•Helps in formulating and testing hypothesis• helps in prediction

.

Page 5: Presentation on Statistics for Research Lecture 7.

Scope of Statistics:

Vast, unlimited and ever increasing in

e.g. Biostatistics, Industrial statistics, Informatics, Design of experiments in agricultural production, Demography, Queuing Theory, Stochastic Process, psychology, sociology, public administration etc.

Page 6: Presentation on Statistics for Research Lecture 7.

Types of Data

There are three types of data mainly:

1. Cross Sectional, 2. Time Series and3. Panel data

Page 7: Presentation on Statistics for Research Lecture 7.

Cross Sectional Data:Cross-sectional data refer to observations of many individuals (subjects, objects) at a given time.

Example:Gross annual income for each of 1000 randomly chosen households in Dhaka City for the year 2009

Page 8: Presentation on Statistics for Research Lecture 7.

Example of cross-section data

Income data (,000 Tk) of 10 persons in year 2000.

Person

A

Person

B

Person

C

Person

D

Person

E

Person

F

Person

G

Person

H

Person

I

Person

J

Person

K

234 210 187 342 124 234 321 123 128 187 301

Page 9: Presentation on Statistics for Research Lecture 7.

Time series data Data:

Time series data also called Longitudinal data refer to observations of a given unit made over time.

Page 10: Presentation on Statistics for Research Lecture 7.

Example of Time Series Data: Overtime (10 years) Income data for 1 person (in ,000 )

.

Year Person X

1991 129

1992 131

1993 150

1994 170

1995 187

1996 293

1997 209

1998 210

1999

2000

213

240

Page 11: Presentation on Statistics for Research Lecture 7.

Example of Time series data

Average gross annual income of, say, 1000 households randomly chosen from Dhaka City for 10 years 1991-2000.

Page 12: Presentation on Statistics for Research Lecture 7.

Panel Data:

A panel data set refers contains observations on a number of units (e.g. subjects, objects) over time. Thus, panel data has characteristics of both time series and cross-sectional data .

Page 13: Presentation on Statistics for Research Lecture 7.

Example of Panel data

Values of the gross annual income for each of 1000 randomly chosen households in Dhaka City collected for each of 10 years from 1991 to 2000. Such data can be represented as a set of double-indexed values {Vij;   i=1,...,10,   j=1,...,1000} .

Page 14: Presentation on Statistics for Research Lecture 7.

Example of Panel Data: Overtime (10 years) Income data for 3 person (in ,000 ) V ij (‘i =1-10, j= 1,2,3. Year Person X

IncomePerson YIncome

Person Z

Income

1991 129 131 87

1992 131 150 93

1993 150 170 70

1994 170 187 34

1995 187 293 87

1996 293 170 93

1997 209 187 70

1998 210 293 87

1999

2000

213

240

209

234

16

54

Page 15: Presentation on Statistics for Research Lecture 7.

Example of Panel Data: Overtime (10 years) Income, Exp, Loan data for 3 person (in ,000 )Vij (i= 1-10, j = Income, Exp, Loan

. Year Person X Income

Person XExpenditure

Person X

Loan

1991 129 131 87

1992 131 150 93

1993 150 170 70

1994 170 187 34

1995 187 293 87

1996 293 170 93

1997 209 187 70

1998 210 293 87

1999

2000

213

240

209

234

16

54

Page 16: Presentation on Statistics for Research Lecture 7.

Example of Panel data

Values of the gross annual income for each of 1000 randomly chosen households in Dhaka City collected for each of 10 years from 1991 to 2000. Such data can be represented as a set of double-indexed values {Vij;   i=1,...,10,   j=1,...,1000} .

Page 17: Presentation on Statistics for Research Lecture 7.

Presentation of data

Pie chart, Bar chart and Column chart

export quantity by products of year 2010

Series1, 125, 8%

Series1, 800, 55%

Series1, 325, 22%

Series1, 225, 15%

tea

RMG

Jute

others

export quantity

125

800

325

225

0 500 1000

tea

RMG

Jute

others

export quantity

export of 2010

0

500

1000

tea RMG Jute others

products

qu

anti

ty

Series1

Page 18: Presentation on Statistics for Research Lecture 7.

Pie chart Example

export value by products of year 2010

8%

55%

22%

15%

tea

RMG

Jute

others

Page 19: Presentation on Statistics for Research Lecture 7.

Bar chart Example Projected export value in crore dollar

125

800

325

225

0 500 1000

tea

RMG

Jute

others

export quantity

Page 20: Presentation on Statistics for Research Lecture 7.

Column chart Example

Projected export

0

500

1000

tea RMG Jute others

products

Series1

Page 21: Presentation on Statistics for Research Lecture 7.

MEASURES OF CENTRAL TENDENCY

What is Measures of Central Tendency?Measures of Central Tendency are -

Mean, Median, Mode, Quartile, Percentile calculations

Page 22: Presentation on Statistics for Research Lecture 7.

Measures of Central Tendency

Mean: For a population or a sample, the mean is

the arithmetic average of all values.

The mean is a measure of central tendency.

e.g. mean age of CSC students is say 38

Page 23: Presentation on Statistics for Research Lecture 7.

The mean, symbolized by X, is the sum of the weights of students divided by the number of students whose weights have been taken.

The following formula both defines and describes the procedure for finding the mean

= X1 + X2 + X3 / 3

Page 24: Presentation on Statistics for Research Lecture 7.

32,35,36,36, 37,38,38,39,39,39,40,40,42,45

Then the mean denoted as :

Page 25: Presentation on Statistics for Research Lecture 7.

Meaning of Measures of Central Tendency

• Maximum observation at the mean level and then gradually declining on both sides.

2.5% 2.5%Mean Height in cm

15% 25%

Values have tendency to cluster around the central /mean values

Page 26: Presentation on Statistics for Research Lecture 7.

Median:

The median, symbolized by Md, is the value which lies in the middle point of the distribution so that half the values are above the median and half of the values are below the median.

Computation of the median is relatively straightforward

Page 27: Presentation on Statistics for Research Lecture 7.

.

The first step is to serially write the values (called rank order of the values) from lowest to highest.

Then the Median is simply the middle number. In the case below, the Median would be 38 because there are 15

values all together with 7 values larger and 7 values smaller than the median.

32 32 35 36 36 37 38

38

39 39 39 40 40 45 46

Page 28: Presentation on Statistics for Research Lecture 7.

Median in case of even number of values

Median is calculated as mid-point of the two middle numbers.38 + 39 / 2 = 38.5

32 35 36 36 37 38

38 39

39 39 40 40 42 45

Page 29: Presentation on Statistics for Research Lecture 7.

Mode: Mode is a value that occurs most in a population or a sample. It could be considered as the single value most typical of all the values.

Page 30: Presentation on Statistics for Research Lecture 7.

Here Mode is 39

32 35 36 36 37 38

38 39

39 39 40 40 42 45

Page 31: Presentation on Statistics for Research Lecture 7.

Shape of distribution if mode is higher than mean and medianooooooo.

Meaning of Measures of Central Tendency

• Maximum observation at the mean level and then gradually declining on both sides.

Population’s distribution

2.5% 2.5%Mean Height in cm

15%

Page 32: Presentation on Statistics for Research Lecture 7.

Example: For a set of numbers 1,2,3,7,3,8,9,5,3,8,9

the mode is 3 which occurs most

NB. Some population may have more than one mode and could be bi-modal.

Page 33: Presentation on Statistics for Research Lecture 7.

Percentiles and Quartiles

Percentiles are like quartiles except that percentiles divide the set of data into 100 equal parts and quartiles divide the set of data into 4 equal parts.

Page 34: Presentation on Statistics for Research Lecture 7.

Example

. Research methodology Exam numbers

Frequency

No. of students

Cumulative frequency

Cum. No. of students

76-80 9 9

81-85 21 30

86-90 18 48

91-95 12 60

Page 35: Presentation on Statistics for Research Lecture 7.

First Quartile = 25th percentile

In total 60 marks, the first quartile will be located (25% of 60) = 15

15 values from the bottom First quartile is the interval 81-85

Similarly 3rd quartile (75% of 60) = 45 3rd quartile is the interval 86-90

Page 36: Presentation on Statistics for Research Lecture 7.

Percentile rank of the student who got 90 marks

Percentile rank = (number of students got below 90 / Total no. of students) x 100

= (47 /60) x 100 = 78th

Page 37: Presentation on Statistics for Research Lecture 7.

Measures of Variability Variability refers to the spread or dispersion of

values scores.

A distribution of scores is said to be highly variable if the scores differ widely from one

another. There are Three measures of dispersion Range Variance Standard Deviation

Lecture 8

Page 38: Presentation on Statistics for Research Lecture 7.

Importance of Variability

Following two data have got same mean

But do they reflect the same information?

No Data B has more

number of under-weight babies

Data A weight of new born baby (pound

Data B weight of new born baby (pound)

4 3

5 3

6 9

Average

5

Average

5

Page 39: Presentation on Statistics for Research Lecture 7.

Range Range is the difference between the largest value and smallest value. Range= Highest value-lowest value Distribution 1: 32 35 36 36 37 38 40 42 42 43 43 45

Distribution 2: 32 32 33 33 33 34 34 34 34 34 35 45

Although the range is (45-32) 13 for both the distribution but doesn’t give true picture about the variability.

Page 40: Presentation on Statistics for Research Lecture 7.

Measures of Variability (Variance and Standard Deviation)

: The variance, symbolized by "s2", is a measure of variability.

The variance symbolized by "s2 ", is the average of sum of squares of the deviation.

2S

Page 41: Presentation on Statistics for Research Lecture 7.

Formula of Standard Deviation

Standard Deviation is the positive Square root of Variance

2

1

)(

N

XXiS

Page 42: Presentation on Statistics for Research Lecture 7.

Example of Variance and Standard Deviation

Series 1 : 32 36 37 37 38 40 42 42 43 43 45 45

Mean X = 480/12 = 40Student No.

12 3 4 5 6 7 8 9 10 11 12

Weights of students

kg

32

36 37 37 38 40 42 42 43 43 45 45

Xi - X

-8

-4 -3 -3 -2 0 2 2 3 3 5 5

(Xi –X) 2 6416 9 9 4 0 4 4 9 9 25 25

Sum of squares = 186

Page 43: Presentation on Statistics for Research Lecture 7.

Therefore Variance S2=186 / n-1 = 186 /11 = 16.9

Standard Deviation = 4.11

Standard deviation 4.11 means average variation of the series of values from the mean value is 4.11

Page 44: Presentation on Statistics for Research Lecture 7.

Chi Square Test

Tests difference in qualitative values For example, whether people have a definite

taste for colored cars compared to white cars

Suppose in Bangladesh 1000 cars are sold in a month. If there was no preference for colored cars,

then:

Page 45: Presentation on Statistics for Research Lecture 7.

Chi square Test; Whether Bangladeshi people have a choice for colored cars.

Types of Colors

Observed no.(O)

Expected no.(E)

O-E (O-E)**2 (O-E)**2/E

White 400 500 -100 10000 20

Colored 600 500 100 10000 20

Total = 40

From Chi-square table, find value for 40 for n-1 = 2-1=1 degree of freedom.

Reject null hypothesis (of no preference) if Calculated Value greater than Tabulated value at 99% or 95% level of significance.

Page 46: Presentation on Statistics for Research Lecture 7.

The End