Methods of Research and Enquiry Basic Statistics and Correlational Research by Dr. Daniel Churchill.

Methods of Research and Enquiry

Basic Statistics and Correlational Researchby Dr. Daniel Churchill

R&

D i

n I

T i

n E

du

cati

on

What is statistics? Statistics is a body of mathematical

techniques or processes for gathering, organizing, analyzing, and interpreting numerical data.

R&

D i

n I

T i

n E

du

cati

on

Basic Concepts

Measurement – assigning a number of observation based on certain rules

Variable – a measured characteristic (e.g., age, grade level, test score, height, gender)

A constant – a measure that has only one value

Continuous variable – can have a wide range of values (e.g., height)

Discrete variables – have a finite number of distinct values between any two given points (age between 30-50)

R&

D i

n I

T i

n E

du

cati

on

Basic Concepts

Independent variables -- purported causes

Dependent variables -- purported effects

Two instructional strategies, co-operative groups and traditional lectures, were used during a three week social studies unit. Students’ exam scores were analyzed for differences between the groups. The independent variable is the instructional

approach (of which there are two levels) The dependent variable is the students’

achievement

Obj. 2.3

R&

D i

n I

T i

n E

du

cati

on

Basic Concepts

A population – entire group of elements that have at least one characteristics in common

A sample – a small group of observations selected from the total population

A parameter – a measure of a characteristics of an entire population

A statistic - a measure of a characteristics of a sample

Statistics – a method

R&

D i

n I

T i

n E

du

cati

on

Basic Concepts

Descriptive statistics – classify, organize, and summarize numerical data about a particular group of observations (e.g., a number of students in HK, the mean maths grade, ethnic make-up of students)

Inferential statistics – involve selecting a sample from a defined population and studying it.

These two statistics are not mutually exclusive

R&

D i

n I

T i

n E

du

cati

on

Probability and Level of Significance

Studies yield statistical results which are used to decide whether to retain or reject the null hypothesis

The decision is made in term of probability, not certainty

Once we obtain sample statistic, we compare the obtained value to the appropriate critical value (from tables)

Mostly, the probability level of 5% (p of .05) is considered statistically significant

R&

D i

n I

T i

n E

du

cati

on

Data Collection Measurement scales

Nominal – categories Gender, ethnicity, etc.

Ordinal – ordered categories Rank in class, order of finish, etc.

Interval – equal intervals Test scores, attitude scores, etc.

Ratio – absolute zero Time, height, weight, etc.

Obj. 2.1

R&

D i

n I

T i

n E

du

cati

on

Measurement Scales

R&

D i

n I

T i

n E

du

cati

on

Watch videos from Learner.orghttp://learner.org/resources/series158.html

Watch Video 5. Variation About the Mean

R&

D i

n I

T i

n E

du

cati

on

Statistical measures Measures of central tendency or averages

Mean Median -- a point in an array, above & below

which one-half of the scores fall Mode -- the score that occurs most frequently

in a distribution

R&

D i

n I

T i

n E

du

cati

on

Organizing Data

Source: http://www.learnactivity.com/lo/

R&

D i

n I

T i

n E

du

cati

on

Example

37, 58, 74, 54, 67, 78, 48,

42, 61, 42, 57, 61, 45, 63,

52, 65, 39, 59, 51, 63, 48,

58, 73, 56, 69, 56, 72, 54,

66, 72, 63

Here is a set of maths test scores (raw scores) for a class of 31 students

R&

D i

n I

T i

n E

du

cati

on

Organizing measurements

Steam Leaf

3 7 9

4 2 2 5 8 8

5 1 2 4 4 6 6 7 8 8 9

6 1 1 3 3 3 5 6 7 9

7 2 2 3 4 8

37, 58, 74, 54, 67, 78, 48,42, 61, 42, 57 , 61, 45, 63,52, 65, 39, 59, 51, 63 , 48,58, 73, 56, 69, 56, 72, 54,

66, 72, 63

R&

D i

n I

T i

n E

du

cati

on

Organizing measurements –frequency tables

Test Score Frequency MidpointCumulative Frequency

Percent Frequency

Cumulative Percentage

36-40 2 38 30 7 100

41-45 3 43 28 10 93

46-50 2 48 25 7 83

51-55 4 53 24 13 80

56-60 6 58 20 20 67

61-65 6 63 14 20 47

66-70 3 68 8 10 27

71-75 4 73 5 13 17

76-80 1 78 1 3 3

--------------N=30

37, 58, 74, 54, 67, 78, 48,42, 61, 42, 57, 61, 45, 63,52, 65, 39, 59, 51, 63, 48,58, 73, 56, 69, 56, 72, 54,

66, 72, 63

R&

D i

n I

T i

n E

du

cati

on

Organizing measurements – Histogram

0

1

2

3

4

5

6

7

36-40 41-45 46-50 51-55 56-60 61-65 66-70 71-75 76-80

Test Score

Fre

qu

en

cy

37, 58, 74, 54, 67, 78, 48,42, 61, 42, 57 , 61, 45, 63,52, 65, 39, 59, 51, 63 , 48,58, 73, 56, 69, 56, 72, 54,

66, 72, 63

R&

D i

n I

T i

n E

du

cati

on

Organizing measurements – Mean

The mean, the median and the mode The mean X=

X = mean Σ = sum of X = scores in a distribution N = number of scores

It is the base from which many important measures are computed.

X

N

X = 37 + 58 + 74 + … + 72 + 63

31

= 58

37, 58, 74, 54, 67, 78, 48,42, 61, 42, 57 , 61, 45, 63,52, 65, 39, 59, 51, 63 , 48,58, 73, 56, 69, 56, 72, 54,

66, 72, 63

R&

D i

n I

T i

n E

du

cati

on

Organizing measurements – Mode and Median

37, 58, 74, 54, 67, 78, 48,42, 61, 42, 57, 61, 45, 63,52, 65, 39, 59, 51, 63, 48,58, 73, 56, 69, 56, 72, 54,

66, 72, 63

37, 58, 74, 54, 67, 78, 48,42, 61, 42, 57, 61, 45, 63,52, 65, 39, 59, 51, 63, 48,58, 73, 56, 69, 56, 72, 54,

66, 72, 63

Mode -- the score that

occurs most frequently in a

distribution

Median -- a point in an

array, above & below which

one-half of the scores fall

63

59

R&

D i

n I

T i

n E

du

cati

on

Organizing measurements – Histogram

0

1

2

3

4

5

6

7

36-40 41-45 46-50 51-55 56-60 61-65 66-70 71-75 76-80

Test Score

Fre

qu

en

cy

37, 58, 74, 54, 67, 78, 48,42, 61, 42, 57 , 61, 45, 63,52, 65, 39, 59, 51, 63 , 48,58, 73, 56, 69, 56, 72, 54,

66, 72, 63

MeanMode

Median

R&

D i

n I

T i

n E

du

cati

on

Statistical measures Measures of spread or dispersion

Range -- the difference between the highest and the lowest scores plus one

Standard deviation – average distance from the mean (also see calculator)

Variance – squared standard deviation Z-score -- a number of standard deviations

from the mean

Z=(score-mean)/SD

R&

D i

n I

T i

n E

du

cati

on

Basic Formulas for Sample

Variance = S2= (X-X)2

n

StandardDeviation= S = S2

R&

D i

n I

T i

n E

du

cati

on

Normal Distribution

Source: http://en.wikipedia.org/wiki/Normal_distribution


n

StandardDeviation = S = S2


nVariance = S2= (X-X)2(X-X)2

n

StandardDeviation = S = S2StandardDeviation = S = S2

R&

D i

n I

T i

n E

du

cati

on

Source: http://noppa5.pc.helsinki.fi/koe/flash/histo/histograme.html

R&

D i

n I

T i

n E

du

cati

on

Z-Score Example

z score = z =X-XS

Example, compare a student’s performance on Maths and English tests if the student’s scores, class means and standard deviations for the classes are known

SubjectStudent's

Score Class Mean Class S

English 50 45 5

Maths 68 56 6

5 6zEnglish= =+1

50-45zMaths = = +2

68-56

R&

D i

n I

T i

n E

du

cati

on

Z-Score Example

zEnglish

zMaths

R&

D i

n I

T i

n E

du

cati

on

Z score vs. T score, and Percentile Rank

R&

D i

n I

T i

n E

du

cati

on

Correlational Studies

Attempts to describe the predictive relationships between or among variables The predictor variable is the variable from

which the researcher is predicting The criterion variable is the variable to

which the researcher is predicting

Objectives 10.1 & 10.2

R&

D i

n I

T i

n E

du

cati

on

Relationship Studies General purpose

Gain insight into variables that are related to other variables relevant to educators

Achievement Self-esteem Self-concept

Two specific purposes Suggest subsequent interest in establishing cause

and effect between variables found to be related Control for variables related to the dependent variable

in experimental studies


R&

D i

n I

T i

n E

du

cati

on

Correlational Data

Income/month ($)

Expenditure/month ($)

4000 4000

4000 5000

5000 6000

2000 2000

9000 6000

4000 2000

7000 5000

8000 6000

9000 9000

5000 3000

R&

D i

n I

T i

n E

du

cati

on

Scatter Diagram

30005000

90009000

60008000

50007000

20004000

60009000

20002000

60005000

50004000

40004000


Income/month ($)

30005000

90009000

60008000

50007000

20004000

60009000

20002000

60005000

50004000

40004000


Income/month ($)

2000, 2000

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0 2000 4000 6000 8000 10000

Income

Ex

pe

nd

itu

re

R&

D i

n I

T i

n E

du

cati

on

Source: http://noppa5.pc.helsinki.fi/koe/corr/index.html

R&

D i

n I

T i

n E

du

cati

on

Correlation Coefficients The general rule

+.95 is a strong positive correlation +.50 is a moderate positive correlation +.20 is a low positive correlation (small correlation) -.26 is a low negative correlation -.49 is a moderate negative correlation -.95 is a strong negative correlation

Predictions Between .60 and .70 are adequate for group

predictions Above .80 is adequate for individual predictions

Objective 3.3 & 3.5

R&

D i

n I

T i

n E

du

cati

on

Conducting a Prediction Study Identify a set of variables

Limit to those variables logically related to the criterion

Identify a population and select a sample Identify appropriate instruments for measuring

each variable Ensure appropriate levels of validity and reliability

Collect data for each instrument from each subject Typically data is collected at different points in time

Compute the results Regression coefficient Regression equation

R&

D i

n I

T i

n E

du

cati

on

Hypotheses for Correlation

H0: r = 0

HA: r 0

R&

D i

n I

T i

n E

du

cati

on

Collecting Measurement Instrument – a tool used to collect data Test – a formal, systematic procedure for

gathering information Assessment – the general process of

collecting, synthesizing, and interpreting information

Obj. 3.1 & 3.2

R&

D i

n I

T i

n E

du

cati

on

The Process Participant and instrument selection

Minimum of 30 subjects Instruments must be valid and reliable

Higher validity and reliability requires smaller samples Lower validity and reliability requires larger samples

Design and procedures Collect data on two or more variables for each subject

Data analysis Compute the appropriate correlation coefficient


R&

D i

n I

T i

n E

du

cati

on

Selection of a Test Sources of test information, e.g.,:

Mental Measurement Yearbooks (MMY) Buros Institute

ETS Test Collection ETS Test Collection

R&

D i

n I

T i

n E

du

cati

on

Types of Correlation Coefficients

The type of correlation coefficient depends on the measurement level of the variables Pearson r - continuous predictor and criterion

variables Math attitude and math achievement

Spearman rho – ranked or ordinal predictor and criterion variables Rank in class and rank on a final exam

Phi coefficient – dichotomous predictor and criterion variables Gender and pass/fail status on a high stakes test

Objectives 7.1, 7.2, & 7.3

R&

D i

n I

T i

n E

du

cati

on

Calculating Pearson Correlation Coefficient

zxzy

Nr =

Z-score formula

Raw score formula

NXY-( X)( Y)

(NX2-(X)2) (NY2-(Y)2)r =

R&

D i

n I

T i

n E

du

cati

on

Just for informationCritical Values of the Pearson Product-Moment Correlation

Coefficient: First you determine degrees of freedom (df). For a correlation

study, the degrees of freedom is 2 less than the number of subjects. Use the critical value table to find the intersection of alpha .05 (see columns) and 25 degrees of freedom (see rows). The value found at the intersection (.381) is the minimum correlation coefficient needed to confidently state 95 times out of a hundred that the relationship you found with your subjects exists in the population from which they were drawn.

If the absolute value of your correlation coefficient is above .381, you reject your null hypothesis (there is no relationship) and accept the alternative hypothesis: e.g., there is a statistically significant relationship between arm span and height, r (25) = .87, p < .05.

If the absolute value of your correlation coefficient were less than .381, you would fail to reject your null hypotheses: There is not a statistically significant relationship between arm span and height, r (25) = .12, p > .05.

Source: http://www.gifted.uconn.edu/siegle/research/Correlation/alphaleve.htm

R&

D i

n I

T i

n E

du

cati

on

Prediction and Regression

Source: http://noppa5.pc.helsinki.fi/koe/corr/index.html

The position of the line is determined by “b” or the slope (the angle), and “a” of the interceptor (the point where the line intersects with Y-axis).

Y= bX + a

R&

D i

n I

T i

n E

du

cati

on

Other Correlation Analyses

Multiple Regression Two or more variables are used to predict

one criterion variable Cannonical correlation

An extension of multiple regression in which more than one predictor variable and more than one criterion variable are used

Factor analysis A correlational analysis used to take a large

number of variables and group them into a smaller number of clusters of similar variables called factors

R&

D i

n I

T i

n E

du

cati

on

References

Gay, L. R., Mills, G. E., & Airasian, P. (2006). Educational Research: Competencies for Analysis and Applications. Upper Saddle River, N.J. : Pearson/Merrill Prentice Hall.

Ravid, R. (2000). Practical statistics for educators. (2nd ed). New York, NY.: University Press of America, Inc.

Methods of Research and Enquiry Basic Statistics and Correlational Research by Dr. Daniel Churchill.

Documents

Transcript of Methods of Research and Enquiry Basic Statistics and Correlational Research by Dr. Daniel Churchill.