Rep

26
Reliability and Validity Ma. Benilda L. Adcock

description

reliability and Validity

Transcript of Rep

Page 1: Rep

Reliability and ValidityMa. Benilda L. Adcock

Page 2: Rep

Why do we need to determine the Reliability and validity of the test?

Page 3: Rep

• Without validity and reliability, one cannot test an hypothesis

• Without hypothesis testing, one cannot support a theory

• Without a supported theory, one cannot explain why events occur.

• Without adequate explanation, one cannot develop effective material and non- material technologies, including programs designed for positive social change

Page 4: Rep

Reliability“consistency”

A measure is considered reliable if it would give us the same result over and over again

Correlation-measure how things are related to one another

Correlation Coefficient-the degree of relationship

Page 5: Rep

+1.00 -1.00

+ (positive) Correlation means when one variable goes up, so as the other on

- (negative) Correlation means when one variable goes up, and the other one goes down

Page 6: Rep

“No TEST, no matter how it is designed, is FREE from ERROR”

Page 7: Rep

We have to consider the Standard Error Measurement

Random Error (e.g Mood)

Systematic Error (e.g traffic)

Page 8: Rep

But have you wondered what is the real importance of Standard Error Measurement?

Page 9: Rep

The standard error of measurement (SEm)

estimates how repeated measures of a person on

the same instrument tend to be distributed

around his or her “true” score

X1= T + e1 X2 = T + e2

Page 10: Rep
Page 11: Rep

Let’s say a child in a class took an individual intelligence test that yielded a standard score of 88. The mean of this

test is 100 and the standard deviation is 15. At first glance this score suggests that the child is in the average range of 85 to

115, where 68% of the normative population would score

SEM =10

Page 12: Rep

Most typical confidence intervals are 68%, 90%, or 95%. Respectively, these bands may be interpreted as the range within which a person’s “true” score can be found 68%, 90%, or 95% of the time.

The 68% confidence level is the one most typically reported in evaluation reports. This is often reported in the following manner;

“Given the student’s obtained score of _______, there are two out of three chances that the individual’s true score would fall between_______(low score in range) and_______(high score in range).”

By Denise Bishop (2006)

Page 13: Rep

So what?

The confidence band of the score is that 2 out of 3, this

child’s true score will be between 78 and 98

The smaller the SEM, the more reliable the test is .

Page 14: Rep

Types of Reliability• Test-Retest Reliability

Used to assess the consistency of a measure from one time to another.

Page 15: Rep

• Inter-Rater or Inter-Observer ReliabilityUsed to assess the degree to which different

raters/observers give consistent estimates of the same phenomenon.

Page 16: Rep

• Parallel-Forms ReliabilityUsed to assess the consistency of the results of two

tests constructed in the same way from the same content domain.

Page 17: Rep

• Internal Consistency ReliabilityUsed to assess the consistency of results across

items within a test.

Page 18: Rep

The term validity refers to whether or not a test measures what it intends to measure.

On a test with high validity the items will be closely linked to the test’s intended focus. If a test has poor validity then it does not measure the competencies it ought to.

Like reliability, there are several ways to estimate the validity of a test.

Validity

Page 19: Rep

1. CONTENT VALIDITY

Content validity refers to the connections between the test items and the subject-related tasks. The test should evaluate only the content related to the field of study in a manner sufficiently representative, relevant, and comprehensible.

Page 20: Rep

2. CONSTRUCT VALIDITY

It implies using the construct correctly (concepts, ideas, notions). Construct validity seeks agreement between a theoretical concept and a specific measuring device or procedure. For example, a test of intelligence nowadays must include measures of multiple intelligences, rather than just logical-mathematical and linguistic ability measures.

Page 21: Rep

3. CRITERION-RELATED VALIDITY

Also referred to as instrumental validity, it states that the criteria should be clearly defined by the teacher in advance. It has to take into account other teachers´ criteria to be standardized and it also needs to demonstrate the accuracy of a measure or procedure compared to another measure or procedure which has already been demonstrated to be valid.

Page 22: Rep

4. CONCURRENT VALIDITY

Concurrent validity is a statistical method using correlation, rather than a logical method.

Examinees who are known to be either masters or non-masters on the content measured by the test are identified before the test is administered. Once the tests have been scored, the relationship between the examinees’ status as either masters or non-masters and their performance (i.e., pass or fail) is estimated based on the test. This type of validity provides evidence that the test is classifying examinees correctly. The stronger the correlation is, the greater the concurrent validity of the test is.

Page 23: Rep

5. PREDICTIVE VALIDITY

This is another statistical approach to validity that estimates the relationship of test scores to an examinee's future performance as a master or non-master. Predictive validity considers the question, "How well does the test predict examinees' future status as masters or non-masters?" For this type of validity, the correlation that is computed is based on the test results and the examinee’s later performance. This type of validity is especially useful for test purposes such as selection or admissions.

Page 24: Rep

6. FACE VALIDITY

Like content validity, face validity is determined by a review of the items and not through the use of statistical analyses. Unlike content validity, face validity is not investigated through formal procedures. Instead, anyone who looks over the test, including examinees, may develop an informal opinion as to whether or not the test is measuring what it is supposed to measure. While it is clearly of some value to have the test appear to be valid, face validity alone is insufficient for establishing that the test is measuring what it claims to measure.

Page 25: Rep

7. Convergent Validity

It connotes whether the information from the instrument of a quality to be helpful to plan an

intervention

8. Treatment ValidityIt indicates the degree to which the instrument

provides information that can lead to the development of intervention strategies, including

developing goals and objectives, determining methods and detecting progress.

Page 26: Rep

9. Social Validity

It represents the value and use of the information obtained from the instrument.