Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research...

34
Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics

description

SAM test development process

Transcript of Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research...

Page 1: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

Ensuring quality and validity of measurement with SAM tests

Kardanova ElenaNational Research University Higher School of Economics

Page 2: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

Outline of presentation

› SAM test development process› Analysis of psychometric quality of test

items and tests› SAM validity study › International expertise of SAM› Localization and adaptation of SAM tests for

use in other countries and cultures

Page 3: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

SAM test development process

Page 4: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

Steps of test development process

Test planning Content analysis Test specification

development Item writing Piloting testing Test construction Test results scaling Test results reporting and

interpretation Test documentation

Page 5: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

SAM theoretical model realization

› Tests in mathematics and Russian language have been developed under SAM model

› Tests have similar structure› Tests are designed for graduates from primary school› Each block includes three test items assigned to levels

1, 2 and 3 that are correspond to the same content area› Dichotomous approach: students get 1 point for a

correct answer and 0 for incorrect (or absent) answer.

Page 6: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

Stages of SAM tests pilotingPre-piloting Purpose – face validity Time recording for each item Sample - 10-20 studentsClinical approbation Purpose – testing the items functioning, detecting mistakes, defining of item difficulty Sample - 50 students per item Data analysis – under classical test theory (CTT) Full-scale approbation Purpose – to check quality of test items and detect problems of item and test functioning Sample – not less than 400-500 students per test form Data analysis – under CTT and IRT

Page 7: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

Analysis of psychometric quality of test items and

tests

Page 8: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

Characteristics of test items under classical test theory

• Item difficulty: proportion of students in the sample who has completed the item • Discrimination: ability of item to

differentiate students with different ability

Page 9: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

Reliability and validity Reliability – characteristic of precision and stability of test results

Validity – characteristic of test information suitability for decision making

Page 10: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

Psychometric quality (CTT)(2012 pilot testing, Mathematics, P&P form, over

5000 forth-graders)

All items have good psychometric quality Items p-values are in the range of (0.16 , 0.98)

Test form 1 Test form 2

Number of examinees 3018 2941Raw score average 26 27Standard deviation 8.37 8.55Average difficulty level 0.59 0.61Avegare dicrimination index 0.44 0.46Average point-biserial coefficient 0.39 0.39 Reliability index (KR20) 0.90 0.91Standard error of measurement 2.61 2.61

Page 11: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

Joint distribution of items difficulties and discrimination indexes (math, test form 1)

Page 12: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

PERSON - MAP - TASKS <more>|<rare> 5 . + .# | | . | . | 4 .# + . | . | .# | . T|T .# | 3 .### + . | .### | M-C-01-1-3 M-D-08-1-3 M-R-05-1-3 .### | M-G-01-1-3  .## | M-M-11-1-3 .###### S| 2 .###### + M-C-03-1-3 .####### | M-D-05-1-3 .####### |S .######## | M-D-03-1-3 .####### | M-M-02-1-3 M-M-08-1-3 M-R-02-1-3 M-R-03-1-3 .############ | 1 ######### + M-M-03-1-3 M-M-06-1-3 .######## M| M-C-05-1-3 M-G-01-1-2 M-M-06-1-2 M-M-08-1-2 .####### | M-R-02-1-2 .######## | M-D-03-1-2 M-M-11-1-2 .########## | M-C-05-1-2 .###### | 0 .##### +M M-C-01-1-2 M-R-03-1-2 .###### | M-D-08-1-2 M-M-03-1-2 .######## | M-R-05-1-2 .##### S| M-C-03-1-2 M-R-02-1-1 .#### | M-M-02-1-2 M-M-06-1-1 .#### | M-D-05-1-2 -1 .## + .### | M-D-03-1-1 M-D-05-1-1 .## | M-C-05-1-1 M-G-01-1-1 .## | .# |S M-M-03-1-1 .# T| M-M-02-1-1 M-M-11-1-1 M-R-03-1-1 -2 . + M-M-08-1-1 . | . | M-C-01-1-1 . | | M-D-08-1-1 -3 . + . | |T . | . | | M-C-03-1-1 -4 + | M-R-05-1-1 | | -5 . + <less>|<frequ> 

IRT analysis: Variable map (math, test form1)

The 2nd level Items

The 1st level Items

The 3rd level Items

Page 13: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

Analysis under IRT: conclusions

› Tests can be considered as essentially unidimensional (Principal component analysis of the standardized residuals (Linacre, 1998; Smith, 2002) was used to confirm the unidimensionality of data)

› Tests have optimal difficulty level and well centered relating to a sample of examinees

› All items demonstrate satisfactory psychometric characteristics and fit the model

SAM tests can be acknowledged as qualitative measurement tool

Page 14: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

SAM validity study

Page 15: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

Description of the SAM validity study

› Validity is the extent to which a test fulfils its purpose

› The Dutch rating system was chosen as a basis for conducting the SAM validity study (Evers, A., 2001)

› SAM validity study was conducted during 2011-2013 SAM pilot testing in different regions of the Russian Federation

Page 16: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

Structure of the SAM validity study› Content validity -

external expertize

› Construct validity - “What does the test measure?” and “Does the test measure the intended concept or does it partly or mainly measure something else?”

Page 17: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

Construct validity

› Сonstruct validity is a matter of the accumulation of research evidence.

› Construct validation research is never completed.

Page 18: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

Evidence for fair test use (DIF analysis on gender)

Test results for both genders (mathematics, test form 1)Females Males

Sample size 1471 1545

Observed raw score: average (SD) 26.7 (8.4) 26.2 (8.3)

Ability estimate: average (SD) 0.76 (1.15) 0.69 (1.11)

The method: t-test and Mantel-Haenzel statistics

Page 19: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

Testing of hypotheses that follow from the theoretical

foundation of the test construct› The first hypothesis:The items of three levels related to the same block and meeting the theoretically-grounded criteria of three levels should be built into a difficulty-based hierarchy

› The second hypothesis:Towards the end of the primary school the syllabus is expected to be acquired on the 2nd, reflexive, level. Acquiring this syllabus on the 3rd, functional, level is expected to happen towards the end of the middle school.

Page 20: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

Distribution of difficulty levels (Math, test form 1)

The first hypothesis: The items of three levels related to the same block and meeting the theoretically-grounded criteria of three levels should be built into a difficulty-based hierarchy.

Page 21: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

Verification of the second hypothesis

› A special study conducted in years 2011-2012. › Research design: in 2011 the tests were

administered on four age groups – students of the 4th, 6th, 8th and 10th grades. One year later the same tests were administered on the same students who were studying at the moment in the 5th, 7th, 9th and 11th grades.

› Testing was done in spring, at the end of academic year.

› Sample included about 100 examinees in each grade.

Page 22: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

The second hypothesis:Towards the end of the primary school the syllabus is expected to be acquired on the 2nd, reflexive, level. Acquiring this syllabus on the 3rd, functional, level is expected to happen towards the end of the middle school. Students distribution of different grades depending

on proficiency level in mathematics

Page 23: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

Criterion validity

Concurrent validity

Predictive validity

Predictive validity shows how well a test can predict future criterion scores. Concurrent criterion validity answers the question how test results are related to a criterion at present.

Page 24: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

SAM predictive validity study: research design

› The study was based on SAM pilot testing in Krasnoyarsk region in spring 2011.

› The total sample was 941 primary schoolers from 12 schools.

› The same students’ marks were gathered one year later (they were studying in the 5th grade at the moment).

Student distribution into proficiency levels (mathematics)

Page 25: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

SAM predictive validity study: Distribution of student marks depending

on student proficiency level (mathematics)

The correlation between the students’ ability score and their school marks is 0.6 and the correlation between their proficiency level and the school mark is 0.56.

Page 26: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

Convergent validity› Convergent validity refers to the degree to which two measures of

construct that theoretically should be related, are in fact related. › To establish convergent validity we used AT test - an instrument of

monitoring of educational achievements in mathematics of primary school students (developed by Russian Academy of Education).

› Among students who completed AT test, students with high test scores were selected.

› The hypothesis tested: the results of these students on SAM tests should be high, most of them should be put into 2nd and 3rd proficiency levels.

Page 27: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

International expertise of SAM 

› Autumn of 2013

› The reviewers:1.Howard T. Everson (Center for Advanced Study

in Education, Graduate School, City University of New York, Professor of Psychology and Senior Research Fellow)

2.Clancy Blair (New York University, Steinhardt School of Culture, Education, and Human Development, Professor of Applied Psychology)

3.Bas Hemker (Netherlands, Cito National Institute for Test Development, Senior Research Scientist)

Page 28: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

The SAM test documents provided for expertise

› Basic: Technical manual Users guide Test specification Math tests Validity study› Additional: SAM Framework Technical report

Page 29: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

International expertise of SAM: conclusions

› SAM toolkit is generally appreciated by experts› All experts point out the scope of the research

aimed at establishing the quality of SAM toolkit and its validation

› Experts call for further research related to SAM application

› Experts stress the need of further research related to SAM validation, particularly longitudinal studies. For instance, the correlation between the findings of SAM research and other cognitive and non-cognitive measurements of students, the analysis of factors which impact the findings, etc.

Page 30: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

Localization and adaptation of SAM tests for use in other

countries and cultures

Page 31: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

Procedure of localization › Dual translation› Verification of translation by national experts› Verification of translation by SAM developers› Ensuring the quality of translated tests:

piloting, analysis of psychometric characteristics of tets items, comparison of items’ characteristics in different languages and cultures, reliability and validity study

Page 32: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

AERA/APA/NCME standards› Standard 6.2. When test developers introduce

significant amendments into the test format, the time frame, the language or the contents it is essential to validate the test and confirm the validity of the localized test, or establish that the validation procedure is impossible or irrelevant

› Standard 13.4. When translating a test from one language or dialect into another, it is required to establish the validity and reliability of the test, meant for a certain linguistic community

› Standard 13.6. If two versions of the test in different languages are expected to feature equivalent, compatible forms, it is required to present confirmation of compatibility and equivalence of the forms.

Page 33: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

Five main sources of potential non-comparability of cross-

cultural results

differences in construct tool differences procedure differences sample differences answering differences

Page 34: Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.

Thank you!

Kardanova [email protected]

Center for Monitoring of the Quality in EducationInstitute of Education National Research University Higher School of Economicshttp://ioe.hse.ru/monitoring/