Cross-Cultural Comparability of SAM-math results Irina Brun Elena Kardanova National Research...

Cross-Cultural Comparabilityof SAM-math results

Irina BrunElena Kardanova

National Research UniversityHigher School of Economics,

Institute of Education,Moscow, Russia

Higher School of Economics , Moscow, 2014

www.hse.ru

15th Annual AEA-Europe Conference6-8 November 2014

Tallinn, Estonia

1


The Instrument SAM

Student Achievement Monitoring

• Assessment of students at the end of primary school (10 years old).

• SAM tool: Russian language, Mathematics.

• SAM was created for use in Russia, initially not developed as cross-cultural assessment tool.

• Framework: Vygotsky’s sociocultural theory of development.Cognitive growth can be described as a process of internalizing culturally transmitted knowledge, which involves acquisition of generalized schemas of thinking and symbolic systems.

2

Nezhnov, Kardanova, Vasilyeva, Ludlow, 2014


The Instrument: SAM

3

Level of Math Knowledge

Short description

Procedural Child knows specific algorithms and standard procedures that have been directly taught. In problem solving child is mostly oriented towards external (descriptive) features of the problem, which allow him to relate problem to specific category and identify algorithm for this category.

Conceptual Child understands how to solve a whole range of problems related to the same concept, regardless of whether they are formulated in a standard or novel way. The student needs to analyze the meaning of the problem, which may require transforming its description in order to understand how to approach its solution.

Functional Child develops the depth of understanding and conceptual flexibility that will allow him to see a full range of possible mental “moves” within the problem space and identify the sequence of moves that leads to a solution. the child compares multiple ways of approaching the problem and chooses best strategy to achieve the results.

Test Structure

4It is hard to establish reliability only on 15 items


SAM measurement design

Subtitletext text text

5

• SAM was constructed in an Item Response Theory (IRT) framework.

• One-parameter Rasch model was selected as a model for test data modeling and students scaling.

• SAM-Math test can be considered as essentially unidimensional.

• All items demonstrate satisfactory psychometric characteristics and model fit.

• SAM-math was translated into Tajik and Kazakh languages.

Functional level items

Conceptual level items

Procedural level items

http://ciced.ru/activities/assessment/tools/index.php?sphrase_id=1875


Estimation of examinees

6


The problem

7

• Why do Tajik results differ from other countries’ so much?• Does the instrument measure the same construct in the same way?

0 1 2 30%

10%

20%

30%

40%

50%

60%

70%

RussiaTajikistanKazakhstan

Proficiency levels

• All countries have a common soviet past.

• Educational systems are similar even nowadays.

We focus on comparing Russian results with Tajik results.


Methodology• One can only compare test results from different countries after establishing the

equivalence between results from these countries.• To establish equivalence means to prove that there is no bias in the results.

Modern psychometric research identifies 3 types of bias:

8

Vijver, Hambleton, 1996; Vijver, Tanzer, 2004; Ercikan, Gierl, McCreith, Puhan & Koh, 2010.

AERA, APA & NCME, 1999; ITC 2010; Hambleton, de Jon, 2003.

Type of bias description Methods for identifying

Construct Different versions of the test measure different constructs

Exploratory factor analysis,Confirmatory factor analysisConstruct maps

Method

Occurs when administration process varies significantly across countries. This also includes familiarity with the stimuli, testing format and sample bias

Randomized-block design, regression analysis, monotrait-multimethod study, collateral information study

Item Items behave differently in different countries DIF-analysis

Has to be planned for in advance


Methodology 2

What we did:1. Created subsample for Russian data, established its

properties.2. Analyzed psychometric properties of the test (both Russian

and Tajik versions).3. Detected 3 types of bias:1) Compared construct maps, performed “reverse”

operationalization.2) Analyzed Tajik translations, visual stimuli, and adaptation

study results.3) Performed DIF-analysis (Mantel-Haenzsel, Logistic

regression, standardization and t-statistic).4) Identified common items for a single scale construction.4. Constructed a single scale for two countries (simultaneous calibration).

9

Vijver, Hambleton, 1996; Vijver, Tanzer, 2004; Ercikan, Gierl, McCreith, Puhan & Koh, 2010.

SampleRussian Federation.

Novgorod region, general population.

2215 4th graders.

Tajikistan. Representative sample towards Tajikistan

(cluster method).408 4th graders.


Analysis

10

M-C

-01-

1-3

M-C

-03-

1-3

M-M

-02-

1-3

M-M

-03-

1-3

M-M

-06-

1-3

M-M

-11-

1-3

M-R

-02-

1-3

M-R

-05-

1-3

M-G

-01-

1-3

M-D

-03-

1-3

M-D

-05-

1-3

M-D

-08-

1-3

M-R

-03-

1-3

M-C

-05-

1-3

M-M

-08-

1-3

0.000.100.200.300.400.500.600.700.800.901.00

Difficulty (CCT) of items Functional level

Russia

Tajikistan

3rd level items excluded from further analysis.Items remaining for scale construction: 30.

Construct maps2 items in the Tajik version measure a different part of the construct “math competence” compared to the Russian version.

2

3.1

1 Random subsample ofRussian data N=408

28 items left for common scale construction

Content areas

Relations and Functions

Numbers and Operations

Measurement

Geometry

Patterns

Higher School of Economics , Moscow, 2014 11

Analysis

3.3DIF analysis on 28 remaining items14 items showed DIF.

What is the source of bias?1. Item wording2. Item visual representation

Russian version Tajik version

2 lines

Review 3.2, 5 more items were excluded

Only partial equivalence can be established.Common scale construction might be done on these 9 items.

3.2

Verifying translation.Significant changes in wording: 4 itemsChanges in representation:15 items Still 28 items left

14 items left

3.4 Conclusion: 9 items


Analysis. Common scale construction

12

4 Method: simultaneous calibration.

408 students

408 students

Tajikistan

Russia

19 unique items

9 common items

19 unique items

We analyzed the psychometric properties of these 47 items.The items fit the Rasch model and there is no second dimension in the data.

Then we scaled the results on a 1000-point scale and set thresholds

Items that showed DIF by country or had changes in wording/ representation


Analysis. Common scale construction

13

Russia Tajikistan0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

210

Common scale

After constructing a common scale the number of Tajik students, who reached the 2nd level (conceptual), increased slightly

Item/person map for combined data


The results

14

We showed how equivalence can be tested for assessment tools, which were not created as tools for cross-cultural comparison.Construct bias: EFA & CFA, maps of constructItem bias: verifying translation and item representation, DIF analysis.

Partial equivalence was established.

3rd level of math proficiency is not present in Tajikistan.

DIF can occur not only because of changes in wording, but also because of visual representation of the item (scale of pictures, arrangement of response options, additional numbering).


Discussion

15

The math curriculum in Tajikistan is almost the same as in Russia. Content areas were checked during the adaptation process.

Does the curriculum represent what is actually taught in classrooms?

GDP, 2012 (per person):

World bankRussian Federa-

tionTajikistan Kazakhstan

0

5000

10000

15000

20000

25000

Russian Federation 23549

Tajikistan 2247

Kazakhstan 13 892

20, Myasnitskaya str., Moscow, Russia, 101000Tel.: +7 (495) 628-8829, Fax: +7 (495) 628-7931

www.hse.ru

16

Items. Example of levels

17

Nezhnov, Kardanova, Vasilyeva, Ludlow, 2014

Cross-Cultural Comparability of SAM-math results Irina Brun Elena Kardanova National Research...

Documents

Transcript of Cross-Cultural Comparability of SAM-math results Irina Brun Elena Kardanova National Research...