Neural Test Theory: A nonparametric test theory using the mechanism of a self-organizing map SHOJIMA...

Post on 11-Jan-2016

219 views 5 download

Tags:

Transcript of Neural Test Theory: A nonparametric test theory using the mechanism of a self-organizing map SHOJIMA...

Neural Test Theory:A nonparametric test theory using the mechanism of a self-organizing map

SHOJIMA KojiroThe National Center for

University Entrance Examinations, Japanshojima@rd.dnc.ac.jp

1

Neural Test Theory (NTT)

• Shojima (2008) IMPS2007 CV, in press.– Test theory using the mechanism of a self-organizing

map (SOM; Kohonen, 1995)

• Scaling– Latent scale is ordinal.– Latent rank– Number of latent ranks is about [3, 20]– Item Reference Profile– Test Reference Profile– Rank Membership Profile

• Equating– Concurrent calibration

2

Why an Ordinal Scale?

Two main reasons:– Methodological– Sociological

3

Methodological Reason

• Psychological variables are continuous– Reasoning, reading comprehension, ability…– Anxiety, depression, inferiority complex…

• Tools do not have high resolution for measuring them on a continuous scale– Tests– Psychological questionnaires– Social investigation

4

Weight and Weighing Machine

• Phenomenon (continuous) • Measure (high reliability)

Weight

1 23 4

5

Ability and Test

• Phenomenon (continuous?)• Measure (low reliability)

Ability6

1234

Resolution• Power to detect difference(s) • Weighing machines

– can detect the difference between two persons of almost the same weight.

– can almost correctly array people according to their weights on the kilogram scale.

• Tests– cannot discriminate the difference between two

persons of nearly equal ability.– cannot correctly array people according to their

abilities.

• The most that tests can do is to grade examinees into several ranks. 7

Sociological Reason

• Negative aspects of continuous scale– Students are motivated to get the

highest possible scores.– They should not be pushed back and

forth by unstable continuous scores. • Positive aspects of ordinal scale

– Ordinal evaluation is more robust than continuous scores.

– Sustained endeavor is necessary to go up to the next rank.

8

NTT

Latent Rank TheorySOM GTM

Binary Shojima (in press) RN07-12

Polytomous(ordinal) RN07-03 In preparation

Polytomous(nominal) RN07-21 In preparation

Continuous In preparation In preparation

• ML (RN07-04)• Fitness (RN07-05)• Missing (RN07-06)

• Equating (RN07-9)• Bayes (RN07-15)

9

Statistical Learning of the NTT

・ For (t=1; t ≤ T; t = t + 1) ・ U(t)←Randomly sort row vectors of U

  ・ For (h=1; h ≤ N; h = h + 1)  ・ Obtain zh

(t) from uh(t)

  ・ Select winner rank for uh(t)

  ・ Obtain V(t,h) by updating V(t,h−1)

・ V(t,N)←V(t+1,0)

Point 1

Point 2

10

Mechanism of Neural Test Theory

0

0

0

1

0

0

0

1

0

0

0

1

0

1

1

1

1

0

1

0

1

0

0

1

Latent rank scale

Nu

mb

er

of

item

s

ResponsePoint 1Point 2 Point 1Point 2

11

Point 1: Winner Rank Selection

The least squares method is also available.

Bayes

ML

)1,()()1,()(

1

)()1,()( 1ln1ln)|(

htqj

thj

htqj

thj

n

j

thj

htth vuvuzp Vu

Likelihood

)|(lnmaxarg: )1,()()(

htt

hQq

MLw pwR Vu

)(ln)|(lnmaxarg: )1,()()(q

htth

Qq

MAPw fppwR

Vu

12

Point 2: Reference Matrix Update

• The nodes of the ranks nearer to the winner are updated to become closer to the input data

• h: tension• α: size of tension• σ: region size of

learning propagation

)1,(')(')()()1,(),( )()'( htQ

thQ

th

tn

htht V1u1zh1VV

1

)1()(1

)1()(

2

)(exp

)1(}{

1

1

22

2)(

)()(

T

ttTT

ttT

Q

wq

N

Qh

nh

Tt

Tt

t

ttqw

tqw

t

h

13

Analysis Example

• Geography test

N 5000n 35Median 17Max 35Min 2Range 33Mean 16.911Sd 4.976Skew 0.313Kurt -0.074Alpha 0.704

0 5 10 15 20 25 30 35SCORE

0

100

200

300

400

500

YCNEUQERF

14

IRP of Item 25

IRP of Item 14

Item Reference Profile(IRP)

15

IRPs of Items 1–15 (ML, Q=10)

The monotonic increasing constraint can be imposed on the IRPs in the learning process.16

IRP of Items 16–35 (ML, Q=10)

17

IRP index (1) Item Difficulty

• Beta– Rank stepping over

0.5

• B– Its value

Kumagai (2007)

18

IRP index (2) Item Discriminancy

• Alpha– Smaller rank of the

neighboring pair with the biggest change

• A– Its value

19

IRP index (3) Item Monotonicity

• Gamma– Proportion of

neighboring pairs with negative changes.

• C– Their sum

20

ITEM R1 R2 R3・・・

R8 R9 R10 A α B β C γ

10.26

2 0.257

0.255

・・・

0.416 0.460 0.497 0.044 80.49

7 10 -0.007

0.222

20.27

1 0.255

0.240

・・・

0.319 0.320 0.317 0.025 50.31

7 10 -0.033

0.333

30.59

7 0.624

0.669

・・・

0.856 0.867 0.880 0.057 40.59

7 1 0.000

0.000

40.21

0 0.204

0.202

・・・

0.460 0.539 0.592 0.084 70.53

9 9 -0.009

0.222

50.22

7 0.219

0.214

・・・

0.319 0.390 0.445 0.071 80.44

5 10 -0.013

0.222

60.74

7 0.784

0.836

・・・

0.914 0.921 0.928 0.052 20.74

7 1 0.000

0.111

70.35

2 0.326

0.296

・・・

0.439 0.440 0.436 0.051 50.43

6 10 -0.066

0.444

80.22

9 0.234

0.238

・・・

0.490 0.593 0.667 0.104 80.59

3 9 0.000

0.000

90.44

4 0.491

0.562

・・・

0.778 0.802 0.816 0.071 20.56

2 3 0.000

0.000

100.28

7 0.254

0.210

・・・

0.548 0.648 0.719 0.112 60.54

8 8 -0.094

0.333

320.18

9 0.170

0.157

・・・

0.302 0.332 0.360 0.042 50.36

0 10 -0.032

0.222

330.16

8 0.188

0.221

・・・

0.333 0.376 0.414 0.044 80.41

4 10 0.000

0.000

340.40

7 0.413

0.424

・・・

0.566 0.585 0.593 0.036 60.53

5 7 0.000

0.000

350.48

1 0.522

0.569

・・・

0.719 0.765 0.794 0.051 70.52

2 2 0.000

0.000

Item Reference Profile Estimate

IRP indices

21

Can-Do Table (example)

IRP estimates IRP indicesAbility category and item content

22

Test Reference Profile (TRP)

• Weakly ordinal alignment condition– Satisfied when the TRP is monotonic, but not every IRP is

monotonic.• Strongly ordinal alignment condition

– Satisfied when all the IRPs are monotonic. TRP is monotonic.• The scale is not ordinal unless at least the weak condition is

satisfied.

• Weighted sum of the IRPs• Expected score of each

latent rank

23

Model-Fit Indices

ML, Q=10 ML, Q=5

• Fit indices are helpful in determining the number of latent ranks.

24

Bayes

ML

qjijqjij

n

jiji vuvuzp

1ln1ln)|(1

VuLikelihood

)|(lnmaxarg:)( VuiQq

MLi pwR

)(ln)|(lnmaxarg:)(qi

Qq

MAPi fppwR

Vu

Latent Rank Estimation

• Identical to the winner rank selection

25

Latent Rank Distribution (LRD)

• LRD is not always flat• Examinees are classified according

to the similarity of their response patterns. 26

Stratified Latent Rank Distribution

LRD stratified by sex LRD stratified by establishment

0.0

0.2

0.4

0.6

0.8

1.0

1 2 3 4 5

Male Female Total

0.0

0.2

0.4

0.6

0.8

1.0

1 2 3 4 5

National Public

Private Total

27

Relationship between Latent Ranks and Scores

• R-S scatter plot– Spearman’s R=0.929

• R-Q scatter plot– Spearman’s R=0.925

1 2 3 4 5 6 7 8 9 10LATENT RANK

0

5

10

15

20

25

30

35

EROCS1 2 3 4 5 6 7 8 9 10

LATENT RANK

1

2

3

4

5

6

7

8

9

10

ELITNAUQ

Validity of the NTT scale28

Rank Membership Profile (RMP)

• Posterior distribution of latent rank to which each examinee belongs

Q

q qqi

qqiiq

fpp

fppp

1' '' )()|(

)()|(

vu

vuRMP

29

RMPs of Examinees 1–15 (Q=10)

2 4 6 8 10LATENT RANK

0

0.2

0.4

0.6

0.8

1

YTILIBABORP

Examinee 11

2 4 6 8 10LATENT RANK

0

0.2

0.4

0.6

0.8

1

YTILIBABORP

Examinee 12

2 4 6 8 10LATENT RANK

0

0.2

0.4

0.6

0.8

1

YTILIBABORP

Examinee 13

2 4 6 8 10LATENT RANK

0

0.2

0.4

0.6

0.8

1

YTILIBABORP

Examinee 14

2 4 6 8 10LATENT RANK

0

0.2

0.4

0.6

0.8

1

YTILIBABORP

Examinee 15

2 4 6 8 10LATENT RANK

0

0.2

0.4

0.6

0.8

1

YTILIBABORP

Examinee 6

2 4 6 8 10LATENT RANK

0

0.2

0.4

0.6

0.8

1

YTILIBABORP

Examinee 7

2 4 6 8 10LATENT RANK

0

0.2

0.4

0.6

0.8

1

YTILIBABORP

Examinee 8

2 4 6 8 10LATENT RANK

0

0.2

0.4

0.6

0.8

1

YTILIBABORP

Examinee 9

2 4 6 8 10LATENT RANK

0

0.2

0.4

0.6

0.8

1

YTILIBABORP

Examinee 10

2 4 6 8 10LATENT RANK

0

0.2

0.4

0.6

0.8

1

YTILIBABORP

Examinee 1

2 4 6 8 10LATENT RANK

0

0.2

0.4

0.6

0.8

1

YTILIBABORP

Examinee 2

2 4 6 8 10LATENT RANK

0

0.2

0.4

0.6

0.8

1

YTILIBABORP

Examinee 3

2 4 6 8 10LATENT RANK

0

0.2

0.4

0.6

0.8

1

YTILIBABORP

Examinee 4

2 4 6 8 10LATENT RANK

0

0.2

0.4

0.6

0.8

1

YTILIBABORP

Examinee 5

30

Extended Models

• Graded Neural Test Model (RN07-03)– NTT model for ordinal polytomous data

• Nominal Neural Test Model (RN07-21)– NTT model for nominal polytomous data

• Batch-type NTT Model (RN08-03)• Continuous Neural Test Model• Multidimensional Neural Test Model

31

Graded Neural Test ModelBoundary Category Reference Profiles of Items

1–9Dashed lines are observation ratio profiles (ORP)

32

Graded Neural Test ModelBoundary Category Reference Profiles of Items

1–9Dashed lines are observation ratio profiles (ORP)

33

Nominal Neural Test ModelItem Category Reference Profiles of Items 1–16

* correct choice, x merged category of choices with selection ratios of less than 10%

34

Discussion

• Test standardization theory– Self-Organizing Map– Latent scale is ordinal– IRPs are flexible and nonlinear

• Test editing• CBT and CAT• Test equating

– Concurrent calibration

• Application– Japan’s National Achievement Test for 6th and 9th

graders 35

• Websitehttp://www.rd.dnc.ac.jp/~shojima/ntt/index.htm

• Software– Neutet

• Developed by Professor Hashimoto (NCUEE) • Available in Japanese and English versions

– EasyNTT• Developed by Professor Kumagai (Niigata Univ.) • Japanese version only 36