Test Design

75
Language Assessment Design Dordt College Placement Exam Vicky Fang & Hala Sun

description

Dordt College Placement Test

Transcript of Test Design

Page 1: Test Design

Language Assessment Design Dordt College Placement Exam

Vicky Fang & Hala Sun

Page 2: Test Design

Original Test Design: Placement Exam ii

Table of Contents

BACKGROUND INFORMATION .............................................................................................1

OVERVIEW .......................................................................................................................................................... 1 History......................................................................................................................................1

Target population.....................................................................................................................2

Purpose of the placement test ..................................................................................................3

Test description ........................................................................................................................3

TEST DEVELOPMENT PROCESS ................................................................................................................. 6 GENERAL ADMINISTRATION PROCEDURES ......................................................................................... 7 CONSTRUCTS ..................................................................................................................................................... 8

I. Listening comprehension ......................................................................................................8

II. Grammar .............................................................................................................................8

III. Reading comprehension .....................................................................................................9

IV. Writing ability ....................................................................................................................9

V. Oral skills ..........................................................................................................................10

SCORING AND INTERPRETATION ......................................................................................11

ANALYSIS ...................................................................................................................................16

APPLYING WESCHE’S (1983) FOUR COMPONENTS .........................................................................16 APPLYING SWAIN’S (1983) FOUR PRINCIPLES ..................................................................................18

EXAMNING VALIDITY AND RELIABILITY ......................................................................20

VALIDITY OF M-C QUESTIONS ..................................................................................................................21 Item facility ............................................................................................................................21

Distractor analysis .................................................................................................................22

Item discrimination ................................................................................................................23

Response frequency distribution ............................................................................................25

RELIABILITY ....................................................................................................................................................27 Inter-rater reliability ..............................................................................................................21

SUBTEST RELATIONSHIPS ....................................................................................................32

DISCUSSION ...............................................................................................................................35

CONCLUSION ............................................................................................................................38

REFERENCES .............................................................................................................................40

Page 3: Test Design

Original Test Design: Placement Exam 1

Background and Information

Overview

History. In 2012, as a language assessment project with Dr. Kathleen M. Bailey, we

(Hala Sun and Vicky Fang of the Monterey Institute of International Studies [MIIS]) re-designed

the Dordt College Placement Test (DCPT) (Hala Sun is a Dordt College alumna). The DCPT is a

specialized assessment tool to measure incoming international and exchange students’ academic

English language proficiency, specifically their listening, reading, writing, and speaking skills as

well as their grammatical knowledge. Unlike the previous DCPT, this newly designed test

includes a section called “Grammar.”

Concurrent with this language assessment project, we also designed an academic writing

course curriculum for the English for Academic Purposes (EAP) program at Dordt for our

curriculum design project. As part of the design process, we surveyed the current and the past

international students and interviewed the EAP and the English instructors at Dordt College.

Based on our needs analysis, we learned that the need to improve international students’

grammatical competence was crucial. Furthermore, we also found out that the DCPT, which

determines whether students need to take the EAP courses during their first semester, was

designed 16 years ago in 1996 by the current EAP course instructor, Sanneke Kok. We strongly

felt the need to “update” the stimulus materials presented in the previous DCPT because we

believe that the relevancy and the authenticity of stimulus materials affects the abilities that we

want to assess (Wesche, 1983). Based on our interview, Instructor Kok had attempted to update

the DCPT, but due to limited time and resources, she was only able to make minor changes to the

scoring rubrics 2 years ago; the content and the type of test methods were not revised. The

stimulus material for the listening comprehension subtest (a mock lecture) was slightly changed

Page 4: Test Design

Original Test Design: Placement Exam 2

over the years--the professor giving the lecture was changed. In addition, there had been no tests

conducted to examine the reliability and the validity of the DCPT (Sanneke Kok, personal

communication, September 24, 2012).

Applying all the concepts from our language assessment course, we envisioned this test

to be comprehensive and appropriate to the needs of the stakeholders. This newly designed

DCPT is still “organic” and may need follow-up revisions upon administering this test to

incoming international students at Dordt College. Nevertheless, we feel confident about the

foundations of this test because (1) we designed this test, following the decision-making format

presented by Alderson, Clapham, and Wall (1995); (2) we pre-piloted and piloted the new DCPT

with the current international students at Dordt; and (3) we ran various statistical tests to ensure

the validity and the reliability of this test.

Target population. All students admitted to Dordt College whose English is not their

native language (this includes exchange students and ESL students) are required to take the

DCPT. According to Instructor Kok, the number of international students admitted to Dordt

varies every year, but on average, 10 to 12 students take her EAP courses every semester. To

“pass” the old DCPT, students need to score at least 80% on the essay writing and 70% on all of

each of the remaining subtests (listening, reading comprehension, and speaking). Through our

needs analysis, we found out that the English professors have high academic expectations from

their students, especially in writing and grammar competency. Therefore, we decided to keep the

current 70%–80% standard except for one minor change. We included the newly added subtest,

grammar, into the 80% standard category. The students who do not score 80% or above on both

the essay writing and the grammar subtests are required to take the EAP reading and writing

Page 5: Test Design

Original Test Design: Placement Exam 3

course (Academic Writing from Sources). Similarly, if students do not score 70% or above, they

have to take the EAP speaking and listening course (Academic Interaction).

Purpose of the placement test. The revised DCPT is designed to provide an accurate

evaluation of international students’ academic English language skills, assessing their potential

to be successful in their college academic life. Specifically, this test helps determine whether

these international students have sufficient academic English skills and knowledge to take the

general courses at Dordt, especially English courses. International students who do not “pass”

the placement exam have to take either or both of the EAP courses offered in their first semester.

Once the international students complete these EAP courses, then they can register for general

English core courses.

As we re-designed the DCPT, we constantly made sure that the constructs assessed in our

DCPT matched the two EAP courses offered. We reviewed the curricula of these two courses

because we wanted to examine whether the areas or skills that students need further

improvement, based on the results of their DCPT, are covered in the current EAP courses.

Currently, the Academic Writing from Source course helps students to improve their academic

reading and writing skills, especially focusing on how to integrate various sources and to make

appropriate citations using standard documentation styles. The Academic Interaction course

focuses on helping students to develop and strengthen their speaking and listening skills of

academic English.

Test description. To understand the new DCPT, we must first examine the original

placement test created by the EAP instructor, Sanneke Kok. The previous DCPT had the

following four subtests:

Page 6: Test Design

Original Test Design: Placement Exam 4

Subtest 1: Oral interview. This subtest required the test-takers to answer several

questions posed by a test administrator. This subtest assessed the test-takers’ oral fluency and

accuracy in speaking.

Subtest 2: Mini article. This subtest comprised 10 multiple-choice questions based on a

sample reading. This subtest assessed the test-takers’ vocabulary knowledge and reading

comprehension.

Subtest 3: Mini lecture. This subtest required the test-takers to watch a video clip of a

mock lecture, testing their listening comprehension. After watching the clip, students had to

answer the true or false questions presented orally by the lecturer in the video. In addition,

students had to fill in the missing blanks of the given table.

Subtest 4: Writing prompt. For this subtest, the test-takers had to compose an essay

according to the given prompt. This subtest assessed the test-takers’ academic writing ability,

which includes organization skills and grammar.

Subtests 1, 2, and 3 were used to determine whether the test-takers needed to take the

Academic Interaction and subtest 4 was used to decide whether the test-takers had to take the

Academic Writing from Sources.

For our newly designed test, we have an overarching theme of language learning. This test has

the following five subtests:

Subtest 1: Listening comprehension. This subtest consists of five short-answer questions

and measures the test-takers’ ability to comprehend a speech from a video clip (Ted Talk). In this

video, the presenter discusses the concept of English manias and the implications of the spread or

the dominance of the English language. The test-takers have to identify various important

Page 7: Test Design

Original Test Design: Placement Exam 5

information from the video to be able to answer the short-answer questions. The maximum

allotted time for this subtest is 10 minutes.

Subtest 2: Grammar. This is a cloze-elide subtest, in which the test-takers have to

identify 15 “extra” words that make the sentence(s) within the given text ungrammatical. The

test-takers are required to cross out these extra words. The instructions indicate that there are

exactly 15 “extra” words to cross out. This given text, taken from the New York Times

newspaper, relates to the topic of language learning and immersion. The maximum allotted time

for this subtest is 5 minutes.

Subtest 3: Reading comprehension. This third subtest consists of 10 multiple-choice (M-

C) questions and measures the test-takers’ vocabulary knowledge, reading comprehension, and

grammar. There are two articles within this subtest, each having five M-C questions. The first

article is a short narrative story of the world’s oldest learner. The second part is an excerpt of an

article that discusses the influence of mother tongue and language learning experience. The

maximum allotted time for this subtest is 20 minutes.

Subtest 4: Mini-essay writing. This subtest measures the test-takers’ academic writing

ability. Specifically, the essay’s content and organization are assessed as well as the test-takers’

correct use of grammar. This subtest requires the test-takers to explain using specific examples

whether or not they think learning English is important. The maximum allotted time for this

subtest is 20 minutes. The test-takers are required to write at least 180 words.

Subtest 5: Oral interview. This subtest measures the test-takers’ speaking ability,

specifically their fluency and accuracy in speech (e.g., grammar, pronunciation, and coherence).

Furthermore, their content and comprehension are also assessed by examining the relevancy of

their answer to the given prompt including their examples to support their stance. The test-takers

Page 8: Test Design

Original Test Design: Placement Exam 6

are given 2 minutes to prepare and up to 3 minutes to answer the prompt. The prompt is written

as follows:

In the United States, many universities require students to learn an additional language

other than their native language. Do you support the idea that university students should

be required to learn an additional language (other than their native languages)? Why or

why not?

For this new DCPT, subtests 1, 3, and 5 will be used to determine whether the test-takers

need to take the Academic Interaction, and subtest 2 and 4 will be used to decide whether the

test-takers have to take the Academic Writing from Sources.

Test Development Process

The following table shows the steps we took to design this new DCPT:

Table 1

Dordt College Placement Test Development Process

Step 1: Decision-making

- Examined the (old) DCPT

- Familiarized the target population and the setting (including college goals and curricula).

- Conducted a needs analysis of the stakeholders

- Chose the constructs and the types of subtests

- Provided definition for each construct

- Determined the test methods for each construct

Step 2: Designing

- Gathered relevant, useful, and motivating stimulus materials

- Designed one subtest at a time

- Allocated specific time for each subtest

- Created the scoring criteria for each subtest

Step 3: Pre-piloting

- Pre-piloted the test with 3 TESOL MIIS classmates and the course instructor

- Revised the test based on the feedback and test results (e.g., lessened the time allotted for

each subtest; and revised M-C choice items, that were misleading, confusing, or too

obvious)

Step 4: Piloting

- Sent the revised DCPT to Instructor Kok to pilot/administer this test

- Instructor Kok returned 10 current international students’ DCPT, including the recorded

oral interview via DVD

- Scored the exams

Page 9: Test Design

Original Test Design: Placement Exam 7

Table 1 (Con’t)

Dordt College Placement Test Development Process

Step 5: Analysis

- Analyzed the validity of the M-C subtest, using item facility, item discriminability,

distractor analysis, and response frequency distribution

- Analyzed the reliability of the objectively scored subtests using the split-half method

- Analyzed the reliability of the scorers using the interrater reliability

Step 6: Reflections & Revisions

- Decided to make the oral interview prompt simpler (some students did not answer the

question asked)

- Made minor changes to the oral scoring criteria

- Created a model or an example for the cloze-elide test (instead of crossing out the words,

some students underlined or circled the words)

- Added more lines in the essay; some students did not meet the minimum word

requirement; we assume that the test-takers concluded their writing when they saw the

lines ending or lacking

General Administration Procedures

For test administration, Dordt College has two teams—the logistics team and the oral

interview team. The logistics team members are student volunteers recruited by Instructor Kok in

advance. Instructor Kok provides a 1-hour training to the student volunteers. We adapted the

current logistics guide and made minor changes. See Appendix A for the new logistics guide we

created and Appendix B for the original guide.

In addition, Instructor Kok recruits the oral interview team or as Instructor Kok refers to

as the Entrance Interview for International/ESL Students (EIIS) team. Every year, there are about

five to six groups of EIIS team, each team consisting three faculty members from various

disciplines (both male and female). Similar to the logistics team, the EIIS team members receive

an hour training from Instructor Kok. During the training session, Instructor Kok briefly

discusses the topic of language acquisition, as well as the benefits of being an EIIS team

member, such as gaining a “snap shot” of the new international students’ abilities and needs

(personal communication, November 14, 2012). For a sample oral interview schedule (provided

Page 10: Test Design

Original Test Design: Placement Exam 8

by Instructor Kok), see Appendix C.

Constructs

With Alderson, Clapham, and Wall’s (1995) guidance on developing test specifications,

we identified the five constructs for the new placement test. There are listening comprehension,

grammatical knowledge, reading comprehension, writing ability and oral skills. In addressing to

the issue of test methods, Bailey (1998) points out that indirect tests may fail to provide valid

assessment of a construct and may also have negative washback on test-takers. Wesche (1983)

also argues that integrative and direct tests are better to predict students’ use of the target

language in real life. Thus, when designing the placement test, we tried to incorporate direct and

integrative test methods to measure each construct.

I. Listening comprehension. In defining the listening construct, Buck (2001) argues that

listening tests need to be contextualized, “knowledge-independent”, “require fast, automatic, on-

line processing of texts” and “go beyond literal meaning” (p. 113). In correspondence to the

definition, we made a listening task that requires test-takers to respond to five short-answer

questions after watching a four-minute video. By doing so, we simulated a mini lecture to test

students’ abilities to recall specific words as well as students’ comprehension of the overall

speech.

II. Grammar. From our interview with Instructor Kok as well as the four English

professors at Dordt College, we have learned that the institution’s educational philosophy

stresses an emphasis on students’ grammatical competence. Therefore, we added the grammar

section in designing the test. In defining the concept of grammar, the Longman dictionary (2010)

states that “it usually takes into account the meanings and functions these sentences have in the

overall system of the language” (p. 252). Citing Larsen-Freeman (1991, 1997), Brown (2010)

Page 11: Test Design

Original Test Design: Placement Exam 9

also argues that grammatical knowledge includes grammatical forms, grammatical meanings and

pragmatic meaning. To implement the idea that grammatical forms are intimately associated

with grammatical meanings as well as pragmatic meaning, we inserted the grammar problems

that English as Second language (ESL) learners may encounter into an article. The grammar

problems we addressed in test include the use of articles and prepositions, adjective usage, verb

tense and subject-verb agreement. These grammar problems were intentionally selected from the

grammar criteria addressed in the analytic rubric of writing (see Appendix D for the scoring

criteria). By doing this, we hope to raise the test-takers’ awareness of these grammar problems

when they compose their writings.

III. Reading comprehension. Hedgcock and Ferris (2009) mention that from a bottom-

up view of reading, the reader starts from small units such as words and works towards large

units such as written discourse; from the top-down view, the reader’s understanding of a text is

the product of the reader’s background knowledge of the text and the information given by the

text. Thus, we designed the items that lead the test-takers to adopt both approaches to

comprehend the reading passage (Alderson, 2000). The bottom-up items include questions

asking for the interpretation of specific words. The top-down items include questions that require

the test-takers to paraphrase a sentence and recognize the implied message of a text. We included

two readings in the section, which consists of 10 multiple-choice questions.

IV. Writing ability. To study academic writing, students need to master the process of

structuring ideas into a piece of writing which shares the convention of a specific type of text

(Ferris & Hedgcock, 1998). To measure the students’ writing skills, we decided to assess the

students’ ability to write an expository essay, which is a common essay genre that college

students often encounter in academic life (Purdue Online Writing Lab, 2010). Thus, test-takers

Page 12: Test Design

Original Test Design: Placement Exam 10

need to write an essay of about 180-250 words to state, explain, and support their views on the

given prompt. Based on Purdue Online Writing Lab (2012), the structure of the expository essay

consists of the following main components:

A clear, concise, and defined thesis statement that occurs in the first paragraph of the

essay.

Clear and logical transitions between the introduction, body and conclusion.

Body paragraphs that include evidential support (whether factual, logical, statistical, or

anecdotal).

We used these descriptions to revise the analytic rubric developed by Instructor Kok to assess

students’ writing ability.

V. Oral skills. Luoma (2004) defines speaking tasks as “activities that involve speakers

in using language for the purpose of achieving a particular goal or objective in a particular

speaking situation” (p. 31). To effectively assess the construct, we created a prompt that requires

test-takers to expound on an argument based on a given topic. Test-takers will have two minutes

to prepare their speech and three minutes to perform their speech orally. During the preparation

time, test takers are also allowed to jot down some notes for their speech.

By having students relate the issue to a familiar environment, we hope the students will

gain confidence in discussing the topic. We also hope to maximize their opportunity to express

themselves in English by providing concrete personal examples.

Page 13: Test Design

Original Test Design: Placement Exam 11

Scoring and Interpretations

Different scoring criteria are used to evaluate each construct. Reading comprehension and

Grammar are both objectively scored subtests. The Reading comprehension subtest uses

multiple-choice questions to test students’ reading ability. There are 10 multiple-choice

questions, each worth 1 point. For the Grammar subtest, we created a cloze elide test method to

measure students’ grammatical knowledge. The test-taker receives one point when he/she crosses

out the extra word in the article. If the test-taker crosses out the wrong word, points would not be

deducted from his/her score. The cloze-elide test contains 15 extra words, so 15 points are

granted to the grammar subtest.

We used both exact word method and acceptable word method to evaluate the listening

comprehension construct. Bailey (1998) introduces two scoring methods to evaluate cloze tests.

Under exact word scoring method, the test-taker gets credit only when he/she writes down the

exact word in the response. In contrast, with acceptable-word scoring, the test-taker can get

credit when his/her response is “grammatically correct” and “makes good sense in the context”

(p. 61). The two methods both have merits and demerits. We used the exact word method for

evaluating responses that require accurate information from the listening, and we used the

acceptable word method to assess the test-taker’s comprehension of the overall content of the

listening. For each item scored by acceptable word method, we made a list of acceptable

answers. The total points of the listening subtest are 10 points. Each question is worth 2 points. 1

point will be deducted if the test-taker does not respond to the acceptable word questions in a

complete sentence.

The oral and writing subtests are both scored subjectively according to the respective

analytic rubrics. In setting up the rubrics for the oral presentation and essay writing, we revised

Page 14: Test Design

Original Test Design: Placement Exam 12

the analytic rubrics used in the original placement test. The analytic rubrics of writing include the

evaluation of three aspects, content, organization and grammar (see Appendix D). Based on the

needs analysis we conducted for our curriculum design project, we knew that both the

international students and the English department at Dordt College value grammatical

competence in language learning. Therefore, we kept grammar weighting 50% of the total

possible writing scores (100 points).

The rubric for the oral test was retained at first, but we found that this rubric was not

appropriate to score the oral test we designed. The old DCPT oral test was in the form of an

interview, so the rubrics included comprehension of the interview questions. However, the oral

test we designed is a presentation in a given scenario, so our rubrics need to assess whether the

student appropriately provides an answer based on the given prompt or not. In designing the new

rubrics for the oral test, we emphasized three main aspects of a speech: content, accuracy and

fluency (see Appendix D for the scoring criteria). The new rubrics made the total points of the

oral test increased to 40 points.

Table 2 presents our descriptive statistics based on the results of the new DCPT:

Table 2

Dordt College Placement Test Descriptive Statistics (N = 10)

Test

Points

Possible Mean Mode Median Range

Standard

Deviation Variance

Listening 10 8.4 10 10 8 2.76 7.6

Grammar 15 9 10, 13 10 14 4.64 21.56

Reading 10 6.7 8, 6 7 7 1.95 3.79

Writing 100 69.4 N/A 71.75 52.5 16.84 283.6

Oral 40 28.9 N/A 29 19.5 5.27 27.82

Total 175 122.4 N/A 127.75 101 31.46 344.37

Page 15: Test Design

Original Test Design: Placement Exam 13

Except listening and reading subtests, all the other tests are graded using different scales.

These subtest scores are not aggregated to enable the EAP/English Department to decide whether

an international student has to take either or both of the EAP courses. Table 3 and 4 present the

subtest scores, and the comments following Table 4 represents how subtest scores are used to

make placement decisions (Alderson et al., 1995).

Table 3

Placement Test Scores for Academic Interaction

Subtest (Points Possible)

Learner Listening (10) Reading (10) Oral (40)

1 10 8 R1 (30) R2 (33) =31.5

2 10 6 R1 (38) R2 (38) = 38

3 9 6 R1 (28) R2 (26) = 27.5

4 2 2 R1 (17) R2 (20) = 18.5

5 5 7 R1 (30) R2 (27) = 28.5

6 8 6 R1 (34) R2 (34) = 34

7 10 8 R1 (30) R2 (30) = 30

8 10 7 R1 (23) R2 (26) = 24.5

9 10 8 R1 (28) R2 (31) = 29.5

10

Mean

10

8.4

9

6.7

R1 (27) R2 (27) = 27

28.9

Page 16: Test Design

Original Test Design: Placement Exam 14

To be exempt from the Academic Interaction course, students must obtain a score of 70% or

higher on each of these three subtests—listening comprehension, oral presentation and grammar.

To be exempt from the Academic Writing from Sources course, students must obtain a score of

80% or higher on each of these two subtests—reading comprehension and mini-essay writing

respectively.

To further analyze students’ scores on each subtest, we created the following frequency

polygons for listening for where we used partially subjective scoring, reading, and grammar

subtests for where we both used objective scoring for both subtests:

Table 4

Placement Test Scores for Academic Writing

Subtest (Points Possible)

Learner Grammar (15) Writing (100)

1 14 R1 (77) R2 (75) =76

2 13 R1 (90) R2 (90) = 90

3 10 R1 (85) R2 (85) = 85

4 0 R1 (36) R2 (38) = 37.5

5 9 R1 (62) R2 (61) = 61.5

6 11 R1 (79) R2 (77) = 78

7 13 R1 (87) R2 (87) = 87

8 8 R1 (53) R2 (57) = 55

9 2 R1 (57) R2 (56) = 56.5

10 10

R1 (68) R2 (67) = 67.5

Mean 9 69.4

Page 17: Test Design

Original Test Design: Placement Exam 15

By looking at the frequency polygons and the descriptive results from Table 3 and Table

4, we wondered whether the listening comprehension subtest is too easy for the students. The

mean of the listening subtest is 8.4, much higher than a score of 70% of the total listening scores.

On the reading subtest, although the mean is only 6.7 out of 10, there are 60% of the students

Figure 1

Frequency Polygon for Listening and Reading Subtests

Figure 2

Frequency Polygon for Grammar Subtest

Page 18: Test Design

Original Test Design: Placement Exam 16

who obtained a score of 70% or higher. In contrast, in the grammar, writing, and oral subtests,

only 20–30% of the students met the requirements. Based on these scores, it seems that these

students’ needs to improve their writing and oral skills with an emphasis on grammatical

competence.

Analysis

Applying Wesche’s (1983) Four Components

The following table shows the application of Wesche’s (1983) four components

framework to our test:

Table 5

Wesche’s (1983) Framework

Subtest Stimulus

Materials

Task Posed to the

Learner

Learner’s

Response

Scoring Criteria*

Listening The test-taker

watches a video

clip of “English

Mania”

presented by

Jay Walker

(2009). The test

also contains

five short-

answer

questions

related to the

content of the

video.

The test-taker must watch

and listen to the video and

identify important

information.

The test-taker

must write

down their

responses to

the questions.

Questions 2 and 3

(requiring specific

number and country

names) are marked

using the exact word

method. The remaining

questions are marked

using the acceptable

word method. Students

are given either 2 points

or 0 points. For

Question 3, partial credit

(1 pt) is given when at

least two correct

countries are mentioned.

Grammar The test-taker

reads an article

from the New

York Times

(Bahanoo,

2012).

The test-taker must

identify 15 extra words

inserted within a sentence

that makes the sentence

ungrammatical based on

the structural rules of

English; the test-taker

must pay attention to the

details of the reading to

find multiple grammar

errors, such as use of

articles and tenses.

The test-taker

must cross out

the extra

words.

The test-taker gets

points when he/she

crosses out the exact

incorrect words.

Page 19: Test Design

Original Test Design: Placement Exam 17

*Note. The keys and rubrics of the scoring criteria were all pre-established by the test designers,

although the rubric of the oral interview was modified subject to the students’ responses from the

piloting tests.

Wesche (1983) points out the importance of using authentic materials in language testing.

Therefore, the stimulus materials we selected to include in the test were sourced from

Table 5 (Con’t)

Wesche’s (1983) Framework

Subtest Stimulus

Materials

Task Posed to the

Learner

Learner’s

Response

Scoring Criteria*

Reading The test-taker

reads 1 long

passage and 1

short passage.

The test contains

5 multiple-

choice questions

for each passage.

The test-taker must

identify the main

ideas of the

readings and

define the meaning

of the words

within the given

context.

The test-taker

must circle the

letter

representing the

answer to a

question.

The test-taker gets

points when they circle

the correct letters of the

multiple-choice

questions, as determined

by the established key.

Mini-

essay

Writing

An essay prompt

is presented to

the test-taker.

The test-taker must

read and answer to

the given prompt.

He/She must

compose an

organized writing

with sufficient

examples and

correct use of

vocabulary and

grammar.

The test-taker

must write an

essay about

180-250 words

that states,

explains, and

supports his/her

opinion on the

given prompt.

The test-taker’s essay is

subjectively scored

based on an analytic

rubric set by the test

designers. The rubric

consists of three

sections, content,

organization and

grammar.

Oral

Interview

A role-play

scenario is given

to the test-taker.

The test-taker must

read the prompt,

understand the

context and adopt

the role given in

the scenario.

The test-taker

must take 2

minutes to

prepare a

persuasive

speech that

states, explains,

and supports

his/her opinion

on the given

topic and

deliver it within

3 minutes.

The test-taker’s speech

is subjectively scored

based on an analytic

rubric set by the test

designers. The rubric

evaluates two aspects of

a speech which are

content and fluency and

accuracy.

Page 20: Test Design

Original Test Design: Placement Exam 18

authoritative publications, such as the New York Times and the National Geographic Learning.

We also decided to create a theme-based test to help scaffold students’ knowledge, as well as to

make the testing constructs more integrated with one another. Considering the background of our

test-takers, we chose “language learning” as the overarching theme, because all test-takers share

an experience of learning an additional language, English. In addition, we sequenced the test

from the receptive skills (listening, grammar and reading) to the productive skills (speaking and

writing) to enhance the production stage of the exam.

Applying Swain’s (1980) Four Principles of Communicative Language Development

The following table shows our application of our test to Swain’s four principles:

Table 6

Swain’s (1980) Framework

Subtest Start from

somewhere

Concentrate on

content

Bias for best Work for

washback

Listening Our choice of this

procedure is

motivated by our

intention to simulate

an academic

situation in which

students are given a

lecture.

Since the test-

takers are

international

students, the

topic of English

learning is

relevant to them

and the video

also serves to

activate test-

takers’ schemata.

The test-takers can

get visual support

besides the audio

input. Also, they

are allowed to take

notes when

watching the video.

The spelling errors

are not marked in

test-takers’

responses to the

comprehension

questions.

The test-takers can

Experience a

situation of

taking a real

academic

lecture.

Practice note-

taking skills.

Grammar Citing Larsen-

Freeman’s (1991,

1997), Brown (2010)

defines grammatical

knowledge as:

grammatical forms,

grammatical

meanings and

pragmatic meanings.

Students can

relate the content

to their own

experience in

language

learning.

The subtest

assesses multiple

grammar points,

such as use of

articles, adjectives

and verb tense.

The test-takers can:

Learn to pay

attention to the

details of the

reading

passages.

Know the

meanings are

associated with

the grammatical

forms.

Page 21: Test Design

Original Test Design: Placement Exam 19

Table 6 (Con’t)

Swain’s (1980) Framework

Subtest Start from somewhere Concentrate on

content

Bias for best Work for

washback

Reading

The design of the

subtest was driven by

both the top-down-

processing and the

bottom-up-processing

of reading

comprehension

(Longman dictionary,

2012).

Consistent with

the content of

the previous

subtests, the

two articles are

also about

language

learning.

The definitions of

some difficult

vocabulary terms

are given in the

test.

Key words and

key sentences are

either underlined

or bolded for

attention.

Paragraphs are

marked with

alphabetic letters

for the

convenience of

reference.

The test-takers can:

Expand their

vocabulary

knowledge.

Learn to use

context to

interpret the

meanings of the

words.

Identify the

main ideas from

the readings.

Paraphrase the

reading.

Mini-essay

Writing

Through essay writing

task, we are able to

identify students’

strengths and

weaknesses, including

grammar usage and

vocabulary knowledge.

The essay

prompt,

whether

learning English

is important or

not, has been

developed

through the

previous

subtests.

The test-takers can

use the materials

provided on the

test to support

their opinions.

The test-takers can:

Write in a

simulated

academic

context.

Compose an

argumentative

essay.

Incorporate

sufficient

sources into the

writing.

Oral

Interview

Besides the concern of

using direct test to

measure the test-takers’

oral competence, the

construct of the oral

test was also inspired

by the frequent

situations where

students are required to

orally express their

opinions supported by

examples in academic

settings.

The content is

related with the

theme of the

test, language

learning.

The test-takers can

use the materials

provided on the

test to support

their opinions.

The test-takers

have 2 minutes to

prepare and jot

down some notes

for their speech.

The test-takers can:

Experience a

simulated

academic

presentation.

Give a

persuasive

speech.

Page 22: Test Design

Original Test Design: Placement Exam 20

Examining Validity and Reliability

The Dordt College Placement Test (DCPT) is an important test not only for the English

Department but also for the international students. Since the results of this test will be used to

decide whether the incoming students need to take the English for Academic Purposes (EAP)

classes in their first semester, we had to make sure that our newly revised test is valid and

reliable. Therefore, we piloted the DCPT with the current EAP students at Dordt and decided to

conduct several analyses on our subtests, including Item Facility (I.F.), Item Discrimination

(I.D.), Distractor Analyses, Response Frequency Distribution, Split Half Reliability, Inter-Rater

Reliability and Subtest Relationships. Specifically for validity, we analyzed one of the

objectively scored portions of our test, the reading comprehension subtest, using I.F., Distractor

Analyses, I.D., and Response Frequency Distribution. To test the reliability, we evaluated the

subjectively scored parts of our test using Inter-Rater Reliability and one of the objectively

scored parts, the multiple-choice (M-C) test, using Split Half Reliability. Finally, we assessed the

correlation between scores on each of our subtests and the total test.

As mentioned previously, for our reading comprehension subtest, we designed an M-C

test. Bailey (1998) discusses that many teachers and test-makers use an M-C test as a method to

assess students’ ability because of the ease of test administration and scoring. Moreover, students

may perceive an M-C test to be much “fairer and/or more reliable” since this test can be scored

objectively (Bailey, 1998, p. 130). Despite its scoring practicality, the reality is that it is difficult

to design an M-C test. In fact, Bailey (1998) mentions that it is quite labor-intensive because test-

makers need to consider various factors. For instance, getting the question (stems) and the

options right takes time. To ensure that our M-C subtest is working well and that this subtest

Page 23: Test Design

Original Test Design: Placement Exam 21

provides valid information the college needs to make placement decisions, we conducted four

different types of validity analyses.

Validity of M-C Questions

Item facility. Item facility (I.F.) is “an index of how easy an individual item was for the

people who took it” (Bailey, 1998, p. 132). To calculate the I.F. of the reading comprehension

multiple-choice (M-C) subtest, we used the following formula, taken from Bailey (1998, p. 132):

I.F. = # of test-takers answering the item correctly ÷ # of test-takers

According to Bailey (1998), the I.F. number ranges from 0.0, which signifies that every

test-taker missed the item, to 1.0, which means everyone answered the item correctly. Table 7

represents the I.F. data for DCPT reading comprehension M-C subtest.

Table 7

Reading Comprehension Subtest Item Facility (n=10)

Item

Students who answered

the item correctly Item Facility (I.F.)

1 8 0.80 (80%)

2 8 0.80

3 10 1.00

4 9 0.90

5 9 0.90

6 3 0.30

7 6 0.60

8 4 0.40

9 5 0.50

10 5 0.50

Average I.F.= 0.67

Oller (1979) states that “items falling somewhere between about 0.15 and 0.85 are

usually preferred” (p. 247). Based on our I.F. data, item 3 (1.00) and items 4 and 5 (0.90) need

serious revisions, because Oller claims that “in tests that are intended to reveal differences

among the students who are better and worse performers on whatever is being tested, there is

Page 24: Test Design

Original Test Design: Placement Exam 22

nothing gained by including test items that every student answers correctly or that every student

answers incorrectly” (p. 246). Excluding items 3, 4, and 5, the remaining items fall well within

Oller’s preferred range. Half of the items (items 6, 7, 8, 9, and 10) appear to be in the medium

difficulty range, from 0.30 to 0.60. Based solely on these results, we would not change the items

with medium difficulty (items 6 to 10), but would revisit items 3, 4, and 5.

Distractor analysis. We conducted a Distractor Analysis to improve the validity of our

M-C test and to make sure that each option for each test item was “distracting.” Also, since we

assume that students have variable skills and knowledge, we do not want to have a test item

option that is too obvious and serves no purpose in terms of distinguishing students who know

and who do not know the correct answer. Table 8 shows the number of students that selected

each option.

Table 8

Reading Comprehension Subtest Distractor Analysis (n=10)

Item A B C D

1 8* 2 0 0

2 0 8* 1 1

3 0 10* 0 0

4 0 0 9* 1

5 9* 0 0 1

6 2 2 3 3*

7 2 2 0 6*

8 5 0 4* 1

9 1 4 5* 0

10 5* 1 2 2

Note. (*) indicates the correct answer to the item.

Based on the results of the Distractor Analysis, we can see that items 1 through 5 need

attention. Previously, our I.F. Analysis indicated that items 3, 4, and 5 should be revised because

these items were too easy for the students. In the Distractor Analysis, we can verify that options

in items 3 to 5 should be changed considerably because the majority of the students chose one

Page 25: Test Design

Original Test Design: Placement Exam 23

option over the others. Options in the items 1, 2, 7, 8, and 9 should also be reviewed to make

ensure that all of the options are distracting, like items 6 and 10.

Item discrimination. The Item Discrimination method was used to find out how the top

scorers and low scorers performed on each item in the M-C subtest. Using Flanagan’s method of

computing item discriminability (Oller, 1979), the top scorers and the low scorers were ranked

based on the scores of the entire exam, including the other four subtests. We took the top 33% of

the exams and the bottom 33% of the exams and calculated the I.D. using the following formula,

taken from Bailey (2008, p. 136)

I.D.= (# of high scorers who got the item right) – (# of low scorers who got the item right)

33% (total # of students [10])

Using Flanagan’s formula, Table 9 reflects the calculated I.D. values for the M-C subtest.

Investigating the I.D. value is helpful to us as test makers because we are able to know

whether our low I.F. items were truly difficult and our high I.F. items were too easy for the

Table 9

Reading Comprehension Subtest Item Discrimination (n=10)

Item

High scorers (top three)

with correct answers

Low scorers (bottom

three) with correct

answers

Item Discrimination

(I.D.)

1 2 2 0.00

2 3 1 0.61

3 3 3 0.00

4 3 2 0.30

5 3 2 0.30

6 1 1 0.00

7 2 2 0.00

8 2 1 0.30

9 2 1 0.30

10 1 2 − 0.30

Average I.D.= 0.151

Page 26: Test Design

Original Test Design: Placement Exam 24

students. Table 4 shows the I.F. and I.D. values side by side. We will use Table 10 to better

analyze the results of the I.D. values.

Table 10

Reading Comprehension Subtest Item Discrimination and Item Facility (n=10)

Item

High scorers

(top three) with

correct answers

Low scorers

(bottom three)

with correct

answers

Item

Discrimination

(I.D.)

Item Facility

(I.F)

1 2 2 0.00 0.80 (80%)

2 3 1 0.61 0.80

3 3 3 0.00 1.00

4 3 2 0.30 0.90

5 3 2 0.30 0.90

6 1 1 0.00 0.30

7 2 2 0.00 0.60

8 2 1 0.30 0.40

9 2 1 0.30 0.50

10 1 2 − 0.30 0.50

Based solely on the I.D. values, items 1, 3, 6, 7, and 10 should probably be revised since

Oller’s (1979) lowest acceptable value is 0.25. From our previous discussion on Item Facility,

we mentioned that item 3 should to be addressed because the item was too easy for the students

(IF=1.0). In addition, we also mentioned keeping the items 6, 7, and 10 because these items had

medium difficulty and were in Oller’s preferred I.F. range; but our I.D. values for items 6 and 7

show that they need to be revisited because equal numbers of top scoring students and low

scoring students answered these items correctly. In fact, in item 10 (I.D. = −.30), two high

scorers missed the item while only one low scorer incorrectly answered the item. Oller points out

that “we would be disturbed if we found an item that good readers (high scorers) tended to miss

more frequently than weak readers (low scorers)” (1979, p. 251). However, we need to consider

two important factors before making changes to any items. First, because our sample size is

Page 27: Test Design

Original Test Design: Placement Exam 25

small (n=10), it is difficult to decide whether to revise these items using Oller’s recommended

range, especially in items 4 and 5—the high I.F. value (0.9) signals “change” but I.D. value

(0.30) indicate that it is acceptable. If our sample size was larger, there may be more variance in

our results; thus, we could analyze these items better.

Second, the top scorers and the low scorers were divided based on the results of the entire

test, which consists of five subtests, evaluating different language constructs. The DCPT that we

designed heavily emphasize grammatical competence and writing skills; it was designed as such

based on our needs analysis. As a result, we may have considered students with high grammar

knowledge and writing skills as part of the top three high scorers, but we are analyzing the

performance of students’ reading comprehension ability. Students with high reading

comprehension skills may have scored lower in other sections, and thus, may not have been

included in the top three high scorer sample. Therefore, keeping these factors in mind, we will

revisit and mindfully revise the items that need attention.

Response frequency distribution. Prior to looking at the results of the Response

Frequency Distribution, Table 11 provides a brief overview of the items that need attention

and/or revision based on I.F., Distractor Analysis, and I.D. analysis:

Table 11

Overview of Items that Need Attention

Analysis Item(s)

Item Facility 3, 4, and 5

Distractor Analysis 3, 4, and 5 (maybe 1, 2, 7, 8, and 9)

Item Discriminability 1, 3, 6, 7, and particularly 10 (-0.30)

Based on the information from Table 5, items 3, 4, 5, and 10 (since it is showing a

negative discrimination) seem to need revisions, and items 7, 8, and 10 should be revisited. To

further assist us in making decisions and analyzing the validity of our M-C subtest, we conducted

the Response Frequency Distribution. Response Frequency Distribution analysis is a useful

Page 28: Test Design

Original Test Design: Placement Exam 26

method because it provides a detailed picture of what the top scorers and low scorers answered

for each item; this analysis reflects the combination of the Distractor Analysis and I.D. analysis

(see Table 12).

Table 12

Response Frequency Distribution on Reading Comprehension Subtest

Item

High/Low

Scorers A B C D

1 High 2* 1 0 0

Low 2* 1 0 0

2 High 0 3* 0 0

Low 0 1* 1 1

3 High 0 3* 0 0

Low 0 3* 0 0

4 High 0 0 3* 0

Low 0 0 2* 1

5 High 3* 0 0 0

Low 2* 0 0 1

6 High 1 0 1 1*

Low 1 1 0 1*

7 High 0 1 0 2*

Low 1 0 0 2*

8 High 1 0 2* 0

Low 2 0 1* 0

9 High 0 1 2* 0

Low 1 1 1* 0

10 High 1* 1 0 1

Low 2* 0 1 0

Note. (*) indicates the correct answer to the item.

As shown in Table 12, items 1, 3, 7, and 10 need to be revisited because these items did

not discriminate between the high scorers and the low scorers. For items 1, 3, and 7, there were

equal number of high scorers and low scorers who answered the items correctly; for item 10, as

mentioned previously, there were more low scorers who had the right answer than high scorers.

Item 6 also had equal number of high and low scorers, but in terms of Distractor Analysis, this

item is ideal. In addition, it is important to note that since we only selected the top three and

Page 29: Test Design

Original Test Design: Placement Exam 27

bottom three scorers, and because there are only four options, despite the even distribution, there

will still be a “0” number in our Response Frequency Distribution data, such is the case in items

2, 6, and 9. The results in items 2, 8, and 9 are quite interesting because a majority (if not all

three) of the top scorers answered these items correctly, while the low scorers were distracted by

other options. Referring back to Table 11, the items that were easy were items 3, 4, and 5. Thus,

we consider items 2, 8, and 9 as useful items to separate the performance of high and low

scorers. Finally, based on Table 12, it is worth revisiting items 4 and 5 because all three high

scorers as well as 2 out of 3 low scorers answered these items correctly.

Through several validity analyses, we have learned that some of the items—either the

questions or the options—need to be reviewed. Table 13 shows the overall summary of items

that need attention.

Table 13

Overall Summary of Items that Need Attention

Analysis Item(s)

Item Facility 3, 4, and 5

Distractor Analysis 3, 4, and 5

Item Discriminability

Response Frequency Analysis

Overall Items That Need Attention:

1, 3, 6, 7, and particularly 10 (-0.30)

1, 3, 7, and 10

1, 3, 4, 5, 7, and 10

Reliability

Aside from ensuring the validity, it is important to examine whether a test reliable.

According to Brown (2005), “test reliability is defined as the extent to which the results can be

considered consistent or stable” (p. 175). As mentioned above, the DCPT consists of five

subtests, in which the reading and grammar subtests are objectively scored. We consider the

listening subtest as a partially subjectively scored test because the scoring criteria adopted the

acceptable word method. Internal-consistency measures can only be applied to the objectively

Page 30: Test Design

Original Test Design: Placement Exam 28

scored tests. We also are not able to calculate the internal-consistency of the grammar subtest

because it is a cloze-elide test, and each item is dependent on one another. Therefore, we only

measured the internal-consistency reliability of the reading subtest, which is composed of 10 M-

C items. We used the split-half method and the Spearman-Brown prophecy formula to estimate

the full subtest reliability (see Table 14).

Table 14

Internal Consistency Measures of the Reading M-C Subtest

Subtest

Split Half

Reliability

Reliability after

using Spearman

Brown

Prophecy

Formula

Standard

Deviation

Standard Error

of

Measurement

(SEM)

Points

Possible

Reading 0.56 0.72 1.85 0.80 10.00

The reliability result of 0.72 using the Spearman Brown formula means that the scores of the

reading M-C subtest is 72% consistent, with an 18% measurement error (100%72%=18%). The

statistical results also suggest that our reading subtest has good reliability considering the

internal consistency. We claim as such in consideration of the following factors:

1). The sample size of the test is small—only 10 test-takers.

2). The test is first launched and is fairly new.

3). The total number of the items (10) on the M-C subtest is small.

In addition, Brown (2005) points out that all the methods used to estimate the internal-

consistency reliability underestimate the actual value. Therefore, we are confident enough to

conclude that the reading M-C subtest of the DCPT is fairly reliable.

Page 31: Test Design

Original Test Design: Placement Exam 29

With the reliability estimate, we calculated the standard error of measurement (SEM)

using the following formula, in which S stands for standard deviation and stands for the

reliability estimate for the test:

(Brown, 2005, p. 189)

As the formula suggests, the SEM value is related to the internal consistency of the test. It

refers to the possible score range a test-taker can get if he/she takes the test repeatedly. In other

words, it expresses the precision of test scores. The SEM value we obtained is 0.98, which can

be rounded up to 1 point. Thus, if a test-taker gets a score of 7 on the reading subtest, his/her true

ability score lies with a certain level of probability in between 6 and 8. Considering the fact that

the total points of the reading subtest is 10 and each item is worth 1 point, the 0.98 value of the

SEM is quite good and reasonable. Thus, our SEM value further supports the reliability of our

reading comprehension section.

Inter-rater reliability. Since the reliability of the reading section is confirmed, we now

proceed to examine the reliability of our subjectively scored tests—the oral and the essay writing

subtests. Using analytic scoring rubrics, we each rated the tests; upon scoring, the inter-rater

reliability was measured. We calculated the final score of each of the subjectively scored

sections by averaging our ratings (Rater1’s rating and Rater 2’s rating). According to Bailey

(1998), coefficient alpha is usually used to compare the scoring of the two raters. To calculate

the coefficient alpha, the variance for each rater and the total variance for both raters were

computed (see Table 15 and Table 16).

'xxr

'1 xxrSSEM

Page 32: Test Design

Original Test Design: Placement Exam 30

Table 15

Inter-rater Reliability for Oral Test

Learner Rater 1 Rater 2 Rater 1 + Rater 2

1 30 33 63

2 38 38 76

3 28 24 52

4 17 22 39

5 30 27 57

6 34 34 68

7 30 37 67

8 23 26 49

9 28 32 60

10 27 27 54

Mean 29 30 59

Standard Deviation 5.41 5.25 10.13

Variance 29.25 27.60 102.65

Coefficient Alpha = .89

As Table 15 suggests, the standard deviation for Rater 1 was slightly higher than the

standard deviation of Rater 2, but the mean for Rater 1’s scores was marginally lower than the

mean for Rater 2’s scores. Thus, we can infer that Rater 1 was slightly tougher and had a little

more variability in scoring the oral test. Moreover, as shown in Table 9, the calculated

coefficient alpha was 0.89. Bailey (1998) mentions, “the closer the value is to the whole number

1.00, the greater the inter-rater reliability” (p. 182). Therefore, based on the coefficient alpha

value, the ratings on the oral test were quite reliable.

Page 33: Test Design

Original Test Design: Placement Exam 31

Table 16

Inter-rater Reliability for Essay Writing Test

Learner Rater 1 Rater 2 Rater 1 + Rater 2

1 77 82 159

2 90 90 180

3 85 85 170

4 33 41 74

5 62 61 123

6 82 75 157

7 92 85 177

8 53 62 115

9 59 54 113

10 76 67 143

Mean 70.9 70.2 141.10

Standard Deviation 17.81 15.07 32.43

Variance 317.29 226.96 1051.49

Coefficient Alpha = .96

Similar to the analysis of the results shown in Table 15, the means and the standard

deviations in Table 16 indicate that Rater 1’s ratings were slightly more lenient and had more

variability in scoring the writing subtest than Rater 2’s. The coefficient alpha, 0.96, indicates an

extremely high correlation between the two raters. In other words, the ratings of the two raters

were highly reliable.

Since the ratings of the two raters on both subjectively scored subtests are quite reliable,

we deduce that one of the attributions of the results may be due to the use of analytic scoring

scales. The analytic scoring rubrics outline the components of writing, such as content,

organization and grammar in detail, so raters can easily identify the measurable concepts when

evaluating the tests. It is also noteworthy to mention that the inter-rater reliability of the writing

subtest is higher than the one of the oral subtest. The trigger of such a difference may be that the

analytic scoring scale of the writing subtest is more specific and detailed than of the oral subtest.

Page 34: Test Design

Original Test Design: Placement Exam 32

Although the inter-rater reliability appears to be substantial, we acknowledge our

limitations—that we both served as test-developers and raters. In real-life cases, the raters are

often not involved with the test development. Although trainings are given to ensure the

reliability of raters, raters sometimes have different perspectives and interpretations of the

scoring criteria. However, in our case, since we created the scoring criteria, we knew exactly

what we were looking for. To ensure the successful application of the DCPT, we recommend

that raters attend a rater conference, where they can be trained on how to score each criterion of

the analytic scoring rubrics before they start to evaluate the test.

Subtest Relationships

To further strengthen the validity of our test, we evaluated the relationship between each

pair of the subtests by conducting a statistical analysis of the correlation among the subtests of

the DCPT. As mentioned before, the DCPT consists of five subtests. The listening

comprehension subtest is worth 10 points; the grammar section is worth 15 points; the reading

comprehension section has 10 total points; the writing section contains 100 points in total; and

the oral subtest is worth 40 points (a total of 175 points). At first, we used the raw score formula

to calculate Pearson’s r, which is the correlation coefficient between each pair of the subtests.

However, since each subtest has different scoring scales and points, we decided to convert all the

subtest scores to a standardized scale—z scores—to calculate Pearson’s r. Interestingly, there

was no difference in the results between using z scores and raw scores. Table 17 reflects the

statistical results of the subtest relationship.

Page 35: Test Design

Original Test Design: Placement Exam 33

Table 17

Subtest Relationships

Test Correlation Coefficients (Pearson’s r)

Listening 0.57 0.67 0.79 0.58 -

Grammar 0.69 0.89 0.52 - 0.58

Reading 0.44 0.47 - 0.52 0.79

Writing 0.78 - 0.47 0.89 0.67

Oral - 0.78 0.44 0.69 0.57

Oral Writing Reading Grammar Listening

As evident in Table 17, the values of the correlation coefficient are all positive. This

positive correlation suggests that as scores in one subtest increase, so will the scores in another

subtest. Thus, if a test-taker improves his/her performance on the listening comprehension,

he/she may also perform better on any of the other four subtests.

Brown (2005) notes that “relatively strong correlations would be those that range from

+0.80 to +1.0, or 0.80 to 1.0” (p. 141). The greatest value indicated in Table 17 is 0.89, which

is the value of correlation coefficient between grammar and writing subtests. The correlation

coefficient between oral and writing is 0.78, and the correlation coefficients between listening

and reading is 0.79. Both of them can be rounded up to 0.80. These three high values indicate a

strong correlation between the paired subtests mentioned in comparison to the other pairs of the

subtests. For example, reading and writing subtests as well as reading and oral subtests show

relatively low correlations (0.44 and 0.47, respectively).

The high correlation between grammar and writing is expected, because half of the scores

in the writing section depend mainly on the test-taker’s grammatical competence. However, we

cannot claim that our grammar section measures the same construct as the writing section. Oller

(1979) strongly argues that a low correlation does not indicate that two tests are measuring

Page 36: Test Design

Original Test Design: Placement Exam 34

different constructs, nor does high correlation indicate that two tests are measuring the same

constructs. In fact, there are a lot of factors that may impact the correlation between two tests.

Oller also points out that “high correlations have been observed between a wide variety of testing

techniques with a wide range of tested populations” (p. 193). Since our test is fairly new and our

sample size of test-takers is small, it is normal that we do not have consistently high subtest

correlations.

To observe whether two different tests measure the same thing, we decided to square the

correlation coefficients to obtain the values of overlapping variance. Table 18 reflects the values

of overlapping variance between each pair of the subtests.

Table 18

r-squared for Subtest Relationships

Test Overlapping Variance

Listening 0.32 0.45 0.62 0.34 -

Grammar 0.48 0.79 0.27 - 0.35

Reading 0.19 0.22 - 0.27 0.62

Writing 0.61 - 0.22 0.79 0.45

Oral - 0.61 0.19 0.48 0.32

Oral Writing Reading Grammar Listening

The highest value in Table 18, 0.79, is the overlapping variance between the grammar

and writing subtests. This means that the writing section and the grammar section share almost

80% overlapping variance in measuring the same construct, which is grammatical competence.

As we mentioned previously, 50% of the scores in the writing section are dependent on

grammar. Based on our needs analysis of Dordt College ESL students and the English

Department, Dordt College strongly values grammatical competence and the writing skills of

students. Therefore, we were glad to see that the grammar and writing sections have high

reliability and relationship within the DCPT.

Page 37: Test Design

Original Test Design: Placement Exam 35

Discussion

According to Oller (1979), there are four traditional criteria for evaluating tests—validity,

reliability, practicality, and washback. First, the validity of a test refers to “how well the test does

what it is supposed to do, namely, to inform us about the examinee’s progress toward some goal

in a curriculum […] or to differentiate levels of ability among various examinees on some task”

(p. 4). In terms of face validity, although few items from the M-C subtest need improvement, we

feel that overall the newly revised DCPT has face validity because the test was designed

carefully considering the language constructs that need to be tested. Moreover, we made sure that

the overall difficulty level of the entire test was appropriate, the instructions were clear, and the

tasks were uncomplicated. It is also worth mentioning that we pre-piloted and piloted this test to

the current international students who are taking EAP classes at Dordt College.

According to Mousavi (1999), a test is considered valid when the content of the test

measures the language skills and structures that are meant to be concerned. To ensure that the

content of our test is valid, especially to Dordt College, we reviewed and adapted all the skills

that were tested on the original DCPT. However, based on our needs analysis, we found out that

the English Department at Dordt, as well as the past and current international students,

considered grammatical competence as an important language skill. Therefore, with Alderson,

Clapham, and Wall’s (1995) guidance on developing test specification, we identified five

constructs for the new placement test—listening comprehension, grammatical knowledge,

reading comprehension, writing ability, and oral skills. We also used Weshe’s (1983) four

components framework (see Appendix G) and Swain’s (1980) four principles of communicative

language test development (see Appendix H) as our guidance when developing and validating

the content of our test. Finally, since the target population for this test is incoming international

Page 38: Test Design

Original Test Design: Placement Exam 36

students who are language learners, we selected “language learning” as the overarching theme of

the exam.

Oller’s (1979) second criterion for evaluating tests is reliability. Oller states that

“reliability of a test is a matter of how consistently it produces similar results on different

occasions under similar circumstances” (p. 4). Furthermore, Baker (1989) defines the term

reliability as “stability in the measure” (p. 60). Based on several reliability analyses that we

conducted, it is safe to claim that the DCPT is a reliable test to assess incoming international

students’ overall English language abilities. Using the split-half method, we confirmed the

reliability of the reading M-C subtest. To examine the reliability of the subjectively rated scores,

we used coefficient alpha to calculate inter-rater reliability, in which we found that the ratings of

the two raters are highly reliable. Finally, we also evaluated the subtest relationships to

determine the strength of the correlation between two subtests. Our results showed positive

correlation coefficients between each pair of subtests.

The third criterion we used to evaluate our test is practicality. According to Oller (1979),

practicality includes the “preparation, administration, scoring, and interpretation of the test” (p.

4). To ensure the practicality of our test, we pre-piloted and piloted our exam to measure and

adjust the limits of time. We also referred to the test administration specification sent by the EAP

instructor. We adapted this specification and modified it accordingly (see Appendix A for Test

Administration Procedures). We also made sure that the entire exam, especially the oral

interview subtest, was not too lengthy, not only for the benefit of the students, but to also

increase the ease of scoring and interpretation, as well as to lessen the burden of volunteer

students and faculty members who are involved in the test administration and scoring.

Page 39: Test Design

Original Test Design: Placement Exam 37

Finally, it is important to consider the washback of a test, “the effect a test has on

teaching and learning” (Bailey, 1998, p. 249). Applying Swain’s (1980) four principles of

communicative language test development, the newly revised DCPT has the following washback

for each subtest, as shown in Table 19.

Table 19

Washback of Subtests: Applying Swain’s (1980) Framework

Subtest Washback

Listening Comprehension The test-takers can:

Experience a situation of taking a real

academic lecture.

Practice note-taking skills.

Grammar The test-takers can:

Learn to pay attention to the details of the

reading passages.

Know the meanings are associated with the

grammatical forms.

Reading Comprehension

The test-takers can:

Expand their vocabulary knowledge.

Learn to use context to interpret the

meanings of the words.

Identify the main ideas from the readings.

Mini-Essay Writing The test-takers can:

Write in a simulated academic context.

Compose an argumentative essay.

Incorporate sufficient sources into the

writing.

Grammar The test-takers can:

Experience a simulated academic

presentation.

Give a persuasive speech.

Page 40: Test Design

Original Test Design: Placement Exam 38

Conclusion

In the process of designing the DCPT, we used Weshe’s (1983) four components

framework (see Appendix G) and Swain’s (1980) (see Appendix H) four principles of

communicative language test development framework to ensure that the quality of the test. Test

specifications were also used to guide the test development process to establish good

comparability of scores across test forms (Alderson, Claphan, & Wall, 1995). Furthermore, we

pre-piloted the test to three native (or near native) English speakers before piloting the test to the

students from the target group—current Dordt College ESL students. The pre-piloting stage

allowed us to revisit some of our testing items, as well as to re-adjust the time allotted to each

subtest to increase practicality of the test. Thanks to the support of the ESL Department of Dordt

College, our test test was piloted in the environment where the test would actually be adopted

and administered. Therefore, the results of the DCPT were highly informative and indicative of

its future performance.

The latter sections of this report specifically discussed the validity and the reliability

results of the DCPT. In summary, the overall test is valid because the content measures the

language skills and structures that are meant to be concerned (Mousavi, 1999). We conducted

four statistical analyses, item facility, distractor analysis, item discrimination and response

frequency distribution, to test the quality of the M-C items on the reading subtest. Based on our

statistical results, although there are some items that need to be revisited to strengthen the overall

validity of the reading subtest (see Table 13), we can confirm the overall validity of our reading

subtest, especially considering that this is a new test. In addition, our reliability procedures

yielded positive results, showing significant inter-rater reliability between raters. Therefore,

based on all the statistical results, we can safely conclude that the DCPT is valid and reliable for

Page 41: Test Design

Original Test Design: Placement Exam 39

its use as a placement test. We can also confirm the practicality of the DCPT because we

carefully designed the test and conducted several piloting procedures. Finally, we ensured the

washback of the DCPT by applying Swain’s (1980) four principles of communicative language

test development.

Page 42: Test Design

Original Test Design: Placement Exam 40

References

Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation.

Cambridge: Cambridge University Press.

Angeli, E., Wagner, J., Lawrick, E., Moore, K., Anderson, M., Soderlund, L., & Brizee, A.

(2012, May 30). General format. Retrieved from

http://owl.english.purdue.edu/owl/resource/560/01/

Bahanoo, S. (2012, April 3). How immersion helps to learn a new language. The New York

Times. Retrieved from http://www.nytimes.com/2012/04/03/science/how-immersion-

helps-to-learn-a-new-language.html

Bailey, K. M. (1998). Learning about language assessment: Dilemmas, decisions, and

directions. Boston, MA: Heinle & Heinle Publishers.

Baker, D. (1989). Language testing: A Critical survey and practical guide. London:

Edward Arnold.

Brown, H. D. (2010). Language assessment, principles and classroom practice. New York, NY:

Pearson Education.

Deutscher, G. (2010, August 26). Does your language share how you think? The New York

Times. Retrieved from http://www.nytimes.com/2010/08/29/magazine/29language-

t.html?pagewanted=all

Ferris, D., & Hedgcock, J. (Forthcoming). Teaching L2 composition: Purpose, process, and

practice (3rd ed.). New York, NY: Routledge.

Mousavi, S. A. (1999). A Dictionary of language testing. Tehran: Rahnama Publications.

Oller, J. W. (1979). Language tests at school. London: Longman Group.

Richards, J. C., & Schmidt, R. (2010). Longman dictionary of language teaching & applied

Page 43: Test Design

Original Test Design: Placement Exam 41

linguists. London, UK: Pearson Education Limited.

Swain, M. (1984). Large-scale communicative language testing: A case study. In S. J. Savignon

& M. Berns (Eds.), Initiatives in communicative language teaching (pp. 185–201).

Reading, MA: Addison-Wesley.

Walker, J. (2009, February). Jay Walker: World’s English mania [Video file]. Retrieved from

http://www.ted.com/talks/lang/en/jay_walker_on_the_world_s_english_mania.html

Wesche, M. B. (1983). Communicative testing in a second language. Modern Language Journal,

67, 41–55.

Vargo, M. & Blass, L. (2013). Pathways 1: Reading, writing, and critical thinking. Boston:

National Geographic Learning.

Page 44: Test Design

Original Test Design: Placement Exam 42

Appendices

Appendix A: New Logistics Guide

Instructions for Logistics Team for Test Administration

DATE

NAMES OF ADMINISTRATORS

9:00 AM – 12:00 noon

General Information

1. There will be a reception desk in front of the circulation desk in the library. All students

taking the interview will be told to report to the reception desk. You team members will

be at that desk to welcome students as they arrive. There will also be a few chairs for

students who may have to wait a minute for your attention.

2. You will have a schedule with a list of the interview rooms, interview teams, interview

times, and student names. As each student comes to the reception desk, you will find the

interview station where they are expected and bring them to the door/entrance.

3. After the first part of the EIIS, the oral interview, students will be directed to return to the

reception desk, where one of you will take the student to the appropriate place for the

next part of the interview.

4. Here follows a list of locations for the various parts of the interview:

Part I: Listening: watching video from YouTube called Ted Talk and answering

questions: computer bank in TRC. Each student will need a set of headphones. These

are available from a librarian at the circulation desk.

Part II, III, IV: Grammar, Reading, & Essay Writing: Reading of article and answering

objective questions: large tables or individual chairs with writing surface on the right

side of the Teaching Resource Center (TRC). Place one student at each table or chair.

Be sure each student has plenty of elbow space and privacy.

Part V: Oral Interview: assigned station; see schedule.

Turn to the next page

Page 45: Test Design

Original Test Design: Placement Exam 43

Specific Procedure

1. Part I: Listening: Give students pages 1, 2, and 3. Take students to one of the

computers in the TRC and give them a set of headphones. Point out and remind students

to carefully read the instructions on pages 2 and 3. Also point out that page 2 should be

used for taking notes on the video presentation. Tell students that they have 10 minutes to

complete this section. Note the starting and stopping times. Collect the pages and place

in the student folder, from which I will retrieve them. Guide them to return to the

reception desk.

2. Part II & III: Grammar & Reading: When students return to the reception desk, give

them a copy of pages 4, 5, 6, 7, and 8. Bring students to a table or chair in the Teaching

Resources Center (TRC). Instruct the student to read the instructions on both pages

carefully. Tell the students they have 25 minutes to complete Part II and III. Write down

the student starting and stopping times on the sheet provided. Collect the pages and place

in the student folder, from which I will retrieve them.

3. Part IV: Essay Writing: At the same location (TRC), give students a copy of pages 9

and 10. Remind students to read all of the instructions before they begin to write.

Remind them to write at least 180 words. Students have 20 minutes to complete this

essay. Record starting and stopping times. If the student does not come to you, you

should go to the student and inform him or her that time is up. When the student has

completed the essay, collect the pages and place in the student folder, from which I will

retrieve them. Ask them to sign up for an oral interview.

4. Photocopies and distribution—As soon as you have each student’s essay/paragraph,

make two copies (the reference librarian will give you money) and bring the three copies

of the paragraph to the team that interviewed the student. (Be sure the team is not in the

middle of an interview with another student.) If the team is waiting till all paragraphs are

done, store the three copies in the student folder and make sure the copies get to the right

place.

5. Part V: Oral Interview: As students come in, they will be helped right away, or asked to

take a seat until a team member is available. Greet the student and ask for the student’s

name. Check your schedule to see where the student will be interviewed and accompany

the student to the entrance of the interview station. The interview team will take over

from there.

THANK YOU!

Page 46: Test Design

Original Test Design: Placement Exam 44

Appendix B: Original Logistics Guide

Instructions for Logistics Team for EIIS Administration August 24, 2012

Kerrie Best ,Fanny Gonzales Garcia, Giovi Romero, Yushin Tsai

9:00 AM – 12:00 noon

General Information

1. There will be a reception desk in front of the circulation desk in the library. All students

taking the interview will be told to report to the reception desk. You team members will

be at that desk to welcome students as they arrive. There will also be a few chairs for

students who may have to wait a minute for your attention.

2. You will have a schedule with a list of the interview rooms, interview teams, interview

times, and student names. As each student comes to the reception desk, you will find the

interview station where they are expected and bring them to the door/entrance.

3. After the first part of the EIIS, the oral interview, students will be directed to return to the

reception desk, where one of you will take the student to the appropriate place for the

next part of the interview.

4. Here follows a list of locations for the various parts of the interview:

a. Part I: Oral Interview: assigned station; see schedule

b. Part II: Reading of article and answering objective questions: large tables or

individual chairs with writing surface on the right side of the Teaching Resource

Center (TRC). Place one student at each table or chair. Be sure each student has

plenty of elbow space and privacy.

c. Part III: watching video lecture and answering questions: computer bank in

TRC. Each student will need a set of head phones. These are available from a

librarian at the circulation desk.

d. Part IV: writing prompt: large tables or individual chairs with writing surfaces on

right hand side of TRC

Specific Procedure

1. Part I--As students come in, they will be helped right away, or asked to take a seat

until a team member is available. Greet the student and ask for the student’s name.

Check your schedule to see where the student will be interviewed and accompany the

student to the entrance of the interview station. The interview team will take over

from there.

2. Part II—When students return to the reception desk, give them a copy of pages 9 and

10. Bring students to a table or chair in the Teaching Resources Center (TRC).

Instruct the student to read the instructions on both pages carefully. Tell the students

they have 20 minutes to complete this part of the interview. Write down the student

starting and stopping times on the sheet provided. If students do not return to you 20

minutes after they have started, go to the CRC and politely inform them that time is

up. Collect the pages and place in the student folder, from which I will retrieve them.

3. Part III—Give students pages 16, 17, and 18. Take students to one of the computers

in the TRC and give them a set of headphones. Point out and remind students to

carefully read the instructions on page 18. Also point out that page 17 should be used

Page 47: Test Design

Original Test Design: Placement Exam 45

for taking notes on the video-taped lecture and that the chart on page 16 is a copy of a

chart shown briefly during the lecture. Tell students that they have 20 minutes to

complete this section. Note the starting and stopping times. Again, if the student

does not come to you after 20 minutes, you should go to the student. You should

collect the answer sheet, page 18, but instruct the student to keep the chart and the

notes for use with the final part of the interview.

4. Part IV—Take students, with their pages 16 and 17, back to a table or chair in the

TRC. Give them copies of pages 20 and 21 (this is one sheet that has the instructions

and a lined area for writing the essay). Remind students to read all of the instructions

before they begin to write. Remind them also that they can refer to their notes.

Students have 30 minutes to complete this final part of the interview. Record starting

and stopping times. Ask students to bring their completed “mini-essay” to one of the

team at the reception desk. As always, if the student does not come to you, you

should go to the student and inform him or her that time is up. When the student has

completed this final part of the interview, please direct him or her to the Commons

for lunch.

5. Photocopies and distribution—As soon as you have each student’s essay/paragraph,

make two copies (the reference librarian will give you money) and bring the three

copies of the paragraph to the team that interviewed the student. (Be sure the team is

not in the middle of an interview with another student.) If the team is waiting till all

paragraphs are done, store the three copies in the student folder and make sure the

copies get to the right place.

THANK YOU! THANK YOU! THANK YOU!

Page 48: Test Design

Original Test Design: Placement Exam 46

Appendix C: Oral Interview Schedule Sample

INTERVIEW SCHEDULE

ENTRANCE INTERVIEW FOR INTERNATIONAL/ESL STUDENTS

Friday August 24, 2012, John and Louise Hulst Library, Upper Level

ORAL

INTERVIEW

BEGINS

ROOM 262

L. VAN

BEEK

J. VERSLUIS

C. HENTGES

ROOM 263

H.

SCHAAP

D. ROTH

S.

GRONECK

ROOM 264

L. ZUIDEMA

B. KUIPER

K.

SANDOUKA

ALCOVE

M.

DENGLER

N. VAN

GAALEN

A.

FOREMAN

REFERENCE

CORNER

S. TAYLOR

I. MULDER

M. DRISSEL

9:00 AM (10:30)

Ivy Mang’eli

Kenya, ex.

Winnie

Obiero

Kenya, fr.

Yonatan

Ashenafi

Ethiopia, fr.

Henry

Murray

Panama, tr.

Juan Benitez

Gonzalez

Paraguay, fr.

9:20 AM (10:50)

Alba Garcia

Macias

Mexico, fr.

Eun Hye

Jee

South

Korea, ex.

Young In

Kim

South Korea,

ex.

Eui Shin

Kim

South

Korea, ex.

Ju Eun Park

South Korea,

ex.

9:40 AM (11:10)

Bit Null Ryu

South Korea,

ex.

Jung Eun

Sun

South

Korea, ex.

Fortunate

Magara

Uganda, ex.

David

Baldusi

Alves

Brazil, fr.

Ji Eun Kim

South Korea,

ex.

10:00 AM (11:30)

Carolyne

Muthoni

Kenya, fr.

Dong Hyun

Park

South

Korea, fr.

There will be a reception desk in front of the main circulation desk of the library and a

logistics team to welcome and move our students to and from various parts of the interview.

Team members are: Kerrie, Giovi, Yuhsin, Fanny, and Sanneke Kok, Coordinator of

Academic Services for International Students.

Page 49: Test Design

Original Test Design: Placement Exam 47

Appendix D: Answer Key with Scoring Criteria

I. Listening Comprehension: Watch the video “English Mania” presented by Jay Walker from

Ted Talk (about 4 minutes) and answer the following questions. Please answer in less than 50

words (ALL ANSWERS MUST BE IN COMPLETE SENTENCES EXCEPT FOR QUESTIONS 2

& 3)

Transcript1

Let's talk about manias. Let's start with Beatle mania: hysterical teenagers, crying, screaming,

pandemonium. Sports mania: deafening crowds, all for one idea -- get the ball in the net. Okay,

religious mania: there's rapture, there's weeping, there's visions. Manias can be good. Manias can

be alarming. Or manias can be deadly.

The world has a new mania. A mania for learning English. Listen as Chinese students practice

their English by screaming it.

Teacher: ... change my life!

Students: I will change my life.

T: I don't want to let my parents down.

S: I don't want to let my parents down.

T: I don't ever want to let my country down.

S: I don't ever want to let my country down.

T: Most importantly ... S: Most importantly ...

T: I don't want to let myself down.

S: I don't want to let myself down.

Jay Walker: How many people are trying to learn English worldwide? Two billion of them.

Students: A t-shirt. A dress.

JW: In Latin America, in India, in Southeast Asia, and most of all in China. If you are a Chinese

student you start learning English in the third grade, by law. That's why this year China will

become the world's largest English-speaking country. (Laughter) Why English? In a single word:

Opportunity. Opportunity for a better life, a job, to be able to pay for school, or put better food

on the table. Imagine a student taking a giant test for three full days. Her score on this one

test literally determines her future. She studies 12 hours a day for three years to prepare. 25

percent of her grade is based on English. It's called the Gaokao, and 80 million high school

Chinese students have already taken this grueling test. The intensity to learn English is almost

unimaginable, unless you witness it.

Teacher: Perfect! Students: Perfect!

T: Perfect! S: Perfect!

T: I want to speak perfect English.

S: I want to speak perfect English.

T: I want to speak -- S: I want to speak --

T: perfect English. S: perfect English.

T: I want to change my life!

S: I want to change my life!

1 Walker, J. (2009, February). Jay Walker: World’s English mania [Video file]. Retrieved from

http://www.ted.com/talks/lang/en/jay_walker_on_the_world_s_english_mania.html

Page 50: Test Design

Original Test Design: Placement Exam 48

JW: So is English mania good or bad? Is English a tsunami, washing away other languages? Not

likely. English is the world's second language. Your native language is your life. But with

English you can become part of a wider conversation: a global conversation about global

problems, like climate change or poverty, or hunger or disease. The world has other universal

languages. Mathematics is the language of science. Music is the language of emotions. And now

English is becoming the language of problem-solving. Not because America is pushing it, but

because the world is pulling it. So English mania is a turning point. Like the harnessing of

electricity in our cities or the fall of the Berlin Wall, English represents hope for a better future --

a future where the world has a common language to solve its common problems.

Short-answer questions: Spelling errors are allowed; Deduct 1 pt when the sentences are not

“complete” except for question 2 & 3 (2 pts per question =10 pts total)

1. In your own words, define the word “mania.”

*Acceptable words: enthusiasm, passion, craze, popular trend generating wide

enthusiasms, hysteria, craziness, alarming

2. How many people are trying to learn English worldwide?

*Answer: 2 (two) billion

3. Name at least 3 countries/regions that the speaker mentioned that are manias for English?

*Answer: Latin America, India, Southeast Asia, and China

4. According to the speaker, why are so many people trying to learn English?

*Answer: 2pt: opportunity, for better life, hope, language of problem solving, world’s

second language (full credit); 1pt: “acceptable words”: job, pay for school, put better

food on the table, academic achievement; 0 pt: no mention of any of the words

5. What is the speaker’s opinion on English mania?

*Answer: English mania is more “positive” than negative; English mania is positive; it is

a “turning point”=2 pts; no mention of “good”=0 pts.

Page 51: Test Design

Original Test Design: Placement Exam 49

II. Grammar: The passage was taken from the New York Times newspaper, published on April

3, 2012. Read the following passage and cross out 15 “extra” words that make the sentences

grammatically incorrect (15 pts total).

Example: The boys is are singing the national anthem.

“How Immersion Helps to Learn a Language”2

Answer key: The crossed out words are bolded

Learning (1) a the foreign language is never easy, but contrary to common wisdom, it is possible

for adults to process a language (2) a the same way (3) a the native speaker does. And over time,

processing improves even when the skill goes unused, researchers are reporting.

For (4) there their study, (5) in on the journal PloS One, the scientists used an artificial language

of 13 words, completely different from English. “It’s totally (6) unpractical impractical to

follow someone to high proficiency because it takes years and years,” said the lead author,

Michael Ullman, a neuroscientist at Georgetown University Medical Center.

The language dealt with pieces and moves in (7) a the computer game, and the researchers tested

proficiency by asking test subjects to play (8) a the game.

The subjects (9) are were split into two groups. One group studied the language in a formal

classroom setting, while the other (10) was were trained through immersion.

After five months, both groups retained the language (11) even though because they had not

used it at all, and both displayed brain processing similar to that of a native speaker. But the

immersion group displayed the full brain patterns (12) for of a native speaker, Dr. Ullman said.

The research has several applications, Dr. Ullman said.

“This should help us understand how foreign-language learners can achieve native like

processing with (13) increase increased practice,” he said. “It makes sense that you’d want to

have your brain process like (14) a the native speaker.”

And though it may (15) take takes time, and more research, the work “also could or should help

in rehabilitation of people with traumatic brain injury,” he added.

2 Bahanoo, S. (2012, April 3). How immersion helps to learn a new language. The New York Times. Retrieved from

http://www.nytimes.com/2012/04/03/science/how-immersion-helps-to-learn-a-new-language.html

Page 52: Test Design

Original Test Design: Placement Exam 50

III. Reading Comprehension: (10 pts- 1pt each)

Passage 1: “The World’s Oldest First Grader”

1. Based on the passage, we can infer that before 2003, primary education in Kenya was:

a. Not cheap

b. Not available

c. Prohibited

d. Free

2. Why was Maruge motivated to study?

a. To be in one of the top five students in his class.

b. To use his education to read the Bible.

c. To become the school’s student leader.

d. To study Swahili, English, and math.

3. Who did NOT want Maruge to be in school?

a. Kenyan government

b. First grade parents

c. Jane Obinchu

d. None of the above

4. The main idea in paragraph (E) is:

a. People were fighting and burning houses in the village.

b. It was too difficult to live in a tent at a refugee camp.

c. Maruge did not stop studying, even during those difficult times.

d. Maruge taught other residents of the home to read and write.

5. The main idea in paragraph (G) is:

a. Maruge was an inspiration to other adult Kenyans.

b. Kenyans enjoyed the movie The First Grader.

c. Thoma Litei decided to go to school to learn.

d. The First Grader was created after Maruge’s death.

Passage 2:

1. The author’s attitude to Whorf’s theory is

a. Ambivalent

b. Neutral

c. Supportive

d. Contemptuous

Page 53: Test Design

Original Test Design: Placement Exam 51

2. The word trauma in the passage is closest in meaning to

a. Physical injury

b. Torture

c. Emergency

d. Agony

3. All of the following can be inferred from the text EXCEPT

a. Learning our mother tongue can lead to positive experiences.

b. The influence of mother tongue on our thoughts is significant.

c. Whorf’s theory was based on hard facts and solid common sense.

d. Whorf failed to provide any evidence to support his theory.

4. The author uses the word crash-landed to imply that Whorf’s theory was _________ hard facts

and solid common sense.

a. in favor of

b. based on

c. inconsistent with

d. critical of

5. Which of the sentences below best expresses the essential information in the boldfaced

sentence in the passage?

a. Exploring the relationship between the mother tongue and our thoughts was

frowned upon for decades.

b. People reacted severely and they explored the relationship between the mother tongue

and our thought.

c. Whorf’s theory succeeded in exploring the relationship between the mother tongue and

our thoughts.

d. Whorf’s claims were so credible that no researcher made an attempt to dishonor Whorf

for decades.

Page 54: Test Design

Original Test Design: Placement Exam 52

IV. Mini-essay writing: Write a mini-essay about 180-250 words according to the following

prompt. You will be tested on the following criteria: content, organization, and grammar.

Do you think learning English is important? If so, why or why not? Please provide personal

examples to support your stance (Total 100 pts).

Content Scoring: circle the appropriate score

Clearly relates or answers to the given

topic or question

Clear 5—4—3—2—1—0 Missing

Gives sufficient examples/references Sufficient 5—4—3—2—1—0 Lacking

Clear connection between

examples/references and main ideas

Clear 5—4—3—2—1—0 Missing

Correct use of vocabulary words Correct 5—4—3—2—1—0 Incorrect

Sufficient number of words (180-250) Target #: 5; 160-179 words: 4; 140-159 words: 3;

120-139 words: 2; 100-119 words: 1; less than 100

words: 0

Subtotal: points for content _________/25

Organization Scoring: circle the appropriate score

Topic or introductory sentence Clear 5—4—Not Clear 3—2—1—Missing 0

Concluding sentence Clear 5—4—Not Clear 3—2—1—Missing 0

Coherence (logical progression and

development of ideas, good flow)

Always 5—4—Sometimes 3—2—1 Never 0

Cohesion (good connections between

sentences)

Always 5—4—Sometimes 3—2—1—Never 0

Sentence variety (both simple and

compound and/or complex)

Good Variety 5—4—Some Variety 3—2—1__Never

0

Subtotal: points for organization ________/25

Grammar Scoring: take off one point for each error in the

categories indicated. Circle the # of remaining pts.

Correct spelling (subtract 1 pt .ea. new

error)

5 4 3 2 1 0

Correct use of articles and prepositions 5 4 3 2 1 0

Standard capitalization 5 4 3 2 1 0

Standard punctuation (periods, commas,

semicolons)

5 4 3 2 1 0

Standard sentence word order 5 4 3 2 1 0

Agreement between subjects & verbs,

nouns and pronouns/antecedents

5 4 3 2 1 0

Correct verb tense and usage 5 4 3 2 1 0

Correct adverb and adjective usage 5 4 3 2 1 0

Appropriately placed phrasal modifiers 5 4 3 2 1 0

Standard academic diction (avoidance of

slang and informal language)

5 4 3 2 1 0

Subtotal: points for grammar _______/50

TOTAL POINTS _______/100

Page 55: Test Design

Original Test Design: Placement Exam 53

V. Oral Interview: 25 pts total

In the United States, many universities require students to learn an additional language other

than their native language.

Do you think universities in your home country should require students to learn an additional

language (other than your native language)? Why or why not?

You have 2 minutes to prepare. Use the space below to write down an outline or important points

that you want to discuss. You will be given maximum 3 minutes to answer the question. You

can use your notes to talk but do not read aloud what you have written out. Please relate the

issue to your personal experience and cultural background.

*For this subjectively scored portion, the following criteria will be assessed: Content and

Fluency & Accuracy; we will use an “analytic scale”:

Oral Interview Criteria

Content Scoring: circle the appropriate score

Clearly relates or answers to the given topic or

question

Clear 5—4—3—2—1—0 Missing

Gives adequate and meaningful examples/references Sufficient 5—4—3—2—1—0 Lacking

Clear connection between examples/references and

main ideas

Clear 5—4—3—2—1—0 Missing

Correct use of vocabulary words Correct 5—4—3—2—1—0 Incorrect

Accuracy

Correct use of grammar Correct 5—4—3—2—1—0 Incorrect

Clear pronunciation of words Clear 5—4—Not Clear 3—2—1—0

Missing

Fluency

Coherence (logical progression and development of

ideas, good flow)

Always 5—4—Sometimes 3—2—1— 0

Never

Fluency in speech (with few use of circumlocution

and few hesitation)

Fluent 5—4— Somewhat Fluent 3—2—

1—0 Not Fluent

TOTAL POINTS ______/40

Page 56: Test Design

Original Test Design: Placement Exam 54

Appendix E: Dordt College Placement Test (DCPT)

Instruction: The placement test consists of 5 sections (about 1 hour

total)

For the listening comprehension, the test instructor will play a short

video.

I. Listening Comprehension (10 minutes)

II. Grammar (5 minutes)

III. Reading Comprehension (20 minutes)

IV. Mini-Essay Writing (20 minutes)

After you complete all four sections, submit your test to the test

instructor and schedule a time to do the oral interview section. The test

instructor will provide you the oral interview section of the test. You

will be interviewed individually and the interview will be audio-

recorded.

V. Oral Interview (about 5 minutes)

Name: ___________________

Page 57: Test Design

Original Test Design: Placement Exam 55

I. Listening Comprehension: Short-Answer Questions

Watch the video “World’s English Mania” presented by Jay Walker from Ted Talk (about 4

minutes).3 Use the space below to take notes. After watching the video, answer the following five

questions. Please answer in less than 50 words (ALL ANSWERS MUST BE IN COMPLETE

SENTENCES EXCEPT FOR QUESTIONS 2 & 3)

3 Walker, J. (2009, February). Jay Walker: World’s English mania [Video file]. Retrieved from

http://www.ted.com/talks/lang/en/jay_walker_on_the_world_s_english_mania.html

Use this space to take notes

TURN TO NEXT PAGE FOR QUESTIONS

Page 58: Test Design

Original Test Design: Placement Exam 56

Please answer in less than 50 words (ALL ANSWERS MUST BE IN COMPLETE

SENTENCES EXCEPT FOR QUESTIONS 2 & 3)

1. In your own words, define the word “mania.”

2. How many people are trying to learn English worldwide?

3. Name at least 3 places (countries or regions) that the speaker mentioned that HAVE

manias for English?

4. According to the speaker, why are so many people trying to learn English?

5. What is the speaker’s opinion of English manias?

II. Grammar: The passage was taken from the New York Times newspaper, published on April

Page 59: Test Design

Original Test Design: Placement Exam 57

3, 2012.4 Read the following passage and cross out 15 “extra” words that make the sentences

grammatically incorrect.

Example: The boys is are singing the national anthem.

“How Immersion Helps to Learn a Language”

Learning a the foreign language is never easy, but contrary to common wisdom, it is possible for

adults to process a language a the same way a the native speaker does. And over time, processing

improves even when the skill goes unused, researchers are reporting.

For there their study, in on the journal PloS One, the scientists used an artificial language of 13

words, completely different from English. “It’s totally unpractical impractical to follow someone

to high proficiency because it takes years and years,” said the lead author, Michael Ullman, a

neuroscientist at Georgetown University Medical Center.

The language dealt with pieces and moves in a the computer game, and the researchers tested

proficiency by asking test subjects to play a the game.

The subjects are were split into two groups. One group studied the language in a formal

classroom setting, while the other was were trained through immersion.

After five months, both groups retained the language even though because they had not used it at

all, and both displayed brain processing similar to that of a native speaker. But the immersion

group displayed the full brain patterns for of a native speaker, Dr. Ullman said.

The research has several applications, Dr. Ullman said.

“This should help us understand how foreign-language learners can achieve native like

processing with increase increased practice,” he said. “It makes sense that you’d want to have

your brain process like a the native speaker.”

And though it may take takes time, and more research, the work “also could or should help in

rehabilitation of people with traumatic brain injury,” he added.

4 Bahanoo, S. (2012, April 3). How immersion helps to learn a new language. The New York Times. Retrieved from

http://www.nytimes.com/2012/04/03/science/how-immersion-helps-to-learn-a-new-language.html

Page 60: Test Design

Original Test Design: Placement Exam 58

III. Reading Comprehension:

Passage 1: The passage was taken from the National Geographic Learning.5 Read the passage

below and answer the multiple-choice questions following the passage. Circle the letter of the

best answer.

“The World’s Oldest First Grader”

On January 12, 2004, Kimani Maruge knocked on the door of the primary school in his

village in Kenya. It was the first day of school, and he was ready to start learning. The

teacher let him in and gave him a desk. The new student sat down with the rest of the first

graders—six- and seven-year-old boys and girls. However, Kimani Maruge was not an

ordinary first grader. He was 84 years old—the world’s oldest first grader.

Kimani Maruge was born in Kenya in 1920. At that time, primary education in Kenya

was not free, and Maruge’s family didn’t have enough money to pay for school. When

Maruge grew up, he worked hard as a farmer. In the 1950s, he fought with other Kenyans

against the British colonists. After years of fighting, Kenya became independent in 1963.

In 2003, the Kenyan government began offering free primary education to everyone, and

Maruge wanted an education, too. However, it wasn’t always easy for Maruge to attend

school. Many of the first graders’ parents didn’t want an old man in their children’s class.

School officials said that a primary education was only for children. But the school

principal, Jane Obinchu, believed Maruge was right. With her help, he was able to stay in

school.

Maruge was a motivated and successful student. In fact, he was one of the top five

students in his first grade class. In second grade, Maruge became the school’s student

leader. He went as far as seventh grade, the final year of primary school. Over the years,

Maruge studied Swahili, English, and math. He wanted to use his education to read the

Bible and to study veterinary medicine.

In 2008, there were problems in Kenya after an election. People were fighting and

burning houses in Maruge’s village. Maruge moved to a refugee camp for safety and

lived in a tent. However, even during those difficult times he continued to go to school.

Later that year, he moved to a home for the elderly. He continued going to school, and

even taught other residents of the home to read and write.

In 2005, Maruge flew in a plane for the first time in his life. He traveled to New York

City, where he gave a speech at the United Nations. He spoke about the importance of

education and asked for help to educate the people of Kenya. Maruge also wanted to

improve primary education for children in Africa.

5 The passage was printed in Vargo, M. & Blass, L. (2013). Pathways 1: Reading, writing, and critical thinking.

Boston: National Geographic Learning.

A

B

C

D

E

F

Page 61: Test Design

Original Test Design: Placement Exam 59

Maruge died in 2009, at age 89. However, his story lives on. The 2010 movie The First

Grader showed Maruge’s amazing fight to get an education. Many older Kenyans

decided to start school after seeing The First Grader. One of those people was 19-year-

old Thoma Litei. Litei said, “I knew it was not too late. I wanted to read, and to know

more language, so I came [to school] to learn. That is why it is important for his story to

be known.”

1. Based on the passage, we can infer that before 2003, primary education in Kenya was:

a. Not cheap

b. Not available

c. Prohibited

d. Free

2. Why was Maruge motivated to study?

a. To be in one of the top five students in his class.

b. To use his education to read the Bible.

c. To become the school’s student leader.

d. To study Swahili, English, and math.

3. Who did NOT want Maruge to be in school?

a. Kenyan government

b. First grade parents

c. Jane Obinchu

d. None of the above

4. The main idea in paragraph (E) is:

a. People were fighting and burning houses in the village.

b. It was too difficult to live in a tent at a refugee camp.

c. Maruge did not stop studying, even during those difficult times.

d. Maruge taught other residents of the home to read and write.

5. The main idea in paragraph (G) is:

a. Maruge was an inspiration to other adult Kenyans.

b. Kenyans enjoyed the movie The First Grader.

c. Thoma Litei decided to go to school to learn.

d. The First Grader was created after Maruge’s death.

G

Page 62: Test Design

Original Test Design: Placement Exam 60

Passage 2: The following extract was taken from the article “Does Your Language Shape How

You Think?” published in the New York Times magazine6. Read the passage below and answer

the multiple-choice questions following the passage. Circle the letter of the best answer.

Benjamin Lee Whorf’s theory crash-landed on hard facts and solid common sense, when

it transpired1 that there had never actually been any evidence to support his fantastic claims. The

reaction was so severe that for decades, any attempts to explore the influence of the mother

tongue on our thoughts were relegated2 to the loony

3 fringes

4 of disrepute

5. But 70 years on,

it is surely time to put the trauma of Whorf behind us. And in the last few years, new research

has revealed that when we learn our mother tongue, we do after all acquire certain habits of

thought that shape our experience in significant and often surprising ways.

1. The author’s attitude to Whorf’s theory is

a. Ambivalent

b. Neutral

c. Supportive

d. Contemptuous

2. The word trauma in the passage is closest in meaning to

a. Physical injury

b. Torture

c. Emergency

d. Agony

3. All of the following can be inferred from the text EXCEPT

a. Learning our mother tongue can lead to positive experiences.

b. The influence of mother tongue on our thoughts is significant.

c. Whorf’s theory was based on hard facts and solid common sense.

d. Whorf failed to provide any evidence to support his theory.

Turn to the next page

6 Deutscher, G. (2010, August 26). Does your language share how you think? The New York Times. Retrieved from

http://www.nytimes.com/2010/08/29/magazine/29language-t.html?pagewanted=all

Vocabulary word-bank:

1. transpire: occur, happen 2. relegate: assign, transfer 3. loony: crazy

4. fringe: border, trimming 5. disrepute: dishonor

Page 63: Test Design

Original Test Design: Placement Exam 61

4. The author uses the word crash-landed to imply that Whorf’s theory was _________ hard facts

and solid common sense.

6. in favor of

7. based on

8. inconsistent with

9. critical of

5. Which of the sentences below best expresses the essential information in the boldfaced

sentence in the passage?

e. Exploring the relationship between the mother tongue and our thoughts was frowned

upon for decades.

f. People reacted severely and they explored the relationship between the mother tongue

and our thoughts.

g. Whorf’s theory succeeded in exploring the relationship between the mother tongue and

our thoughts.

h. Whorf’s claims were so credible that no researcher made an attempt to dishonor Whorf

for decades.

Page 64: Test Design

Original Test Design: Placement Exam 62

IV. Mini-essay writing

Write a mini-essay about 180-250 words according to the following prompt. You will be tested

on the following criteria: content, organization, and grammar. Feel free to use the back page for

more space.

Do you think learning English is important? If so, why or why not? Please provide personal

examples to support your stance (in addition, you may refer to what you have learned from the

video “English Mania”).

END OF SECTION IV

SUBMIT YOUR TEST AND SCHEDULE AN ORAL INTERVIEW

Page 65: Test Design

Original Test Design: Placement Exam 63

NAME:_________________

V. Oral Interview:

In the United States, many universities require students to learn an additional language other

than their native language.

Do you think universities in your home country should require students to learn an additional

language (other than your native language)? Why or why not?

You have 2 minutes to prepare. Use the space below to write down an outline or important points

that you want to discuss. You will be given maximum 3 minutes to answer the question. You

can use your notes to talk but do not read aloud what you have written out. Please relate the

issue to your personal experience and cultural background.

Page 66: Test Design

Original Test Design: Placement Exam 64

Appendix F: Getting Started Worksheet (Alderson, Clapham, & Wall, 1995)

Worksheet for Getting Started on Your Original Test

1. What is the purpose of the test? (How will the information you gather be used? Are you

measuring achievement or progress? Are you placing students in a program?)

This test serves as an entrance (placement) test for incoming international students (normally 10-

12 students per semester). These international students will either attend college for all four years

(we’ll call them “regular” students”) or just for one year as “exchange” student. All international

students have to take this test—this test will determine whether they can take general English

core requirement (e.g., ENG 101). If the students do not pass this test, then they are required to

take the ESL courses—Reading & Writing and/or Speaking & Listening. It is only one level.

Students will be required to take either one of the ESL courses or to take both.

2. What sort of learners will be taking the test? (Describe the 2LLs’ age, first language[s],

purpose for learning the target language, etc.)

All international students who are admitted to Dordt College will be required to take this test. As

mentioned previously, these international students could either be regular or exchange students.

They come from all different countries with diverse L1’s. However, based on our interview with

the ESL professor, most of the students are from South Korea; there are some students from

Turkey, Mexico, and various African countries. The age group ranges from 18 to 25 years old.

3. What language skills should be tested (reading, writing, speaking and/or listening)?

We will be testing all four skills. Since there are not very many international students (around

10-12 per semester) and only one ESL level, we decided to revise the current placement/entrance

exam. Dordt College is Hala Sun’s alma mater.

4. What language elements should be tested (grammar, vocabulary, pronunciation, speech acts,

etc.)?

I. Listening comprehension: Content, Comprehension

II. Grammar: Grammar

III. Reading comprehension: Vocabulary, Comprehension, Grammar

IV. Mini-essay writing: Content, Organization, Grammar

V. Oral interview: Content, Fluency and Accuracy (Grammar, Pronunciation, Coherence, and

Fluency)

5. What target language situation is envisaged for the test, and is this to be simulated in some way

in the test content and method? (For instance, is this a test of academic French? Of English for

international TAs? Of Japanese for hotel workers in California?)

English for Academic Purposes in college

Page 67: Test Design

Original Test Design: Placement Exam 65

6. What text types should be chosen as stimulus material—written and/or spoken?

I. Listening comprehension: 1 approximately four-minute video/audio clip (a speech about

“World’s English mania” taken from Ted Talk)

II. Grammar: 1 written text with grammatical errors

III. Reading comprehension: 2 written texts (academic in nature)

IV. Mini-essay writing: 1 written essay question

V. Oral interview: The administrator will give students a role-play scenario, in which students

will have two minutes to prepare their speech and three minutes to perform their speech orally.

7. What sort of tasks are required -- discrete point, integrative, simulated ‘authentic’,

objectively assessable? (That is, what will the test-takers actually do?)

I. Listening comprehension: Students will watch and listen to a video clip (a speech from Ted

Talk). Students can take notes while watching/listening to the video clip. Students will then

answer 5 short answer questions (they can answer the questions while they are

watching/listening to the video).

II. Grammar: This is a cloze elide test. Students have to read a text and cross out 15 “extra”

words that make the sentences grammatically incorrect. This requires students editing skills as

well as their knowledge in grammar. This is objectively assessable as well since there will be

exact answers (words that need to be crossed out). Test scorers will only count the correctly

crossed answers; students do not lose points for incorrectly crossing (students will not be aware

of this specific aspect of the scoring method to avoid crossing out all or many words as they

can).

III. Reading comprehension: This test is objectively assessable (multiple-choice questions).

After reading two passages, students are required to answer the MC-questions by choosing the

best answer (questions will cover comprehension, vocabulary, and grammar aspects).

IV. Mini-essay writing: For this test, students have to read the prompt (subject/question/topic of

the essay) and write an essay (hand-written); they will be required to write minimum 180 and

maximum 250-word essay.

V. Oral interview: It is an integrative test examining the use of language elements (Grammar,

Vocabulary, Fluency, Comprehension, and Pronunciation). The test takers will have two minutes

to prepare their speech and three minutes to perform their speech orally. There will be an

analytic scale to assess these language elements.

8. What test methods (what item formats) are to be used? (One multiple-choice subtest is

required.)

I. Listening comprehension: 5 short answer questions

II. Grammar: 1 cloze elide test; crossing out “extra” (grammatically incorrect) words from 1

written text (15 crossed-out words in total)

Page 68: Test Design

Original Test Design: Placement Exam 66

III. Reading comprehension: two 5 multiple-choice questions (total of 10 questions)

IV. Mini-essay writing: 1 written essay question

V. Oral interview: Responding to 1 question (given orally)

9. How many sections should the test have, how long should they be and how will they

be differentiated? (There will be at least three sections – more if you are working with another

student.)

I. Listening comprehension: about 10 minutes

II. Grammar: 5 minutes to read and cross out “extra”/grammatically incorrect words

III. Reading comprehension: 20 minutes

IV. Mini-essay writing: 20 minutes to answer 1 essay question

V. Oral interview: about 5 minutes

10. How many items are required for each section? What is the relative weight for each

item?

I. Listening comprehension: 5 short answer questions (each is worth 2 points; 10 pts max)

II. Grammar: 1 written text with 15 “extra”/grammatically incorrect words (15 items; 1 pt each;

15 pts max)

III. Reading comprehension: 10 questions (two sections of 5 questions (10 pts; 1 pt each item)

IV. Mini-essay writing: 1 essay question (but more than one point for scoring; 100 pts max)

V. Oral interview: One 3-minute speech (40 pts max)

TOTAL Maximum Points: 175 pts

11. What rubrics are to be used as instructions for candidates? (That is, what instructions

and guidance are printed in the test and/or announced by the test administrator?)

Instructions and guidance are printed in the test in English. For oral interview, test administrators

will read the instruction to the student. The student will then be given 2 minutes to prepare

his/her speech, responding to the prompt. Once the time is up, the test administrator notifies the

student and gives him/her 3 minutes to respond. For listening comprehension, test administrators

will play the audio/video file. Students can take notes and proceed to answer the short answer

questions as they listen/watch the clip. The audio/video file will only be played once.

12. Which criteria will be used for assessment by markers? (In other words, describe how

the answer key will be developed for the objectively scored portion, and explain the

rating system for the subjectively scored portion.)

I. Listening comprehension: For this subjectively scored portion, the following criteria will be

assessed: Content and Comprehension (understanding the main points).

Listening Comprehension Criteria

Spelling errors are allowed; Deduct 1 pt when the sentences are not “complete” except for

question 2 & 3. Total possible points: 10 pts.

Page 69: Test Design

Original Test Design: Placement Exam 67

10. In your own words, define the word “mania.”

*Acceptable words: enthusiasm, passion, desire, craze, popular trend generating wide

enthusiasms, hysteria, craziness, alarming, deeply fascinated

11. How many people are trying to learn English worldwide?

*Answer: 2 (two) billion

12. Name at least 3 countries/regions that the speaker mentioned that are manias for English?

*Answer: Latin America, India, Southeast Asia, and China; 2 pts when three countries

are mentioned; only 1 pt when two countries are correct (one country is incorrect); 0

points for no answer or none of these countries are mentioned

13. According to the speaker, why are so many people trying to learn English?

*Answer: 2pt: opportunity, for better life, hope, language of problem solving, world’s

second language (full credit); 1pt: “acceptable words”: job, pay for school, put better

food on the table, academic achievement; 0 pt: no mention of any of the words

14. What is the speaker’s opinion on English mania?

*Answer: English mania is more “positive” than negative; Speaker’s opinion is neutral is

OK; English mania is positive; it is a “turning point”=2 pts; no mention of “good”=0 pts.

II. Grammar: Objectively scored. For each “extra” word (choice a) along with the correct word

(choice b), we will item that as (Question 1). Students get one mark for each correct answer (15

total).

III. Reading comprehension: MC questions— Students get one mark for each correct answer

(10 total). After piloting this test, we will do the following analyses: Item-discriminability, Item

facility, distractor analysis, and response frequency distribution. These analyses would enable us

to find out more about the questions and the choices we wrote.

IV. Mini-essay writing: Subjectively scored; the following criteria will be assessed; Content,

Organization, and Grammar; we will calculate the scores based on our essay criteria and

categorize the scores into following score system (analytic). We will use the interrater reliability

to test the validity of this section.

Essay Criteria

Content Scoring: circle the appropriate score

Clearly relates or answers to the given

topic or question

Clear 5—4—3—2—1—0 Missing

Gives sufficient examples/references Sufficient 5—4—3—2—1—0 Lacking

Clear connection between

examples/references and main ideas

Clear 5—4—3—2—1—0 Missing

Correct use of vocabulary words Correct 5—4—3—2—1—0 Incorrect

Sufficient number of words (180-250) Target #: 5; 160-179 words: 4; 140-159 words: 3;

120-139 words: 2; 100-119 words: 1; less than 100

words: 0

Subtotal: points for content _________/25

Page 70: Test Design

Original Test Design: Placement Exam 68

Organization Scoring: circle the appropriate score

Topic or introductory sentence Clear 5—4—Not Clear 3—2—1—Missing 0

Concluding sentence Clear 5—4—Not Clear 3—2—1—Missing 0

Coherence (logical progression and

development of ideas, good flow)

Always 5—4—Sometimes 3—2—1 Never 0

Cohesion (good connections between

sentences)

Always 5—4—Sometimes 3—2—1—Never 0

Sentence variety (both simple and

compound and/or complex)

Good Variety 5—4—Some Variety 3—2—1__Never

0

Subtotal: points for organization ________/25

Grammar Scoring: take off one point for each error in the

categories indicated. Circle the # of remaining pts.

Correct spelling (subtract 1 pt .ea. new

error)

5 4 3 2 1 0

Correct use of articles and prepositions 5 4 3 2 1 0

Standard capitalization 5 4 3 2 1 0

Standard punctuation (periods, commas,

semicolons)

5 4 3 2 1 0

Standard sentence word order 5 4 3 2 1 0

Agreement between subjects & verbs,

nouns and pronouns/antecedents

5 4 3 2 1 0

Correct verb tense and usage 5 4 3 2 1 0

Correct adverb and adjective usage 5 4 3 2 1 0

Appropriately placed phrasal modifiers 5 4 3 2 1 0

Standard academic diction (avoidance of

slang and informal language)

5 4 3 2 1 0

Subtotal: points for grammar _______/50

TOTAL POINTS _______/100

V. Oral interview: For this subjectively scored portion, the following criteria will be assessed:

Grammar, Vocabulary, Fluency, Comprehension, and Pronunciation; we will use an “analytic

scale”; we will use interrater reliability to test the validity of this section.

Oral Interview Criteria

Content Scoring: circle the appropriate score

Clearly relates or answers to the given topic or

question

Clear 5—4—3—2—1—0 Missing

Gives adequate and meaningful examples/references Sufficient 5—4—3—2—1—0 Lacking

Clear connection between examples/references and

main ideas

Clear 5—4—3—2—1—0 Missing

Correct use of vocabulary words Correct 5—4—3—2—1—0 Incorrect

Page 71: Test Design

Original Test Design: Placement Exam 69

Accuracy

Correct use of grammar Correct 5—4—3—2—1—0 Incorrect

Clear pronunciation of words Clear 5—4—Not Clear 3—2—1—0

Missing

Fluency

Coherence (logical progression and development of

ideas, good flow)

Always 5—4—Sometimes 3—2—1— 0

Never

Fluency in speech (with few use of circumlocution

and few hesitation)

Fluent 5—4— Somewhat Fluent 3—2—

1—0 Not Fluent

TOTAL POINTS ______/40

Page 72: Test Design

Original Test Design: Placement Exam 70

Appendix G

Weshe’s (1983) Four Components Framework

Subtest Stimulus

Materials

Task Posed to

the Learner

Learner’s

Response

Scoring Criteria*

Listening The test-taker

watches a video

clip of “English

Mania” presented

by Jay Walker

(2009). The test

also contains five

short-answer

questions related to

the content of the

video.

The test-taker

must watch and

listen to the

video and

identify

important

information.

The test-taker

must write

down their

responses to

the questions.

Questions 2 and 3

(requiring specific

number and country

names) are marked

using the exact word

method. The remaining

questions are marked

using the acceptable

word method. Students

are given either 2 points

or 0 points. For

Question 3, partial credit

(1 pt) is given when at

least two correct

countries are mentioned.

Grammar The test-taker reads

an article from the

New York Times

(Bahanoo, 2012).

The test-taker

must identify 15

extra words

inserted within a

sentence that

makes the

sentence

ungrammatical

based on the

structural rules

of English; the

test-taker must

pay attention to

the details of the

reading to find

multiple

grammar errors,

such as use of

articles and

tenses.

The test-taker

must cross out

the extra

words.

The test-taker gets

points when he/she

crosses out the exact

incorrect words.

Page 73: Test Design

Original Test Design: Placement Exam 71

*Note. The keys and rubrics of the scoring criteria were all pre-established by the test designers,

although the rubric of the oral interview was modified subject to the students’ responses from the

piloting tests.

Appendix G (Con’t)

Wesche’s (1983) Four Components Framework

Subtest Stimulus

Materials

Task Posed to the

Learner

Learner’s

Response

Scoring Criteria*

Reading The test-taker

reads 1 long

passage and 1

short passage.

The test contains

5 multiple-

choice questions

for each passage.

The test-taker must

identify the main

ideas of the

readings and

define the meaning

of the words

within the given

context.

The test-taker

must circle the

letter

representing the

answer to a

question.

The test-taker gets

points when they circle

the correct letters of the

multiple-choice

questions, as determined

by the established key.

Mini-

essay

Writing

An essay prompt

is presented to

the test-taker.

The test-taker must

read and answer to

the given prompt.

He/She must

compose an

organized writing

with sufficient

examples and

correct use of

vocabulary and

grammar.

The test-taker

must write an

essay about

180-250 words

that states,

explains, and

supports his/her

opinion on the

given prompt.

The test-taker’s essay is

subjectively scored

based on an analytic

rubric set by the test

designers. The rubric

consists of three

sections, content,

organization and

grammar.

Oral

Interview

A role-play

scenario is given

to the test-taker.

The test-taker must

read the prompt,

understand the

context and adopt

the role given in

the scenario.

The test-taker

must take 2

minutes to

prepare a

persuasive

speech that

states, explains,

and supports

his/her opinion

on the given

topic and

deliver it within

3 minutes.

The test-taker’s speech

is subjectively scored

based on an analytic

rubric set by the test

designers. The rubric

evaluates two aspects of

a speech which are

content and fluency and

accuracy.

Page 74: Test Design

Original Test Design: Placement Exam 72

Appendix H

Swain’s (1980) Four Principles of Communicative Language Test Development

Subtest Start from

somewhere

Concentrate on

content

Bias for best Work for

washback

Listening Our choice of

this procedure is

motivated by our

intention to

simulate an

academic

situation in

which students

are given a

lecture.

Since the test-

takers are

international

students, the

topic of English

learning is

relevant to them

and the video

also serves to

activate test-

takers’ schemata.

The test-takers can

get visual support

besides the audio

input. Also, they

are allowed to take

notes when

watching the video.

The spelling errors

are not marked in

test-takers’

responses to the

comprehension

questions.

The test-takers can:

Experience a

situation of

taking a real

academic

lecture.

Practice note-

taking skills.

Grammar Citing Larsen-

Freeman’s

(1991, 1997),

Brown (2010)

defines

grammatical

knowledge as:

grammatical

forms,

grammatical

meanings and

pragmatic

meanings.

Students can

relate the content

to their own

experience in

language

learning.

The subtest

assesses multiple

grammar points,

such as use of

articles, adjectives

and verb tense.

The test-takers can:

Learn to pay

attention to the

details of the

reading

passages.

Know the

meanings are

associated with

the grammatical

forms.

Page 75: Test Design

Original Test Design: Placement Exam 73

Appendix H (Con’t)

Swain’s (1980) Four Principles of Communicative Language Test Development

Subtest Start from

somewhere

Concentrate on

content

Bias for best Work for

washback

Reading

The design of the

subtest was driven by

both the top-down-

processing and the

bottom-up-processing

of reading

comprehension

(Richards & Schmidt,

2010).

Consistent with

the content of

the previous

subtests, the

two articles are

also about

language

learning.

The definitions of

some difficult

vocabulary terms

are given in the

test.

Key words and

key sentences are

either underlined

or bolded for

attention.

Paragraphs are

marked with

alphabetic letters

for the

convenience of

reference.

The test-takers can:

Expand their

vocabulary

knowledge.

Learn to use

context to

interpret the

meanings of the

words.

Identify the

main ideas from

the readings.

Paraphrase the

reading.

Mini-essay

Writing

Through essay writing

task, we are able to

identify students’

strengths and

weaknesses, including

grammar usage and

vocabulary

knowledge.

The essay

prompt,

whether

learning English

is important or

not, has been

developed

through the

previous

subtests.

The test-takers can

use the materials

provided on the

test to support

their opinions.

The test-takers can:

Write in a

simulated

academic

context.

Compose an

argumentative

essay.

Incorporate

sufficient

sources into the

writing.

Oral

Interview

Besides the concern of

using direct test to

measure the test-

takers’ oral

competence, the

construct of the oral

test was also inspired

by the frequent

situations where

students are required

to orally express their

opinions supported by

examples in academic

settings.

The content is

related with the

theme of the

test, language

learning.

The test-takers can

use the materials

provided on the

test to support

their opinions.

The test-takers

have 2 minutes to

prepare and jot

down some notes

for their speech.

The test-takers can:

Experience a

simulated

academic

presentation.

Give a

persuasive

speech.