Thesis Proposal

1. TOPIC

AN ANALYSIS OF ENGLISH FINAL TEST OF THE FIRST SEMESTER

STUDENTS GRADE V MADE BY MGMP OF ENGLISH OF NATIONAL

EDUCATION MINISTRY OF SEMARANG AND MGMP OF ENGLISH

OF RELIGION MINISTRY OF SEMARANG

2. BACKGROUND OF THE STUDY

Evaluation is a common term we always hear in our daily life. Evaluation is

used when we want to know the progress and result what we have been done.

When we work or do some jobs, evaluation is useful to know whether our work

done well or not. Evaluation can be used to know and check if there is some

obstacle that blocked our plan.

One of several ways in doing an evaluation can be done by making an

assessment. The last term is more special and narrower than previous one. It is

because assessment is part of evaluation. Evaluation can be done by making an

assessment, but evaluation occurs in some ways by observation or performance

judgment during the process. Assessment itself is more familiar in education

domain rather than other since teacher, trainer, or education practitioner use the

term to measure and analyze how far students understand material they taught.

Somehow, it is done by some employers to assess their employee to know the

progress of their work. For example, a director makes an assessment and appraisal

to analyze her or his employee’s work; a headmaster does it to assess teachers’

1

work; and a teacher used the term to assess and evaluate students’ understanding

and achievement.

To assess students’ understanding and know their achievement on material

which has been taught by teachers, usually they give their students some questions

in form of a test. Students have to answer with the correct answer related to the

subject’s material. The question can be in form of essay test in which students

have to write in some sentences the theories and their understanding on it. Beside,

teachers can give the questions in form multiple-choices to simply check students’

understanding and acceptance on the material. However, every form of the

assessment have their plus and minus aspects that the teachers should know. It is

teachers’ authority to choose what form of questions they want to give to the

students. And it can be based on what aspects of students’ intelligence that

teachers want to know.

Assessing language subject in this case English as foreign language, is little

bit different from assessing other non-language subject. Non-language subject is

presented by questions and item tests in native language so what teachers want to

knows here is about the scientific material and knowledge related to the subject

students know. Whereas, testing language subject does not just examine on the

knowledge and science of the subject, but practically it should cover some skills

in mastering foreign language in order that students can be said as successful

learners. Thus, in language testing the questions have to be able to measure

language learners in mastering listening, speaking, reading and writing in foreign

language. Of course, the skills they have to master are in line with students’ level

2

of education. It is for example that in level of senor high school, the students

should master at least two or three skills in minimum standard. It means that even

though they are not able to speak communicatively in English and write in a good

arrangement, at least they understand when they listen to the conversation and

read some statements in it. It can be assumed that in the level of elementary

school, what students should master is not the same with senior high school one.

In this level, they can be said as master in English when they can memorize some

vocabularies related to themes and use them in the context of situations. It means

that, students in elementary school can only understand and make statements in

foreign language in very simple sentences or paragraph both orally or written. But

usually teacher in this level only focus on students’ mastering in vocabulary.

Considering that language testing of elementary school is more focus on

vocabulary mastering and the use of it in context, it usually comes to the students

in multiple-choice formats. The teachers prefer using this format because it simply

can measure students’ mastering in vocabulary. Even though in multiple-choices

test items not only vocabularies mastering we can measure, but also another

aspects such as grammar, and language context. In addition, to evaluate other

aspects, teachers usually combine multiple-choice questions with essay test items.

So that it is teachers’ authority to use whatever format they want based on the

purpose. Sometime, in any cases teachers cannot make by themselves test items

given to their students. It is because the test pack is already prepared by an

institution that has right to make the test items. In this case, what teachers can do

is only preparing the material for students to face the test. In Indonesia, this case

3

is such National Examination for graduation of each education level, and final test

of every semester. Ministry of National Education has the authority to make the

test items, because in Indonesia every school should go with the tide of its official

regulation. Besides that, Ministry of Religion has the same authority to make test

pack for some schools. The schools that follow the education rule of Ministry of

Religion are those which have more religion subject on their curriculum, in this

case Islamic. Therefore, the schools induce on two regulation of education;

Ministry of National Education and Ministry of Religion.

Problem arises when there are two different test pack form the same grade

of each education level. A question come up whether or not test pack organized by

Ministry of National Education has the same quality and characteristics with the

one arranged by Ministry of Religion. If there are any differences, for example

test pack made by Ministry of National Education easier than those made by

Ministry of Religion or vice versa, it is unfair to one side. Another case is that one

of the test packs may not appropriate with instructional material, in this case

Standard and Basic Competence.

From the problem explained above, the writer want to analyze English final

semester test pack made by MGMP English of Ministry of National Education

compared by the one made by MGMP English of Ministry of Religion. In the end

of the study, the writer hopes that there is no difference between the two test

packs. If they exists, from this study the differences can be fixed so that the

education system from both ministries can go in line one to another.

4

3. IDENTIFICATION OF THE PROBLEMS

Analysis of items of two test-packs with the aim to know whether they are a type

of good test or not, can be done by using some quantitative measurement. It is

such as the measurement of validity, reliability, level of difficulty, discrimination

power, and item distractors. Yet, if we want to study further about analysis of test

items, we should not only tie on quantitative analysis. We can use qualitative

analysis to evaluate several non-statistical aspects of test items. The study of

qualitative analysis can be covers with some analysis on the appropriateness of

test items with materials in teaching and learning process (school-based

curriculum), test construction, and language used on test items.

4. FORMULATION OF THE PROBLEMS

In order not to discuss something irrelevant the writer has limited the discussion

by presenting and focusing her attention to the following problems:

4.1 How is the quality of the English Final test of first semester students made

by MGMP English of the Ministry of National Education and Ministry of

Religion of Semarang in terms of validity, reliability, difficulty level,

discrimination power, and item distractors?

4.2 How are the appropriateness of those test items in terms of instructional

materials (Standard Competence and Basic Competence), Test

Construction, and Language Use?

4.3 What are differences of those tests items made by MGMP English of

Ministry of National Education and Ministry of Religion of Semarang?

5

5. OBJECTIVES OF THE STUDY

Based on the formulated problems above this study has several objectives. They

are elaborated as follows:

5.1 To describe how the quality of the English Final tests of first semester

students made by MGMP English of the Ministry of National Education

and Ministry of Religion of Semarang in terms of difficulty level,

Discrimination Power, validity and reliability are.

5.2 To describe how the appropriateness of those test items in terms of

instructional materials (Standard Competence and Basic Competence), Test

Construction, and Language Use are.

5.3 To explain the differences of those tests items made by MGMP English of

Ministry of National Education and Ministry of Religion of Semarang?

6. SIGNIFICANCES OF THE STUDY

Related to the objectives of the study, this analysis is made to give some

advantages as elaborated in some paragraphs below. There are three major

significances that this study wants to give.

The first one is theoretical significance. This study may give basic

understanding toward teacher, educator, trainer, and others that assessment and

evaluation cannot be made and assumed only based on students or one’s outer

performance or guessing in some cases. They should know that that test items

should be made to evaluate students’ understanding and ability. The tests are

useful also to develop their professionalism as being an educator.

6

Second one is practical significances. This study is beneficial for the test

makers as additional reference in constructing and analyzing test items and their

procedures.

And the last one is pedagogical significance. This study provides English

teachers especially elementary schools’ teachers with some meaningful and useful

information for efficient class discussion of the test result, the general

improvement of classroom instruction, evaluation in teaching learning process,

and improvement in test making.

7. LIMITATION OF THE STUDY

The limitation of the study is written to limit and border the research so that it will

not go further that the researcher want to discuss about. This study is quantitative

and qualitative research. It studies about such test items in form of multiple-

choice questions. This test will be analyzed using quantitative approach in this

case its statistical features will be measured. Not only using quantitative approach,

does qualitative approach will also be used to synchronize the tests within

Standard and Basic Competence, test instruction, and its language use.

The test items used here is English test-packs in final test of first semester

students for Grade V of Elementary School. The study analyze only in Grade V of

Elementary School just because the limitation of the time of research.

7

8. REVIEW OF THE RELATED LITERATURE

8.1 EVALUATION IN EDUCATION

Evaluation is the notion that the value of worth of someone or something is to be

judged. It may occur by some tests, measurements, or other objective information

(Nitko, 1983:7). More specific, Tuckman says that “evaluation is a process

wherein the parts, processes, or outcomes of a program are examined to see

whether they are satisfactory, particularly with reference to the program’s stated

objectives, our own expectation, or our own standard of excellence” (1975:12).

What is meant by educational evaluation here is a way of examine,

investigate, and appraise any aspects in education field. It is a process which

involves the production, application and analysis of instruments of educational

measurements (Nurulia, 2011:12). In narrower term, this technique is used by a

headmaster to evaluate some teachers, and commonly, this technique is used also

by teachers to evaluate their students’ understanding on materials. It is near with

what Gronlund (in Nurulia, 2011:12) states that evaluation is systemic process of

determining the extent to which instructional objectives are achieved by pupils.

Cornbach states that evaluation is the collection and use of information to make

decisions about an educational program (1984:60 in Nurulia, 2011:13).

8.2 LANGUAGE TESTING AND ASSESSMENT

A test is a method of measuring a person’s ability, knowledge or performance in

a given domain (Brown, 2004:3). In this statement, Brown, want to highlight on

the term testing as a way or method in which people’s intelligence and

8

achievement are being explored. Testing becomes the important method to check

many requirements or competency in some fields like medicine, law, sport, and

government. Because, it test so many aspect that must be fulfilled by test takers

before go deep in such fields. Yet, in teaching learning process, the term testing is

little bit different with those kinds of test. Related to the term of testing, people

are commonly think that assessment is the same method as testing is. They still

confused and consider that testing and assessment are the synonymously. While

test are prepared administrative procedures that occur identifiable times in a

curriculum when learner muster to offer peak performance, knowing that their

responses are being measured and evaluated, assessment is an ongoing process

that encompasses a much wider domain (Brown, 2004:4). Bachman (2004:7)

states that assessment is the process of collecting information about a given object

of interest according to procedures that are systematic and substantively grounded.

It is conducted and can be implied whenever a student responds to a question,

offer a comment, or tries out the new word or structure (Brown, 2004:4). In this

situation, the teacher subconsciously makes an assessment of the students’

performance (Brown, 2004:4). In educational programs, “the result of assessment

are most commonly used to describe both the processes and outcomes of learning

for the purposes of diagnosis or evaluating achievement, or make decisions that

will improve the quality of teaching and learning and of the program itself”

(Bachman, 2004:6)

Language tests offer us many choices in test administration, test

format, materials, scoring method, and test items.

9

http://www.cal.org/flad/tutorial/resources/7keyterms.html






(http://www.cal.org/flad/tutorial/practicality/2methodoftesting.html). Language

test have potential for helping us collect information that will benefit a wide

variety of individuals (Bachman, 2004:3).

Alderson and others have argued that “testers have long been concerned

with matters of fairness and that striving for fairness is an aspect of ethical

behavior, others have separated the issue of ethics from validity, as an essential

part of the professionalizing of language testing as a discipline” (Davies, 1997).

Tests then are subset of assessment. They are certainly not the only form of

assessment that a teacher can make. Test can be useful devices, but they are only

one among many procedures and tasks that ultimately use to assess students

(Brown, 2004:4).

In short, it can be said that test is a part of assessment so that assessment is

wider than test itself. Assessment can be understood as a part in teaching and

learning process. Testing and assessment are two methods and ways that must be

used and implied in teaching. Language assessment takes place in a variety of

situations, including educational programs and real word settings (Bachman,

2004:6).

8.3 TYPES OF ASSESSMENT AND TESTING

In order to know more about assessment, in this sub chapter the writer want to

explain about type and from of assessment. There are two types of assessment,

informal and formal assessment (Brown, 2004:5). Informal assessment can take a

number of forms starting with incidental, unplanned comments and responses,

10

http://www.cal.org/flad/tutorial/practicality/2methodoftesting.html

along with coaching and other impromptu feedback to the student (Brown,

2004:5). In this type of assessment, teachers record students’ achievement by

some techniques that are not systematically made. In this assessment, teachers can

memorize what students do in the classroom based on their learning activity.

Whereas, formal assessment are exercises or procedures specifically designed to

tap into a storehouse of skills and knowledge (Brown, 2004:5). Different from

informal assessment, this type of assessment is intentionally made by teacher to

get students’ score to know their achievement. This assessment is done by teacher

by making standard and official based on the rule. It is conducted systematically

and periodically (Brown, 2004:6). We can say that all tests are formal assessment

but not all of formal assessment is tests (Brown, 2004:6).

There are two functions of assessment that usually occurs in the classroom

based. They are formative and summative assessment (Brown, 2004:6). Formative

assessment intends to evaluate students in the process of forming their

competencies and skills with the goal of helping them to continue that growth

process (Brown, 2004:6). This formative assessment usually occurs during

teaching and learning process in the classroom done by teacher to know directly

students’ achievement. This assessment is conducted to build and grow up

students understanding and skills during the process. Assessment is formative

when teachers use it to check on the progress of their students, to see how they

have mastered what they should have learned, and then use this information to

modify their future teaching plans (Hughes, 2005:5). Summative assessment,

then, aims to measure, or summarize, what students has grasped, and typically

11

occurs at the end of a course or unit of instruction (Brown, 2004:6). It is used in

the end of the term, semester, or year in order to measure what has been achieved

both by groups and individuals (Hughes, 2005:5). This type of assessment is used

by teacher to measure and evaluate what students achieved in the process of

teaching learning in classroom. Final exams are the example of this test. In short,

formative assessment is done in the middle of the semester in the process of

teaching and learning, but summative is done in the end of the semester. The

object of this study is final test of first semester, so this kind of test is formal

assessment with the function of summative assessment.

In Indonesia, usually a final semester test-packs consist of three parts of

items. They are, first, multiple choice items, and the next is short-answer question,

and the last is essay items. Every item has different definitions and characteristics.

There are some different formulas and measurement that can be used. To know

more about the characteristics of each item, next sub-chapter below will explain

more about them.

8.4 MULTIPLE-CHOICE TEST

Multiple-choice items which may appear to be the simplest kind of item to

construct are extremely difficult to design correctly (Brown, 2004:55). Multiple-

choice items take many forms, but their basic structure is that it has stems or the

question itself, and a number of options- one which is correct, the others being

distracters (Hughes, 2005:75). The most obvious advantage of multiple-choice is

that scoring can be perfectly reliable (Hughes, 2005:75). Scoring in multiple

12

choice techniques is rapid and economical. It is possible to include more items

than would otherwise be possible in a given period of time. This test items are

designed to elicit specific responses from the student (Valette, 1967:6). It allows

the testing of receptive skills without requiring the test taker to produce written or

spoken language and it makes greater reliability (Hughes, 2005:76).

The principles that stand out multiple-choice test items are practicality and

reliability (Brown, 2004:55). Brown states in his book Language Assessment

Principles and Classroom Prectices, those multiple-choice items have prime

terminology. They are:

a. Multiple-choice items are all receptive, or selective, response items in that the test-takers choose from a set of responses rather than creating a response.

b. Every multiple-choice item has stem, which presents a stimulus, and several options or alternatives to chooses from.

c. One of those options, the key, is the correct answer, while the others serve distractors.

The most advantages of multiple choice is that scoring can be perfectly

reliable. It can be rapid and economical. Another advantage is that it is possible to

include more items than would otherwise be possible in a given period of time.

(Hughes, 1989:76).

In another case, Hughes states number of weaknesses of multiple-choice

items (Hughes, 2005:76-78). Multiple-choice questions only recognition of

knowledge, test takers can only guess to come with correct answer, test takers can

cheat easily, the technique severely restrict what can be tested, it is very difficult

to write successful items and the answer is restricted by the optional answer. In

13

this case, test-takers can not elaborate their answer and understanding of the

material because the answer is limited only by optional answer.

Multiple-choice comes to be the first part of test packs faced by test-takers.

When we want to analyze this item we can use statistical analysis as stated in the

next chapter that Research Method. Since there is only one right answer, the score

can very rapidly mark an item as correct and incorrect (Valette, 1967:6). Thus, we

can use simple codes to present the answer of test-takers. Number 1 presents true

answer chosen by students, and 0 presents false answer. If students choose a true

answer, we can note it with 1. And vice verse, if test-takers, in this case student,

answer with false answer we note it with number 0. More explanation can be read

in quantitative analysis later.

8.5 SHORT-ANSWER ITEMS

After test-takers have already answered the multiple choice items in first chapter

of test-packs, in next chapter they have to answer on short-answer items. The

question is just the same, but in this items students are not given distractors items.

The answers are usually only one or two words. Those answers should be exactly

correct. It is usually occurs in listening and reading tests (Hughes, 1989:79).

Short-answer items deals with measurement of students’ knowledge

acquisition and comprehension. It has two choices or formats, free and fixed.

Basically, there two basic free formats they are unstructured format and fill-in or

completion format. Fixed choice format include true-false, other two-choice,

multiple choice and matching (Tuckman, 1975:77). Short-answer items in English

14

final semester test-packs used in this study here is the items in which students

should answer with writing the answer in a short and brief. How it is different

from essay test items? In essay-test items, students should explore and elaborate

their answer. For example, if the question is about structure and grammar, usually

students should fill in the blank with a complete sentence. Yet, in short-answer

items what students should answer are usually not more than two or three words.

That is why the items can be called as short-answer items. This item may require

one-word answer, such as brief responses to questions, or the filling in of missing

elements (Valette, 1967:8).

In the short-answer items, the true answer has been determined by teachers

so that students can not elaborate their answer. Both free choice and fixed choice

items have previously determined correct response. In free choice type, the

student is not given choices from which to select the correct response as he or she

is in the fixed choice type (Tuckman, 1975:77). In this formats, basically,

measurement involves asking students a question that requires that they state or

name the specific information or knowledge (Tuckman, 1975:77).

In this part of test-packs, usually short-answer items are in unstructured and

completion/ fill-in format. In unstructured format, students can answer by a word,

phrase or number. While in completion or fill-in format, students must construct

their own response rather than choose an optional answer. It differs from

unstructured item by requiring that they fill in or complete a sentence from which

a word or phrase has been omitted (Tuckman, 1975:79).

15

In order to assure to the objective nature of short-answer items, teacher

must prepare a scoring system in advance (Valette, 1967:8). Teacher should give

credit score to students’ answer for misspelling of the world given. But since in

short answer usually the answer is only one word, we can use the credit point the

same as multiple choice. We can use the number 1 to presents students choose

correct answer and number 0 that presents incorrect answer. We only have to

mark as 1 and 0 because the answer has been determined by test-makes, and there

is no optional answer for test-takers.

8.6 ESSAY TEST ITEMS

In English final test of elementary school, beside multiple choice and short-

answer items, there is one more test technique that is served to the test-takers in

final semester test-packs. It is essay test. Different from short-answer items, essay

test need longer answer using deeper analysis. While short answer is the

continuity of multiple choice items, essay test involve deep thinking about test-

takers knowledge and understanding on material. In language testing, it may

include in students understanding on language structure and culture.

Essay items provide test-takers with the opportunity to structure and

compose their own responses within relatively broad limits (Tuckman, 1975:111).

Essay tests enable them to demonstrate their ability to apply knowledge and to

analyze, to synthesize, and to evaluate new information in the light of their

knowledge (Tuckman, 1975:111). This test is more reliable to measure students’

understanding. Tuckman says in his book Measuring Educational Outcomes

16

Fundamentals of Testing that there are several aspects of students that can be

measured using Essay tests. They are students’ application, analysis, synthesis,

evaluation, and combination of those four aspects (1975, 111-123).

Essay questions intended to measure students’ application must require that

the students use knowledge that has been acquired to describe a way of dealing

with a concrete situation. Thus, to measure application of students the item must

present a concrete situation – one that can somehow be included in the reality of

the students being tested and one to which that can relate (Tuckman, 1975:112).

In analysis items, the questions do not contain certain problem. The

situation is one with which the students presumably is familiar and that contains

elements, relationships, or organization principle which can be analyzed

(Tuckman, 1975:115). Unlike measuring students’ application in which students

are given familiar problem, in measuring how students can analyze on something,

as teachers we should give them a problem that need understanding in

organization and relationship between several variables.

When we want to measure students’ synthesis, the items should present

problem to be solved. It should be outside of the range of the familiar or the

practical and require the production of a new and unique solution of the problem.

Moreover, the particular problem itself must also be new for students (Tuckman,

1975:117-118). Synthesis then can be interpreted also as how creative students in

making a solution on a problem and create a new model different with what

teacher has taught to them.

17

In evaluation item, the questions contain two parts that which are to be

evaluated and response instructions. Response instruction also includes

information about the criteria that are to be used in evaluation. In addition, an

essay items to measure evaluation provides a general criterion for evaluating it

and general response instruction to provide detailed support for one’s evaluative

position (Tuckman, 1975:122). Students’ understanding then have to present their

skill in evaluate on the problem related with material taught by teachers.

All in all, analysis, synthesis, and evaluation can all combined in a single

question. Giving students an object, an organization, an occurrence, and asking

them to analyze its parts or workings is the first step. Evaluating the parts or

workings is the second step and redesigning or improving upon it through

synthesis is the third step. (Tuckman, 1975:123). So that one item of essay

questions can be used to test students’ intelligence and understanding

performance.

There are several words and keywords that can be used to prepare essay

questions. Several words included are analyze, compare or contrast, describe,

define, evaluate, explain, summarize, justify, outline, identify, and so on. Those

words are usually used to perform and present essay items.

The scoring system of this item will be very different from scoring

objectives items or multiple-choice. In objective items, the score of each number

is exact and all the same from number to number. Whereas, in essay items, what

we should do, first, is determining the ideal answer even though no correct and

wrong answer at all. The ideal answer then should be scored as highest score. And

18

the far answers of students go beyond it will be the lowest score it is. Teachers

then should create interval scale to score the highest and the lowest one on each

item. Interval scale will be going like picture below:

1 2 3 4 5 6 7 8 9 10

The interval scale then can be used to measure how far students understand the

material. The highest score students get the more understand they are. Teachers

have an authority to determine interval scale number between ideal and not-ideal

answer. It can be a scale from 0 until 10 like the scale above, or 0 until 3 or 5

based on their preferences. It may can be decided by calculating every score of

every items, from the multiple choice, short-answer items, and the last is essay

items.

8.7 QUALITATIVE ANALYSIS

8.7.1 School-Based Curriculum (KTSP)

Curriculum is a document of an official nature, published by a leading or

central education authority in order to serve as a framework or a set of guidelines

for the teaching of a subject area in a broad varied context (Celce-Murcia, 2000).

A curriculum in a school context refers to the whole body of knowledge that

children acquire in school (Richards, 2001:39). More specific, BSNP defines it as

a set of plan and arrangement of objective, content, and lesson material, and also

manner that is used as the guidance of learning activities to achieve the aim of

education (2006:1751). In short, we can say that curriculum is the fundamental

19

Ideal AnswerNot ideal answer

guidelines for teachers to reach the aims of education in school. It is a ground-

base teachers should know in conducting teaching learning process.

School-Based curriculum is as the same as the terms curriculum has stated

in the subchapter before. It is a revised-edition of curriculum of 2004 which is in

Bahasa Indonesia said Kurikulum Berbasis Kompetensi (Competence-Based

Curriculum). This curriculum firstly used in any educational institution since

2006. It is the way in which any school can create and make policy and rule about

their educational programs. Teacher can create their own syllabus, teaching-

learning process, and learning goal that are appropriate for students in their

school.

KTSP is operational curriculum that is arranged and applied in every

educational unit (Jumadi, 2). It is because KTSP is created based on school’s need

and condition. In this way, schools in big city may have different curriculum from

school in a small city. The arrangement of the content itself is regarding with

cultural and social condition of the students of a school. In order that, students that

are in different places and areas have their own learning achievement that

appropriate with their natural life. Even though based on Government Rule 19,

2005 about Education National Standard, every school is mandated to develop

KTSP based in Passing Competence Standard (SKL), and Content Standard (SI)

and based on the guidance arranged by Education National Standard Board

(BSNP). Government publishes General Guidance in arranging KTSP in order

that an educational unit or a school that has ability can develop KTSP started in

20

academic year 2006/2007 (Jumadi, 1). A school is called having ability to arrange

and develop KTSP if it have tried to apply Curriculum of 2004 in its institutions.

Based on The Rule of Minister of National Education number 24, 2006, the

arrangements of KTSP involves teachers, employees, and also School Committee

with the hope that KTSP will reflect the aspiration of people, environment

situation and condition, and the people’s need. That is why, this curriculum is

more democratic than curriculum used in every school before. It gives place for

democratization to determine the education curriculum which is appropriate to the

community context where the school take place, financial context, human

resources and other things of the school so that the potential of each school can be

optimalized and there is competition among school (Handayani, 2010). KTSP

consists of educational goal in educational unit level, structure and curriculum

content in educational unit level, educational calendar, and syllabus (Jumadi, 2).

Sutrisno in Handayani (22:2010) states that as a concept and also program,

KTSP has characteristics as follows:

a. KTSP emphasized on the students’ competence achievement. In KTSP, the students are formed to develop knowledge, understanding, ability, value, attitude, and wants to be skilled and independent person.

b. KTSP is learning process and variety oriented.c. Learning process uses various approaches and methods.d. Teachers are not the only source, but the other educative sources

are included.e. Assessment emphasizes the process and the result of study to

achieve a competence.

21

KTSP consist of two basic documents, they are school documents and the

contents. School documents here means any information about school in which

KTSP is arranged. They are for example introduction of KTSP, vision, mission,

and goal of the school, curriculum structure and content, and education calendar

that is made by the school independently. KTSP structure and content in

elementary education level stated in Content Standard involves five group of

subject as states below:

1) Group of religion and morality subject2) Group of citizenship and personality subject3) Group of science and technology subject 4) Group of aesthetics subject 5) Group of athletics and health subject

. (BSNP, 2006)

The education goal in elementary level is to put basic intelligence,

knowledge, personality, noble characters, and independent-lived skill and to

continue into higher level of education (Jumadi, 3). Education calendar is made by

school autonomously with the guidance of education calendar established by

national education department.

And the other basic of document is that document that relates to certain

subject taught in the school. In every subject, the material consists of:

a. Syllabus and Lesson Plan of the Competence Standard and the Basic

Competence that are developed by Central Government;

b. Syllabus and Lesson Plan of the Competence Standard and the Basic

Competence that are developed by school (subject of the local content).

(Handayani, 2010)

22

English lesson in elementary education level still become extra lesson regarding

this subject is less important than another local lesson. Yet, English is now

becoming important subject in globalization era. The goal of English subject in

elementary level is to create students that having ability in developing oral

communicating competence limitedly to as language accompanying action in

school context and having consciousness about the essence and importance of

English to increase national competitiveness in globalization era.

In the table below, the writer presents competence standard and basic

competence of English Lesson grade V semester I that related to this study. They

are:

Competence Standard Basic Competence

Listening

1. Students are able to understand very simple instruction with an action in school context.

1.1 Students are able to respond very simple instruction with logical action in class and school context

1.2 Students are able to respond very simple instruction verbally

Speaking

2. Students are able to express very simple instruction and information in school context

2.1 Students are able to make a very simple conversation that follow logical action with speech act ; give an example to do an action, give a command, and give an instruction

2.2 Students are able to make a very simple conversation toa sk and or give something logically involve speech act , asking and give a help, asking and giving something

2.3 Students are able to ask and give information involve speech act; introducting, invitating, asking and giving permission, agreeing and

23

Competence Standard Basic Competence

disagreeing, and prohibiting

2.4 Students are able to express politeness using expression: Do you mind and Shall we…

Reading

3. Students are able to understand English written texts and descriptive text using picture in school context

3.1 Students are able to read aloud with stress and intonation correctly involve words, phrases, and simple sentence.

3.2 Students are able to understand simple sentence, written messages, and descriptive txt using picture accurately

Writing

4. Students are able to spell and rewrite simple sentence in school context

4.1 Students are able to spell simple sentence accurately and correctly

4.2 Students are able to rewrite and write simple sentence accurately and correctly; such as Menyalin dan menulis kalimat sangat sederhana secara tepat dan berterima seperti: compliment, felicitation, invitation, and gratitution

8.7.2 Syllabus

Syllabus is lesson plan in every subject and or subject group or certain

theme includes competence standard, basic competence, learning material,

learning activity, indicator, scoring, time allocation, and sources (Jumadi, 2).

Syllabus is a part of curriculum. It is can be defined as systematically and

specifically contents of curriculum that can be applied by teachers in their

teaching activity. Teachers can see their teaching learning’s goal, process and

24

objectives in it. Richards states that a syllabus is a specification of the contents of

a course of instruction and list what will be taught and tested (2001:2). Syllabus

has to be in line with curriculum, because it is made based on curriculum. BSNP

defines it as “learning plan on one or group of lesson/ certain theme which covers

Competence Standard, Basic Competence, main material of learning, learning

activities, indicator, assessment, time allocation, and source/ material/ tolls of

learning” (2006:1751).

The development of the syllabus can be arranged by a teacher autonomously

or can be done in a group of teacher of some schools, deliberation of subject

teacher (MGMP) or education official (Jumadi, 7). In Elementary school, usually

teacher in grade I until VI can arrange the syllabus together. A school which

cannot arrange and develop it autonomously should join together with other

schools through MGMP forum to develop it.

We can see from this definition that syllabus’ content is about specific

guidelines for teachers about what they have to do with their job as educators. It is

not only about what they have to do, but also how, when, and by which they have

to do as a professional educator. If teachers teach without step on it, the education

objectives may go out of the national education goal. To make a clear explanation

about syllabus that will be used to analyze in qualitative approach in this study,

here the writer presents syllabus of English lesson of grade V in semester I. It is

important as the basic guidance of teaching learning process and makes an

assessment of it.

25

8.8 ITEM ANALYSIS DATA (QUANTITATIVE ANALYSIS)

8.8.1 Validity

Test validity refers to whether a test measures what we intend it to measure

(Tuckman, 1975:229). Validity is an integrated evaluative judgment of the degree

to which empirical evidence and theoretical rationales support the adequacy and

appropriateness of inferences and action based on test scores or other modes of

assessment. (Mesick in Bachman, 2004:259).

The objectives of many test is to measure the effect of certain experiences

that have occurred prior to the test (Tuckman, 1975:229). A test, then, is used to

monitor or assess an experience that has already occurred or to determine students

learning based on the experience (Tuckman, 1975:229).

When selecting a test, it is important to make sure that the information

provided by the test is sufficient for the decisions we are going to make based on

the test scores. (http://www.cal.org/flad/tutorial/validity/4testuse.html)

Brown says that the most important principle of a test is validity (2004:22).

It is the extent to which inferences made from assessment result are appropriate,

meaningful and useful in terms of the purposes of the assessment (Gronlund,

1998:226). In some cases, it may be appropriate to examine the extent to which a

test calls for performance that matches that of the course of unit of study being

tested (Brown, 2004:22)

There are two types of validity that are most relevant o classroom test,

namely: face validity and content validity (Brown, 2002:26). Face validity refers

26

http://www.cal.org/flad/tutorial/validity/4testuse.html

to the appearance of a test that looks like it is measuring what is supposed to

measure. Mousavi (in Brown, 2002:26) stated that face validity refers to the

degree to which a test looks right and appears to measure the knowledge or

abilities it claims to measure based on the subjective judgment of the examinees

who take it, the administrative personnel who decide on its use, and other

psychometrically unsophisticated observers.

Face validity refers to the degree to which a test look right, and appears to

measure the knowledge or abilities it claims to measure, based on the subjective

judgment, of the examinees who take it, the administrative personnel who decide

on its use, and other psychometrically unsophisticated observes (Mousavi in

Brown, 2004: 26). We can say that face validity refers to the performance of the

test when it comes to test-takers. How it looks good or bad to test-takers and how

the test-takers feel when the test-pack is given to them is known as face validity.

Hughes states that a test is said to have face validity if it looks as if it measures

what it is supposed to measure (Hughes, 1989:33). Brown (2004:27) states that

face validity will likely be high if learners encounter:

a. A well constructed, expected format with familiar tasksb. A test that is clearly doable within the allotted time limitc. Items that are clear and uncomplicatedd. Directions, that are crystal cleare. Tasks that relate to their course work, andf. A difficulty level that presents a reasonable challenge

Several parts of test-packs that related with this study is about performance

appears in questions sheet. They are font used in test pack whether it is easy or

difficult to be read or not. If the test-packs consist of some pictures, it should be

27

analyze also whether the picture is clear enough or not. The important aspect that

should be look deeper is that the arrangement of the test-items involves

vocabularies, phrases and sentences arrangement of the test. If all of the features

mentioned above have been well organized, the test-takers will feel confident to

face and answer the test-packs. A test which does not have face validity may not

be accepted by candidates, teachers, educations authorities and employer (Hughes,

1989:33). It is because the test is not standardized and the test will not perform

what should be measured.

In contrast to face validity, a claim of content validity requires affirmation

from an expert. The expert should look into whether the test content is

representative of the skills that are supposed to be measured. This involves

looking into the consistency between the syllabus content, the test objective and

the test contents. If the test contents cover the test objectives, which in turn are

representatives of the syllabus, it could be said that the test possesses content

validity (Brown, 2002:23-24). A test is said to have content validity if its content

constitutes a representative sample of the language skills, structures, etc which it

is meant to be concerned (Hughes, 1989:26). It means that a test will have content

validity if the test-items appropriate with what teachers want to measure. If

teachers want to test students’ understanding on grammar and structures, the test

should be near of it and not out of the topic. The importance of content validity is

that the greater a test have content validity, the more likely it is to be an accurate

measure if what it is supposed to measure (Hughes, 1989:27). Another importance

is that areas that will not test are likely to become areas ignored in teaching and

28

learning. It means that teacher should give test-items based on what they have

taught to students.

8.8.2 Reliability

Reliability refers to the consistency of test result. A reliable test is consistent

and dependable (Brown, 2004:20). Reliable here means that a test must reliable

and fit on several aspects in conducting the test itself. A test should reliable to

students as test-takers. Bachman (2004: 153) states that reliability is consistency

of measures across different conditions in the measurement procedures.

The most common learner-related issue in reliability is caused by temporary

illness, fatigue, anxiety in facing the test (Brown, 2004:21). Beside, a test must

have rater reliability. Rater reliability is a principle in which the scoring process

should be match and fit to the testing and assessment. This scoring process must

be standardized. Unreliability may also result from the conditions in which the

test is administered (Brown, 2004:21). In every test, then, no measurement

instrument or procedure is perfect (Tuckman, 1975:253). Neither a mechanical

device such as voltmeter nor a human device such as a test gives a result that is a

perfect reflection of the property being measured (Tuckman, 1975:253) Test

administration must be reliable also by which a test will go succeed and well-

organized. Bad administration and unplanned arrangements of a test can make the

good preparation going worse.

8.8.3 Level of Difficulty (Item Facility)

A good test is a test which is not too easy or vice verse is too difficult to students.

It should gives optional answer that is rational students may choose. Very easy

29

item are to build in some affective feelings of “success” among lower ability

students and to serve as warm up items, and very difficult items can provide a

challenge to the highest-ability students (Brown, 2004:59). Too easy test will not

stimulate students to fix it, and too difficult test will make boring students to find

the answer (Arikunto, 2006:207).

Level difficulty or in Brown (2004:58) it states as item facility is the extent

to which an item is easy or difficult for the proposed group of test-takers. It makes

students know and record the characteristics of teacher’s test if the test given

always comes to them too easy and difficult. Thus, the test should be standard and

fulfill the characteristics of a good test. The number that shows the level difficulty

of a test can be said as difficulty index (Arikunto, 2006:207). In this index there

are minimum and maximum scores. In this index, the lower index of a test shows

more difficult the test is. And vice verse, the higher the test is the easier it is.

There are some factors that every test constructors must consider in

constructing difficulty level of test items. Mehren and Lehmen point out that the

concept of difficulty or the decision of how difficult the test should be depends on

variety factors, notably 1) the purpose of the test, 2) ability level of the students,

and 3) the age of grade

8.8.4 Discrimination Power (Item Discrimination)

It explains how well the items perform in separating the better students from the

poorer ones (Nurulia, 2010:53). It is the extent to which an item differentiates

between high and low-ability test-takers. Discrimination is important because the

30

more discriminating the items are, the most reliable will be the test (Hughes,

1989:226)

It is defined as the ability of a test to separate master students and non-

master students (Arikunto, 2006:211). A master student is a student with higher

scores of test, and a non-master student is a student with lower scores on the test

given. As same as the term of difficulty level, discrimination has discrimination

index. It is an indicator of how well an item discriminates between weak

candidates and strong candidates (Hughes, 1989:226). This index is used to

measure to the ability of a test in discriminating the upper and lower group of

students. Upper students are students who answer with true answer, and lower

group are students with false answer. In this index, it has negative point. Different

from difficulty index, the negative point in this index shows that the questions

present masters students as dull students and non-masters students as smart

students. A good question is a question that can be answered by upper group and

cannot be answered with true answer by lower group. If a question can be

answered truly by both upper and lower group or vice verse cannot be answered

truly by both groups, it means that the question is a bad test because the

discrimination index shows 0 point.

“The higher its discrimination index, the better the item discriminates in this way. The theoretical maximum discrimination index is 1. An item that does not discriminates at all (weak and strong test-takers perform equally well on it) has a discrimination index of zero.” (Hughes, 1989:226)

An item on which high-ability students who did well in the test (master

students) and low ability students (non-master students) who did not score equally

31

well would have poor ID because it did not discriminate between the two groups.

Conversely, an item that garners correct responses from most the high-ability

group and incorrect responses from most of the low ability group has good

discrimination power (Brown, 2004:59).

8.8.5 Answer of Questions Form (Item Distractors)

In addition to calculating discrimination indices and facility values, it is necessary

to analyze the performance of distractors (Hughes, 1989:228). It is defined as the

distribution of testee in choosing the optional answer (distracters) in multiple

choice questions (Arikunto, 2006:219). This item is as important as the other

items consider that in view of nearly 50 years of research that shows that there is a

relationship between the distractors students choose and total test score (Nurulia,

2010:57).

It can be obtained by calculate the number of testee in choosing the

distractors. We can calculate this form by seeing the answer form done by

students. The distractors are good if chosen by minimum 5% of the number of test

takers. One way to study responses to distractors is with frequency table that tells

us the proportion of students who selected a given distractor. Remove or replace

distractors selected by few or no students because students find them to be

implausible (Nurulia, 2010:57). Distractors that are not chosen by any examinees

should be replaced or removed. Distractors that do not work for example are

chosen by very few test-takers should be replace by better ones, or the item should

be otherwise modified or dropped (Hughes, 1989:228). They are not contributing

32

the test’s ability to discriminate the good students from the poor students (Nurulia,

2010:57)

9. RESEARCH METHOD

9.1 THE RESEARCH HYPOTHESIS

Hatch (1982: 3) states that hypothesis is a tentative statement

about outcome of the research. In line with what Hatch states, Best

says that hypothesis is tentative answer to question (1977: 26). On

the general definition it can be said as pre-assumption of the

researcher about the product of the study. Furthermore, he states

that the statistical hypothesis should be stated in negative or null

form.

In this research, the hypothesis is that students of both SDIT Al

Kamila Semarang and MI Darus Sa’adah Semarang will get the

same score in each of the test pack used by both schools. It is from

the assumption that both of the test packs have the same degree in

their quantitative and qualitative aspects.

9.2 OBJECT AND SUBJECT OF THE STUDY

The object of this study is multiple-choice test items in English subject on

elementary school for Grade V. The test items used is the comparison between

test pack made by Ministry of National Education and Ministry for Religion. The

comparison of the two test pack is used since in Indonesia there are two ministries

that deals with formal education and delivers formal test from elementary until

33

senior high school. In order that, the writer want to compare the qualities of two

test pack in form of their statistical and non-statistical features.

The two test packs actually consists not only in form of multiple-choice

questions, but also brief response and essay. But, in order not to discuss too broad,

the writer only focus on analyzing questions in form of multiple-choice items.

The two test packs of multiple-choice questions, then, are given to two

different classes. The two different classes consist of one from non-Islamic state

elementary school in this study is taken from SDIT Al Kamila Semarang, and the

other is from Islamic private elementary school, in this case taken from MI. Darus

Sa’adah Semarang.

9.3 POPULATION AND SAMPLE

9.3.1 Population

The population of the study is multiple-choice test items that are taken

from English final test for Grade V of elementary school and students of Grade V

that will be given the test.

9.3.2 Sample

From the population above, we get sample of the test. They are multiple-

choice English final test of first semester academic year 2011/2012 for Grade V

and students of Grade V in SDIT Al Kamila Semarang and MI. Darus Sa’adah

Semarang in the same academics year.

9.4 RESEARCH DESIGN AND INSTRUMENT

9.4.1 Research Design

34

Bachman (2004:3) states that much of data obtained from language

assessment is quantitative, and statistic is a set of logical and mathematical

procedures for analyzing quantitative data. Thus, the methods used in this study

are both quantitative. But, the writer needs not only mathematical measurement to

analyze multiple choice tests. She uses also qualitative approach in her study.

Quantitative approach is used to measure test items’ statistical features such as

their validity, reliability, difficulty level, and Discrimination Power. To measure

those items there are several formulas that will be presents in the next sub chapter.

In addition, qualitative approach is used to check whether or not the test items are

appropriate with Standard and Basic Competence by which teaching learning

process use as fundamental instruction. In qualitative approach, language used in

test items will be analyzed to measure whether it is good enough or not.

9.4.2 Instruments/ Unit analysis

In this study, instruments that are used are two test packs. It consists of multiple

choice, short-answer items, and essay items. The two test packs are taken from

English Final Test used by SDIT Al Kamila Semarang and MI Darus Sa’adah

Semarang. Each of the test packs will be given into one class of grade V of SDIT

Al Kamila Semarang and MI Darus Sa’adah. These two test-packs are delivered

from different institution. The one used in SDIT Al Kamila is made by MGMP

English of Ministry of National Education Semarang. The rest test-pack used in

MI Darus Sa’adah is made by MGMP English of Ministry of Religion Semarang.

Thus, students of grade V will get and will answer two different test packs. This

35

method is conducted to see whether there are any differences in those two test

packs or not.

1. Test items made by MGMP of English of National Education Ministry of

Semarang and MGMP of English of Religion Ministry of Semarang in

form of both multiple choice and essay test.

2. Students’ scores on these formative test

3. Cards of item analysis which map the appropriateness of the test item with

the material in syllabus, test construction, and effectiveness of language

used.

9.5 METHOD OF COLLECTING DATA

9.5.1 Collecting method

In collecting method, the writer collect two different test pack in which multiple-

choice is taken as the object of the study. The two test packs are taken from two

different schools. The first one is taken from SDIT Al Kamila Semarang in which

induce in the rule of Ministry of National Education and Culture. Another test

packs is taken from MI Darus Sa’adah that induce on Ministry of Religion since

its curriculum deeply cover on religion subject, especially Islamic. The two test

pack is taken from English teachers teach in those schools.

9.5.2 Testing method

After the test items has been collected on previous method, the tests then are given

to the test takers in this case student in grade V of elementary school on class V

in two different school. Every class is give both test made by MGMP English of

36

Ministry of National Education and Ministry of Religion of Semarang to get the

same result and data of each test pack. To make a clear explanation about the

process of testing, the diagram below will explain more about it:

Note:

Y1 : Final Semester Test-Pack made by MGMP English of Ministry of

National Education Semarang

Y2 : Final Semester Test-Pack made by MGMP English of Ministry of

National Education Semarang

X1 : SDIT Al Kamila Semarang

X2 : MI Darus Sa’adah Semarang

Y1 is a test-pack made by MGMP English of National Education Minister

will given to both students of grade V in SDIT Al Kamila and MI Darus Sa’adah.

So do Y2, that is a test-pack made by MGMP English of Religion Minister of

Semarang, will be given to both students on both schools.

37

Y1X2

X1

X2

X1

Y2

9.6 METHOD OF ANALYZING DATA

9.6.1 Quantitative Analysis

Quantitative analysis deals with measurement of test items on its statistical

futures. They are measurement of test items’ validity, reliability, level of

difficulty, discrimination power, and item distractors.

9.5.1.1. Validity

To know the validity of each number of the test, we can use formula product

moment as described below:

(Arikunto, 2006:72, Bachman, 2004:86 Tuckman, 1978: 163, )

Note:

rxy = correlation coefficient between variable X and Y

N = number of test-takers

ΣX = number of test items

ΣY = total score of test items

ΣXY = multiplication of items score and total score

ΣX2 = quadrate of number of test items

ΣY2 = quadrate of total score of test items

38

By significant standard of 5%, if the result of measurement we get rmeasured

≥ rtable so, it can be said that the test item is significant or valid. If rmeasured < rtable,

then it can be said that the test items is not significant or valid.

9.5.1.2. Reliability

Reliability is constancy. A test can be said as reliable if the test is given to any test

takers whoever they are and whenever by the same result. To measure reliability

we can use formula of K-R. 20 (Kuder Richardson) as follow:

(Arikunto, 2006: 100, Bachman, 2004:164)

Note:

r11 = reliability

p = subject proportion have true answer

q = subject proportion have false answer (q=1–p)

k = number of items

Σpq = multiplication between p and q

S = standard deviation

Varians formula:

Realibility of essay test items can be measured using the Alpha formula

below:

Keterangan:

: test of reliability

39

: number of varians of each item test

: test items’ varians

n : total of test items

(Arikunto, 2006:178)

Classification of items reliability are:

0, 00 < r11 ≤ 0, 20 : very low

0, 20 < r11 ≤ 0, 40 : low

0, 40 < r11 ≤ 0,60 : medium

0, 60 < r11 ≤ 0,70 : high

0, 70 < r11 ≤ 1 : very high

By standard significant of 5%, if measurement process we get r11 ≥ rtable so it

is said that test instrument is significant or reliable. If r11 < table, so it can be said

that test instrument is not significant or not reliable.

9.5.1.3. Level of difficulty

Number that shows difficulty or easiness of a test items is known as difficulty

index. The formula that can be used to measure it is:

(Arikunto, 2006:208, Brown, 2004:59)

Note:

P = level of difficulty

B = number of test-takers answering the item correclty

JS = number of test-takers responding to that item

Classification of level of difficulty is:

40

P = 0, 00 : test items is too difficult

0, 00 < P ≤ 0, 30 : test items is difficult

0, 30 < P ≤ 0, 70 : test items is medium

0, 70 < P ≤ 1, 00 : test items is easy

P = 1 : test items is too easy

There is no absolute P value that must be met to determine if an item should

be included in the test as is, modified, or thrown out, but appropriate test item will

generally have Ps that range between 0.15 and 0.85. (Brown, 2004:59)

9.5.1.4. Discrimination Power

Test Discrimination Power is a technique to discriminate smart test-takers (high

intelligence) and less smart test takers (low intelligence) (Arikunto, 2006:211).

Number shows the degree of test Discrimination Power is known as

discrimination index. In this report, to find difference power we can use split half

formula. In this case we can separate group of test takers into two groups, smart

group by top group and less smart group by bottom group.

The formula that can be used to measure discrimination power of multiple choice

test items is:

(Arikunto, 2006:213)

Note:

D = test Discrimination Power

BA = number of top test takers that have true answer

BB = number of bottom test takers that have true answer

41

JA = total participant of top test-takers

JB = total participant of bottom test takers

Classifications of test Discrimination Power are:

D = 0, 00 – 0, 20: poor Discrimination Power

D = 0, 20 – 0, 40: sufficient Discrimination Power

D = 0, 40 – 0, 70: good Discrimination Power

D = 0, 70 – 1, 00: very good Discrimination Power

D = negative, all of test items is not good. Thus, the items that have same

negative D score should be skipped.

The formula that can be used to measure discrimination power of essay test items

is by using t-test as stated in Arifin (2009:278) below:

Note:

MH = average of high class

ML = averga of low class

= quadrate total of high class individual deviation

= quadrate total of low class individual deviation

ni = total of test-takers high and low class

ni = 27% x N

N = total of test takers

42

Next, tmeasured is compared to ttable by dk = (n1-1) + (n2 -2) with α = 5% with

the charactersistics :

If tmeasured > value ttable , so discrimination power is significant.

Practical use for discrimination power indices is to select items from a test

bank that includes more items than we need (Brown, 2004:60).

9.6.2 Qualitative analysis

While qualitative analysis deals with analyze and study on non-statistical features

on test items. There are three aspects on this sub chapter that the writer going to

study, analysis of instructional materials, analysis of test construction, and

analysis of language use.

9.6.2.1 Analysis of Instructional Materials

Analysis of instructional materials deals with appropriateness of test items with

instructional materials of teaching and learning process stated in curriculum as

Standard and Basic competence. In this sub chapter, the test items will be review

whether or not they are match with Standard and Basic Competence especially on

elementary school. In order that in this study the writer will presents Standard and

Basic Competence of Elementary School for Grade V in order to match the test

items with it.

9.6.2.2 Analysis of Test Construction

Test construction analysis deals with the appropriateness of test items’

construction making by test makers with principles of good multiple-choice

questions. In this analysis, the test items will be analyze whether or not they fulfill

characteristics of a good test as principles of a good test stated in previous chapter.

43

It means that the analysis covers several aspects, such as the question with

optional answer is effective or not. It means that may be the answer is too easy or

vice versa too difficult to be found out. Another problem will be fixed in this sub

chapter is that the questions is easy to be understood or vice versa. Another case is

that, if some questions insert a picture, the picture may easy to be read or not and

so on.

9.6.2.3 Analysis of Language Use

Analysis of language use is simply clear that this sub chapter will analyze on

language use in constructing the questions and optional answer on test items. It

can be assumed that somehow test makers use difficult word or the grammatical

features of the questions hardly to be understood toward students in their level of

knowledge.

10. ORGANIZATION OF THE STUDY

In order to make the readers become easier in understanding this study

report, the writer is going to organise this research paper as follow:

Chapter I is Introduction. It includes the explanation about the background

of the study, reasons for choosing the topic, statements of the problem, objectives

of the study, significance of the study, and the outline of the study report.

Chapter II presents review of related literature that presents some theoretical

source about language test and assessment, measurement, assessment and

evaluation, types of assessment, form of assessment, and some theories on how to

design and make a good test and analyze it.

44

Chapter III deals with method of investigations. It presents methodology of

investigation, including object of the study, population and sample, method and

instrument, method of collecting the data, method of analyzing the data, and

technique of reporting the result.

Chapter IV presents finding and interpretation. It consists of analysis and

discussion of the research findings.

Chapter V as the end of the discussion includes the conclusions and

suggestions.

11. BIBLIOGRAPHY

Arikunto, S. 2006. Dasar-Dasar Evaluasi Pendidikan. Jakarta: Bumi Aksara.

Bachman, L.F. 2004. Statistical Analyses for Language Assessment. London:

Cambridge University Press.

Best, J. W, 1977. Research in Education. New Zealand: Prentice Hall,Inc.

Brown, H. Douglas. 2002. Principles Language Learning and teaching (4th Ed).

New York: Addison Wesley Longman Inc.

Brown, H.D. 2004. Language Assessment Principles and Classroom Prectices.

San Francisco : Longman, Inc.

BSNP. 2006. Standar Isi dan Standar Kompetensi Lulusan Tingkat Sekolah

Menengah Pertama dan Madrasah Tsanawiyah. Jakarta: PT. Binatama

Raya

Celce-Murcia et.al. 2000. Discourse and Context in Language Teaching. London:


45

Davies, A. (1997). Demands of being professional in language testing. Language

Testing, 14(3), 328-39 at Alderson, J.C and Banarjee, J. (Ed) 2008.

Gronlund, N. E. 1998. Assessment of Students Achievement. 6th Edition. Boston:

Allyn and Bacon in Brown, H.D. (Ed) 2004.

Hatch, E. and Farhady, H, 1982. Research Design and Statistics for Applied

Linguistics. London: Newbury House Publishers, Inc.

Hughes, A. 2005. Testing for Language Teachers. 2nd Ed. London: Cambridge

University Press.

Jumadi. ___. Pengeretian KTSP dan Pengembangan Silabus dalam KTSP. A

journal presented on Training and Implementation of KTSP in SD

Wedomartini.

Mehrens, W and Lehmen, I.J. 1984. Measurement and Evaluation in Educational

and Psychology. New York: Halt Rinehart and Winston.

Meizaliana. 2009. Teaching Structure Through Games to The Studentss of

Madrasah Aliyah Negeri I Kapahiang Bengkulu. A Thesis. Semarang:

Diponegoro University.

Nitko, A. J. 1983. Educational Test and Measurement an Introduction. Horcourt:

Brace Javanovich, Inc.

Nurulia, L. 2011. An Analysis of Multiple-choice English Formatuve Test for

Grade VIII of MTsN 1 and MTsN 2 Semarang. A Thesis. Semarang:

Semarang State University.

Richards. 2001. Curriculum Development in Language Teaching. London:


46

Tuckman, B. W. 1975. Measuring Educational Outcomes Fundamentals of

Testing. New York: Harcourt Brace Javanovich Inc.

Valette, R.M. 1967. Modern Language Testing. 2nd Ed. New York: Harcourt

Brace Jovanovich Publishers

__________. 2011. Understanding Assessment a Guide for Foreign Language

Educators accessed at http://www.cal.org/flad/tutorial/

47

http://www.cal.org/flad/tutorial/

Thesis Proposal

Documents

Transcript of Thesis Proposal