Post on 03-Dec-2014
1. TOPIC
AN ANALYSIS OF ENGLISH FINAL TEST OF THE FIRST SEMESTER
STUDENTS GRADE V MADE BY MGMP OF ENGLISH OF NATIONAL
EDUCATION MINISTRY OF SEMARANG AND MGMP OF ENGLISH
OF RELIGION MINISTRY OF SEMARANG
2. BACKGROUND OF THE STUDY
Evaluation is a common term we always hear in our daily life. Evaluation is
used when we want to know the progress and result what we have been done.
When we work or do some jobs, evaluation is useful to know whether our work
done well or not. Evaluation can be used to know and check if there is some
obstacle that blocked our plan.
One of several ways in doing an evaluation can be done by making an
assessment. The last term is more special and narrower than previous one. It is
because assessment is part of evaluation. Evaluation can be done by making an
assessment, but evaluation occurs in some ways by observation or performance
judgment during the process. Assessment itself is more familiar in education
domain rather than other since teacher, trainer, or education practitioner use the
term to measure and analyze how far students understand material they taught.
Somehow, it is done by some employers to assess their employee to know the
progress of their work. For example, a director makes an assessment and appraisal
to analyze her or his employee’s work; a headmaster does it to assess teachers’
1
work; and a teacher used the term to assess and evaluate students’ understanding
and achievement.
To assess students’ understanding and know their achievement on material
which has been taught by teachers, usually they give their students some questions
in form of a test. Students have to answer with the correct answer related to the
subject’s material. The question can be in form of essay test in which students
have to write in some sentences the theories and their understanding on it. Beside,
teachers can give the questions in form multiple-choices to simply check students’
understanding and acceptance on the material. However, every form of the
assessment have their plus and minus aspects that the teachers should know. It is
teachers’ authority to choose what form of questions they want to give to the
students. And it can be based on what aspects of students’ intelligence that
teachers want to know.
Assessing language subject in this case English as foreign language, is little
bit different from assessing other non-language subject. Non-language subject is
presented by questions and item tests in native language so what teachers want to
knows here is about the scientific material and knowledge related to the subject
students know. Whereas, testing language subject does not just examine on the
knowledge and science of the subject, but practically it should cover some skills
in mastering foreign language in order that students can be said as successful
learners. Thus, in language testing the questions have to be able to measure
language learners in mastering listening, speaking, reading and writing in foreign
language. Of course, the skills they have to master are in line with students’ level
2
of education. It is for example that in level of senor high school, the students
should master at least two or three skills in minimum standard. It means that even
though they are not able to speak communicatively in English and write in a good
arrangement, at least they understand when they listen to the conversation and
read some statements in it. It can be assumed that in the level of elementary
school, what students should master is not the same with senior high school one.
In this level, they can be said as master in English when they can memorize some
vocabularies related to themes and use them in the context of situations. It means
that, students in elementary school can only understand and make statements in
foreign language in very simple sentences or paragraph both orally or written. But
usually teacher in this level only focus on students’ mastering in vocabulary.
Considering that language testing of elementary school is more focus on
vocabulary mastering and the use of it in context, it usually comes to the students
in multiple-choice formats. The teachers prefer using this format because it simply
can measure students’ mastering in vocabulary. Even though in multiple-choices
test items not only vocabularies mastering we can measure, but also another
aspects such as grammar, and language context. In addition, to evaluate other
aspects, teachers usually combine multiple-choice questions with essay test items.
So that it is teachers’ authority to use whatever format they want based on the
purpose. Sometime, in any cases teachers cannot make by themselves test items
given to their students. It is because the test pack is already prepared by an
institution that has right to make the test items. In this case, what teachers can do
is only preparing the material for students to face the test. In Indonesia, this case
3
is such National Examination for graduation of each education level, and final test
of every semester. Ministry of National Education has the authority to make the
test items, because in Indonesia every school should go with the tide of its official
regulation. Besides that, Ministry of Religion has the same authority to make test
pack for some schools. The schools that follow the education rule of Ministry of
Religion are those which have more religion subject on their curriculum, in this
case Islamic. Therefore, the schools induce on two regulation of education;
Ministry of National Education and Ministry of Religion.
Problem arises when there are two different test pack form the same grade
of each education level. A question come up whether or not test pack organized by
Ministry of National Education has the same quality and characteristics with the
one arranged by Ministry of Religion. If there are any differences, for example
test pack made by Ministry of National Education easier than those made by
Ministry of Religion or vice versa, it is unfair to one side. Another case is that one
of the test packs may not appropriate with instructional material, in this case
Standard and Basic Competence.
From the problem explained above, the writer want to analyze English final
semester test pack made by MGMP English of Ministry of National Education
compared by the one made by MGMP English of Ministry of Religion. In the end
of the study, the writer hopes that there is no difference between the two test
packs. If they exists, from this study the differences can be fixed so that the
education system from both ministries can go in line one to another.
4
3. IDENTIFICATION OF THE PROBLEMS
Analysis of items of two test-packs with the aim to know whether they are a type
of good test or not, can be done by using some quantitative measurement. It is
such as the measurement of validity, reliability, level of difficulty, discrimination
power, and item distractors. Yet, if we want to study further about analysis of test
items, we should not only tie on quantitative analysis. We can use qualitative
analysis to evaluate several non-statistical aspects of test items. The study of
qualitative analysis can be covers with some analysis on the appropriateness of
test items with materials in teaching and learning process (school-based
curriculum), test construction, and language used on test items.
4. FORMULATION OF THE PROBLEMS
In order not to discuss something irrelevant the writer has limited the discussion
by presenting and focusing her attention to the following problems:
4.1 How is the quality of the English Final test of first semester students made
by MGMP English of the Ministry of National Education and Ministry of
Religion of Semarang in terms of validity, reliability, difficulty level,
discrimination power, and item distractors?
4.2 How are the appropriateness of those test items in terms of instructional
materials (Standard Competence and Basic Competence), Test
Construction, and Language Use?
4.3 What are differences of those tests items made by MGMP English of
Ministry of National Education and Ministry of Religion of Semarang?
5
5. OBJECTIVES OF THE STUDY
Based on the formulated problems above this study has several objectives. They
are elaborated as follows:
5.1 To describe how the quality of the English Final tests of first semester
students made by MGMP English of the Ministry of National Education
and Ministry of Religion of Semarang in terms of difficulty level,
Discrimination Power, validity and reliability are.
5.2 To describe how the appropriateness of those test items in terms of
instructional materials (Standard Competence and Basic Competence), Test
Construction, and Language Use are.
5.3 To explain the differences of those tests items made by MGMP English of
Ministry of National Education and Ministry of Religion of Semarang?
6. SIGNIFICANCES OF THE STUDY
Related to the objectives of the study, this analysis is made to give some
advantages as elaborated in some paragraphs below. There are three major
significances that this study wants to give.
The first one is theoretical significance. This study may give basic
understanding toward teacher, educator, trainer, and others that assessment and
evaluation cannot be made and assumed only based on students or one’s outer
performance or guessing in some cases. They should know that that test items
should be made to evaluate students’ understanding and ability. The tests are
useful also to develop their professionalism as being an educator.
6
Second one is practical significances. This study is beneficial for the test
makers as additional reference in constructing and analyzing test items and their
procedures.
And the last one is pedagogical significance. This study provides English
teachers especially elementary schools’ teachers with some meaningful and useful
information for efficient class discussion of the test result, the general
improvement of classroom instruction, evaluation in teaching learning process,
and improvement in test making.
7. LIMITATION OF THE STUDY
The limitation of the study is written to limit and border the research so that it will
not go further that the researcher want to discuss about. This study is quantitative
and qualitative research. It studies about such test items in form of multiple-
choice questions. This test will be analyzed using quantitative approach in this
case its statistical features will be measured. Not only using quantitative approach,
does qualitative approach will also be used to synchronize the tests within
Standard and Basic Competence, test instruction, and its language use.
The test items used here is English test-packs in final test of first semester
students for Grade V of Elementary School. The study analyze only in Grade V of
Elementary School just because the limitation of the time of research.
7
8. REVIEW OF THE RELATED LITERATURE
8.1 EVALUATION IN EDUCATION
Evaluation is the notion that the value of worth of someone or something is to be
judged. It may occur by some tests, measurements, or other objective information
(Nitko, 1983:7). More specific, Tuckman says that “evaluation is a process
wherein the parts, processes, or outcomes of a program are examined to see
whether they are satisfactory, particularly with reference to the program’s stated
objectives, our own expectation, or our own standard of excellence” (1975:12).
What is meant by educational evaluation here is a way of examine,
investigate, and appraise any aspects in education field. It is a process which
involves the production, application and analysis of instruments of educational
measurements (Nurulia, 2011:12). In narrower term, this technique is used by a
headmaster to evaluate some teachers, and commonly, this technique is used also
by teachers to evaluate their students’ understanding on materials. It is near with
what Gronlund (in Nurulia, 2011:12) states that evaluation is systemic process of
determining the extent to which instructional objectives are achieved by pupils.
Cornbach states that evaluation is the collection and use of information to make
decisions about an educational program (1984:60 in Nurulia, 2011:13).
8.2 LANGUAGE TESTING AND ASSESSMENT
A test is a method of measuring a person’s ability, knowledge or performance in
a given domain (Brown, 2004:3). In this statement, Brown, want to highlight on
the term testing as a way or method in which people’s intelligence and
8
achievement are being explored. Testing becomes the important method to check
many requirements or competency in some fields like medicine, law, sport, and
government. Because, it test so many aspect that must be fulfilled by test takers
before go deep in such fields. Yet, in teaching learning process, the term testing is
little bit different with those kinds of test. Related to the term of testing, people
are commonly think that assessment is the same method as testing is. They still
confused and consider that testing and assessment are the synonymously. While
test are prepared administrative procedures that occur identifiable times in a
curriculum when learner muster to offer peak performance, knowing that their
responses are being measured and evaluated, assessment is an ongoing process
that encompasses a much wider domain (Brown, 2004:4). Bachman (2004:7)
states that assessment is the process of collecting information about a given object
of interest according to procedures that are systematic and substantively grounded.
It is conducted and can be implied whenever a student responds to a question,
offer a comment, or tries out the new word or structure (Brown, 2004:4). In this
situation, the teacher subconsciously makes an assessment of the students’
performance (Brown, 2004:4). In educational programs, “the result of assessment
are most commonly used to describe both the processes and outcomes of learning
for the purposes of diagnosis or evaluating achievement, or make decisions that
will improve the quality of teaching and learning and of the program itself”
(Bachman, 2004:6)
Language tests offer us many choices in test administration, test
format, materials, scoring method, and test items.
9
(http://www.cal.org/flad/tutorial/practicality/2methodoftesting.html). Language
test have potential for helping us collect information that will benefit a wide
variety of individuals (Bachman, 2004:3).
Alderson and others have argued that “testers have long been concerned
with matters of fairness and that striving for fairness is an aspect of ethical
behavior, others have separated the issue of ethics from validity, as an essential
part of the professionalizing of language testing as a discipline” (Davies, 1997).
Tests then are subset of assessment. They are certainly not the only form of
assessment that a teacher can make. Test can be useful devices, but they are only
one among many procedures and tasks that ultimately use to assess students
(Brown, 2004:4).
In short, it can be said that test is a part of assessment so that assessment is
wider than test itself. Assessment can be understood as a part in teaching and
learning process. Testing and assessment are two methods and ways that must be
used and implied in teaching. Language assessment takes place in a variety of
situations, including educational programs and real word settings (Bachman,
2004:6).
8.3 TYPES OF ASSESSMENT AND TESTING
In order to know more about assessment, in this sub chapter the writer want to
explain about type and from of assessment. There are two types of assessment,
informal and formal assessment (Brown, 2004:5). Informal assessment can take a
number of forms starting with incidental, unplanned comments and responses,
10
along with coaching and other impromptu feedback to the student (Brown,
2004:5). In this type of assessment, teachers record students’ achievement by
some techniques that are not systematically made. In this assessment, teachers can
memorize what students do in the classroom based on their learning activity.
Whereas, formal assessment are exercises or procedures specifically designed to
tap into a storehouse of skills and knowledge (Brown, 2004:5). Different from
informal assessment, this type of assessment is intentionally made by teacher to
get students’ score to know their achievement. This assessment is done by teacher
by making standard and official based on the rule. It is conducted systematically
and periodically (Brown, 2004:6). We can say that all tests are formal assessment
but not all of formal assessment is tests (Brown, 2004:6).
There are two functions of assessment that usually occurs in the classroom
based. They are formative and summative assessment (Brown, 2004:6). Formative
assessment intends to evaluate students in the process of forming their
competencies and skills with the goal of helping them to continue that growth
process (Brown, 2004:6). This formative assessment usually occurs during
teaching and learning process in the classroom done by teacher to know directly
students’ achievement. This assessment is conducted to build and grow up
students understanding and skills during the process. Assessment is formative
when teachers use it to check on the progress of their students, to see how they
have mastered what they should have learned, and then use this information to
modify their future teaching plans (Hughes, 2005:5). Summative assessment,
then, aims to measure, or summarize, what students has grasped, and typically
11
occurs at the end of a course or unit of instruction (Brown, 2004:6). It is used in
the end of the term, semester, or year in order to measure what has been achieved
both by groups and individuals (Hughes, 2005:5). This type of assessment is used
by teacher to measure and evaluate what students achieved in the process of
teaching learning in classroom. Final exams are the example of this test. In short,
formative assessment is done in the middle of the semester in the process of
teaching and learning, but summative is done in the end of the semester. The
object of this study is final test of first semester, so this kind of test is formal
assessment with the function of summative assessment.
In Indonesia, usually a final semester test-packs consist of three parts of
items. They are, first, multiple choice items, and the next is short-answer question,
and the last is essay items. Every item has different definitions and characteristics.
There are some different formulas and measurement that can be used. To know
more about the characteristics of each item, next sub-chapter below will explain
more about them.
8.4 MULTIPLE-CHOICE TEST
Multiple-choice items which may appear to be the simplest kind of item to
construct are extremely difficult to design correctly (Brown, 2004:55). Multiple-
choice items take many forms, but their basic structure is that it has stems or the
question itself, and a number of options- one which is correct, the others being
distracters (Hughes, 2005:75). The most obvious advantage of multiple-choice is
that scoring can be perfectly reliable (Hughes, 2005:75). Scoring in multiple
12
choice techniques is rapid and economical. It is possible to include more items
than would otherwise be possible in a given period of time. This test items are
designed to elicit specific responses from the student (Valette, 1967:6). It allows
the testing of receptive skills without requiring the test taker to produce written or
spoken language and it makes greater reliability (Hughes, 2005:76).
The principles that stand out multiple-choice test items are practicality and
reliability (Brown, 2004:55). Brown states in his book Language Assessment
Principles and Classroom Prectices, those multiple-choice items have prime
terminology. They are:
a. Multiple-choice items are all receptive, or selective, response items in that the test-takers choose from a set of responses rather than creating a response.
b. Every multiple-choice item has stem, which presents a stimulus, and several options or alternatives to chooses from.
c. One of those options, the key, is the correct answer, while the others serve distractors.
The most advantages of multiple choice is that scoring can be perfectly
reliable. It can be rapid and economical. Another advantage is that it is possible to
include more items than would otherwise be possible in a given period of time.
(Hughes, 1989:76).
In another case, Hughes states number of weaknesses of multiple-choice
items (Hughes, 2005:76-78). Multiple-choice questions only recognition of
knowledge, test takers can only guess to come with correct answer, test takers can
cheat easily, the technique severely restrict what can be tested, it is very difficult
to write successful items and the answer is restricted by the optional answer. In
13
this case, test-takers can not elaborate their answer and understanding of the
material because the answer is limited only by optional answer.
Multiple-choice comes to be the first part of test packs faced by test-takers.
When we want to analyze this item we can use statistical analysis as stated in the
next chapter that Research Method. Since there is only one right answer, the score
can very rapidly mark an item as correct and incorrect (Valette, 1967:6). Thus, we
can use simple codes to present the answer of test-takers. Number 1 presents true
answer chosen by students, and 0 presents false answer. If students choose a true
answer, we can note it with 1. And vice verse, if test-takers, in this case student,
answer with false answer we note it with number 0. More explanation can be read
in quantitative analysis later.
8.5 SHORT-ANSWER ITEMS
After test-takers have already answered the multiple choice items in first chapter
of test-packs, in next chapter they have to answer on short-answer items. The
question is just the same, but in this items students are not given distractors items.
The answers are usually only one or two words. Those answers should be exactly
correct. It is usually occurs in listening and reading tests (Hughes, 1989:79).
Short-answer items deals with measurement of students’ knowledge
acquisition and comprehension. It has two choices or formats, free and fixed.
Basically, there two basic free formats they are unstructured format and fill-in or
completion format. Fixed choice format include true-false, other two-choice,
multiple choice and matching (Tuckman, 1975:77). Short-answer items in English
14
final semester test-packs used in this study here is the items in which students
should answer with writing the answer in a short and brief. How it is different
from essay test items? In essay-test items, students should explore and elaborate
their answer. For example, if the question is about structure and grammar, usually
students should fill in the blank with a complete sentence. Yet, in short-answer
items what students should answer are usually not more than two or three words.
That is why the items can be called as short-answer items. This item may require
one-word answer, such as brief responses to questions, or the filling in of missing
elements (Valette, 1967:8).
In the short-answer items, the true answer has been determined by teachers
so that students can not elaborate their answer. Both free choice and fixed choice
items have previously determined correct response. In free choice type, the
student is not given choices from which to select the correct response as he or she
is in the fixed choice type (Tuckman, 1975:77). In this formats, basically,
measurement involves asking students a question that requires that they state or
name the specific information or knowledge (Tuckman, 1975:77).
In this part of test-packs, usually short-answer items are in unstructured and
completion/ fill-in format. In unstructured format, students can answer by a word,
phrase or number. While in completion or fill-in format, students must construct
their own response rather than choose an optional answer. It differs from
unstructured item by requiring that they fill in or complete a sentence from which
a word or phrase has been omitted (Tuckman, 1975:79).
15
In order to assure to the objective nature of short-answer items, teacher
must prepare a scoring system in advance (Valette, 1967:8). Teacher should give
credit score to students’ answer for misspelling of the world given. But since in
short answer usually the answer is only one word, we can use the credit point the
same as multiple choice. We can use the number 1 to presents students choose
correct answer and number 0 that presents incorrect answer. We only have to
mark as 1 and 0 because the answer has been determined by test-makes, and there
is no optional answer for test-takers.
8.6 ESSAY TEST ITEMS
In English final test of elementary school, beside multiple choice and short-
answer items, there is one more test technique that is served to the test-takers in
final semester test-packs. It is essay test. Different from short-answer items, essay
test need longer answer using deeper analysis. While short answer is the
continuity of multiple choice items, essay test involve deep thinking about test-
takers knowledge and understanding on material. In language testing, it may
include in students understanding on language structure and culture.
Essay items provide test-takers with the opportunity to structure and
compose their own responses within relatively broad limits (Tuckman, 1975:111).
Essay tests enable them to demonstrate their ability to apply knowledge and to
analyze, to synthesize, and to evaluate new information in the light of their
knowledge (Tuckman, 1975:111). This test is more reliable to measure students’
understanding. Tuckman says in his book Measuring Educational Outcomes
16
Fundamentals of Testing that there are several aspects of students that can be
measured using Essay tests. They are students’ application, analysis, synthesis,
evaluation, and combination of those four aspects (1975, 111-123).
Essay questions intended to measure students’ application must require that
the students use knowledge that has been acquired to describe a way of dealing
with a concrete situation. Thus, to measure application of students the item must
present a concrete situation – one that can somehow be included in the reality of
the students being tested and one to which that can relate (Tuckman, 1975:112).
In analysis items, the questions do not contain certain problem. The
situation is one with which the students presumably is familiar and that contains
elements, relationships, or organization principle which can be analyzed
(Tuckman, 1975:115). Unlike measuring students’ application in which students
are given familiar problem, in measuring how students can analyze on something,
as teachers we should give them a problem that need understanding in
organization and relationship between several variables.
When we want to measure students’ synthesis, the items should present
problem to be solved. It should be outside of the range of the familiar or the
practical and require the production of a new and unique solution of the problem.
Moreover, the particular problem itself must also be new for students (Tuckman,
1975:117-118). Synthesis then can be interpreted also as how creative students in
making a solution on a problem and create a new model different with what
teacher has taught to them.
17
In evaluation item, the questions contain two parts that which are to be
evaluated and response instructions. Response instruction also includes
information about the criteria that are to be used in evaluation. In addition, an
essay items to measure evaluation provides a general criterion for evaluating it
and general response instruction to provide detailed support for one’s evaluative
position (Tuckman, 1975:122). Students’ understanding then have to present their
skill in evaluate on the problem related with material taught by teachers.
All in all, analysis, synthesis, and evaluation can all combined in a single
question. Giving students an object, an organization, an occurrence, and asking
them to analyze its parts or workings is the first step. Evaluating the parts or
workings is the second step and redesigning or improving upon it through
synthesis is the third step. (Tuckman, 1975:123). So that one item of essay
questions can be used to test students’ intelligence and understanding
performance.
There are several words and keywords that can be used to prepare essay
questions. Several words included are analyze, compare or contrast, describe,
define, evaluate, explain, summarize, justify, outline, identify, and so on. Those
words are usually used to perform and present essay items.
The scoring system of this item will be very different from scoring
objectives items or multiple-choice. In objective items, the score of each number
is exact and all the same from number to number. Whereas, in essay items, what
we should do, first, is determining the ideal answer even though no correct and
wrong answer at all. The ideal answer then should be scored as highest score. And
18
the far answers of students go beyond it will be the lowest score it is. Teachers
then should create interval scale to score the highest and the lowest one on each
item. Interval scale will be going like picture below:
1 2 3 4 5 6 7 8 9 10
The interval scale then can be used to measure how far students understand the
material. The highest score students get the more understand they are. Teachers
have an authority to determine interval scale number between ideal and not-ideal
answer. It can be a scale from 0 until 10 like the scale above, or 0 until 3 or 5
based on their preferences. It may can be decided by calculating every score of
every items, from the multiple choice, short-answer items, and the last is essay
items.
8.7 QUALITATIVE ANALYSIS
8.7.1 School-Based Curriculum (KTSP)
Curriculum is a document of an official nature, published by a leading or
central education authority in order to serve as a framework or a set of guidelines
for the teaching of a subject area in a broad varied context (Celce-Murcia, 2000).
A curriculum in a school context refers to the whole body of knowledge that
children acquire in school (Richards, 2001:39). More specific, BSNP defines it as
a set of plan and arrangement of objective, content, and lesson material, and also
manner that is used as the guidance of learning activities to achieve the aim of
education (2006:1751). In short, we can say that curriculum is the fundamental
19
Ideal AnswerNot ideal answer
guidelines for teachers to reach the aims of education in school. It is a ground-
base teachers should know in conducting teaching learning process.
School-Based curriculum is as the same as the terms curriculum has stated
in the subchapter before. It is a revised-edition of curriculum of 2004 which is in
Bahasa Indonesia said Kurikulum Berbasis Kompetensi (Competence-Based
Curriculum). This curriculum firstly used in any educational institution since
2006. It is the way in which any school can create and make policy and rule about
their educational programs. Teacher can create their own syllabus, teaching-
learning process, and learning goal that are appropriate for students in their
school.
KTSP is operational curriculum that is arranged and applied in every
educational unit (Jumadi, 2). It is because KTSP is created based on school’s need
and condition. In this way, schools in big city may have different curriculum from
school in a small city. The arrangement of the content itself is regarding with
cultural and social condition of the students of a school. In order that, students that
are in different places and areas have their own learning achievement that
appropriate with their natural life. Even though based on Government Rule 19,
2005 about Education National Standard, every school is mandated to develop
KTSP based in Passing Competence Standard (SKL), and Content Standard (SI)
and based on the guidance arranged by Education National Standard Board
(BSNP). Government publishes General Guidance in arranging KTSP in order
that an educational unit or a school that has ability can develop KTSP started in
20
academic year 2006/2007 (Jumadi, 1). A school is called having ability to arrange
and develop KTSP if it have tried to apply Curriculum of 2004 in its institutions.
Based on The Rule of Minister of National Education number 24, 2006, the
arrangements of KTSP involves teachers, employees, and also School Committee
with the hope that KTSP will reflect the aspiration of people, environment
situation and condition, and the people’s need. That is why, this curriculum is
more democratic than curriculum used in every school before. It gives place for
democratization to determine the education curriculum which is appropriate to the
community context where the school take place, financial context, human
resources and other things of the school so that the potential of each school can be
optimalized and there is competition among school (Handayani, 2010). KTSP
consists of educational goal in educational unit level, structure and curriculum
content in educational unit level, educational calendar, and syllabus (Jumadi, 2).
Sutrisno in Handayani (22:2010) states that as a concept and also program,
KTSP has characteristics as follows:
a. KTSP emphasized on the students’ competence achievement. In KTSP, the students are formed to develop knowledge, understanding, ability, value, attitude, and wants to be skilled and independent person.
b. KTSP is learning process and variety oriented.c. Learning process uses various approaches and methods.d. Teachers are not the only source, but the other educative sources
are included.e. Assessment emphasizes the process and the result of study to
achieve a competence.
21
KTSP consist of two basic documents, they are school documents and the
contents. School documents here means any information about school in which
KTSP is arranged. They are for example introduction of KTSP, vision, mission,
and goal of the school, curriculum structure and content, and education calendar
that is made by the school independently. KTSP structure and content in
elementary education level stated in Content Standard involves five group of
subject as states below:
1) Group of religion and morality subject2) Group of citizenship and personality subject3) Group of science and technology subject 4) Group of aesthetics subject 5) Group of athletics and health subject
. (BSNP, 2006)
The education goal in elementary level is to put basic intelligence,
knowledge, personality, noble characters, and independent-lived skill and to
continue into higher level of education (Jumadi, 3). Education calendar is made by
school autonomously with the guidance of education calendar established by
national education department.
And the other basic of document is that document that relates to certain
subject taught in the school. In every subject, the material consists of:
a. Syllabus and Lesson Plan of the Competence Standard and the Basic
Competence that are developed by Central Government;
b. Syllabus and Lesson Plan of the Competence Standard and the Basic
Competence that are developed by school (subject of the local content).
(Handayani, 2010)
22
English lesson in elementary education level still become extra lesson regarding
this subject is less important than another local lesson. Yet, English is now
becoming important subject in globalization era. The goal of English subject in
elementary level is to create students that having ability in developing oral
communicating competence limitedly to as language accompanying action in
school context and having consciousness about the essence and importance of
English to increase national competitiveness in globalization era.
In the table below, the writer presents competence standard and basic
competence of English Lesson grade V semester I that related to this study. They
are:
Competence Standard Basic Competence
Listening
1. Students are able to understand very simple instruction with an action in school context.
1.1 Students are able to respond very simple instruction with logical action in class and school context
1.2 Students are able to respond very simple instruction verbally
Speaking
2. Students are able to express very simple instruction and information in school context
2.1 Students are able to make a very simple conversation that follow logical action with speech act ; give an example to do an action, give a command, and give an instruction
2.2 Students are able to make a very simple conversation toa sk and or give something logically involve speech act , asking and give a help, asking and giving something
2.3 Students are able to ask and give information involve speech act; introducting, invitating, asking and giving permission, agreeing and
23
Competence Standard Basic Competence
disagreeing, and prohibiting
2.4 Students are able to express politeness using expression: Do you mind and Shall we…
Reading
3. Students are able to understand English written texts and descriptive text using picture in school context
3.1 Students are able to read aloud with stress and intonation correctly involve words, phrases, and simple sentence.
3.2 Students are able to understand simple sentence, written messages, and descriptive txt using picture accurately
Writing
4. Students are able to spell and rewrite simple sentence in school context
4.1 Students are able to spell simple sentence accurately and correctly
4.2 Students are able to rewrite and write simple sentence accurately and correctly; such as Menyalin dan menulis kalimat sangat sederhana secara tepat dan berterima seperti: compliment, felicitation, invitation, and gratitution
8.7.2 Syllabus
Syllabus is lesson plan in every subject and or subject group or certain
theme includes competence standard, basic competence, learning material,
learning activity, indicator, scoring, time allocation, and sources (Jumadi, 2).
Syllabus is a part of curriculum. It is can be defined as systematically and
specifically contents of curriculum that can be applied by teachers in their
teaching activity. Teachers can see their teaching learning’s goal, process and
24
objectives in it. Richards states that a syllabus is a specification of the contents of
a course of instruction and list what will be taught and tested (2001:2). Syllabus
has to be in line with curriculum, because it is made based on curriculum. BSNP
defines it as “learning plan on one or group of lesson/ certain theme which covers
Competence Standard, Basic Competence, main material of learning, learning
activities, indicator, assessment, time allocation, and source/ material/ tolls of
learning” (2006:1751).
The development of the syllabus can be arranged by a teacher autonomously
or can be done in a group of teacher of some schools, deliberation of subject
teacher (MGMP) or education official (Jumadi, 7). In Elementary school, usually
teacher in grade I until VI can arrange the syllabus together. A school which
cannot arrange and develop it autonomously should join together with other
schools through MGMP forum to develop it.
We can see from this definition that syllabus’ content is about specific
guidelines for teachers about what they have to do with their job as educators. It is
not only about what they have to do, but also how, when, and by which they have
to do as a professional educator. If teachers teach without step on it, the education
objectives may go out of the national education goal. To make a clear explanation
about syllabus that will be used to analyze in qualitative approach in this study,
here the writer presents syllabus of English lesson of grade V in semester I. It is
important as the basic guidance of teaching learning process and makes an
assessment of it.
25
8.8 ITEM ANALYSIS DATA (QUANTITATIVE ANALYSIS)
8.8.1 Validity
Test validity refers to whether a test measures what we intend it to measure
(Tuckman, 1975:229). Validity is an integrated evaluative judgment of the degree
to which empirical evidence and theoretical rationales support the adequacy and
appropriateness of inferences and action based on test scores or other modes of
assessment. (Mesick in Bachman, 2004:259).
The objectives of many test is to measure the effect of certain experiences
that have occurred prior to the test (Tuckman, 1975:229). A test, then, is used to
monitor or assess an experience that has already occurred or to determine students
learning based on the experience (Tuckman, 1975:229).
When selecting a test, it is important to make sure that the information
provided by the test is sufficient for the decisions we are going to make based on
the test scores. (http://www.cal.org/flad/tutorial/validity/4testuse.html)
Brown says that the most important principle of a test is validity (2004:22).
It is the extent to which inferences made from assessment result are appropriate,
meaningful and useful in terms of the purposes of the assessment (Gronlund,
1998:226). In some cases, it may be appropriate to examine the extent to which a
test calls for performance that matches that of the course of unit of study being
tested (Brown, 2004:22)
There are two types of validity that are most relevant o classroom test,
namely: face validity and content validity (Brown, 2002:26). Face validity refers
26
to the appearance of a test that looks like it is measuring what is supposed to
measure. Mousavi (in Brown, 2002:26) stated that face validity refers to the
degree to which a test looks right and appears to measure the knowledge or
abilities it claims to measure based on the subjective judgment of the examinees
who take it, the administrative personnel who decide on its use, and other
psychometrically unsophisticated observers.
Face validity refers to the degree to which a test look right, and appears to
measure the knowledge or abilities it claims to measure, based on the subjective
judgment, of the examinees who take it, the administrative personnel who decide
on its use, and other psychometrically unsophisticated observes (Mousavi in
Brown, 2004: 26). We can say that face validity refers to the performance of the
test when it comes to test-takers. How it looks good or bad to test-takers and how
the test-takers feel when the test-pack is given to them is known as face validity.
Hughes states that a test is said to have face validity if it looks as if it measures
what it is supposed to measure (Hughes, 1989:33). Brown (2004:27) states that
face validity will likely be high if learners encounter:
a. A well constructed, expected format with familiar tasksb. A test that is clearly doable within the allotted time limitc. Items that are clear and uncomplicatedd. Directions, that are crystal cleare. Tasks that relate to their course work, andf. A difficulty level that presents a reasonable challenge
Several parts of test-packs that related with this study is about performance
appears in questions sheet. They are font used in test pack whether it is easy or
difficult to be read or not. If the test-packs consist of some pictures, it should be
27
analyze also whether the picture is clear enough or not. The important aspect that
should be look deeper is that the arrangement of the test-items involves
vocabularies, phrases and sentences arrangement of the test. If all of the features
mentioned above have been well organized, the test-takers will feel confident to
face and answer the test-packs. A test which does not have face validity may not
be accepted by candidates, teachers, educations authorities and employer (Hughes,
1989:33). It is because the test is not standardized and the test will not perform
what should be measured.
In contrast to face validity, a claim of content validity requires affirmation
from an expert. The expert should look into whether the test content is
representative of the skills that are supposed to be measured. This involves
looking into the consistency between the syllabus content, the test objective and
the test contents. If the test contents cover the test objectives, which in turn are
representatives of the syllabus, it could be said that the test possesses content
validity (Brown, 2002:23-24). A test is said to have content validity if its content
constitutes a representative sample of the language skills, structures, etc which it
is meant to be concerned (Hughes, 1989:26). It means that a test will have content
validity if the test-items appropriate with what teachers want to measure. If
teachers want to test students’ understanding on grammar and structures, the test
should be near of it and not out of the topic. The importance of content validity is
that the greater a test have content validity, the more likely it is to be an accurate
measure if what it is supposed to measure (Hughes, 1989:27). Another importance
is that areas that will not test are likely to become areas ignored in teaching and
28
learning. It means that teacher should give test-items based on what they have
taught to students.
8.8.2 Reliability
Reliability refers to the consistency of test result. A reliable test is consistent
and dependable (Brown, 2004:20). Reliable here means that a test must reliable
and fit on several aspects in conducting the test itself. A test should reliable to
students as test-takers. Bachman (2004: 153) states that reliability is consistency
of measures across different conditions in the measurement procedures.
The most common learner-related issue in reliability is caused by temporary
illness, fatigue, anxiety in facing the test (Brown, 2004:21). Beside, a test must
have rater reliability. Rater reliability is a principle in which the scoring process
should be match and fit to the testing and assessment. This scoring process must
be standardized. Unreliability may also result from the conditions in which the
test is administered (Brown, 2004:21). In every test, then, no measurement
instrument or procedure is perfect (Tuckman, 1975:253). Neither a mechanical
device such as voltmeter nor a human device such as a test gives a result that is a
perfect reflection of the property being measured (Tuckman, 1975:253) Test
administration must be reliable also by which a test will go succeed and well-
organized. Bad administration and unplanned arrangements of a test can make the
good preparation going worse.
8.8.3 Level of Difficulty (Item Facility)
A good test is a test which is not too easy or vice verse is too difficult to students.
It should gives optional answer that is rational students may choose. Very easy
29
item are to build in some affective feelings of “success” among lower ability
students and to serve as warm up items, and very difficult items can provide a
challenge to the highest-ability students (Brown, 2004:59). Too easy test will not
stimulate students to fix it, and too difficult test will make boring students to find
the answer (Arikunto, 2006:207).
Level difficulty or in Brown (2004:58) it states as item facility is the extent
to which an item is easy or difficult for the proposed group of test-takers. It makes
students know and record the characteristics of teacher’s test if the test given
always comes to them too easy and difficult. Thus, the test should be standard and
fulfill the characteristics of a good test. The number that shows the level difficulty
of a test can be said as difficulty index (Arikunto, 2006:207). In this index there
are minimum and maximum scores. In this index, the lower index of a test shows
more difficult the test is. And vice verse, the higher the test is the easier it is.
There are some factors that every test constructors must consider in
constructing difficulty level of test items. Mehren and Lehmen point out that the
concept of difficulty or the decision of how difficult the test should be depends on
variety factors, notably 1) the purpose of the test, 2) ability level of the students,
and 3) the age of grade
8.8.4 Discrimination Power (Item Discrimination)
It explains how well the items perform in separating the better students from the
poorer ones (Nurulia, 2010:53). It is the extent to which an item differentiates
between high and low-ability test-takers. Discrimination is important because the
30
more discriminating the items are, the most reliable will be the test (Hughes,
1989:226)
It is defined as the ability of a test to separate master students and non-
master students (Arikunto, 2006:211). A master student is a student with higher
scores of test, and a non-master student is a student with lower scores on the test
given. As same as the term of difficulty level, discrimination has discrimination
index. It is an indicator of how well an item discriminates between weak
candidates and strong candidates (Hughes, 1989:226). This index is used to
measure to the ability of a test in discriminating the upper and lower group of
students. Upper students are students who answer with true answer, and lower
group are students with false answer. In this index, it has negative point. Different
from difficulty index, the negative point in this index shows that the questions
present masters students as dull students and non-masters students as smart
students. A good question is a question that can be answered by upper group and
cannot be answered with true answer by lower group. If a question can be
answered truly by both upper and lower group or vice verse cannot be answered
truly by both groups, it means that the question is a bad test because the
discrimination index shows 0 point.
“The higher its discrimination index, the better the item discriminates in this way. The theoretical maximum discrimination index is 1. An item that does not discriminates at all (weak and strong test-takers perform equally well on it) has a discrimination index of zero.” (Hughes, 1989:226)
An item on which high-ability students who did well in the test (master
students) and low ability students (non-master students) who did not score equally
31
well would have poor ID because it did not discriminate between the two groups.
Conversely, an item that garners correct responses from most the high-ability
group and incorrect responses from most of the low ability group has good
discrimination power (Brown, 2004:59).
8.8.5 Answer of Questions Form (Item Distractors)
In addition to calculating discrimination indices and facility values, it is necessary
to analyze the performance of distractors (Hughes, 1989:228). It is defined as the
distribution of testee in choosing the optional answer (distracters) in multiple
choice questions (Arikunto, 2006:219). This item is as important as the other
items consider that in view of nearly 50 years of research that shows that there is a
relationship between the distractors students choose and total test score (Nurulia,
2010:57).
It can be obtained by calculate the number of testee in choosing the
distractors. We can calculate this form by seeing the answer form done by
students. The distractors are good if chosen by minimum 5% of the number of test
takers. One way to study responses to distractors is with frequency table that tells
us the proportion of students who selected a given distractor. Remove or replace
distractors selected by few or no students because students find them to be
implausible (Nurulia, 2010:57). Distractors that are not chosen by any examinees
should be replaced or removed. Distractors that do not work for example are
chosen by very few test-takers should be replace by better ones, or the item should
be otherwise modified or dropped (Hughes, 1989:228). They are not contributing
32
the test’s ability to discriminate the good students from the poor students (Nurulia,
2010:57)
9. RESEARCH METHOD
9.1 THE RESEARCH HYPOTHESIS
Hatch (1982: 3) states that hypothesis is a tentative statement
about outcome of the research. In line with what Hatch states, Best
says that hypothesis is tentative answer to question (1977: 26). On
the general definition it can be said as pre-assumption of the
researcher about the product of the study. Furthermore, he states
that the statistical hypothesis should be stated in negative or null
form.
In this research, the hypothesis is that students of both SDIT Al
Kamila Semarang and MI Darus Sa’adah Semarang will get the
same score in each of the test pack used by both schools. It is from
the assumption that both of the test packs have the same degree in
their quantitative and qualitative aspects.
9.2 OBJECT AND SUBJECT OF THE STUDY
The object of this study is multiple-choice test items in English subject on
elementary school for Grade V. The test items used is the comparison between
test pack made by Ministry of National Education and Ministry for Religion. The
comparison of the two test pack is used since in Indonesia there are two ministries
that deals with formal education and delivers formal test from elementary until
33
senior high school. In order that, the writer want to compare the qualities of two
test pack in form of their statistical and non-statistical features.
The two test packs actually consists not only in form of multiple-choice
questions, but also brief response and essay. But, in order not to discuss too broad,
the writer only focus on analyzing questions in form of multiple-choice items.
The two test packs of multiple-choice questions, then, are given to two
different classes. The two different classes consist of one from non-Islamic state
elementary school in this study is taken from SDIT Al Kamila Semarang, and the
other is from Islamic private elementary school, in this case taken from MI. Darus
Sa’adah Semarang.
9.3 POPULATION AND SAMPLE
9.3.1 Population
The population of the study is multiple-choice test items that are taken
from English final test for Grade V of elementary school and students of Grade V
that will be given the test.
9.3.2 Sample
From the population above, we get sample of the test. They are multiple-
choice English final test of first semester academic year 2011/2012 for Grade V
and students of Grade V in SDIT Al Kamila Semarang and MI. Darus Sa’adah
Semarang in the same academics year.
9.4 RESEARCH DESIGN AND INSTRUMENT
9.4.1 Research Design
34
Bachman (2004:3) states that much of data obtained from language
assessment is quantitative, and statistic is a set of logical and mathematical
procedures for analyzing quantitative data. Thus, the methods used in this study
are both quantitative. But, the writer needs not only mathematical measurement to
analyze multiple choice tests. She uses also qualitative approach in her study.
Quantitative approach is used to measure test items’ statistical features such as
their validity, reliability, difficulty level, and Discrimination Power. To measure
those items there are several formulas that will be presents in the next sub chapter.
In addition, qualitative approach is used to check whether or not the test items are
appropriate with Standard and Basic Competence by which teaching learning
process use as fundamental instruction. In qualitative approach, language used in
test items will be analyzed to measure whether it is good enough or not.
9.4.2 Instruments/ Unit analysis
In this study, instruments that are used are two test packs. It consists of multiple
choice, short-answer items, and essay items. The two test packs are taken from
English Final Test used by SDIT Al Kamila Semarang and MI Darus Sa’adah
Semarang. Each of the test packs will be given into one class of grade V of SDIT
Al Kamila Semarang and MI Darus Sa’adah. These two test-packs are delivered
from different institution. The one used in SDIT Al Kamila is made by MGMP
English of Ministry of National Education Semarang. The rest test-pack used in
MI Darus Sa’adah is made by MGMP English of Ministry of Religion Semarang.
Thus, students of grade V will get and will answer two different test packs. This
35
method is conducted to see whether there are any differences in those two test
packs or not.
1. Test items made by MGMP of English of National Education Ministry of
Semarang and MGMP of English of Religion Ministry of Semarang in
form of both multiple choice and essay test.
2. Students’ scores on these formative test
3. Cards of item analysis which map the appropriateness of the test item with
the material in syllabus, test construction, and effectiveness of language
used.
9.5 METHOD OF COLLECTING DATA
9.5.1 Collecting method
In collecting method, the writer collect two different test pack in which multiple-
choice is taken as the object of the study. The two test packs are taken from two
different schools. The first one is taken from SDIT Al Kamila Semarang in which
induce in the rule of Ministry of National Education and Culture. Another test
packs is taken from MI Darus Sa’adah that induce on Ministry of Religion since
its curriculum deeply cover on religion subject, especially Islamic. The two test
pack is taken from English teachers teach in those schools.
9.5.2 Testing method
After the test items has been collected on previous method, the tests then are given
to the test takers in this case student in grade V of elementary school on class V
in two different school. Every class is give both test made by MGMP English of
36
Ministry of National Education and Ministry of Religion of Semarang to get the
same result and data of each test pack. To make a clear explanation about the
process of testing, the diagram below will explain more about it:
Note:
Y1 : Final Semester Test-Pack made by MGMP English of Ministry of
National Education Semarang
Y2 : Final Semester Test-Pack made by MGMP English of Ministry of
National Education Semarang
X1 : SDIT Al Kamila Semarang
X2 : MI Darus Sa’adah Semarang
Y1 is a test-pack made by MGMP English of National Education Minister
will given to both students of grade V in SDIT Al Kamila and MI Darus Sa’adah.
So do Y2, that is a test-pack made by MGMP English of Religion Minister of
Semarang, will be given to both students on both schools.
37
Y1X2
X1
X2
X1
Y2
9.6 METHOD OF ANALYZING DATA
9.6.1 Quantitative Analysis
Quantitative analysis deals with measurement of test items on its statistical
futures. They are measurement of test items’ validity, reliability, level of
difficulty, discrimination power, and item distractors.
9.5.1.1. Validity
To know the validity of each number of the test, we can use formula product
moment as described below:
(Arikunto, 2006:72, Bachman, 2004:86 Tuckman, 1978: 163, )
Note:
rxy = correlation coefficient between variable X and Y
N = number of test-takers
ΣX = number of test items
ΣY = total score of test items
ΣXY = multiplication of items score and total score
ΣX2 = quadrate of number of test items
ΣY2 = quadrate of total score of test items
38
By significant standard of 5%, if the result of measurement we get rmeasured
≥ rtable so, it can be said that the test item is significant or valid. If rmeasured < rtable,
then it can be said that the test items is not significant or valid.
9.5.1.2. Reliability
Reliability is constancy. A test can be said as reliable if the test is given to any test
takers whoever they are and whenever by the same result. To measure reliability
we can use formula of K-R. 20 (Kuder Richardson) as follow:
(Arikunto, 2006: 100, Bachman, 2004:164)
Note:
r11 = reliability
p = subject proportion have true answer
q = subject proportion have false answer (q=1–p)
k = number of items
Σpq = multiplication between p and q
S = standard deviation
Varians formula:
Realibility of essay test items can be measured using the Alpha formula
below:
Keterangan:
: test of reliability
39
: number of varians of each item test
: test items’ varians
n : total of test items
(Arikunto, 2006:178)
Classification of items reliability are:
0, 00 < r11 ≤ 0, 20 : very low
0, 20 < r11 ≤ 0, 40 : low
0, 40 < r11 ≤ 0,60 : medium
0, 60 < r11 ≤ 0,70 : high
0, 70 < r11 ≤ 1 : very high
By standard significant of 5%, if measurement process we get r11 ≥ rtable so it
is said that test instrument is significant or reliable. If r11 < table, so it can be said
that test instrument is not significant or not reliable.
9.5.1.3. Level of difficulty
Number that shows difficulty or easiness of a test items is known as difficulty
index. The formula that can be used to measure it is:
(Arikunto, 2006:208, Brown, 2004:59)
Note:
P = level of difficulty
B = number of test-takers answering the item correclty
JS = number of test-takers responding to that item
Classification of level of difficulty is:
40
P = 0, 00 : test items is too difficult
0, 00 < P ≤ 0, 30 : test items is difficult
0, 30 < P ≤ 0, 70 : test items is medium
0, 70 < P ≤ 1, 00 : test items is easy
P = 1 : test items is too easy
There is no absolute P value that must be met to determine if an item should
be included in the test as is, modified, or thrown out, but appropriate test item will
generally have Ps that range between 0.15 and 0.85. (Brown, 2004:59)
9.5.1.4. Discrimination Power
Test Discrimination Power is a technique to discriminate smart test-takers (high
intelligence) and less smart test takers (low intelligence) (Arikunto, 2006:211).
Number shows the degree of test Discrimination Power is known as
discrimination index. In this report, to find difference power we can use split half
formula. In this case we can separate group of test takers into two groups, smart
group by top group and less smart group by bottom group.
The formula that can be used to measure discrimination power of multiple choice
test items is:
(Arikunto, 2006:213)
Note:
D = test Discrimination Power
BA = number of top test takers that have true answer
BB = number of bottom test takers that have true answer
41
JA = total participant of top test-takers
JB = total participant of bottom test takers
Classifications of test Discrimination Power are:
D = 0, 00 – 0, 20: poor Discrimination Power
D = 0, 20 – 0, 40: sufficient Discrimination Power
D = 0, 40 – 0, 70: good Discrimination Power
D = 0, 70 – 1, 00: very good Discrimination Power
D = negative, all of test items is not good. Thus, the items that have same
negative D score should be skipped.
The formula that can be used to measure discrimination power of essay test items
is by using t-test as stated in Arifin (2009:278) below:
Note:
MH = average of high class
ML = averga of low class
= quadrate total of high class individual deviation
= quadrate total of low class individual deviation
ni = total of test-takers high and low class
ni = 27% x N
N = total of test takers
42
Next, tmeasured is compared to ttable by dk = (n1-1) + (n2 -2) with α = 5% with
the charactersistics :
If tmeasured > value ttable , so discrimination power is significant.
Practical use for discrimination power indices is to select items from a test
bank that includes more items than we need (Brown, 2004:60).
9.6.2 Qualitative analysis
While qualitative analysis deals with analyze and study on non-statistical features
on test items. There are three aspects on this sub chapter that the writer going to
study, analysis of instructional materials, analysis of test construction, and
analysis of language use.
9.6.2.1 Analysis of Instructional Materials
Analysis of instructional materials deals with appropriateness of test items with
instructional materials of teaching and learning process stated in curriculum as
Standard and Basic competence. In this sub chapter, the test items will be review
whether or not they are match with Standard and Basic Competence especially on
elementary school. In order that in this study the writer will presents Standard and
Basic Competence of Elementary School for Grade V in order to match the test
items with it.
9.6.2.2 Analysis of Test Construction
Test construction analysis deals with the appropriateness of test items’
construction making by test makers with principles of good multiple-choice
questions. In this analysis, the test items will be analyze whether or not they fulfill
characteristics of a good test as principles of a good test stated in previous chapter.
43
It means that the analysis covers several aspects, such as the question with
optional answer is effective or not. It means that may be the answer is too easy or
vice versa too difficult to be found out. Another problem will be fixed in this sub
chapter is that the questions is easy to be understood or vice versa. Another case is
that, if some questions insert a picture, the picture may easy to be read or not and
so on.
9.6.2.3 Analysis of Language Use
Analysis of language use is simply clear that this sub chapter will analyze on
language use in constructing the questions and optional answer on test items. It
can be assumed that somehow test makers use difficult word or the grammatical
features of the questions hardly to be understood toward students in their level of
knowledge.
10. ORGANIZATION OF THE STUDY
In order to make the readers become easier in understanding this study
report, the writer is going to organise this research paper as follow:
Chapter I is Introduction. It includes the explanation about the background
of the study, reasons for choosing the topic, statements of the problem, objectives
of the study, significance of the study, and the outline of the study report.
Chapter II presents review of related literature that presents some theoretical
source about language test and assessment, measurement, assessment and
evaluation, types of assessment, form of assessment, and some theories on how to
design and make a good test and analyze it.
44
Chapter III deals with method of investigations. It presents methodology of
investigation, including object of the study, population and sample, method and
instrument, method of collecting the data, method of analyzing the data, and
technique of reporting the result.
Chapter IV presents finding and interpretation. It consists of analysis and
discussion of the research findings.
Chapter V as the end of the discussion includes the conclusions and
suggestions.
11. BIBLIOGRAPHY
Arikunto, S. 2006. Dasar-Dasar Evaluasi Pendidikan. Jakarta: Bumi Aksara.
Bachman, L.F. 2004. Statistical Analyses for Language Assessment. London:
Cambridge University Press.
Best, J. W, 1977. Research in Education. New Zealand: Prentice Hall,Inc.
Brown, H. Douglas. 2002. Principles Language Learning and teaching (4th Ed).
New York: Addison Wesley Longman Inc.
Brown, H.D. 2004. Language Assessment Principles and Classroom Prectices.
San Francisco : Longman, Inc.
BSNP. 2006. Standar Isi dan Standar Kompetensi Lulusan Tingkat Sekolah
Menengah Pertama dan Madrasah Tsanawiyah. Jakarta: PT. Binatama
Raya
Celce-Murcia et.al. 2000. Discourse and Context in Language Teaching. London:
Cambridge University Press.
45
Davies, A. (1997). Demands of being professional in language testing. Language
Testing, 14(3), 328-39 at Alderson, J.C and Banarjee, J. (Ed) 2008.
Gronlund, N. E. 1998. Assessment of Students Achievement. 6th Edition. Boston:
Allyn and Bacon in Brown, H.D. (Ed) 2004.
Hatch, E. and Farhady, H, 1982. Research Design and Statistics for Applied
Linguistics. London: Newbury House Publishers, Inc.
Hughes, A. 2005. Testing for Language Teachers. 2nd Ed. London: Cambridge
University Press.
Jumadi. ___. Pengeretian KTSP dan Pengembangan Silabus dalam KTSP. A
journal presented on Training and Implementation of KTSP in SD
Wedomartini.
Mehrens, W and Lehmen, I.J. 1984. Measurement and Evaluation in Educational
and Psychology. New York: Halt Rinehart and Winston.
Meizaliana. 2009. Teaching Structure Through Games to The Studentss of
Madrasah Aliyah Negeri I Kapahiang Bengkulu. A Thesis. Semarang:
Diponegoro University.
Nitko, A. J. 1983. Educational Test and Measurement an Introduction. Horcourt:
Brace Javanovich, Inc.
Nurulia, L. 2011. An Analysis of Multiple-choice English Formatuve Test for
Grade VIII of MTsN 1 and MTsN 2 Semarang. A Thesis. Semarang:
Semarang State University.
Richards. 2001. Curriculum Development in Language Teaching. London:
Cambridge University Press.
46
Tuckman, B. W. 1975. Measuring Educational Outcomes Fundamentals of
Testing. New York: Harcourt Brace Javanovich Inc.
Valette, R.M. 1967. Modern Language Testing. 2nd Ed. New York: Harcourt
Brace Jovanovich Publishers
__________. 2011. Understanding Assessment a Guide for Foreign Language
Educators accessed at http://www.cal.org/flad/tutorial/
47