AN ANALYSIS OF THE ENGLISH SUMMATIVE TEST ITEMS IN …
Transcript of AN ANALYSIS OF THE ENGLISH SUMMATIVE TEST ITEMS IN …
AN ANALYSIS OF THE ENGLISH SUMMATIVE TEST
ITEMS IN TERMS OF DIFFICULTY LEVEL (A Case Study Of The Second Year Students Of Mts Darul Ma’arif Jakarta)
A “Skripsi”
Presented to the Faculty of Tarbiyah and Teachers’ Training in Partial Fulfillment of the Requirements
for the Degree of S.Pd. (Bachelor of Arts) in English Language Education
By:
Rika Amelia NIM. 105014000357
DEPARTMENT OF ENGLISH EDUCATION
FACULTY OF TARBIYAH AND TEACHERS’ TRAINING
SYARIF HIDAYATULLAH STATE ISLAMIC UNIVERSITY
JAKARTA
2010
“do your best.”
“Search for the things you are good at. Work at them until you are the best.
You are guaranteed to succeed.”
The Procedure of the Research
The steps of conducting the research are follows:
1) Collecting the answer sheets and the test.
2) Checking the key answer of the test to see whether the key answer has
been made correctly by the teacher. Then, the result of this checking
becomes reference to score the students’ responses.
3) Arranging and tabulating the answers from highest score to the lowest one.
4) Taking 27% from highest rank to be the upper group; and 27% from the
lowest rank to be the lower group.
5) Calculating and tabulating the students’ responses in the upper and lower
group who response each item correctly and put it in the format of
tabulation of the item analysis.
6) Finding the index of difficulty of the items.
AN ANALYSIS OF THE ENGLISH SUMMATIVE TEST
ITEMS IN TERMS OF DIFFICULTY LEVEL (A Case Study Of The Second Year Students Of Mts Darul Ma’arif Jakarta)
A “Skripsi” Presented to the Faculty of Tarbiyah and Teachers’ Training
In Partial Fulfillment of the Requirements For the Degree of S.Pd. in English Language Education
Approved by:
Dr. M.M. Farkhan, M. Pd
NIP. 150 299 480
DEPARTMENT OF ENGLISH EDUCATION
FACULTY OF TARBIYAH AND TEACHERS’ TRAINING
SYARIF HIDAYATULLAH STATE ISLAMIC UNIVERSITY
JAKARTA
2010
i
ENDORSEMENT SHEET
The examination committee of the Faculty of Tarbiyah and Teachers’
Training certifies that the “Skripsi” (Scientific Paper) entitled “An Analysis of
The English Summative Test Items in terms of Difficulty Level (A Case Study at
the Second Year Students of MTs. Darul Ma’arif Jakarta),” written by Rika
Amelia, student’s registration number 105014000357 was examined in the
examination session on June 25th, 2010. The “skripsi” has been accepted and
declared to have fulfilled one of the requirements for the Degree of S.Pd.
(Bachelor of Arts) in English Language Education at English Education
Department.
Jakarta, June 26th 2010
The Examination Committee
Chairman : Drs. Syauki, M.Pd. ………………………
NIP. 19641212 199103 1 002
Secretary : Neneng Sunengsih, M.Pd. ………………………
NIP. 19730625 199903 200 1
Examiner I :Drs. Nasrun Mahmud, M.Pd ………………………
NIP. 150 041 070
Examiner II :Dr. Zaenal Arifin Toy, M.Sc ………………………
NIP. 150 031 215
Acknowledged by
Dean Faculty of Tarbiyah and Teachers’ Training
Prof. Dr. Dede Rosyada, MA
NIP. 19571005 198703 1 003
ii
ABSTRACT
RIKA AMELIA. “An Analysis of the English Summative Test Items in terms of Difficulty Level for the Second Year Students of MTs Darul Ma’arif Jakarta”. A Paper, Study Program of English Education, Faculty of Tarbiya and Teachers’ Training, ‘Syarif Hidayatullah’ State Islamic University Jakarta, 2010. The research is purposed to measure the difficulty level of the English summative test items by calculation the students’ correct response from the upper and lower group with J.B Heaton’s formula referred from his book “Writing English Language Tests”. The result of this research is interpreted by the Suharsimi Arikunto’s criteria of items referred from his book “Dasar–dasar Evaluasi Pendidikan” that there are 20 items regarded as difficult item because they are at difficult level, ranges from 0.01 up to 0.30. Twenty one items regarded as good items because they are at moderate level, ranges from 0.31 up to 0.70. And there are 9 items regarded as easy items because they are at easy level, it ranges from 0.71 up to 1.00. From this information, it can be counted the difficulty level of all items by dividing the total of difficulty level of the items with the total number of students is 0.45. So, it can be said that the English summative test items for the second year students of Mts. Darul Ma’arif qualified as a good test seen from the difficulty level of all item which is at moderate level, because it ranges from 0.30 up to 0.70. Key terms : Test - Item Analysis – Difficulty Level
iii
ABSTRAK
RIKA AMELIA. “An Analysis of the English Summative Test Items in terms of Difficulty Level for the Second Year Students of MTs Darul Ma’arif Jakarta”. Skripsi, Jurusan Pendidikan Bahasa Inggris, Fakultas Ilmu Tarbiyah dan Keguruan, Universitas Islam Negeri Syarif Hidayatullah Jakarta, 2010. Penelitian ini bertujuan untuk mengukur tingkat kesulitan soal dari tes sumatif bahasa inggris kelas VIII (delapan) Mts. Darul Ma’arif Jakarta dengan cara mengkalkulasikan respon jawaban yang benar dari kelompok upper dan lower dengan rumus hitungan dari J.B Heaton, dengan bukunya “Writing English Language Test” . Hasil dari penelitian ini diinterpretasikan dengan kriteria butir soal dari Suharsimi Arikunto, dengan bukunya “Dasar-dasar evaluasi pendidikan”, bahwa 20 butir soal merupakan soal yang sulit karena mereka berada pada level sulit (difficult) yaitu antara 0,00 sampai 0,30. 21 butir soal merupakan soal yang sedang karena berada pada level moderate yaitu antara 0,31 sampai dengan 0,70. dan 9 butir soal merupakan soal yang mudah karena berada pada level mudah yaitu antara 0,71 sampai dengan 1,00. Dari informasi tersebut dapat dihitung tingkat kesukaran dari seluruh butir soal dengan cara membagi total keseluruhan nilai tingkat kesukaran tiap butir soal dengan seluruh jumlah siswa, maka diperoleh nilai 0,45. Dengan demikian, soal tes bahasa inggris tersebut dikatakan baik dilihat dari dilihat dari nilai tingkat kesukaran soal yang berada diantara 0,30 sampai dengan 0,70.
Kata kunci : Tes-Analisis butir soal-Tingkat kesukaran soal
iv
ACKNOWLEDGEMENT
Bismillahirahmanirrahim,
In the name of Allah, the most beneficent and the most merciful. All praise
be to Allah SWT lord of the universe, peace and blessing be upon the prophet
Muhammad SAW, his family, his companions and all of his follows:
In finishing this paper the writer gets much valuable help from many
people who are too numerous to be mentioned, but in particular, the writer very
much grateful to:
1. The writer’s family, especially her beloved mother Misnawati and her beloved
father Ali Amran and sisters (Ranti Novitasari Royani Afriyani, Rahmi Sri
Wahyuni, Rani Asmawati and Ridhatulfahmi), her brothers (K’ Yan and K’
Rari), and her nieces (Wafa azzahhiyah, Abdurrahma Faiz and Aisha hilma
Abiya) who had prayed and supported for the writer.
2. Dr. M. M. Farkhan, M.Pd, as the writer advisors who have guided the writer
during the process of writing this paper
3. The lectures of department of English education faculty of tarbiyah and
teachers’ training Syarif Hidayatullah Jakarta who have given the knowledge
which is very useful for the writer.
4. The chairman of English Education Department, Syauki, M.Pd. and his
secretary, Neneng Sunengsih, M.Pd.
5. Prof. Dr. Dede Rosyada, MA. as the dean of Faculty of Tarbiyah and
Teachers’ Training of English Department
6. The headmaster of MTs Darul Ma’arif H. Antung Abdullah and the English
teacher Mrs. Ida S.Pd, who allowed the writer to do the research
7. The staffs of all libraries; the main library of State Islamic University ‘Syarif
Hidayatullah’, the Faculty of Tarbiyah and Teachers Training’s library, British
Counsil Library, Balai Pustaka, Aminef Library, the Catholic University of
v
Atmajaya’s library and PKBB Atmajaya. Thanks for providing the sources to
fulfill the refereces of the writing.
8. My inspired friends, Nadiyah, Irka, Sri Rizki, Reni, Ucha, Yuli, Cyifa, Ida and
Anita, thanks for your kindness to share ideas and time to accompany the
writer in finishing this “skripsi”, and to all PBI B friends and English
Department 2005 friends for their cheerfulness, support and prayer.
9. All people who have given their help in writing this paper that the writer could
not mention one by one. May Allah bless you all.
Jakarta, May 17th 2010
The writer
vi
TABLE OF CONTENTS
COVER PAGE
APPROVEMENT SHEET ................................................................................. i
ENDORSEMENT SHEET................................................................................... ii
STATEMENT SHEET ........................................................................................ iii
ABSTRACT........................................................................................................... iv
ACKNOWLEDGEMENT ................................................................................... vi
TABLE OF CONTENTS ..................................................................................... viii
LIST OF TABLES................................................................................................ .x
CHAPTER I : INTRODUCTION
A. The Background of the Study.……………………. 1
B. The Limitation of the Study.……………………… 4
C. The Formulation of the Problem………………….. 4
D. The Significance of the Study……………………... 4
E. The Organization of the Paper…………………….. 5
CHAPTER II : THEORETICAL FRAMEWORK
A. Evaluation…….…….……………........................... 6
B. Test………………………….................................. 7
C. Type of Tests……………………………………… 8
D. The Characteristics of a Good Test………………. 17
1. Validity ………………………………………. 17
2. Reliability.…………………………………..... 20
3. Practicality…………………………………...... 20
E. Item Analysis.……………….…………………...... 21
1. Difficulty Level ……………………………..... 23
vii
CHAPTER III : RESEARCH METHODOLOGY
A. The Objectives of the Research………………... 28
B. The Method of Study………...………………... 28
C. Time and Place…….………………………....... 28
D. The Respondents……..…………………........... 28
E. The Procedure of the Research……………..…. 29
CHAPTER IV : RESEARCH FINDINGS
A. The Data Description..……............………............. 30
B. The Data Analysis…………...................…………. 31
CHAPTER V : CONCLUSION AND SUGGESTION
A. Conclusion.......……...…………........................... 37
B. Suggestion..............……………........................... 37
BIBLIOGRAFI........................................................................................ 39
APPENDICES
viii
LIST OF TABLES
Table 4.1 : The Students’ Group Position…………………………........... 30
Table 4.2 : The Category of FV of the English summative test items......... 34
ix
LIST OF CHARTS
Chart 4.1 : The result of difficulty level of each item………………… 34
Chart 4.2 : Pie-chart of the difficulty level percentage ………………. 35
x
xi
LIST OF APPENDICES
Appendix 1 : Tabulation of the students’ correct answer from upper group
Appendix 2 : Tabulation of students’ correct answer from lower group
Appendix 3 : Table of the result of difficulty level of the items
Appendix 4 : The procedure of the research
Appendix 5 : The English Summative Test Paper
OUT LINE
CHAPTER I INTRODUCTION
A. Background of Study
B. Significance of the Study
C. Limitation of Problem
D. Formulation of Problem
E. Research Methodology
F. Organization of Writing
CHAPTER II THEORETICAL FRAMEWORK
A. Evaluation B. The definition of test C. Testing roole D. Types of test
a. Function 1. The placement test 2. The diagnostic test 3. The achievement test 4. The proficiency test
b. Way of scoring 1. Objective test 2. Subjective test
E. The Characteristic of a Good Test 1. Validity 2. Reliability 3. Practically
F. Item Analysis 1. Level of difficulty 2. Discriminating power 3. Distracter
G. The importance of item analysis
CHAPTER III PROFILE OF SCHOOL
A. History of School
B. Vision and Mision of School
C. Facilities of School
D. Organization Structure of School
E. Teachers, Staffs, and Students
CHAPTER IV RESEARCH FINDINGS
A. Population and Sample
B. Time of Research
C. The Data Description
D. The Data Analysis
E. The Data Interpretation
CHAPTER V CONCLUSION AND SUGGESTION
A. Conclusion
B. Suggestion
CHAPTER I
INTRODUCTION
A. Background of Study
Evaluation is an important part of every teaching and learning experiences.
It gives big contribution for the teaching and it provides an information about
the students’ progress which can be used by the teachers to manage the
learning task and students. As stated by Pauline Rea- Dicksin and Kevin
Germain; “Evaluation is important for the teacher because it provides a wealth
of information to use for the future direction of classroom practice, for the
planning of courses and for the management of learning tasks and students.”1
Evaluation also can be said as the process to make desirable decision toward
teaching and learning based on the information that has been collected,
synthesized, and reflected on. Lyle F. Bachman states “Evaluation can be
defined as the systematic gathering of information for the purpose of making
decision”.2
Depending upon the decision being made and the information a teacher
needs in order to inform that decision, testing often contribute to the process
as the implementation of evaluation. Indeed, a test is one kind of evaluation
instrument to collect data. “A test is defined as a systematic procedure for
observing and describing one or more characteristics of a person with the aid
of either a numerical scale or category system”.3 In other word, a test
measures a person’s ability or knowledge with a number of tasks or questions.
According to Henning “. . . tests in general is to pinpoint strengths and
1 Pauline Rea and Kevin Germain, Evaluation, (New York: Oxford University Press, 1992), p. 3 2 Lyle F. Bachman, Fundamental Considerations in Language Testing, (Oxford; oxford
University Press, 1990), p. 22 3 Anthony J . Nitko, Educational Test and Measurement, An Introduction, (New York:
Harcourt Brace Javanovich, Inc, 1983), p.6
1
2
weakness in the learned abilities of students”.4 Teachers need to do the test
because through the test they are able to find out the students’ achievement in
mastering the lessons that have been taught and to evaluate the effectiveness
of the method used and the teaching material. Rebecca M. Valette states,
“…through tests the teacher can evaluate the effectiveness of a new teaching
method, of a different approach to a difficult pattern, or of a new materials”.5
To measure the students’ learning progress at school, a teacher
commonly administers two kinds of test; formative test and summative test.
The former test is held earlier than latter test which is held at the end of
semester. Through both tests, a teacher can measure the students’ achievement
level and the degree of how far the instructional objectives of learning be
accomplished by them. For this reason, Gronlund states that;
“Formative test is used to monitor learning progress during instruction. Its purpose to provide continuous feedback to both pupil and teacher concerning learning successes and failures ………..And summative test typically comes at the end of a course of instruction. It is designed to determine the extent to which the instructional objectives have been achieved and is used primarily for assigning course grades or for certifying pupil mastery of the intended learning outcomes”.6
For getting accurate measures a test must have a good quality, because
a good test doesn’t only influence the students learning, but also influences the
teachers to improve teaching and learning process. JB. Heaton supports that
“Test may be constructed primary as device to reinforce learning and to
motivate the students’ performance in language”7. In addition, Lyle F.
Bachman states also that “Test are often used for pedagogical purposes, either
4 Grant Henning, A Guide to Language Testing, (U.S.A: Newbury House Publishers, 1987), p. 1 5 Rebecca M. Valette, Modern Language Testing, (U.S.A; Harcourt Brace Javanovich,
1977), p. 5 6 Norman E. Gronlund, Measurement and Evaluation in Teaching 4th edition,
(Macmillan; Publishing Company, 1976), p. 18 7 JB. Heaton, Writing English Language Test, (New Delhi; Tata Mc. Graw-Hill
Publishing Company, 1998), p.13
3
as a means of motivating students to study or as means of reviewing material
taught”.8
As the accuracy of a test result influences the motivation of students
learning, so the test administered must reflect a good test. A good test is a test
which has the criteria of validity, reliability, and practically. Beside that, it
must has discriminating power and difficulty level.9 A test can be valid if the
test can measure what is supposed to measure. It can be reliable if the result
of the test is the same even though the test administered to the same level
students in the next time. And it can be practical if it is easy to do and
administer.
The matter, which is often forgotten by the teacher is the follow up of
the test implementation pertaining to the test item it self. In fact, they do not
criticize whether or not all items have fulfilled the criteria above. Therefore, it
really required an analysis of the test items, that is namely “item analysis”.
Through analyzing test item teacher can identify good item and the poor item
and to differentiate between student who have done well and poorly.
According to J. Stanley Ahmann and Marvin D. Glock, the purpose of doing
item analysis is:
“Re-examining each test item to discover its strengths and flaws is known as item analysis. Item analysis usually concentrates on two vital features; level of difficulty and discriminating power. The former means the percentage of pupils who answer correctly each item; the latter the ability of the test item to differentiate between pupils who have done well and those who have done poorly”.10
8 Lyle F. Bachman, Fundamental Consideration in Language Testing, (Toronto; Oxford
University Press, 1990), p. 22 9 JB. Heaton, Writing English Language Test, (New Delhi; Tata Mc. Graw-Hill
Publishing Company, 1998), p. 152-156 10 J. Stanley Ahmann and Marvin D. Glock, Evaluating Pupil Growth, Principles of Tests
and Measurements, (Boston: Allyn and Bason, INC, 1967), p. 184
4
In addition Ngalim Purwanto states; “Tujuan Khusus dari item analisis adalah
mencari soal tes mana yang baik dan mana yang tidak baik, dan mengapa item
atau soal itu dikatakan baik dan tidak baik.”11
The latest English summative test at MTs.Darul Ma’arif was held on
June 19, 2009. According to pre-survey result during teaching practice at Mts.
Darul Ma’arif , the writer was informed that in the occasion of second
semester, the English teacher has never analyzed the test items, so that is
difficult to say whether it is a good test or not. In addition, the test results
show that the scores of the students’ are bad.
Considering this fact, the writer is interested in making item
analysis through the items of English summative test at MTs. Darul Ma’arif
Jakarta, in the second term 2008/2009 academic year.
B. Limitation of the Problem
The writer limits the study of item analysis of the English summative
test which is administered for the second year of MTs. Darul Maa’rif Jakarta
2008/2009 academic year on the aspect of difficulty level or facility value.
C. Formulation of Problem
Based on the background of study described, the writer would like to
seek the answer the following problem; “Does the English summative test
items for the second year students of MTs. Darul Ma’arif Jakarta have a good
quality in terms of difficulty level?”
11Drs. Ngalim Purwanto, Prinsip-prinsip dan Tehnik Evaluasi Pengajaran, (Bandung;
Remaja Rosda Karya, 1991), p. 118
5
D. Significance of the Study
Firstly, it provides with the feedback to the writer especially, and the
English teacher of how to analyze the test items in terms of difficulty level.
Secondly, it informs the English teacher about the quality of test
items in terms of difficulty level. Through this research, the English teacher
can know the good items for the future used and the students’ achievement in
mastering the materials taught in order to evaluate the teacher’s competence in
teaching.
E. Organization of Writing
In discussing the topic, the writer divides this study into five chapters,
as follow
Chapter one is introduction, involving background of study,
significance of study, limitation of problem, formulation of problem,
significance of study and organization of writing.
Chapter two is theoretical framework which discusses about
evaluation, the test and its types, the criteria of a good test and item analysis
Chapter three discusses is research methodology which is include the
objective of research, the method of study, the time and place, the population
and sample, the instrument and the procedure of the research.
Chapter four presents the research findings which consist of the data
description and the data analysis.
Chapter five is devoted to the conclusion of what has been discussed
and analyzed in the chapters before, and also the writer’s suggestion through
the research.
CHAPTER II
THEORETICAL FRAMEWORK
A. The Definition of Evaluation
Evaluation is important for every process of anything that has done,
because through evaluation we can find out the weakness which should be
revised and the strengths which should be improved, so does in the teaching
learning process evaluation plays important role to contribute and provide
some information for making judgments about what is good or desirable as in
order to improve the students’ knowledge in learning and the teacher’s
competence in teaching,. It is likely what Peter W. Airasian defines:
“Evaluation is the process of judging the quality or value of a performance or
a course of action”.1Still in the same sense Lyle F. Bachman states
“Evaluation can be defined as the systematic gathering of information for the
purpose of making decision”.2And evaluation includes, “the making
judgments about the value, for some purpose, of ideas works, solutions,
methods, materials, etc”.3 Hence, Benjamin S. Bloom,et.al states that
“Evaluation is a system of quality control in which It may be determined at
each step in the teaching-learning process whether the process is effective or
not, and if not what changes must be made to its effectiveness before it is too
late”.4
Basically, the purpose of evaluation is to judge the worth of program
or procedure, usually in terms of how well it has achieved its objectives and
1 Peter W. Airasian, Classroom Assesment; Concepts and Applications, (1221 Avenue of the Americas, New York, NY 10020; McGraw-Hill, 2005}, 5th edition, p. 9
2 Lyle F. Bachman, Fundamental Confiderations..., p. 22 3 Julian C. Stanley, Measurement In Todays’ School, (Englewood Cliffs; Prentice-Hill,
Inc, 1964), p. 16 4 Benjamin S.Bloom, Handbook on Formative and Summative of Students Learning,
(London; Longman, 1971), p. 8
6
7
for this purpose all appropriate techniques of gathering evidence may be
used.5 “Evaluation goes beyond the statement of how much to concern it self
with the question what value. It seeks to answer the pupil’s and teacher
question of what progress am I making???.6 Richard I. Arends states that “
An important purpose of testing and evaluation is to provide students with
feedback on how they are doing”.7
Finally, considering all those opinions above about evaluation, the
writer can summarize that evaluation is a systematic process to provide
available information in order to make judgment and desirable decision of
how to measure whether the objective is suitable or in line of the curriculum
used, and to find out the students’ improvement in teaching learning process
and the teacher competences in teaching, and also the classroom climate.
B. The Definition of Test
When people hear the word assessment and evaluation, they often
think right a way of tests because a test is one of the instruments of evaluation
for collecting the data. A test is a formal, systematic, usually paper-and-pencil
procedure for gathering information about pupil’s performance.8 While paper-
and-pencil tests are one important tool for gathering assessment information.
A test is composed of a number of tasks or questions for students to
respond. By analyzing the responses, the teacher can measure the student’s
achievement in the teaching learning process. While Lyle F. Bachman states
that; “A test is a procedure designed to elicit certain behavior from which one
can make inferences about certain characteristics of an individual”.9 While
5 Victor H. Noll, Introduction to Educational Measurement, (Boston; Houghton Mifflin
Company, 1965), 2nd edition, p.14 6 H.H.Remmers, N.L.Gage, J.Francis Rummel, A Practical Introduction to Measurement
and Evaluation, (USA; Harper and Brother Publishers, 1960), p. 7 7 Richard I. Arends, Learning To Teach, (New York, Mc.GrawHill International Edtion,
1989), p. 312 8 Peter W. Airasian, Classroom Assesment..., p. 9 9 Lyle F. Bachman, Fundamental Consideratin..., p. 20
8
Wilmar states that; “A test is a set of questions, each of which has a correct
answer, that examinees usually answer orally or in writing”.10
From those views of test, it can be concluded that a test can be
instrument, techniques, or procedures to have the students’ respond through
tasks or performance in the form of set of questions must be answered in order
to achieve the teaching-learning objectives. In short, a test is a measurement
instrument designed to assess a specific sample of individuals’ behavior.
Test is also a way to deliver information, which is very useful for
many practitioners of education. “A test is a formal systematic procedure for
gathering information”.11 Therefore, test a device of educational is necessary
in a teaching process, since testing and teaching can not be separated. Heaton
states that ”both testing and teaching are so closely interrelated that is virtually
impossible to work in either field without being constantly concerned with the
other”.12The reason of that interrelation and connection between testing and
teaching is the material tested, must be based on the material taught in order to
find out how far the students comprehension.
C. Type of Tests
There are many kinds of tests used to measure students’ achievement
that can be used in an evaluation process. The type of test can be classified
into two types, namely; function and way of scoring.
1. Function
According to Andrew Harrison, the types of functional test can be
categorized into four types: placement test, diagnostic test, achievement
test, and proficiency test.
10 Wilmar Tinambuan, Evaluation of Students Achievement, (Jakarta; Depdikbud, 1988)
p. 310 11 Julian C. Stanley, Measurement in Today’s..., p.3 12 J.B. Heaton, Writing English..., p.1
9
a. The Placement test
Placement test is used to place a student to appropriate level or
section of a language curriculum or school. It usually happens in the
beginning of course. According to Wilmar Tinambuan;
A placement test is designed to determine pupil performance at the beginning of instruction. Thus, it is designed to sort new students into teaching groups, so that they can start a course at approximately the same level as the other students in the class. It is concerned with the student’s present standing, and so relates to general ability rather than specific points of learning. As a rule the result are needed quickly so that the teaching may begin.13
b. The Diagnostic Test
Diagnostic test is designed to diagnose a particular aspect of a
language. “Diagnostic tests are also achievement test, but they are
characterized by one distinctive feature, namely that they are designed
to show specific weakness and strengths within the skills or elements
measured”.14
It can also be used to check the students’ progress in learning
particular elements of the course. It is used for example at the end of a
unit in the course book or after lesson designed to teach one particular
point.15 “A diagnostic test is designed to determine the degree to which
the specific instructional objectives of the course have been
accomplished”.16 And J.B Heaton states that; “Diagnostic test is
widely used, few tests are constructed solely as diagnostic tests. Note
that diagnostic testing is frequently carried out of groups of students
rather for individuals”.17
13 Wilmar Tinambuan, Evaluation of Students..., p. 7 14 Robert Lado, Language Testing, (Hongkong; Wing Tai Cheung Printing Co Ltd, 1961),
p. 369 15 Andrew Harrison, A Language Testing Handbook, (London; Macmillan Press, 1983),
p.6 16 James Dean Brown, Testing in Language Program, (New Jersey; Prentice Hall
Regents, 1996), p. 15 17 J.B. Heaton, Writing English... , p.173
10
Thus, diagnostic test is much comprehensive and detailed
because it searches for the underlying causes of learning difficulties
and then formulates a plan for remedial action.
c. Achievement Test
These tests are used to know what students have actually learnt
or on what have actually been taught. “Achievement tests are designed
to measure relative accomplishment in specified areas of work”.18 The
purpose of achievement test as its name reflect is to establish how
successful individual students, groups of students, or the courses
themselves have been in achieving objectives.19 In another point of
view Wilmar says that “the degree purpose of achievement test is
designed to indicate degree of students’ success in some past learning
activities”.20 And also “Achievement tests relate to the past in that they
measure, what language the students have learned as a result of
teaching”.21
Based on the argumentation above about achievement test, the
writer can conclude that the achievement test are intended to measure
how effectively students have mastered the lesson and how far they
have reached the instructional objectives. Thus, an achievement test
must be designed with very specific reference to a particular course.
This link with a specific program usually means that the achievement
tests will be directly based on the course objectives and will therefore
be criterion referenced. Such tests will typically be administered at the
end of a course to determine how effectively students have mastered
the instructional objectives.
At the implementation level, the achievement test appears in
two purposeful tests, they are formative test and summative test.
18 H.H. Remers, NL. Gage, J. Fraancis Rummel, A Practical Introduction..., p. 19 19 Arthur Hughes, Testing for Language Teachers, (Cambridge; Cambride University
Press, 2003), p. 13 20 Wilmar Tinambunan, Evaluation of Students..., p. 19 21 Tim Mc Namara, Language Testing, (Hong Kong: Oxford university Press, 2000), p. 7
11
1) Formative test
Formative test is administered by the teacher during the
learning progress with the aim of using the result to improve
instruction and to provide continuous feedback to both students and
teacher. Rebecca M. Valette states “The formative test is given
during the course of instruction; its purpose is to show which
aspects of the chapter the student has mastered and where remedial
work is necessary”.22 Hence, formative test is part of the
instructional process. When incorporated into classroom practice, it
provides the information needed to adjust teaching and learning
while they are happening. In this sense, formative test informs both
teachers and students about student understanding at a point when
timely adjustments can be made. These adjustments help to ensure
students achieve, targeted standards-based learning goals within a
set time frame.23
2) Summative test
Summative test is a test that usually administered at the end of
the course. Rebecca M. Valette states ”the summative test, on the
other hand, is usually gives at the end of a marking period and
measures the “sum” total of the material covered. On this type of a
test, students are usually ranked and graded”. Moreover,
summative test is given periodically to determine at a particular
point in time what students know and do not know. Summative test
at the district/classroom level is an accountability measure that is
generally used as part of the grading process. Arthur Hughes states
that”the content of summative test should be based directly on a
detailed course syllabus or on the books and other material used”.24
22 Rebecca M. Valette, Modern Language..., p.6 23 http://www.nmsa.org/Publications/WebExclusive/Assessment/tabid/1120/Default.aspx 24 Arthur Hughes, Testing for Language…, p. 11
12
Finally, the writer can conclude that summative test is a test that
usually administered at the end of a course of study.
d. Proficiency Test
The proficiency test is also used to measure what students have
learned, but the aim of the proficiency test is to determine whether this
language ability corresponds to specific language requirements”.25
According to J.B. Heaton that “the proficiency test is concerned
simply with measuring a student’s control of the language in the light
of what he or she will be expected to do with it in the future
performance of a particular task “.26 And also James Dean Brown
states that: “A proficiency-test assess the general knowledge or skill
commonly required or prerequisite to entry into (or exemption from) a
group of similar institution”.27
Then, it should never be undertaken lightly. Instead, these
decisions must be based on the best obtainable proficient test scores as
well as other information about the student. The content of proficiency
test therefore, is not based on the content of objective of language
courses that people taking the test may have followed. Rather, it based
on a specification of what candidates may have to be able to do in
language, in order to be considered proficient”.28
25 Rebecca M. Valette, Modern Language..., p.6 26 J.B. Heataon, Writing English... , p.172 27 James Dean Brown, Testing In Language..., p.10 28 Arthur Hughes, Testing For Language..., p. 9
13
2. Way of Scoring.
Based on the manner of scoring, the type of test item is divided
into two general types: objective and subjective test. J.B. Heaton states
that “Subjective and objective test are terms used to refer to the scoring of
tests”.29
a. Objective test
An objective test item is any test item that there is only a single
correct answer. In this test, the students must select one option from
some alternatives. According to Valette; “An objective test item is any
item for which there is a single predictable correct answer”.30
Hence, this item type referred as objective test item, because
they can be scored objectively. That is, equally competent scorers can
score them independently and obtain the same result. Therefore,
whether the item is scored by one teacher or another, today or last
week, it will yield the same score. That is, the advantages of the
objective test items are objective scoring, that is quick, easy and
consistent.
The objective test item commonly used in classroom testing are
true-false, multiple-choice, matching, and short answers. “These test
item include all of the selection-type items-multiple choice, true false,
and matching.”31
1) True-False
True-false is simply a declarative statement which the
students must judge as true or false. As what J. Stanley explained
that “true-false item is referred to alternative response item; the
29 J.B. Heaton, Writing English..., p. 25 30 Rebecca M. Valette, Modern Language..., p.6 31 Norman E. Gronlund, Constructing Achievement Tests, (New Jersey: Prentice-Hall.,
Inc., 1968), p. 25
14
item asks the students to answer with the “true” if it conforms to
the truth or “false” if it essentially incorrect.32
Thus, the item provides the students with a choice of two
alternatives, so the students have possibility to guess the answer
and sometimes it will be the right answer. In other word, students
indicate whether a statement is true or false.
Example:
T F True-False items classified as supply-type item
2) Multiple-choice item
The multiple-choice item consists of a stem, which presents
a problem situation, and several alternatives, which provide
possible solutions to the problem. The stem may be a question or
an incomplete statement. The alternatives include the correct
answer and several plausible wrong answers, called distracters.
Their function is to distract those students who are uncertain of the
answer. “A multiple-choice item consists of one or more
introductory sentences followed by a list of two or more suggested
responses from which the examinee chooses one as the correct
answer”.33
Example:
In objective testing, the term objective refers to the method of … a. identifying the learning outcomes b. selecting the test content c. presenting the problem d. scoring the answers
3) Matching
The matching test item consists of two parallel columns
with each word. Number of symbol in one column is being
matched to a word, sentence or phrase in other column. This type
32 J. Stanley Ahman and Marvin D. Glock, Evaluating Pupil Growth..., p. 17 33 Anthony J. Nitko. Educational Test..., p. 190
15
of item is employed widely in situation where relationship of more
or less similar ideas, facts and principles are to be examined or
judged. In this type, students indicate relationship between a set of
premises and a set of responses.
Example: 1. The …. drives a car a. doctor
2. The …. checks the patience b. driver
This kind of test is an effective way to student’s recognition
of the relationships between words, definitions, events, dates,
categories, examples, and so on.
b. Subjective Test item
Subjective test is a test where in its scoring requires judgment
and evaluation of scores. While Vallette states that “Subjective item is
one that does not have a single right answer”.34 It means that the
scoring is inconsistent and the answer of the question is in form of
composition where the students are given a chance to relate their idea
or argument in their own words. In other word, the answer is
commonly in a form of composition or statement. “Subjective tests,
like translation and essay, have the advantage of measuring language
skill naturally, almost the way English used in a real life”.35
The subjective tests that are commonly used in classroom are
completion, short-answer, and essay item.
1) Completion
The completion item is a written statement that requires the
examinee to supply the correct word or short phrase in response to
an incomplete sentence, a question or a word association.
34 Rebecca M. Valette. Modern Language..., .p. 10 35 Harold S. Madsen, Technique In Testing, (Oxford; Oxford University Press, 1983) p.8
16
Completion test can be used effectively to measure the recall of
terms, dates, and names.36
The completion item and short answer item are both supply
type test items, but in the short answer type, the blank is nearly
always at the end, whereas in the completion, type of the blank
may occur everywhere in the statement. 37
2) Short- answer Item
The short answer item consists of a question, which can be
answered with a word or short phrase.38 A student provides a short
response to a direct question or direction.
Generally, teachers prefer to use the short answer type
question, probably because they think it has some advantages. It is
relatively easy to construct, it also gives the teacher some
opportunity to see how well students can express their thought and
it is also not difficult to score or mark than the essay question.39
However, it is difficult to phrase the short answer question, so that
only one answer is correct. And this type of question will be more
useful only in testing knowledge of facts and quite specific
information.
3) Essay test.
The most notable characteristic of the essay test is freedom
of response it provides. The student is asked a question which
requires him to produce his own answer. He is relatively free to
decide how to approach the problem, what factual information to
use, how to organize his reply, and what degree of emphasis to
give each aspect of the answer. Thus, the essay question places a
36 Wilmar Tinambuan, Evaluation of Students..., p. 61 37 Victor H. Noll, Introduction to Educational..., p. 140 38 Victor H. Nol, Introduction to Educational..., p. 138 39 Victor H. Nol, Introduction to Educational..., p. 138
17
premium on the ability to produce, integrate, and express the ideas.
As what Norman E Gronlund states that;
“Essay tests are inefficient for measuring knowledge outcomes . . . but they provide a freedom of response which is needed for measuring certain complex outcomes . . . . These include the ability to create . . . . to organize . . . . to integrate . . . . to express . . . and similar behaviors that call for the production and synthesis of ideas”.40
Finally, from the explanation above about both objective test and
subjective test concerned on the essay test, the writer conclude that for the
measurement of most knowledge outcomes we would use objective test items
to take advantage of their more extensive sampling and greater reliability. For
the measurement of such complex learning outcomes as the ability to create,
organize, and evaluate ideas, however, the teacher would use essay questions
despite their limitation.
Of the types of test item above, the writer will concern only with the
multiple choice test item in English summative test for the second year
students of Mts. Darul Ma’arif, administered at the end of the second semester
2008/209 academic year.
D. Criteria Of A Good Test
There are many considerations entering into the evaluation of a test,
which referred as a good test because a good test can provide available
information for a good evaluation in order to measure student’s
comprehension of the instructional objectives, but the writer consider them
under three main headings;. These are respectively validity, reliability, and
practically. Validity refers to the extent to which a test measures what we
actually wish to measure. According to Brown “Validity is the degree to
which the test actually measures what is intended to measure…..Reliability is
40 Norman E. Gronlund, op Constructing Achievement..., p. 65
18
consistent and dependable…….And practically is means of financial
limitations, time constraints, ease of administration, and scoring and
interpretation”.41
1. Validity
The single most important characteristic of a good test is its ability
to help the teacher make a correct decision of what is intended to measure.
This characteristic is called validity. “Validity is concerned with whether
the information being gathered is relevant to the decision that needs to be
made”.42
A test has validity if it measures appropriately, what it is supposed
to measure. According to Heaton: “The validity of a test is the extent to
which it measures what is to measure and nothing else”.43 Finnochiaro and
Sako also state : “A test is valid when it measures effectively what it is
intended to measure”.44 Still in the same sense, Wilmar states that “The
validity of a test is the extent to which the test measures what is intended
to measure”.45Also, Norman E. Gronlund states that “test scores are valid
to the extent to which they serve the use for which they are intended”.46
While J. Staley Ahmann and Marvin D. Glock point out “In educational
measurement, validity is often defined as the degree to which a measuring
actually serves the purposes for which it is intended”.47
Based on the definition, the writer can conclude that validity of test
is important to know whether a test has a good quality in testing
someone’s capacity.
41 H. Douglas Brown, Teaching by Principles An Interactive Approach to Language Pedagogy, (San Fransisco: Longman, 2nd edition), p. 386-387
42Peter W. Airasian, Classroom Assesment..., p. 16 43 J.B Heaton. Writing English... , p. 159 44 Mary Finocchiaro and Sydney Sako, Foreign Language Testing a Practical Approach,
(New York: Regent publishing company, 1983), p. 24 45 Wilmar Tinambunan, Evaluation of student..., p. 11 46 Norman E. Gronlund, Constructing Achievement..., p. 105 47 J. Stanley Ahamnn and Marvin D Glock, , Evaluating Pupil Growth..., p. 285
19
As the validity is one of the most important characteristic of test
scores, the constructor of the test should know the various aspects from the
validity itself and various procedures by which they are determined.
“The two most important characteristics of test scores are validity and reliability…Anyone working with tests-whether constructing them or using published tests-should understand the meaning of these concepts…and should know the various procedures by which they are determined”.48 According to Heaton, a validity of a test can be seen from some
aspects mentioned below.
a. Face validity
A test has face validity if the test has a good “face” or the way the
test looks. According to Heaton: “if a test items looks right to other
testers, teachers, moderators, and testers, it can be described as having
at least face validity”.49 While Marry Finocchiario and Sydney Sako
define it is “A judgment about a test based on the way the test looks to
educators, students, and the general public. The test should not only
‘be right’ it also ‘look right”.50
b. Content Validity
A test has content validity if the test contains materials that the
student has been taught. To fulfill this, the teacher also should refer to
the instructional objectives of the teaching learning process.
Finocchiario and Sako state; “Content validity is assured by checking
all items in the test to make certain that they correspond to the
instructional objective of the course“.51Still in the same sense, Victor
H. Noll explaines “when a teacher gives a test which deals with the
48 Norman E. Gronlund, Constructing Achievement..., p. 105 49 J.B Heaton, Writing English..., p. 159 50 Marry Finochiario and Sydney Sako, Foreign Language..., p. 28 51 Marry Finnochiaro and Sydney Sako, Foreign Language..., p. 25
20
material and with the objectives of instruction in particular class, his
test is said to have curricular (content) validity”.52
c. Construct Validity
A test is said to have a construct validity if it can demonstrates
that it measures just the ability, which it is supposed to measure
.according to Heaton; “if a test has construct validity, it is capable of
measuring certain specific characteristics in accordance with a theory
of language behavior and learning”.53
d. Empirical Validity
A fourth type of validity is usually referred to as statistical or
empirical validity. This validity is obtained as a result of comparing
the result of the test with the result of some criterion measure.54
2. Reliability
The second criterion of a good test is reliability. Reliability has to
do with the accuracy and precision of a measurement procedure. Indices of
reliability give an indication of the extent to which a particular
measurement is consistent and reproducible.55 A test should be reliable as
a measuring instrument.
According to Finocchiario and Sako; the reliability or stability of a
language test is concerned with the degree to which it can be trusted to
produce the same result upon repeated administration to the same
individual, or to give consistent information about the value of a learning
variable being measured”.56While J. Stanley Ahmann and Marvin D.
Glock state that “Reliability means consistency of results. This is
equivalent to saying that a highly reliable instrument can be used
52 Victor H. Noll, Introduction to Educational..., p. 79 53 J.B. Heaton. Writing English..., p. 161 54 J.B. Heaton, Writing English..., p. 161 55 Robert L. Thorndike and Elizabeth Hagen, Measurement and Evaluation in Psychology
and Education, ( London; John Willey and Sons, Inc., 1961), p. 127 56 Marry Finnochiario and Sydney Sako, Foreign Language..., p. 28
21
repeatedly in an unchanging situation and produce constant or near
constant results.”57
Based on above statements a test is reliable if it consistently yields
the same or nearly the same ranks over repeated administrations.
3. Practicality
Practicality is concerned with a wide range of factors economy,
convenience and interpretability that determine whether a test is practical
for widespread use. “Practically is concerned with a wide range of factors
economy, convenience, and interpretability that determine whether a test is
practical for widespread use”.58
A test maybe a highly reliable and valid instrument but still is
beyond our means facilities. The teacher or someone who makes the test
should keep in mind a number of very practical considerations. There are
many factors of practicality; economy, scorability, and administrability.
According to Finnochiario and Sako state that “the criteria for
practicality normally will be based upon such factors as economy,
scorability, and administrability”. 59While, Harrison states that “tests
should be as economical as possible in time (preparation, sitting, and
marking) and in cost (material and hidden costs of time spent)”.60
In short, the criteria of a good test are validity, reliability and
practicality. However, besides those three criteria, a good test as whole is
also determined by the quality of each item that construct the set test. If the
quality of each item is good, it can give the strength and accuracy of the
scores get from the test. Then, the quality of each item individually can be
analyzed by doing item analysis. According to Robert Lado; “item analysis
is the study of validity, reliability, and difficulty of test item taken
57 J. Stanley Ahmann and Marvin D. Glock, , Evaluating Pupil..., p. 311 58 Robert L. Thorndike and Elizabeth Hagen, Measurement and Evaluation..., p. 127 59 Marry Finnochiario, Foreign Language Testing..., p. 30 60 Andrew Harrison, A Language Testing..., p. 13
22
individually as if they were separate tests”.61through this analysis, the
evaluator can get information about which item is good for the future used.
D. Item Analysis
After a test has been administered and scored it is usually desirable to
evaluate the effectiveness of the items. This is done by studying the students’
responses to each item. When formalized, the procedure is called item
analysis. Anthony J. Nitko states, “item analysis refers to the process of
collecting, summarizing, and using information about pupils’ responses to
items”.62
Meanwhile Harold S. Madsen explained that:
“The selection of appropriate language items is not enough by it self to ensure a good test. Each questions needs to function properly; otherwise, it can weaken the exam. Fortunately, there are some rather some simple statistical ways of checking individual item. This procedure is called ‘item analysis’.”63
An item analysis also is a systematic procedure which provides some
information about the quality of the test item, concerning each of the
following points:
1. The difficulty of the item
2. The discriminating power of the item
3. The effectiveness of each alternatives or distracters.
Thus, item analysis information can tell the evaluator or constructor if
an item was too easy or too hard, how well it discriminated between high and
low scorers on the test, and whether all of the alternatives functioned as
intended. According to Suharsimi Arikunto, “Analisis soal antara lain
bertujuan untuk membantu kita dalam mengidentifikasi butir-butir soal yang
jelek, memperoleh informasi yang akan digunakan untuk menyempurnakan
61 Robert Lado, Ph. D, Language..., p. 342 62 Anthony J. Nitko, Educational Test..., p. 342. 63 Harold S. Madsen, Technique In... , p. 180
23
soal-soal untuk kepentingan lebih lanjut, dan untuk memperoleh gambaran
secara selintas tentang keadaan yang kita susun”.64
Item analysis data also aids in detecting specific technical flaws and
thus further provides information for improving test items, as what J. Stanley
Ahmann and Marvin D. Glock state “item analysis is re-examining each test to
discover its strength and flaws”.65
Item analysis has several benefits. First, it provides useful information
for class discussion of test. Second, it provides data for helping the students
improve their learning. Third, it provides insights and skills which lead to the
preparation of better tests on future occasions.66
Finally, the writer concludes that item analysis is very important to do
in order to get information of the quality of the test item, whether it is good
item or poor item.
1. Difficulty Level of The Item
The difficulty level of item means the percentage of pupils who
answer correctly each test item. “The item difficulty is fraction of the
persons taking an item who answer it correctly”.67 Heaton states that “The
index of difficulty “(of facility value) of an item simply shows how easy or
difficult the particular item provide in the test. The index of difficulty
(facility value) is generally expressed as the fraction (percentage) of the
students who answered the item correctly”.68
A good test item should have a certain degree of difficulty. It may
not be too easy or too difficult because the test that is too easy or too
difficult will yield same score distribution that make it hard to identify
64 Suharsimi Arikunto, Dasar-dasar Evaluasi Pendidikan, (Jakarta; Bina Aksara, 1987),
p. 205 65 J. Stanley Ahmann and Marvin D. Glock, Evaluating Pupil Growth..., p. 184 66 Norman E. Gronlund, Constructing Achievement..., p. 85-86. 67 Anthony J. Nitko, Educational Test..., p. 288 68 J.B Heaton, Writing English..., p. 178
24
reliable differences in achievement between the pupils who have done well
and these who have done poorly. Suharsimi Arikunto says;
”Soal yang baik adalah soal yang tidak terlalu mudah atau tidak
terlalu sukar. Soal yang terllau mudah tidak merangsang siswa untuk
mempertinggi usaha siswaq untuk memecahkannya. Soal yang terlalu
sukar akan menyebabkan siswa menjadi putus asa dan tidak mempunyai
semangat untuk mencoba lagi karena diluar jangkauannya”.69
By analyzing the students’ response to the items, the level of
difficulty of each item can be known and the information will be helpful
for teacher in identifying concepts to re-teach the study material. In
addition, by analyzing the facility value, the teacher will know if the item
is easy, moderate, or difficult, M. Chobib Thoha states;
“item yang baik adalah item yang tingkat kesukarannya dapat diketahui
tidak terlalu sukar dan tidak terlalu mudah. Sebab tingkat kesukaran itu
memiliki korelasi dengan daya pembeda. Bilamana item memiliki tingkat
kesukaran maksimal, maka daya pembedanya akan rendah, demikia pula
bila item itu terlalu mudah juga tidak akan memiliki daya pembeda”.70
To measure the difficulty level of each item, the writer uses the
Heaton’s formula; the formula is like this:71
nLCorrectUCorrectFV
2+
=
Explanation:
FV : Facility value or item of difficulty that we are looking for
CU : Sum of the students from the upper group who answer correctly
CL : Sum of the students from the lower group who answer correctly
2n : Total number of the students from upper and lower group.
69 Suharsimi Arikunto, Dasar – dasar..., p. 207 70 M. Chobib Thoha, Teknik Evaluasi Pendidikan, (Jakarta; PT. Raja Gafindo Persada,
2003), p. 145 71 J.B. Heaton, Writing English..., p. 178
25
After calculating the difficulty level of each item, the writer calculates the
index of difficulty of all item by this formula;
P = ∑b
N
P : difficulty level of all items B : difficulty level of each items ∑ : Sigma (total) N : Total numbers of test items.
To know the criteria of the difficulty level of each item and all items,
the writer uses the measurement level referred to Suharsimi Arikunto’s
book.72 If the FV is:
Difficult : 0.00 – 0.30
Moderate : 0.31 – 0.70
Easy : 0.71 – 1.00
The level of facility value shows the easiness or difficultness of test
items for that group. So, the level of facility value is influenced by the
students’ competence. The result will be different if the test is given to
another group of learners or students.
E. The Importance of Item Analysis
An item analysis is very important for teachers in preparing better test
items and help teachers in the teaching-learning process. “Item analysis is an
important and necessary step in the preparation of good multiple-choice
tests”.73
72 Suharsimi Arikunto, Dasar – dasar..., p. 210 73 John. W. Oller, Language Tests at School. A Pragmatic Approach, (London: Longman,
1979), p. 245
26
‘For teacher made test, the following are among the important uses of
item analysis: determining whether an item functions as teacher intends,
feedback to the teacher about pupil difficulties, are for curriculum
improvement, revising the item and improving item writing skills”.74
1. Determining whether an item functions as teacher intends.
The item will function properly if the test item tested is able to
distinguish those who master the learning objectives from those who do
not. To differentiate between them, the test item should have certain level
of difficulty, discriminating power and the effectiveness of distracters.
Therefore item analysis should be done.
2. Feedback to students’ performance and as a basis for class discussion.
After knowing the students’ responds to the item, the students’
performance can be known and the students’ error can be corrected and the
test items that are felt difficult for most of them can be discussed in their
class.
3. Feedback to the teacher about pupils’ difficulties
The result of item analysis will be useful for teachers to know the
major types of pupils’ difficulties in learning. So they know the material
needs to be review in next learning.
4. Area for curriculum improvement.
By item analysis, it can be known what kind of items which are felt
difficult by students or certain errors occur often, may be the item is not
compatible to be taught in a school program. So curriculum may be needed
to be revised.
74 Anthony J. Nitko,Educational Test..., p. 284
CHAPTER III
RESEARCH METHODOLOGY
1. The Objective of The Research
The research is done to find out the difficulty level of the English
summative test items in the second year of Mts. Darul Ma’arif Jakarta in the
second term 2008/2009 academic year by calculation which is referred to J.B
Heaton’s book; “Writing English Language Test”.
2. The Method of Study
The method used in this study can be categorized into descriptive
analysis. This descriptive analysis is concerned with a quantitative analysis.
Quantitative is used in analyzing data of scores to detect the test items whether
it is good or not by using simple statistic tabulation.
3. The Time and Place
The research was held during teaching practice from March to June
2009 at MTs Darul Ma’arif which is located at Jl. Rs. Fatmawati No. 45
Cipete , South Jakarta .
4. The Respondents
The writer took the result of the English summative test of the second
grade at MTs. Darul Ma’arif Cipete South Jakarta, which consist of 50 English
multiple choice items. The respondents of this research are the second year
students of MTs. Darul Ma’arif Jakarta, which which consists of 36 students.
5. The Instrument of the Research
The research instrument is the English summative test paper for the second
year students of MTs., Darul Ma’arif Jakarta.
27
CHAPTER IV
RESEARCH FINDINGS
A. The Data Description
The English summative test consists of 50 multiple choice items. As
noted in the procedure of the research, the items are analyzed by arranging the
students’ correct answers of each item from the highest to the lowest score.
After correcting the students answer sheet, the writer listed the score of the
students from the highest score to the lowest score. The score given by the
writer is to make it easier to divide those students into three groups. The way
the writer scored is by multiplying the number of correct answer by two point
because there are 50 items in the test. The following tables show their scores
and their groups. Table 4.1
Group position of English summative test for 36 of the second year students of Mts. Darul Ma’arif Jakarta in the second term 2009/2010 academic year
No Name Score Group 1. Nabil F.Q 70 U 2. Kinanti Restian Putri 68 P 3. Hikmah 58 P 4. Fuad ismail 58 E 5. M. Bachruri. A.F 54 R 6. Aulia Ulfa 52 G 7. Lis saodah 52 R 8. Syifa fauziah 52 O 9. Khalidah khairurizky 50 U 10. M. Yusuf 48 P 11. Rahmat Febrianti 48 12. M. Chaidir Rafsanjani 48 13. Ismaul Husna 48 14. Indah Permata sari 46 15. Wahyu H 46 M 16. Nyai ratih K 46 I 17. Ahmad Akhirussa’ban 46 D
28
29
18. Rahmawati 44 D 19. Siti Istiannah 44 L 20. Layla 44 E 21. Erna Tihana 42 22. Lisa Umami 42 23. Ahmad Jazuli 42 24. Nurhasanah 42 25. Nu’mansyah 40 26. Fauziah 40 27. Qois Abdul Azis 40 L 28. Mustafa Hari Pratama 38 O 29. Uti Safitri 38 W 30. Junita R 38 E 31. Uswatun Hassanah 34 R 32. Ahmad Sofwat 34 G 33. Imam Buchori 32 R 34. Salman Alfarisi 32 O 35. Chairuddin 32 U 36. Deni Arsita 22 P
Table 1. lists the students from those who get the highest score to those
who got the lowest score. The score given by the writer is to make it easier to
divide those students into three groups; upper, middle and lower groups. 27%
is taken from the highest scores to be UPPER group, and 27% from the lowest
scores to be LOWER group, to do the analysis, the MIDDLE group will be a
side.
B. The Data Analysis
The English summative test consists of 50 multiple-choice items with
the four options. The first step to do the difficulty analysis is listing the
students’ responses of each number of the item test. The list can be seen in
appendices, labeled table.1 and table.2 for each group. So, it can be
concluded the distribution of the correct responses or answers from the upper
and lower group as follows:
30
1. The answers from the Upper Group (10 students)
No one student got all items correctly. It is found that only one student got
35 items correctly; 1 student got 34 items; 2 students got 29 items; 1
student got 27 items; 3 students got 26 items; 1 student got 25 items; and 1
student got 24 items correctly. The responses as follow:
a. 10 students answer correctly numbers; 5, 6, 7, 8, 11, 17, 19, 20, 32, 37
b. 9 students answer correctly numbers; 43
c. 8 students answer correctly numbers; 18, 25, 35, 36, 46
d. 7 students answer correctly numbers; 3, 22, 33
e. 6 students answer correctly numbers; 2, 10, 15, 27, 32
f. 5 students answer correctly numbers; 1, 4, 21, 34, 45
g. 4 students answer correctly numbers; 23, 26, 38, 40, 47
h. 3 students answer correctly numbers; 14, 16, 24, 30, 39, 44, 48
i. 2 students answer correctly numbers; 9, 12, 13, 28, 41, 49
j. 1 student answers correctly numbers; 29, 42, 50
2. The answer from the Lower Group (10 students)
Meanwhile, in lower group, only one student got 20 items correctly; 3
students got 19 items; 2 students got 17 items; 3 students got 16 items and
1 student got 11 items correctly. The responses are as follow:
a 10 students answer correctly number; 17
b 9 students answer correctly number; 8
c 8 students answer correctly numbers; 5, 44
d 7 students answer correctly numbers; 11, 20, 25, 32
e 6 students answer correctly numbers; 19, 43, 46
f 5 students answer correctly numbers; 18, 22, 26, 31, 35
g 4 students answer correctly numbers; 4, 36, 37
h 3 students answer correctly numbers; 6, 7, 10, 15, 16, 27, 30, 34, 41,
50.
i 2 students answer correctly numbers; 21, 42, 45
j 1 students answer correctly numbers; 1, 2, 3, 9, 12, 13, 14, 23, 24, 28,
29, 33, 40, 47, 49
31
k 0 students answer correctly numbers; 38, 39, 48
Then, as noted earlier, the data for upper and lower group are
calculated by using Heaton’s formula to get the difficulty level (FV) of each
item.
nLCorrectUCorrectFV
2+
=
Explanation: FV : Facility value or item of difficulty that we are looking for CU : Sum of the students from the upper group who answer correctly CL : Sum of the students from the lower group who answer correctly 2n : Total number of the students from upper and lower group.
Afterwards, the result of the calculation is interpreted by using
Arikunto’s criteria. If the FV is:
Difficult : 0.00 – 0.30
` Moderate : 0.31 – 0.70
Easy : 0.71 – 1.00
The result of the writer’s calculation of the index of difficulty level of
each item can be seen in the appendices, labelled table.3. Then, the writer
concludes it in the chart form;
32
Chart. 4.1 The result of difficulty level of each item
0
5
10
15
20
25
difficultmoderateeasy
The chart.1 above explains the distribution of the difficulty level
criteria of English summative test each item. The detailed distribution is as
follows ;
a There are 20 items, which categorized difficult. It means they are in range
between 0.00 up to 0.30. Those are numbers; 1, 9, 12, 13, 14, 16, 23, 24,
28, 29, 30, 38, 39, 40, 41, 42, 47, 48, 49, 50.
b There are 21 items, which are categorized medium, because they are in
range between 0.31 up to 0.70. Those are numbers; 2, 3, 4, 6, 7, 10, 15, 18,
21, 22, 26, 27, 31, 33, 34, 35, 36, 37, 44, 45, 46.
c There are 9 items, which are categorized easy, because they are in range
between 0.71 up to 1.00. Those are numbers 5, 8, 11, 17, 19, 20, 25, 32,
43.
From the result of difficulty level (FV) of each item, the writer
calculates the percentage distribution of each items referred to their category
in table form as follow;
Table 4.2
The category of difficulty level of the English summative test items.
NO Range of Difficulty Level Category Frequency Percentage
1. 0.00 – 0.30 Difficult 20 40% 2. 0.31 – 0.70 Moderate 21 42% 3. 0.71 – 1.00 Easy 9 18% TOTAL 50 100%
33
Based on the table result of facility value or difficulty level data of
English Summative test at MTs. Darul ma’arif Jakarta, it can be said that there
are not balancing value for each category. In other word, the easy items take
the lowest portion. However, the spread of the item ideally should be
balanced. It means, forty percent of the items are in medium category, thirty
percent of the items are in easy category, and thirty percent of the items are in
difficult category. Sudjana states that “jumlah soal untuk ketiga
kategori…artinya, soal mudah, sedang dan sukar jumlahnya
seimbang…Perbandingan antara soal mudah-sedang dan sukar bisa dibuat 3-
4-4….perbandingan lain yang termasuk sejenis dengan proporsi….misalny 3-
5-2”1
Afterwards, the writer summarizes the percentage distribution or
proportion for each category in the chart form. Chart. 4.2
Pie-chart of the difficulty level percentage. (English summative test items of Mts. Darul Ma’arif)
DifficultmoderateeasySlice 4
The last step is to count the difficulty level of all items by using this
formula;
P = ∑b
N
P : difficulty level of all items B : difficulty level of each items ∑ : Sigma (total) N : Total numbers of test items.
1 Dr. Nana Sudjana, 2005. Penilaian Hasil Proses Belajar Mengajar. Bandung. PT. Remaja Rosydakarya), p. 135-136.
34
After calculating the difficulty of all items by using that formula, the
writer got the result is 0.451. The detailed format result of the difficulty level
of each item and the difficulty level of all items can be seen in the appendices
labeled table 3. In this table, the result of each item will be in decimal. As
noted earlier, the writer can interpret the result of difficulty level (FV) of all
items according to Arikunto’s criteria.
Therefore, the writer can interpret that the difficulty level of the
English summative test which tested at MTs. Darul Ma’arif is MODERATE
seen from the FV(difficulty level/facility value) of all items. It can be said so
because it has FV of all items 0.45 that is in range between 0.31 up to 0.70.
CHAPTER V
CONCLUSION AND SUGGESTION
A. Conclusion
Based on the data analysis and interpretion in the previous chapter, the
writer would like to conclude that the difficulty level of English Summative
Test items for the second year student of MTs Darul Ma’arif are as follows:
1. There are 21 items regarded as good test items because they are at
moderate level, ranges from 0.31 to 0.70 (42 %).
2. 20 items regarded as difficult items because they are at difficult level,
ranges from 0.00 to 0.30 (40%).
3. And the others 9 items regarded as easy items because they are at easy
level, ranges from 0.71 to 1.00 (18 %).
Overall, from this analysis it can be said that Summative Test of English
students for the second grade students at Darul Ma’arif Jakarta has moderate
level of difficulty level. It means, this test qualifies as good enough test seen
as the difficulty level of all items.
B. Suggestion
Based on the conclusion above, the writer would like to give some
suggestions concerning the item analysis result:
1. For the further research the discriminating power analysis and content
validity of the English summative test items is necessary in order to find
the poor items to be revised.
35
BIBLIOGRAPHY Ahmann, J Stanley., and Glock, Marvin D. 1967. Evaluating Pupil Growth,
Principle of Test and Measurements. Allyn and Bason Inc. Boston. Airasian, Peter W. 2008. Classrom Assessment; Concept and Application 6th
Edition. Mc Graw Hill. New York. Airasian, Peter W. 2008. Classrom Assessment; Concept and Application 5th
Edition. Mc Graw Hill. New York. Arends, Richard I, 1989. Learning To Teach. Mc. GrawHill. New York. Arikunto, Suharsimi, Dr. 1997. Dasar-dasar Evaluasi Pendidikan. Bumi Aksara.
Jakarta. Bahman, Lyle F .1990. Fundamental Considerations in Languange Testing.
Oxford University Press. Toronto. Bloom, Benjamin s. 1971. Handbook on Formative and Summative of Students’
Learning. Longman. London. Brown, Dean, James. 1996. Testing in Language Programs. Prentice Hall
Regents. Upper Saddle River. Brown, H. Douglas. Teaching by Principles An Interactive Approach to Language
Pedagogy. Longman. London. Finocchiaro, Mary., and Sako, Sydney. 1983. Foreign Language Testing, A
Pragmatical Approach. Regents Publishing Company. New York. Gronlund, Norman E. 1968. Constructing Achievement Tests. Prentice Hill.
Englewood Cliffs. Harrison, Andrew. 1983. A Language Testing Handbook. Macmillan Press.
London. Heaton, J.B. 1977. Writing English Language Tests, Longman. London. Henning, Grant. 1987. A Guide to Language Testing. Newbury House Publishers.
Cambridge. Hughes, Arthur. 1991. Testing for Language Teacher. Cambridge University
Press. Cambridge.
Knapp Thomas r. 1970. Statistic for Educational Measurement. Intext Educatioanl Publishers. New York.
Lado, Robert. 1961. Language Testing. Wing Tai Cheung Printing Co Ltd. Hong
Kong. Madsen, Harold S. 1983. Techniques in Testing. Oxford University Press. Oxford. Namara, Tim Mc. 2000. Language Testing. Oxford University Press. New York. Nitko, Anthony J. 1983. Educational Test and Measurement, an Introduction.
Harcourt Brace Jovanovich, Inc. New York. Noll, Victor H. 1965. Educational Measurement 2nd Edition. Houghton Miffli
Company. Boston. Oller, John W. 1979. Language Tests at School. A Pragmatic Approach.
Longman. London. Purwanto, Ngalim. 1991. Prinsip dan Teknik Evaluasi Pengajaran. Remaja Rosda
Karya. Bandung. Rea-Dicksin, Pauline and Kevin Germaine. 1992. Evaluation. Oxford University
Press. New York. Rummel, J Francis, H.H. Remmers, NL. Gage. 1960. A Practical Introduction to
Measurement and Evaluation. Harper and Brother Publishers. New york. Stanley, Julian C. 1964. Measurement In Todays’ School. Prentice Hill Inc.
Englewood Cliffs. Sujana, Nana, Drs. 2001. Penilaian Hasil Proses Belajar Mengajar. Remaja
Rosda Karya. Bandung. Thoha, M. Chobib. 2003. Teknik Evaluasi Pendidikan. PT. Raja Grafindo
Persada. Jakarta. Thorndike, Robert L & Elizabeth Hagen. 1961. Measurement and Evaluation in
Psychology and Education 2nd Edition. John Willey son. Inc., London. Valette, Rebecca M. 1997. Modern Language Testing. Harcourt Brace Jovanovich
Publishers. New York. http://www.nmsa.org/Publications/WebExclusive/Assessment/tabid/1120/Default.aspx