AN ANALYSIS OF THE ENGLISH SUMMATIVE TEST ITEMS IN …

AN ANALYSIS OF THE ENGLISH SUMMATIVE TEST

ITEMS IN TERMS OF DIFFICULTY LEVEL (A Case Study Of The Second Year Students Of Mts Darul Ma’arif Jakarta)

A “Skripsi”

Presented to the Faculty of Tarbiyah and Teachers’ Training in Partial Fulfillment of the Requirements

for the Degree of S.Pd. (Bachelor of Arts) in English Language Education

By:

Rika Amelia NIM. 105014000357

DEPARTMENT OF ENGLISH EDUCATION

FACULTY OF TARBIYAH AND TEACHERS’ TRAINING

SYARIF HIDAYATULLAH STATE ISLAMIC UNIVERSITY

JAKARTA

2010

“do your best.”

“Search for the things you are good at. Work at them until you are the best.

You are guaranteed to succeed.”

The Procedure of the Research

The steps of conducting the research are follows:

1) Collecting the answer sheets and the test.

2) Checking the key answer of the test to see whether the key answer has

been made correctly by the teacher. Then, the result of this checking

becomes reference to score the students’ responses.

3) Arranging and tabulating the answers from highest score to the lowest one.

4) Taking 27% from highest rank to be the upper group; and 27% from the

lowest rank to be the lower group.

5) Calculating and tabulating the students’ responses in the upper and lower

group who response each item correctly and put it in the format of

tabulation of the item analysis.

6) Finding the index of difficulty of the items.

AN ANALYSIS OF THE ENGLISH SUMMATIVE TEST

ITEMS IN TERMS OF DIFFICULTY LEVEL (A Case Study Of The Second Year Students Of Mts Darul Ma’arif Jakarta)

A “Skripsi” Presented to the Faculty of Tarbiyah and Teachers’ Training

In Partial Fulfillment of the Requirements For the Degree of S.Pd. in English Language Education

Approved by:

Dr. M.M. Farkhan, M. Pd

NIP. 150 299 480

DEPARTMENT OF ENGLISH EDUCATION

FACULTY OF TARBIYAH AND TEACHERS’ TRAINING

SYARIF HIDAYATULLAH STATE ISLAMIC UNIVERSITY

JAKARTA

2010

i

ENDORSEMENT SHEET

The examination committee of the Faculty of Tarbiyah and Teachers’

Training certifies that the “Skripsi” (Scientific Paper) entitled “An Analysis of

The English Summative Test Items in terms of Difficulty Level (A Case Study at

the Second Year Students of MTs. Darul Ma’arif Jakarta),” written by Rika

Amelia, student’s registration number 105014000357 was examined in the

examination session on June 25th, 2010. The “skripsi” has been accepted and

declared to have fulfilled one of the requirements for the Degree of S.Pd.

(Bachelor of Arts) in English Language Education at English Education

Department.

Jakarta, June 26th 2010

The Examination Committee

Chairman : Drs. Syauki, M.Pd. ………………………

NIP. 19641212 199103 1 002

Secretary : Neneng Sunengsih, M.Pd. ………………………

NIP. 19730625 199903 200 1

Examiner I :Drs. Nasrun Mahmud, M.Pd ………………………

NIP. 150 041 070

Examiner II :Dr. Zaenal Arifin Toy, M.Sc ………………………

NIP. 150 031 215

Acknowledged by

Dean Faculty of Tarbiyah and Teachers’ Training

Prof. Dr. Dede Rosyada, MA

NIP. 19571005 198703 1 003

ii

ABSTRACT

RIKA AMELIA. “An Analysis of the English Summative Test Items in terms of Difficulty Level for the Second Year Students of MTs Darul Ma’arif Jakarta”. A Paper, Study Program of English Education, Faculty of Tarbiya and Teachers’ Training, ‘Syarif Hidayatullah’ State Islamic University Jakarta, 2010. The research is purposed to measure the difficulty level of the English summative test items by calculation the students’ correct response from the upper and lower group with J.B Heaton’s formula referred from his book “Writing English Language Tests”. The result of this research is interpreted by the Suharsimi Arikunto’s criteria of items referred from his book “Dasar–dasar Evaluasi Pendidikan” that there are 20 items regarded as difficult item because they are at difficult level, ranges from 0.01 up to 0.30. Twenty one items regarded as good items because they are at moderate level, ranges from 0.31 up to 0.70. And there are 9 items regarded as easy items because they are at easy level, it ranges from 0.71 up to 1.00. From this information, it can be counted the difficulty level of all items by dividing the total of difficulty level of the items with the total number of students is 0.45. So, it can be said that the English summative test items for the second year students of Mts. Darul Ma’arif qualified as a good test seen from the difficulty level of all item which is at moderate level, because it ranges from 0.30 up to 0.70. Key terms : Test - Item Analysis – Difficulty Level

iii

ABSTRAK

RIKA AMELIA. “An Analysis of the English Summative Test Items in terms of Difficulty Level for the Second Year Students of MTs Darul Ma’arif Jakarta”. Skripsi, Jurusan Pendidikan Bahasa Inggris, Fakultas Ilmu Tarbiyah dan Keguruan, Universitas Islam Negeri Syarif Hidayatullah Jakarta, 2010. Penelitian ini bertujuan untuk mengukur tingkat kesulitan soal dari tes sumatif bahasa inggris kelas VIII (delapan) Mts. Darul Ma’arif Jakarta dengan cara mengkalkulasikan respon jawaban yang benar dari kelompok upper dan lower dengan rumus hitungan dari J.B Heaton, dengan bukunya “Writing English Language Test” . Hasil dari penelitian ini diinterpretasikan dengan kriteria butir soal dari Suharsimi Arikunto, dengan bukunya “Dasar-dasar evaluasi pendidikan”, bahwa 20 butir soal merupakan soal yang sulit karena mereka berada pada level sulit (difficult) yaitu antara 0,00 sampai 0,30. 21 butir soal merupakan soal yang sedang karena berada pada level moderate yaitu antara 0,31 sampai dengan 0,70. dan 9 butir soal merupakan soal yang mudah karena berada pada level mudah yaitu antara 0,71 sampai dengan 1,00. Dari informasi tersebut dapat dihitung tingkat kesukaran dari seluruh butir soal dengan cara membagi total keseluruhan nilai tingkat kesukaran tiap butir soal dengan seluruh jumlah siswa, maka diperoleh nilai 0,45. Dengan demikian, soal tes bahasa inggris tersebut dikatakan baik dilihat dari dilihat dari nilai tingkat kesukaran soal yang berada diantara 0,30 sampai dengan 0,70.

Kata kunci : Tes-Analisis butir soal-Tingkat kesukaran soal

iv

ACKNOWLEDGEMENT

Bismillahirahmanirrahim,

In the name of Allah, the most beneficent and the most merciful. All praise

be to Allah SWT lord of the universe, peace and blessing be upon the prophet

Muhammad SAW, his family, his companions and all of his follows:

In finishing this paper the writer gets much valuable help from many

people who are too numerous to be mentioned, but in particular, the writer very

much grateful to:

1. The writer’s family, especially her beloved mother Misnawati and her beloved

father Ali Amran and sisters (Ranti Novitasari Royani Afriyani, Rahmi Sri

Wahyuni, Rani Asmawati and Ridhatulfahmi), her brothers (K’ Yan and K’

Rari), and her nieces (Wafa azzahhiyah, Abdurrahma Faiz and Aisha hilma

Abiya) who had prayed and supported for the writer.

2. Dr. M. M. Farkhan, M.Pd, as the writer advisors who have guided the writer

during the process of writing this paper

3. The lectures of department of English education faculty of tarbiyah and

teachers’ training Syarif Hidayatullah Jakarta who have given the knowledge

which is very useful for the writer.

4. The chairman of English Education Department, Syauki, M.Pd. and his

secretary, Neneng Sunengsih, M.Pd.

5. Prof. Dr. Dede Rosyada, MA. as the dean of Faculty of Tarbiyah and

Teachers’ Training of English Department

6. The headmaster of MTs Darul Ma’arif H. Antung Abdullah and the English

teacher Mrs. Ida S.Pd, who allowed the writer to do the research

7. The staffs of all libraries; the main library of State Islamic University ‘Syarif

Hidayatullah’, the Faculty of Tarbiyah and Teachers Training’s library, British

Counsil Library, Balai Pustaka, Aminef Library, the Catholic University of

v

Atmajaya’s library and PKBB Atmajaya. Thanks for providing the sources to

fulfill the refereces of the writing.

8. My inspired friends, Nadiyah, Irka, Sri Rizki, Reni, Ucha, Yuli, Cyifa, Ida and

Anita, thanks for your kindness to share ideas and time to accompany the

writer in finishing this “skripsi”, and to all PBI B friends and English

Department 2005 friends for their cheerfulness, support and prayer.

9. All people who have given their help in writing this paper that the writer could

not mention one by one. May Allah bless you all.

Jakarta, May 17th 2010

The writer

vi

TABLE OF CONTENTS

COVER PAGE

APPROVEMENT SHEET ................................................................................. i

ENDORSEMENT SHEET................................................................................... ii

STATEMENT SHEET ........................................................................................ iii

ABSTRACT........................................................................................................... iv

ACKNOWLEDGEMENT ................................................................................... vi

TABLE OF CONTENTS ..................................................................................... viii

LIST OF TABLES................................................................................................ .x

CHAPTER I : INTRODUCTION

A. The Background of the Study.……………………. 1

B. The Limitation of the Study.……………………… 4

C. The Formulation of the Problem………………….. 4

D. The Significance of the Study……………………... 4

E. The Organization of the Paper…………………….. 5

CHAPTER II : THEORETICAL FRAMEWORK

A. Evaluation…….…….……………........................... 6

B. Test………………………….................................. 7

C. Type of Tests……………………………………… 8

D. The Characteristics of a Good Test………………. 17

1. Validity ………………………………………. 17

2. Reliability.…………………………………..... 20

3. Practicality…………………………………...... 20

E. Item Analysis.……………….…………………...... 21

1. Difficulty Level ……………………………..... 23

vii

CHAPTER III : RESEARCH METHODOLOGY

A. The Objectives of the Research………………... 28

B. The Method of Study………...………………... 28

C. Time and Place…….………………………....... 28

D. The Respondents……..…………………........... 28

E. The Procedure of the Research……………..…. 29

CHAPTER IV : RESEARCH FINDINGS

A. The Data Description..……............………............. 30

B. The Data Analysis…………...................…………. 31

CHAPTER V : CONCLUSION AND SUGGESTION

A. Conclusion.......……...…………........................... 37

B. Suggestion..............……………........................... 37

BIBLIOGRAFI........................................................................................ 39

APPENDICES

viii

LIST OF TABLES

Table 4.1 : The Students’ Group Position…………………………........... 30

Table 4.2 : The Category of FV of the English summative test items......... 34

ix

LIST OF CHARTS

Chart 4.1 : The result of difficulty level of each item………………… 34

Chart 4.2 : Pie-chart of the difficulty level percentage ………………. 35

x

xi

LIST OF APPENDICES

Appendix 1 : Tabulation of the students’ correct answer from upper group

Appendix 2 : Tabulation of students’ correct answer from lower group

Appendix 3 : Table of the result of difficulty level of the items

Appendix 4 : The procedure of the research

Appendix 5 : The English Summative Test Paper

OUT LINE

CHAPTER I INTRODUCTION

A. Background of Study

B. Significance of the Study

C. Limitation of Problem

D. Formulation of Problem

E. Research Methodology

F. Organization of Writing

CHAPTER II THEORETICAL FRAMEWORK

A. Evaluation B. The definition of test C. Testing roole D. Types of test

a. Function 1. The placement test 2. The diagnostic test 3. The achievement test 4. The proficiency test

b. Way of scoring 1. Objective test 2. Subjective test

E. The Characteristic of a Good Test 1. Validity 2. Reliability 3. Practically

F. Item Analysis 1. Level of difficulty 2. Discriminating power 3. Distracter

G. The importance of item analysis

CHAPTER III PROFILE OF SCHOOL

A. History of School

B. Vision and Mision of School

C. Facilities of School

D. Organization Structure of School

E. Teachers, Staffs, and Students

CHAPTER IV RESEARCH FINDINGS

A. Population and Sample

B. Time of Research

C. The Data Description

D. The Data Analysis

E. The Data Interpretation

CHAPTER V CONCLUSION AND SUGGESTION

A. Conclusion

B. Suggestion

CHAPTER I

INTRODUCTION

A. Background of Study

Evaluation is an important part of every teaching and learning experiences.

It gives big contribution for the teaching and it provides an information about

the students’ progress which can be used by the teachers to manage the

learning task and students. As stated by Pauline Rea- Dicksin and Kevin

Germain; “Evaluation is important for the teacher because it provides a wealth

of information to use for the future direction of classroom practice, for the

planning of courses and for the management of learning tasks and students.”1

Evaluation also can be said as the process to make desirable decision toward

teaching and learning based on the information that has been collected,

synthesized, and reflected on. Lyle F. Bachman states “Evaluation can be

defined as the systematic gathering of information for the purpose of making

decision”.2

Depending upon the decision being made and the information a teacher

needs in order to inform that decision, testing often contribute to the process

as the implementation of evaluation. Indeed, a test is one kind of evaluation

instrument to collect data. “A test is defined as a systematic procedure for

observing and describing one or more characteristics of a person with the aid

of either a numerical scale or category system”.3 In other word, a test

measures a person’s ability or knowledge with a number of tasks or questions.

According to Henning “. . . tests in general is to pinpoint strengths and

1 Pauline Rea and Kevin Germain, Evaluation, (New York: Oxford University Press, 1992), p. 3 2 Lyle F. Bachman, Fundamental Considerations in Language Testing, (Oxford; oxford

University Press, 1990), p. 22 3 Anthony J . Nitko, Educational Test and Measurement, An Introduction, (New York:

Harcourt Brace Javanovich, Inc, 1983), p.6

1

2

weakness in the learned abilities of students”.4 Teachers need to do the test

because through the test they are able to find out the students’ achievement in

mastering the lessons that have been taught and to evaluate the effectiveness

of the method used and the teaching material. Rebecca M. Valette states,

“…through tests the teacher can evaluate the effectiveness of a new teaching

method, of a different approach to a difficult pattern, or of a new materials”.5

To measure the students’ learning progress at school, a teacher

commonly administers two kinds of test; formative test and summative test.

The former test is held earlier than latter test which is held at the end of

semester. Through both tests, a teacher can measure the students’ achievement

level and the degree of how far the instructional objectives of learning be

accomplished by them. For this reason, Gronlund states that;

“Formative test is used to monitor learning progress during instruction. Its purpose to provide continuous feedback to both pupil and teacher concerning learning successes and failures ………..And summative test typically comes at the end of a course of instruction. It is designed to determine the extent to which the instructional objectives have been achieved and is used primarily for assigning course grades or for certifying pupil mastery of the intended learning outcomes”.6

For getting accurate measures a test must have a good quality, because

a good test doesn’t only influence the students learning, but also influences the

teachers to improve teaching and learning process. JB. Heaton supports that

“Test may be constructed primary as device to reinforce learning and to

motivate the students’ performance in language”7. In addition, Lyle F.

Bachman states also that “Test are often used for pedagogical purposes, either

4 Grant Henning, A Guide to Language Testing, (U.S.A: Newbury House Publishers, 1987), p. 1 5 Rebecca M. Valette, Modern Language Testing, (U.S.A; Harcourt Brace Javanovich,

1977), p. 5 6 Norman E. Gronlund, Measurement and Evaluation in Teaching 4th edition,

(Macmillan; Publishing Company, 1976), p. 18 7 JB. Heaton, Writing English Language Test, (New Delhi; Tata Mc. Graw-Hill

Publishing Company, 1998), p.13

3

as a means of motivating students to study or as means of reviewing material

taught”.8

As the accuracy of a test result influences the motivation of students

learning, so the test administered must reflect a good test. A good test is a test

which has the criteria of validity, reliability, and practically. Beside that, it

must has discriminating power and difficulty level.9 A test can be valid if the

test can measure what is supposed to measure. It can be reliable if the result

of the test is the same even though the test administered to the same level

students in the next time. And it can be practical if it is easy to do and

administer.

The matter, which is often forgotten by the teacher is the follow up of

the test implementation pertaining to the test item it self. In fact, they do not

criticize whether or not all items have fulfilled the criteria above. Therefore, it

really required an analysis of the test items, that is namely “item analysis”.

Through analyzing test item teacher can identify good item and the poor item

and to differentiate between student who have done well and poorly.

According to J. Stanley Ahmann and Marvin D. Glock, the purpose of doing

item analysis is:

“Re-examining each test item to discover its strengths and flaws is known as item analysis. Item analysis usually concentrates on two vital features; level of difficulty and discriminating power. The former means the percentage of pupils who answer correctly each item; the latter the ability of the test item to differentiate between pupils who have done well and those who have done poorly”.10

8 Lyle F. Bachman, Fundamental Consideration in Language Testing, (Toronto; Oxford

University Press, 1990), p. 22 9 JB. Heaton, Writing English Language Test, (New Delhi; Tata Mc. Graw-Hill

Publishing Company, 1998), p. 152-156 10 J. Stanley Ahmann and Marvin D. Glock, Evaluating Pupil Growth, Principles of Tests

and Measurements, (Boston: Allyn and Bason, INC, 1967), p. 184

4

In addition Ngalim Purwanto states; “Tujuan Khusus dari item analisis adalah

mencari soal tes mana yang baik dan mana yang tidak baik, dan mengapa item

atau soal itu dikatakan baik dan tidak baik.”11

The latest English summative test at MTs.Darul Ma’arif was held on

June 19, 2009. According to pre-survey result during teaching practice at Mts.

Darul Ma’arif , the writer was informed that in the occasion of second

semester, the English teacher has never analyzed the test items, so that is

difficult to say whether it is a good test or not. In addition, the test results

show that the scores of the students’ are bad.

Considering this fact, the writer is interested in making item

analysis through the items of English summative test at MTs. Darul Ma’arif

Jakarta, in the second term 2008/2009 academic year.

B. Limitation of the Problem

The writer limits the study of item analysis of the English summative

test which is administered for the second year of MTs. Darul Maa’rif Jakarta

2008/2009 academic year on the aspect of difficulty level or facility value.

C. Formulation of Problem

Based on the background of study described, the writer would like to

seek the answer the following problem; “Does the English summative test

items for the second year students of MTs. Darul Ma’arif Jakarta have a good

quality in terms of difficulty level?”

11Drs. Ngalim Purwanto, Prinsip-prinsip dan Tehnik Evaluasi Pengajaran, (Bandung;

Remaja Rosda Karya, 1991), p. 118

5

D. Significance of the Study

Firstly, it provides with the feedback to the writer especially, and the

English teacher of how to analyze the test items in terms of difficulty level.

Secondly, it informs the English teacher about the quality of test

items in terms of difficulty level. Through this research, the English teacher

can know the good items for the future used and the students’ achievement in

mastering the materials taught in order to evaluate the teacher’s competence in

teaching.

E. Organization of Writing

In discussing the topic, the writer divides this study into five chapters,

as follow

Chapter one is introduction, involving background of study,

significance of study, limitation of problem, formulation of problem,

significance of study and organization of writing.

Chapter two is theoretical framework which discusses about

evaluation, the test and its types, the criteria of a good test and item analysis

Chapter three discusses is research methodology which is include the

objective of research, the method of study, the time and place, the population

and sample, the instrument and the procedure of the research.

Chapter four presents the research findings which consist of the data

description and the data analysis.

Chapter five is devoted to the conclusion of what has been discussed

and analyzed in the chapters before, and also the writer’s suggestion through

the research.

CHAPTER II

THEORETICAL FRAMEWORK

A. The Definition of Evaluation

Evaluation is important for every process of anything that has done,

because through evaluation we can find out the weakness which should be

revised and the strengths which should be improved, so does in the teaching

learning process evaluation plays important role to contribute and provide

some information for making judgments about what is good or desirable as in

order to improve the students’ knowledge in learning and the teacher’s

competence in teaching,. It is likely what Peter W. Airasian defines:

“Evaluation is the process of judging the quality or value of a performance or

a course of action”.1Still in the same sense Lyle F. Bachman states

“Evaluation can be defined as the systematic gathering of information for the

purpose of making decision”.2And evaluation includes, “the making

judgments about the value, for some purpose, of ideas works, solutions,

methods, materials, etc”.3 Hence, Benjamin S. Bloom,et.al states that

“Evaluation is a system of quality control in which It may be determined at

each step in the teaching-learning process whether the process is effective or

not, and if not what changes must be made to its effectiveness before it is too

late”.4

Basically, the purpose of evaluation is to judge the worth of program

or procedure, usually in terms of how well it has achieved its objectives and

1 Peter W. Airasian, Classroom Assesment; Concepts and Applications, (1221 Avenue of the Americas, New York, NY 10020; McGraw-Hill, 2005}, 5th edition, p. 9

2 Lyle F. Bachman, Fundamental Confiderations..., p. 22 3 Julian C. Stanley, Measurement In Todays’ School, (Englewood Cliffs; Prentice-Hill,

Inc, 1964), p. 16 4 Benjamin S.Bloom, Handbook on Formative and Summative of Students Learning,

(London; Longman, 1971), p. 8

6

7

for this purpose all appropriate techniques of gathering evidence may be

used.5 “Evaluation goes beyond the statement of how much to concern it self

with the question what value. It seeks to answer the pupil’s and teacher

question of what progress am I making???.6 Richard I. Arends states that “

An important purpose of testing and evaluation is to provide students with

feedback on how they are doing”.7

Finally, considering all those opinions above about evaluation, the

writer can summarize that evaluation is a systematic process to provide

available information in order to make judgment and desirable decision of

how to measure whether the objective is suitable or in line of the curriculum

used, and to find out the students’ improvement in teaching learning process

and the teacher competences in teaching, and also the classroom climate.

B. The Definition of Test

When people hear the word assessment and evaluation, they often

think right a way of tests because a test is one of the instruments of evaluation

for collecting the data. A test is a formal, systematic, usually paper-and-pencil

procedure for gathering information about pupil’s performance.8 While paper-

and-pencil tests are one important tool for gathering assessment information.

A test is composed of a number of tasks or questions for students to

respond. By analyzing the responses, the teacher can measure the student’s

achievement in the teaching learning process. While Lyle F. Bachman states

that; “A test is a procedure designed to elicit certain behavior from which one

can make inferences about certain characteristics of an individual”.9 While

5 Victor H. Noll, Introduction to Educational Measurement, (Boston; Houghton Mifflin

Company, 1965), 2nd edition, p.14 6 H.H.Remmers, N.L.Gage, J.Francis Rummel, A Practical Introduction to Measurement

and Evaluation, (USA; Harper and Brother Publishers, 1960), p. 7 7 Richard I. Arends, Learning To Teach, (New York, Mc.GrawHill International Edtion,

1989), p. 312 8 Peter W. Airasian, Classroom Assesment..., p. 9 9 Lyle F. Bachman, Fundamental Consideratin..., p. 20

8

Wilmar states that; “A test is a set of questions, each of which has a correct

answer, that examinees usually answer orally or in writing”.10

From those views of test, it can be concluded that a test can be

instrument, techniques, or procedures to have the students’ respond through

tasks or performance in the form of set of questions must be answered in order

to achieve the teaching-learning objectives. In short, a test is a measurement

instrument designed to assess a specific sample of individuals’ behavior.

Test is also a way to deliver information, which is very useful for

many practitioners of education. “A test is a formal systematic procedure for

gathering information”.11 Therefore, test a device of educational is necessary

in a teaching process, since testing and teaching can not be separated. Heaton

states that ”both testing and teaching are so closely interrelated that is virtually

impossible to work in either field without being constantly concerned with the

other”.12The reason of that interrelation and connection between testing and

teaching is the material tested, must be based on the material taught in order to

find out how far the students comprehension.

C. Type of Tests

There are many kinds of tests used to measure students’ achievement

that can be used in an evaluation process. The type of test can be classified

into two types, namely; function and way of scoring.

1. Function

According to Andrew Harrison, the types of functional test can be

categorized into four types: placement test, diagnostic test, achievement

test, and proficiency test.

10 Wilmar Tinambuan, Evaluation of Students Achievement, (Jakarta; Depdikbud, 1988)

p. 310 11 Julian C. Stanley, Measurement in Today’s..., p.3 12 J.B. Heaton, Writing English..., p.1

9

a. The Placement test

Placement test is used to place a student to appropriate level or

section of a language curriculum or school. It usually happens in the

beginning of course. According to Wilmar Tinambuan;

A placement test is designed to determine pupil performance at the beginning of instruction. Thus, it is designed to sort new students into teaching groups, so that they can start a course at approximately the same level as the other students in the class. It is concerned with the student’s present standing, and so relates to general ability rather than specific points of learning. As a rule the result are needed quickly so that the teaching may begin.13

b. The Diagnostic Test

Diagnostic test is designed to diagnose a particular aspect of a

language. “Diagnostic tests are also achievement test, but they are

characterized by one distinctive feature, namely that they are designed

to show specific weakness and strengths within the skills or elements

measured”.14

It can also be used to check the students’ progress in learning

particular elements of the course. It is used for example at the end of a

unit in the course book or after lesson designed to teach one particular

point.15 “A diagnostic test is designed to determine the degree to which

the specific instructional objectives of the course have been

accomplished”.16 And J.B Heaton states that; “Diagnostic test is

widely used, few tests are constructed solely as diagnostic tests. Note

that diagnostic testing is frequently carried out of groups of students

rather for individuals”.17

13 Wilmar Tinambuan, Evaluation of Students..., p. 7 14 Robert Lado, Language Testing, (Hongkong; Wing Tai Cheung Printing Co Ltd, 1961),

p. 369 15 Andrew Harrison, A Language Testing Handbook, (London; Macmillan Press, 1983),

p.6 16 James Dean Brown, Testing in Language Program, (New Jersey; Prentice Hall

Regents, 1996), p. 15 17 J.B. Heaton, Writing English... , p.173

10

Thus, diagnostic test is much comprehensive and detailed

because it searches for the underlying causes of learning difficulties

and then formulates a plan for remedial action.

c. Achievement Test

These tests are used to know what students have actually learnt

or on what have actually been taught. “Achievement tests are designed

to measure relative accomplishment in specified areas of work”.18 The

purpose of achievement test as its name reflect is to establish how

successful individual students, groups of students, or the courses

themselves have been in achieving objectives.19 In another point of

view Wilmar says that “the degree purpose of achievement test is

designed to indicate degree of students’ success in some past learning

activities”.20 And also “Achievement tests relate to the past in that they

measure, what language the students have learned as a result of

teaching”.21

Based on the argumentation above about achievement test, the

writer can conclude that the achievement test are intended to measure

how effectively students have mastered the lesson and how far they

have reached the instructional objectives. Thus, an achievement test

must be designed with very specific reference to a particular course.

This link with a specific program usually means that the achievement

tests will be directly based on the course objectives and will therefore

be criterion referenced. Such tests will typically be administered at the

end of a course to determine how effectively students have mastered

the instructional objectives.

At the implementation level, the achievement test appears in

two purposeful tests, they are formative test and summative test.

18 H.H. Remers, NL. Gage, J. Fraancis Rummel, A Practical Introduction..., p. 19 19 Arthur Hughes, Testing for Language Teachers, (Cambridge; Cambride University

Press, 2003), p. 13 20 Wilmar Tinambunan, Evaluation of Students..., p. 19 21 Tim Mc Namara, Language Testing, (Hong Kong: Oxford university Press, 2000), p. 7

11

1) Formative test

Formative test is administered by the teacher during the

learning progress with the aim of using the result to improve

instruction and to provide continuous feedback to both students and

teacher. Rebecca M. Valette states “The formative test is given

during the course of instruction; its purpose is to show which

aspects of the chapter the student has mastered and where remedial

work is necessary”.22 Hence, formative test is part of the

instructional process. When incorporated into classroom practice, it

provides the information needed to adjust teaching and learning

while they are happening. In this sense, formative test informs both

teachers and students about student understanding at a point when

timely adjustments can be made. These adjustments help to ensure

students achieve, targeted standards-based learning goals within a

set time frame.23

2) Summative test

Summative test is a test that usually administered at the end of

the course. Rebecca M. Valette states ”the summative test, on the

other hand, is usually gives at the end of a marking period and

measures the “sum” total of the material covered. On this type of a

test, students are usually ranked and graded”. Moreover,

summative test is given periodically to determine at a particular

point in time what students know and do not know. Summative test

at the district/classroom level is an accountability measure that is

generally used as part of the grading process. Arthur Hughes states

that”the content of summative test should be based directly on a

detailed course syllabus or on the books and other material used”.24

22 Rebecca M. Valette, Modern Language..., p.6 23 http://www.nmsa.org/Publications/WebExclusive/Assessment/tabid/1120/Default.aspx 24 Arthur Hughes, Testing for Language…, p. 11

12

Finally, the writer can conclude that summative test is a test that

usually administered at the end of a course of study.

d. Proficiency Test

The proficiency test is also used to measure what students have

learned, but the aim of the proficiency test is to determine whether this

language ability corresponds to specific language requirements”.25

According to J.B. Heaton that “the proficiency test is concerned

simply with measuring a student’s control of the language in the light

of what he or she will be expected to do with it in the future

performance of a particular task “.26 And also James Dean Brown

states that: “A proficiency-test assess the general knowledge or skill

commonly required or prerequisite to entry into (or exemption from) a

group of similar institution”.27

Then, it should never be undertaken lightly. Instead, these

decisions must be based on the best obtainable proficient test scores as

well as other information about the student. The content of proficiency

test therefore, is not based on the content of objective of language

courses that people taking the test may have followed. Rather, it based

on a specification of what candidates may have to be able to do in

language, in order to be considered proficient”.28

25 Rebecca M. Valette, Modern Language..., p.6 26 J.B. Heataon, Writing English... , p.172 27 James Dean Brown, Testing In Language..., p.10 28 Arthur Hughes, Testing For Language..., p. 9

13

2. Way of Scoring.

Based on the manner of scoring, the type of test item is divided

into two general types: objective and subjective test. J.B. Heaton states

that “Subjective and objective test are terms used to refer to the scoring of

tests”.29

a. Objective test

An objective test item is any test item that there is only a single

correct answer. In this test, the students must select one option from

some alternatives. According to Valette; “An objective test item is any

item for which there is a single predictable correct answer”.30

Hence, this item type referred as objective test item, because

they can be scored objectively. That is, equally competent scorers can

score them independently and obtain the same result. Therefore,

whether the item is scored by one teacher or another, today or last

week, it will yield the same score. That is, the advantages of the

objective test items are objective scoring, that is quick, easy and

consistent.

The objective test item commonly used in classroom testing are

true-false, multiple-choice, matching, and short answers. “These test

item include all of the selection-type items-multiple choice, true false,

and matching.”31

1) True-False

True-false is simply a declarative statement which the

students must judge as true or false. As what J. Stanley explained

that “true-false item is referred to alternative response item; the

29 J.B. Heaton, Writing English..., p. 25 30 Rebecca M. Valette, Modern Language..., p.6 31 Norman E. Gronlund, Constructing Achievement Tests, (New Jersey: Prentice-Hall.,

Inc., 1968), p. 25

14

item asks the students to answer with the “true” if it conforms to

the truth or “false” if it essentially incorrect.32

Thus, the item provides the students with a choice of two

alternatives, so the students have possibility to guess the answer

and sometimes it will be the right answer. In other word, students

indicate whether a statement is true or false.

Example:

T F True-False items classified as supply-type item

2) Multiple-choice item

The multiple-choice item consists of a stem, which presents

a problem situation, and several alternatives, which provide

possible solutions to the problem. The stem may be a question or

an incomplete statement. The alternatives include the correct

answer and several plausible wrong answers, called distracters.

Their function is to distract those students who are uncertain of the

answer. “A multiple-choice item consists of one or more

introductory sentences followed by a list of two or more suggested

responses from which the examinee chooses one as the correct

answer”.33

Example:

In objective testing, the term objective refers to the method of … a. identifying the learning outcomes b. selecting the test content c. presenting the problem d. scoring the answers

3) Matching

The matching test item consists of two parallel columns

with each word. Number of symbol in one column is being

matched to a word, sentence or phrase in other column. This type

32 J. Stanley Ahman and Marvin D. Glock, Evaluating Pupil Growth..., p. 17 33 Anthony J. Nitko. Educational Test..., p. 190

15

of item is employed widely in situation where relationship of more

or less similar ideas, facts and principles are to be examined or

judged. In this type, students indicate relationship between a set of

premises and a set of responses.

Example: 1. The …. drives a car a. doctor

2. The …. checks the patience b. driver

This kind of test is an effective way to student’s recognition

of the relationships between words, definitions, events, dates,

categories, examples, and so on.

b. Subjective Test item

Subjective test is a test where in its scoring requires judgment

and evaluation of scores. While Vallette states that “Subjective item is

one that does not have a single right answer”.34 It means that the

scoring is inconsistent and the answer of the question is in form of

composition where the students are given a chance to relate their idea

or argument in their own words. In other word, the answer is

commonly in a form of composition or statement. “Subjective tests,

like translation and essay, have the advantage of measuring language

skill naturally, almost the way English used in a real life”.35

The subjective tests that are commonly used in classroom are

completion, short-answer, and essay item.

1) Completion

The completion item is a written statement that requires the

examinee to supply the correct word or short phrase in response to

an incomplete sentence, a question or a word association.

34 Rebecca M. Valette. Modern Language..., .p. 10 35 Harold S. Madsen, Technique In Testing, (Oxford; Oxford University Press, 1983) p.8

16

Completion test can be used effectively to measure the recall of

terms, dates, and names.36

The completion item and short answer item are both supply

type test items, but in the short answer type, the blank is nearly

always at the end, whereas in the completion, type of the blank

may occur everywhere in the statement. 37

2) Short- answer Item

The short answer item consists of a question, which can be

answered with a word or short phrase.38 A student provides a short

response to a direct question or direction.

Generally, teachers prefer to use the short answer type

question, probably because they think it has some advantages. It is

relatively easy to construct, it also gives the teacher some

opportunity to see how well students can express their thought and

it is also not difficult to score or mark than the essay question.39

However, it is difficult to phrase the short answer question, so that

only one answer is correct. And this type of question will be more

useful only in testing knowledge of facts and quite specific

information.

3) Essay test.

The most notable characteristic of the essay test is freedom

of response it provides. The student is asked a question which

requires him to produce his own answer. He is relatively free to

decide how to approach the problem, what factual information to

use, how to organize his reply, and what degree of emphasis to

give each aspect of the answer. Thus, the essay question places a

36 Wilmar Tinambuan, Evaluation of Students..., p. 61 37 Victor H. Noll, Introduction to Educational..., p. 140 38 Victor H. Nol, Introduction to Educational..., p. 138 39 Victor H. Nol, Introduction to Educational..., p. 138

17

premium on the ability to produce, integrate, and express the ideas.

As what Norman E Gronlund states that;

“Essay tests are inefficient for measuring knowledge outcomes . . . but they provide a freedom of response which is needed for measuring certain complex outcomes . . . . These include the ability to create . . . . to organize . . . . to integrate . . . . to express . . . and similar behaviors that call for the production and synthesis of ideas”.40

Finally, from the explanation above about both objective test and

subjective test concerned on the essay test, the writer conclude that for the

measurement of most knowledge outcomes we would use objective test items

to take advantage of their more extensive sampling and greater reliability. For

the measurement of such complex learning outcomes as the ability to create,

organize, and evaluate ideas, however, the teacher would use essay questions

despite their limitation.

Of the types of test item above, the writer will concern only with the

multiple choice test item in English summative test for the second year

students of Mts. Darul Ma’arif, administered at the end of the second semester

2008/209 academic year.

D. Criteria Of A Good Test

There are many considerations entering into the evaluation of a test,

which referred as a good test because a good test can provide available

information for a good evaluation in order to measure student’s

comprehension of the instructional objectives, but the writer consider them

under three main headings;. These are respectively validity, reliability, and

practically. Validity refers to the extent to which a test measures what we

actually wish to measure. According to Brown “Validity is the degree to

which the test actually measures what is intended to measure…..Reliability is

40 Norman E. Gronlund, op Constructing Achievement..., p. 65

18

consistent and dependable…….And practically is means of financial

limitations, time constraints, ease of administration, and scoring and

interpretation”.41

1. Validity

The single most important characteristic of a good test is its ability

to help the teacher make a correct decision of what is intended to measure.

This characteristic is called validity. “Validity is concerned with whether

the information being gathered is relevant to the decision that needs to be

made”.42

A test has validity if it measures appropriately, what it is supposed

to measure. According to Heaton: “The validity of a test is the extent to

which it measures what is to measure and nothing else”.43 Finnochiaro and

Sako also state : “A test is valid when it measures effectively what it is

intended to measure”.44 Still in the same sense, Wilmar states that “The

validity of a test is the extent to which the test measures what is intended

to measure”.45Also, Norman E. Gronlund states that “test scores are valid

to the extent to which they serve the use for which they are intended”.46

While J. Staley Ahmann and Marvin D. Glock point out “In educational

measurement, validity is often defined as the degree to which a measuring

actually serves the purposes for which it is intended”.47

Based on the definition, the writer can conclude that validity of test

is important to know whether a test has a good quality in testing

someone’s capacity.

41 H. Douglas Brown, Teaching by Principles An Interactive Approach to Language Pedagogy, (San Fransisco: Longman, 2nd edition), p. 386-387

42Peter W. Airasian, Classroom Assesment..., p. 16 43 J.B Heaton. Writing English... , p. 159 44 Mary Finocchiaro and Sydney Sako, Foreign Language Testing a Practical Approach,

(New York: Regent publishing company, 1983), p. 24 45 Wilmar Tinambunan, Evaluation of student..., p. 11 46 Norman E. Gronlund, Constructing Achievement..., p. 105 47 J. Stanley Ahamnn and Marvin D Glock, , Evaluating Pupil Growth..., p. 285

19

As the validity is one of the most important characteristic of test

scores, the constructor of the test should know the various aspects from the

validity itself and various procedures by which they are determined.

“The two most important characteristics of test scores are validity and reliability…Anyone working with tests-whether constructing them or using published tests-should understand the meaning of these concepts…and should know the various procedures by which they are determined”.48 According to Heaton, a validity of a test can be seen from some

aspects mentioned below.

a. Face validity

A test has face validity if the test has a good “face” or the way the

test looks. According to Heaton: “if a test items looks right to other

testers, teachers, moderators, and testers, it can be described as having

at least face validity”.49 While Marry Finocchiario and Sydney Sako

define it is “A judgment about a test based on the way the test looks to

educators, students, and the general public. The test should not only

‘be right’ it also ‘look right”.50

b. Content Validity

A test has content validity if the test contains materials that the

student has been taught. To fulfill this, the teacher also should refer to

the instructional objectives of the teaching learning process.

Finocchiario and Sako state; “Content validity is assured by checking

all items in the test to make certain that they correspond to the

instructional objective of the course“.51Still in the same sense, Victor

H. Noll explaines “when a teacher gives a test which deals with the

48 Norman E. Gronlund, Constructing Achievement..., p. 105 49 J.B Heaton, Writing English..., p. 159 50 Marry Finochiario and Sydney Sako, Foreign Language..., p. 28 51 Marry Finnochiaro and Sydney Sako, Foreign Language..., p. 25

20

material and with the objectives of instruction in particular class, his

test is said to have curricular (content) validity”.52

c. Construct Validity

A test is said to have a construct validity if it can demonstrates

that it measures just the ability, which it is supposed to measure

.according to Heaton; “if a test has construct validity, it is capable of

measuring certain specific characteristics in accordance with a theory

of language behavior and learning”.53

d. Empirical Validity

A fourth type of validity is usually referred to as statistical or

empirical validity. This validity is obtained as a result of comparing

the result of the test with the result of some criterion measure.54

2. Reliability

The second criterion of a good test is reliability. Reliability has to

do with the accuracy and precision of a measurement procedure. Indices of

reliability give an indication of the extent to which a particular

measurement is consistent and reproducible.55 A test should be reliable as

a measuring instrument.

According to Finocchiario and Sako; the reliability or stability of a

language test is concerned with the degree to which it can be trusted to

produce the same result upon repeated administration to the same

individual, or to give consistent information about the value of a learning

variable being measured”.56While J. Stanley Ahmann and Marvin D.

Glock state that “Reliability means consistency of results. This is

equivalent to saying that a highly reliable instrument can be used

52 Victor H. Noll, Introduction to Educational..., p. 79 53 J.B. Heaton. Writing English..., p. 161 54 J.B. Heaton, Writing English..., p. 161 55 Robert L. Thorndike and Elizabeth Hagen, Measurement and Evaluation in Psychology

and Education, ( London; John Willey and Sons, Inc., 1961), p. 127 56 Marry Finnochiario and Sydney Sako, Foreign Language..., p. 28

21

repeatedly in an unchanging situation and produce constant or near

constant results.”57

Based on above statements a test is reliable if it consistently yields

the same or nearly the same ranks over repeated administrations.

3. Practicality

Practicality is concerned with a wide range of factors economy,

convenience and interpretability that determine whether a test is practical

for widespread use. “Practically is concerned with a wide range of factors

economy, convenience, and interpretability that determine whether a test is

practical for widespread use”.58

A test maybe a highly reliable and valid instrument but still is

beyond our means facilities. The teacher or someone who makes the test

should keep in mind a number of very practical considerations. There are

many factors of practicality; economy, scorability, and administrability.

According to Finnochiario and Sako state that “the criteria for

practicality normally will be based upon such factors as economy,

scorability, and administrability”. 59While, Harrison states that “tests

should be as economical as possible in time (preparation, sitting, and

marking) and in cost (material and hidden costs of time spent)”.60

In short, the criteria of a good test are validity, reliability and

practicality. However, besides those three criteria, a good test as whole is

also determined by the quality of each item that construct the set test. If the

quality of each item is good, it can give the strength and accuracy of the

scores get from the test. Then, the quality of each item individually can be

analyzed by doing item analysis. According to Robert Lado; “item analysis

is the study of validity, reliability, and difficulty of test item taken

57 J. Stanley Ahmann and Marvin D. Glock, , Evaluating Pupil..., p. 311 58 Robert L. Thorndike and Elizabeth Hagen, Measurement and Evaluation..., p. 127 59 Marry Finnochiario, Foreign Language Testing..., p. 30 60 Andrew Harrison, A Language Testing..., p. 13

22

individually as if they were separate tests”.61through this analysis, the

evaluator can get information about which item is good for the future used.

D. Item Analysis

After a test has been administered and scored it is usually desirable to

evaluate the effectiveness of the items. This is done by studying the students’

responses to each item. When formalized, the procedure is called item

analysis. Anthony J. Nitko states, “item analysis refers to the process of

collecting, summarizing, and using information about pupils’ responses to

items”.62

Meanwhile Harold S. Madsen explained that:

“The selection of appropriate language items is not enough by it self to ensure a good test. Each questions needs to function properly; otherwise, it can weaken the exam. Fortunately, there are some rather some simple statistical ways of checking individual item. This procedure is called ‘item analysis’.”63

An item analysis also is a systematic procedure which provides some

information about the quality of the test item, concerning each of the

following points:

1. The difficulty of the item

2. The discriminating power of the item

3. The effectiveness of each alternatives or distracters.

Thus, item analysis information can tell the evaluator or constructor if

an item was too easy or too hard, how well it discriminated between high and

low scorers on the test, and whether all of the alternatives functioned as

intended. According to Suharsimi Arikunto, “Analisis soal antara lain

bertujuan untuk membantu kita dalam mengidentifikasi butir-butir soal yang

jelek, memperoleh informasi yang akan digunakan untuk menyempurnakan

61 Robert Lado, Ph. D, Language..., p. 342 62 Anthony J. Nitko, Educational Test..., p. 342. 63 Harold S. Madsen, Technique In... , p. 180

23

soal-soal untuk kepentingan lebih lanjut, dan untuk memperoleh gambaran

secara selintas tentang keadaan yang kita susun”.64

Item analysis data also aids in detecting specific technical flaws and

thus further provides information for improving test items, as what J. Stanley

Ahmann and Marvin D. Glock state “item analysis is re-examining each test to

discover its strength and flaws”.65

Item analysis has several benefits. First, it provides useful information

for class discussion of test. Second, it provides data for helping the students

improve their learning. Third, it provides insights and skills which lead to the

preparation of better tests on future occasions.66

Finally, the writer concludes that item analysis is very important to do

in order to get information of the quality of the test item, whether it is good

item or poor item.

1. Difficulty Level of The Item

The difficulty level of item means the percentage of pupils who

answer correctly each test item. “The item difficulty is fraction of the

persons taking an item who answer it correctly”.67 Heaton states that “The

index of difficulty “(of facility value) of an item simply shows how easy or

difficult the particular item provide in the test. The index of difficulty

(facility value) is generally expressed as the fraction (percentage) of the

students who answered the item correctly”.68

A good test item should have a certain degree of difficulty. It may

not be too easy or too difficult because the test that is too easy or too

difficult will yield same score distribution that make it hard to identify

64 Suharsimi Arikunto, Dasar-dasar Evaluasi Pendidikan, (Jakarta; Bina Aksara, 1987),

p. 205 65 J. Stanley Ahmann and Marvin D. Glock, Evaluating Pupil Growth..., p. 184 66 Norman E. Gronlund, Constructing Achievement..., p. 85-86. 67 Anthony J. Nitko, Educational Test..., p. 288 68 J.B Heaton, Writing English..., p. 178

24

reliable differences in achievement between the pupils who have done well

and these who have done poorly. Suharsimi Arikunto says;

”Soal yang baik adalah soal yang tidak terlalu mudah atau tidak

terlalu sukar. Soal yang terllau mudah tidak merangsang siswa untuk

mempertinggi usaha siswaq untuk memecahkannya. Soal yang terlalu

sukar akan menyebabkan siswa menjadi putus asa dan tidak mempunyai

semangat untuk mencoba lagi karena diluar jangkauannya”.69

By analyzing the students’ response to the items, the level of

difficulty of each item can be known and the information will be helpful

for teacher in identifying concepts to re-teach the study material. In

addition, by analyzing the facility value, the teacher will know if the item

is easy, moderate, or difficult, M. Chobib Thoha states;

“item yang baik adalah item yang tingkat kesukarannya dapat diketahui

tidak terlalu sukar dan tidak terlalu mudah. Sebab tingkat kesukaran itu

memiliki korelasi dengan daya pembeda. Bilamana item memiliki tingkat

kesukaran maksimal, maka daya pembedanya akan rendah, demikia pula

bila item itu terlalu mudah juga tidak akan memiliki daya pembeda”.70

To measure the difficulty level of each item, the writer uses the

Heaton’s formula; the formula is like this:71

nLCorrectUCorrectFV

2+

=

Explanation:

FV : Facility value or item of difficulty that we are looking for

CU : Sum of the students from the upper group who answer correctly

CL : Sum of the students from the lower group who answer correctly

2n : Total number of the students from upper and lower group.

69 Suharsimi Arikunto, Dasar – dasar..., p. 207 70 M. Chobib Thoha, Teknik Evaluasi Pendidikan, (Jakarta; PT. Raja Gafindo Persada,

2003), p. 145 71 J.B. Heaton, Writing English..., p. 178

25

After calculating the difficulty level of each item, the writer calculates the

index of difficulty of all item by this formula;

P = ∑b

N

P : difficulty level of all items B : difficulty level of each items ∑ : Sigma (total) N : Total numbers of test items.

To know the criteria of the difficulty level of each item and all items,

the writer uses the measurement level referred to Suharsimi Arikunto’s

book.72 If the FV is:

Difficult : 0.00 – 0.30

Moderate : 0.31 – 0.70

Easy : 0.71 – 1.00

The level of facility value shows the easiness or difficultness of test

items for that group. So, the level of facility value is influenced by the

students’ competence. The result will be different if the test is given to

another group of learners or students.

E. The Importance of Item Analysis

An item analysis is very important for teachers in preparing better test

items and help teachers in the teaching-learning process. “Item analysis is an

important and necessary step in the preparation of good multiple-choice

tests”.73

72 Suharsimi Arikunto, Dasar – dasar..., p. 210 73 John. W. Oller, Language Tests at School. A Pragmatic Approach, (London: Longman,

1979), p. 245

26

‘For teacher made test, the following are among the important uses of

item analysis: determining whether an item functions as teacher intends,

feedback to the teacher about pupil difficulties, are for curriculum

improvement, revising the item and improving item writing skills”.74

1. Determining whether an item functions as teacher intends.

The item will function properly if the test item tested is able to

distinguish those who master the learning objectives from those who do

not. To differentiate between them, the test item should have certain level

of difficulty, discriminating power and the effectiveness of distracters.

Therefore item analysis should be done.

2. Feedback to students’ performance and as a basis for class discussion.

After knowing the students’ responds to the item, the students’

performance can be known and the students’ error can be corrected and the

test items that are felt difficult for most of them can be discussed in their

class.

3. Feedback to the teacher about pupils’ difficulties

The result of item analysis will be useful for teachers to know the

major types of pupils’ difficulties in learning. So they know the material

needs to be review in next learning.

4. Area for curriculum improvement.

By item analysis, it can be known what kind of items which are felt

difficult by students or certain errors occur often, may be the item is not

compatible to be taught in a school program. So curriculum may be needed

to be revised.

74 Anthony J. Nitko,Educational Test..., p. 284

CHAPTER III

RESEARCH METHODOLOGY

1. The Objective of The Research

The research is done to find out the difficulty level of the English

summative test items in the second year of Mts. Darul Ma’arif Jakarta in the

second term 2008/2009 academic year by calculation which is referred to J.B

Heaton’s book; “Writing English Language Test”.

2. The Method of Study

The method used in this study can be categorized into descriptive

analysis. This descriptive analysis is concerned with a quantitative analysis.

Quantitative is used in analyzing data of scores to detect the test items whether

it is good or not by using simple statistic tabulation.

3. The Time and Place

The research was held during teaching practice from March to June

2009 at MTs Darul Ma’arif which is located at Jl. Rs. Fatmawati No. 45

Cipete , South Jakarta .

4. The Respondents

The writer took the result of the English summative test of the second

grade at MTs. Darul Ma’arif Cipete South Jakarta, which consist of 50 English

multiple choice items. The respondents of this research are the second year

students of MTs. Darul Ma’arif Jakarta, which which consists of 36 students.

5. The Instrument of the Research

The research instrument is the English summative test paper for the second

year students of MTs., Darul Ma’arif Jakarta.

27

CHAPTER IV

RESEARCH FINDINGS

A. The Data Description

The English summative test consists of 50 multiple choice items. As

noted in the procedure of the research, the items are analyzed by arranging the

students’ correct answers of each item from the highest to the lowest score.

After correcting the students answer sheet, the writer listed the score of the

students from the highest score to the lowest score. The score given by the

writer is to make it easier to divide those students into three groups. The way

the writer scored is by multiplying the number of correct answer by two point

because there are 50 items in the test. The following tables show their scores

and their groups. Table 4.1

Group position of English summative test for 36 of the second year students of Mts. Darul Ma’arif Jakarta in the second term 2009/2010 academic year

No Name Score Group 1. Nabil F.Q 70 U 2. Kinanti Restian Putri 68 P 3. Hikmah 58 P 4. Fuad ismail 58 E 5. M. Bachruri. A.F 54 R 6. Aulia Ulfa 52 G 7. Lis saodah 52 R 8. Syifa fauziah 52 O 9. Khalidah khairurizky 50 U 10. M. Yusuf 48 P 11. Rahmat Febrianti 48 12. M. Chaidir Rafsanjani 48 13. Ismaul Husna 48 14. Indah Permata sari 46 15. Wahyu H 46 M 16. Nyai ratih K 46 I 17. Ahmad Akhirussa’ban 46 D

28

29

18. Rahmawati 44 D 19. Siti Istiannah 44 L 20. Layla 44 E 21. Erna Tihana 42 22. Lisa Umami 42 23. Ahmad Jazuli 42 24. Nurhasanah 42 25. Nu’mansyah 40 26. Fauziah 40 27. Qois Abdul Azis 40 L 28. Mustafa Hari Pratama 38 O 29. Uti Safitri 38 W 30. Junita R 38 E 31. Uswatun Hassanah 34 R 32. Ahmad Sofwat 34 G 33. Imam Buchori 32 R 34. Salman Alfarisi 32 O 35. Chairuddin 32 U 36. Deni Arsita 22 P

Table 1. lists the students from those who get the highest score to those

who got the lowest score. The score given by the writer is to make it easier to

divide those students into three groups; upper, middle and lower groups. 27%

is taken from the highest scores to be UPPER group, and 27% from the lowest

scores to be LOWER group, to do the analysis, the MIDDLE group will be a

side.

B. The Data Analysis

The English summative test consists of 50 multiple-choice items with

the four options. The first step to do the difficulty analysis is listing the

students’ responses of each number of the item test. The list can be seen in

appendices, labeled table.1 and table.2 for each group. So, it can be

concluded the distribution of the correct responses or answers from the upper

and lower group as follows:

30

1. The answers from the Upper Group (10 students)

No one student got all items correctly. It is found that only one student got

35 items correctly; 1 student got 34 items; 2 students got 29 items; 1

student got 27 items; 3 students got 26 items; 1 student got 25 items; and 1

student got 24 items correctly. The responses as follow:

a. 10 students answer correctly numbers; 5, 6, 7, 8, 11, 17, 19, 20, 32, 37

b. 9 students answer correctly numbers; 43

c. 8 students answer correctly numbers; 18, 25, 35, 36, 46

d. 7 students answer correctly numbers; 3, 22, 33

e. 6 students answer correctly numbers; 2, 10, 15, 27, 32

f. 5 students answer correctly numbers; 1, 4, 21, 34, 45

g. 4 students answer correctly numbers; 23, 26, 38, 40, 47

h. 3 students answer correctly numbers; 14, 16, 24, 30, 39, 44, 48

i. 2 students answer correctly numbers; 9, 12, 13, 28, 41, 49

j. 1 student answers correctly numbers; 29, 42, 50

2. The answer from the Lower Group (10 students)

Meanwhile, in lower group, only one student got 20 items correctly; 3

students got 19 items; 2 students got 17 items; 3 students got 16 items and

1 student got 11 items correctly. The responses are as follow:

a 10 students answer correctly number; 17

b 9 students answer correctly number; 8

c 8 students answer correctly numbers; 5, 44

d 7 students answer correctly numbers; 11, 20, 25, 32

e 6 students answer correctly numbers; 19, 43, 46

f 5 students answer correctly numbers; 18, 22, 26, 31, 35

g 4 students answer correctly numbers; 4, 36, 37

h 3 students answer correctly numbers; 6, 7, 10, 15, 16, 27, 30, 34, 41,

50.

i 2 students answer correctly numbers; 21, 42, 45

j 1 students answer correctly numbers; 1, 2, 3, 9, 12, 13, 14, 23, 24, 28,

29, 33, 40, 47, 49

31

k 0 students answer correctly numbers; 38, 39, 48

Then, as noted earlier, the data for upper and lower group are

calculated by using Heaton’s formula to get the difficulty level (FV) of each

item.

nLCorrectUCorrectFV

2+

=

Explanation: FV : Facility value or item of difficulty that we are looking for CU : Sum of the students from the upper group who answer correctly CL : Sum of the students from the lower group who answer correctly 2n : Total number of the students from upper and lower group.

Afterwards, the result of the calculation is interpreted by using

Arikunto’s criteria. If the FV is:

Difficult : 0.00 – 0.30

` Moderate : 0.31 – 0.70

Easy : 0.71 – 1.00

The result of the writer’s calculation of the index of difficulty level of

each item can be seen in the appendices, labelled table.3. Then, the writer

concludes it in the chart form;

32

Chart. 4.1 The result of difficulty level of each item

0

5

10

15

20

25

difficultmoderateeasy

The chart.1 above explains the distribution of the difficulty level

criteria of English summative test each item. The detailed distribution is as

follows ;

a There are 20 items, which categorized difficult. It means they are in range

between 0.00 up to 0.30. Those are numbers; 1, 9, 12, 13, 14, 16, 23, 24,

28, 29, 30, 38, 39, 40, 41, 42, 47, 48, 49, 50.

b There are 21 items, which are categorized medium, because they are in

range between 0.31 up to 0.70. Those are numbers; 2, 3, 4, 6, 7, 10, 15, 18,

21, 22, 26, 27, 31, 33, 34, 35, 36, 37, 44, 45, 46.

c There are 9 items, which are categorized easy, because they are in range

between 0.71 up to 1.00. Those are numbers 5, 8, 11, 17, 19, 20, 25, 32,

43.

From the result of difficulty level (FV) of each item, the writer

calculates the percentage distribution of each items referred to their category

in table form as follow;

Table 4.2

The category of difficulty level of the English summative test items.

NO Range of Difficulty Level Category Frequency Percentage

1. 0.00 – 0.30 Difficult 20 40% 2. 0.31 – 0.70 Moderate 21 42% 3. 0.71 – 1.00 Easy 9 18% TOTAL 50 100%

33

Based on the table result of facility value or difficulty level data of

English Summative test at MTs. Darul ma’arif Jakarta, it can be said that there

are not balancing value for each category. In other word, the easy items take

the lowest portion. However, the spread of the item ideally should be

balanced. It means, forty percent of the items are in medium category, thirty

percent of the items are in easy category, and thirty percent of the items are in

difficult category. Sudjana states that “jumlah soal untuk ketiga

kategori…artinya, soal mudah, sedang dan sukar jumlahnya

seimbang…Perbandingan antara soal mudah-sedang dan sukar bisa dibuat 3-

4-4….perbandingan lain yang termasuk sejenis dengan proporsi….misalny 3-

5-2”1

Afterwards, the writer summarizes the percentage distribution or

proportion for each category in the chart form. Chart. 4.2

Pie-chart of the difficulty level percentage. (English summative test items of Mts. Darul Ma’arif)

DifficultmoderateeasySlice 4

The last step is to count the difficulty level of all items by using this

formula;

P = ∑b

N

P : difficulty level of all items B : difficulty level of each items ∑ : Sigma (total) N : Total numbers of test items.

1 Dr. Nana Sudjana, 2005. Penilaian Hasil Proses Belajar Mengajar. Bandung. PT. Remaja Rosydakarya), p. 135-136.

34

After calculating the difficulty of all items by using that formula, the

writer got the result is 0.451. The detailed format result of the difficulty level

of each item and the difficulty level of all items can be seen in the appendices

labeled table 3. In this table, the result of each item will be in decimal. As

noted earlier, the writer can interpret the result of difficulty level (FV) of all

items according to Arikunto’s criteria.

Therefore, the writer can interpret that the difficulty level of the

English summative test which tested at MTs. Darul Ma’arif is MODERATE

seen from the FV(difficulty level/facility value) of all items. It can be said so

because it has FV of all items 0.45 that is in range between 0.31 up to 0.70.

CHAPTER V

CONCLUSION AND SUGGESTION

A. Conclusion

Based on the data analysis and interpretion in the previous chapter, the

writer would like to conclude that the difficulty level of English Summative

Test items for the second year student of MTs Darul Ma’arif are as follows:

1. There are 21 items regarded as good test items because they are at

moderate level, ranges from 0.31 to 0.70 (42 %).

2. 20 items regarded as difficult items because they are at difficult level,

ranges from 0.00 to 0.30 (40%).

3. And the others 9 items regarded as easy items because they are at easy

level, ranges from 0.71 to 1.00 (18 %).

Overall, from this analysis it can be said that Summative Test of English

students for the second grade students at Darul Ma’arif Jakarta has moderate

level of difficulty level. It means, this test qualifies as good enough test seen

as the difficulty level of all items.

B. Suggestion

Based on the conclusion above, the writer would like to give some

suggestions concerning the item analysis result:

1. For the further research the discriminating power analysis and content

validity of the English summative test items is necessary in order to find

the poor items to be revised.

35

BIBLIOGRAPHY Ahmann, J Stanley., and Glock, Marvin D. 1967. Evaluating Pupil Growth,

Principle of Test and Measurements. Allyn and Bason Inc. Boston. Airasian, Peter W. 2008. Classrom Assessment; Concept and Application 6th

Edition. Mc Graw Hill. New York. Airasian, Peter W. 2008. Classrom Assessment; Concept and Application 5th

Edition. Mc Graw Hill. New York. Arends, Richard I, 1989. Learning To Teach. Mc. GrawHill. New York. Arikunto, Suharsimi, Dr. 1997. Dasar-dasar Evaluasi Pendidikan. Bumi Aksara.

Jakarta. Bahman, Lyle F .1990. Fundamental Considerations in Languange Testing.

Oxford University Press. Toronto. Bloom, Benjamin s. 1971. Handbook on Formative and Summative of Students’

Learning. Longman. London. Brown, Dean, James. 1996. Testing in Language Programs. Prentice Hall

Regents. Upper Saddle River. Brown, H. Douglas. Teaching by Principles An Interactive Approach to Language

Pedagogy. Longman. London. Finocchiaro, Mary., and Sako, Sydney. 1983. Foreign Language Testing, A

Pragmatical Approach. Regents Publishing Company. New York. Gronlund, Norman E. 1968. Constructing Achievement Tests. Prentice Hill.

Englewood Cliffs. Harrison, Andrew. 1983. A Language Testing Handbook. Macmillan Press.

London. Heaton, J.B. 1977. Writing English Language Tests, Longman. London. Henning, Grant. 1987. A Guide to Language Testing. Newbury House Publishers.

Cambridge. Hughes, Arthur. 1991. Testing for Language Teacher. Cambridge University

Press. Cambridge.

Knapp Thomas r. 1970. Statistic for Educational Measurement. Intext Educatioanl Publishers. New York.

Lado, Robert. 1961. Language Testing. Wing Tai Cheung Printing Co Ltd. Hong

Kong. Madsen, Harold S. 1983. Techniques in Testing. Oxford University Press. Oxford. Namara, Tim Mc. 2000. Language Testing. Oxford University Press. New York. Nitko, Anthony J. 1983. Educational Test and Measurement, an Introduction.

Harcourt Brace Jovanovich, Inc. New York. Noll, Victor H. 1965. Educational Measurement 2nd Edition. Houghton Miffli

Company. Boston. Oller, John W. 1979. Language Tests at School. A Pragmatic Approach.

Longman. London. Purwanto, Ngalim. 1991. Prinsip dan Teknik Evaluasi Pengajaran. Remaja Rosda

Karya. Bandung. Rea-Dicksin, Pauline and Kevin Germaine. 1992. Evaluation. Oxford University

Press. New York. Rummel, J Francis, H.H. Remmers, NL. Gage. 1960. A Practical Introduction to

Measurement and Evaluation. Harper and Brother Publishers. New york. Stanley, Julian C. 1964. Measurement In Todays’ School. Prentice Hill Inc.

Englewood Cliffs. Sujana, Nana, Drs. 2001. Penilaian Hasil Proses Belajar Mengajar. Remaja

Rosda Karya. Bandung. Thoha, M. Chobib. 2003. Teknik Evaluasi Pendidikan. PT. Raja Grafindo

Persada. Jakarta. Thorndike, Robert L & Elizabeth Hagen. 1961. Measurement and Evaluation in

Psychology and Education 2nd Edition. John Willey son. Inc., London. Valette, Rebecca M. 1997. Modern Language Testing. Harcourt Brace Jovanovich

Publishers. New York. http://www.nmsa.org/Publications/WebExclusive/Assessment/tabid/1120/Default.aspx

AN ANALYSIS OF THE ENGLISH SUMMATIVE TEST ITEMS IN …

Documents

Transcript of AN ANALYSIS OF THE ENGLISH SUMMATIVE TEST ITEMS IN …