Development of Mathematics Diagnostic Tests

download Development of Mathematics Diagnostic Tests

of 24

Transcript of Development of Mathematics Diagnostic Tests

  • 7/27/2019 Development of Mathematics Diagnostic Tests

    1/24

    Development of Mathematics Diagnostic Test for DORSHS

    Second Year High School Students Using Item Response Theory

    Jeremias C. Ceniza

    Donnell C. Cereno

    Abstract

    The study developed a diagnostic test that determined to measure learning in Intermediate

    Algebra among second year high school students of Davao Oriental Regional Science High

    School (DORSHS). The test established validity, discrimination indices, difficulty indices and

    reliability. The construction of the test adapted the Research Methods for Educational Planning.

    Content validity was handled by 3 experts in the content area of secondary mathematics. Primary

    data for item analysis were extracted through 2 test tryouts: one on 59 third year students for

    grammatical checking and compatibility benchmarking while another was on 78 second year

    students for discrimination and difficulty indices and reliability. Analyses were attained through

    the use of an IRT modeling software called ConQuest, spreadsheet and statistical software

    known as SPSS. The study used the IRT two-parameter logistic model. The test was found to be

    valid and highly reliable. Experts guaranteed that the instrument can gauge all content in the

    learning competency of MathematicsII and that nothing was left untargeted. The reliability

    coefficient of the test is 0.84. The range of difficulty of the items was at -1.37 to 1.35 and

    discrimination indices were within 0.23 to 0.51. This standardized test is a tool that identifies the

    zone of proximal development, mastered and least learned content areas in MathematicsII

    among DORSHS second year students.

  • 7/27/2019 Development of Mathematics Diagnostic Tests

    2/24

    The Davao Oriental Regional Science High School (DORSHS) of Region XI offers

    additional subjects in Math, Science and English to its students as early as first year. For the past

    five years of its existence, the DORSHS National Achievement Test (NAT) results in

    Mathematics are much far from the 75% Mean Percentage Score (MPS) target. The researcher

    has become interested in creating a diagnostic test for DORSHS second year students that is

    valid and reliable since the school doesnt have any yet. This instrument w ould be used in

    detecting the strengths and weaknesses of second year students before undergoing a review

    program for NAT. In general, this instrument intends to help DORSHS during NAT review to

    improve results in Mathematics.

    The study aimed to construct a standardized diagnostic test in Mathematics for second

    year students in Davao Oriental Regional Science High School (DORSHS). Specifically, it

    sought to answer the following questions:

    1. Is the diagnostic test valid?2. To what extent do the test items illustrate difficulty?3. To what degree does the test exhibit discrimination level?4. How reliable is the entire test according to IRT model?

  • 7/27/2019 Development of Mathematics Diagnostic Tests

    3/24

    Conceptual Framework

    Figure1.The Conceptual Paradigm of the Study

    Item Response Theory (IRT)

    After the popular (or classical) measurement models for constructing test and interpreting

    test scores have served well its purpose for quite a long time, a new test theory had been

    developed over the past forty years that was conceptually more powerful than classical test

    theory. Based upon items rather than test scores, the new approach was known as item response

    theory (Baker, 2001).

    Diagnostic Test

    Content Areas

    Learning Areas in

    MathematicsII

    Test Objectives

    Test Construction

    Test Development

    Content Validity

    IRT Analysis

    o Item Analysis- Difficulty Indices- Discrimination Indices

    o Test Reliability- KR-20

    Diagnostic Test

    Results

    Students Zoneof Proximal

    Develo ment

    Least Learned

    and Mastered

    Competencies

    Settings of Item

    Thinking Level

  • 7/27/2019 Development of Mathematics Diagnostic Tests

    4/24

    In contrast to the limitations of classical test models, the item response theory has many

    desirable features. These features would include (a) item characteristics that are not group-

    dependent, (b) scores describing examinee proficiency that are not test-dependent, (c) a model

    that is expressed at the item level rather than at the test level, (d) a model that does not require

    strictly parallel tests for assessing reliability, and (e) a model that provides a measure of

    precision for each ability score (Hambleton, Swaminathan & Jane Rogers, 1991). Having this

    advancement in educational and psychological measurement, it is now safe to administer test

    tryouts to different groups of various abilities before giving to the target examinees. Kim, Cohen,

    & Park (1995) illustrated further that IRT allows researchers to conduct rigorous tests of

    measurement equivalence across experimental groups. IRT methods can distinguish item bias

    from true differences on the attribute measured for which the classical test theory (CTT) could

    not.

    Point Biserialsor point biserial correlation coefficient (rpb)is one common metric used

    to assess item quality. The pt bis as it is sometimes called is the correlation between an item

    score (1/0) and the total score on a test. Positive values are desirable and indicate that the item is

    good at differentiating between high ability and low ability examinees (Bontempo, 2009).

    Zone of Proximal Development, or ZPD, as established by the Russian psychologist Lev

    Vygotsky, refers to the distance between what a child can do with assistance and what the child

    can accomplish without assistance (Vygotsky, 1978). In other words, it is the learning of the

    students on a certain competency at 50% mastery level. Under IRT perspective, test items that

    the child can answer correctly at 50% chance are the items within the childs zone of proximal

  • 7/27/2019 Development of Mathematics Diagnostic Tests

    5/24

    development. The child is, at this point, has a 50-50 chance of achieving the learning task

    without teachers assistance.

    Assessment

    The main purpose of assessment is to improve the learning outcomes of students. In

    assessing students learning, systematic process is being followed, specifically; the Diagnostic

    test which is an assessment before a learning program, the Formative Test which is an

    assessment during a learning program and the Summative Test which refers to assessment after a

    learning program (Davies, Arbuckle & Bonneau, 2005).

    Conducting a Diagnostic Test

    The main purpose of diagnostic test is to pinpoint specific strengths and weaknesses of

    the learner in accordance to his or her grade level requirements. They are scored using true test

    score criteria, this means that they are not averaged or normed (Educational Diagnostic

    Prescriptive Services, 2009). Izard, J. (2005) says that scores in a criterion referenced test are

    interpreted as an individual performance of each student in the group on what he can do or not do

    rather than comparing the results with other groups of students.

    Stages in Test Construction

    Izard (2005) in his module on Quantitative Research Methods in Educational Planning

    laid an overview on test construction the test construction steps. His module describes the

    different stages of developing a test that could be used by teachers in the classroom and even

  • 7/27/2019 Development of Mathematics Diagnostic Tests

    6/24

    nationwide as achievement test. The methods revealed important details on how the test will be

    constructed in such a way that it can accumulate results that are valid, fair and reliable.

    Validity, Reliability and Usability

    Whatever the test is, it should apply the qualities of a good measuring instrument. The

    qualities of a good measuring instrument are, validity, reliability, and usability (Calmorin, 2004).

    Validity is the extent to which a test measures what it claims to measure. One type of

    validity is the content validity. It refers to the extent to which the test reflects the content

    represented in curriculum statements and the skills implied by that content

    Reliabilityis another important characteristic of a good test. This refers to the consistency

    of a measure. A test is considered reliable if we get the same result repeatedly that means, each

    time the test is administered to a subject, the results should be approximately the same (Cherry,

    2005). A reliability coefficient within the range of 0.811.0 indicates high reliability; 0.610.80

    signifies moderate reliability; 0.41 to 0.60 fair reliability; 0.10 to 0.40 slight reliability and

    less than 0.10virtually no reliability (Shrout, 1998).

  • 7/27/2019 Development of Mathematics Diagnostic Tests

    7/24

    Methods

    Research Locale and Duration

    Table 1.Timescale and Resource Requirement for the Test Development

    TIMESCALE AND RESOURCES FOR TEST DEVELOPMENT

    Stages Time Resources

    I. Planning the Test

    a. Developing Test ItemObjectives

    1 week Basic Education Curriculum (BEC) list ofcompetencies

    b.Constructing Test Grid orTable of Specification

    1 week DepEd Memo on the Official number ofSchool Days

    II. Preparing the Test

    a. Content Analysis 1 week Learning Competencies, textbooksb. Item Writing and

    Preparation of Answers Key

    1 month Guidelines on test construction,

    Mathematics TextbooksIII. Review and Testing Process

    a. Item Review I 2 weeks Test construction team, Research adviser

    b.Planning of Item Scoring 2 days Answers key, Spreadsheet softwarec.Production of Trial Tests I 1 day Word-processingd.Trial Testing I 1 day DORSHS 3rd Year studentse.Item Review II 3 days Researcherf.Production of Trial Tests II 1 day Word-processingg.Trial Testing II 1 day DORSHS 2nd Year studentsh. Item Analysis 2 weeks IRT software, Encoder

    IV. Assembly of Final Test

    a. Revising Test Items 5 days IRT concepts and interpretationsb.Finalization of the Test 2 days Data Findings and Resultsc. IRT Model Fitting 1 day IRTConQuestModeling Softwared. Identification of the Zone of

    Proximal Development1 day

    IRTConQuestModeling

    Software,Learning Competenciese.Test Difficulty Settings 1 day IRT Results, Test Grid

    Table 1 is the timescale and resource requirement for test development patterned

    from the Quantitative Research Methods for Educational PlanningModule 6 (Izard, 2005). The

    study was conducted in Davao Oriental Regional Science High School (DORSHS) campus at

    Mati City during the month of February of School Year 2010-2011.

  • 7/27/2019 Development of Mathematics Diagnostic Tests

    8/24

    Statistical Treatment

    This study used IRT model in describing the data. Item Analysis and test reliability (as

    explained by Brannick, 2006; Baker, 2001; Hambleton, Swaminathan, & Jane Rogers, 1991)

    were computed through a computer software model and interpreted as discussed by the

    following:

    Item Analysis. Under the IRT Two-Parameter Model was the last part of the item review

    that verified how each item performed in the final test tryout. The researcher together with an

    IRT expert did the following:

    i. Processing test responses through IRT model

    In determining the difficulty and discrimination indices of test items, a computer

    software known as ConQuest: Generalised Item ResponseModelling Software was

    used to construct the Item Characteristic Curve (ICC) of each test item. In here, the

    responses of all test takers for each item were recorded in a spreadsheet, converted

    into text format through SPSS, and then were run through the ConQuestsoftware. The

    software then generated the summary of statistical results for each item and the

    corresponding ICC as shown on Figure 2.

    Figure 2. Sample Item Characteristic Curve (ICC)

  • 7/27/2019 Development of Mathematics Diagnostic Tests

    9/24

    The ICC, as shown in Figure 2, was the basis of item estimations for the two

    parameters: item difficulty and item discrimination. The horizontal axis ( ) is the

    latent ability of the examinees while the vertical axis P( ) is the probability of the

    examinees to choose the correct answer. The ability ( ) whereP( ) is equal to 0.5 is

    the value of the difficulty parameter. Also, the value estimated describing the

    steepness of the curve is the value of the item discrimination parameter.

    ii. Difficulty and Item Discrimination

    With the item difficulty and item discrimination parameter values given by the

    ICC, each item of the test was interpreted as the following:

    Labels for item discrimination parameter (a) values

    Verbal label Range of values

    Negative less than zero Zero 0 Low 0.010.20 Moderate 0.210.60 High above 0.60

    The discrimination parameters were sometimes called slope parameters. A jumpy

    curve means that the expected test score responded to true ability unevenly. Flat

    curves means that the expected score is not very sensitive to differences in true

    ability. A steeper S-curve (like in Figure 2) means that the expected score is more

    sensitive to differences in ability. In other words, the test discriminates or

    distinguishes better between persons of different ability, which explains the term

    discrimination parameter. Thus, discrimination parameter describes how well an

  • 7/27/2019 Development of Mathematics Diagnostic Tests

    10/24

    item can differentiate between examinees having abilities below the item location and

    those having abilities above the item location.

    Labels for item difficulty parameter (b) values

    Verbal Label Range of values

    Very easy less than2 Easy 0.50 to2.00 Average 0.49 to 0.49 Difficult 0.50 to 2.00 Very difficult greater than 2.00

    The item difficulty parameter (b) value tells us how easy or how difficult an

    item is. Under item response theory, an items difficulty is a point on the ability scale

    where the probability of correct response is 0.5. One can find the value of b on the

    common ability axis at the point for which the predicted probability equals 0.5.

    Test reliability. In classical test theory, there will be another test for reliability. In IRT,

    there is local reliability, that is, an amount of information at each point of the underlying

    continuum. With IRT model, each item of the test contained information. For the set of

    parameters associated with each term in a model, ConQuest computed a separation reliability

    index. This reliability was an index of the equality of the parameters. In the case of dichotomous

    data like the test conducted, the Coefficient Alpha given by ConQuest is equal to KR-20 (Wu,

    Adams, et. al., 2007).

    Revising Test Items. The selection of the items suited for inclusion to the final output of

    the test was determined through the verbal interpretation of each item. This is illustrated by

    Table 2 below.

  • 7/27/2019 Development of Mathematics Diagnostic Tests

    11/24

    Table 2.Decision Table for Difficulty and Discrimination Indices

    Difficulty

    Level

    Discrimination

    LevelDecision

    Easy

    Low Revise

    Moderate Retain

    High Retain

    Average

    Low Revise

    Moderate Retain

    High Retain

    Difficult

    Low Revise

    Moderate Retain

    High Retain

    An item was rejected as any of the following was observed: (a) its discrimination

    parameter value is negative or zero, and/or (b) its difficulty parameter value is very easy or very

    difficult.

    To preserve the validity of the entire test and to maintain the total number of items, all

    items rejected were automatically replaced. The process of revision and replacement were

    subjected to IRT item analyses yet still accord to the Test Grid.

    Revised and replacement items were re-administered to the target examinees, the second

    year students, and its test responses were re-run through IRT software for final verification of

    whether these items are retained or not. Retesting of selected items was done as necessary in

    preparation for the finalization of the test.

    Evaluators of the Constructed Test

    A group of experts and experienced persons in the field of test construction were in-

    charged on the development of the test. This group served as consultants in connection to content

  • 7/27/2019 Development of Mathematics Diagnostic Tests

    12/24

    validity of the test. Another group, composed of students, took the test for item reviews and test

    reliability. These two teams were called as test evaluators.

    Test Construction Team. The first set of evaluators was the group of persons who took

    charge in examining the content validity of the test. This group was made up of experts on test

    development or individuals with experience on testing or trainings related to. More importantly,

    these persons were familiar with the content areas in secondary mathematics and at least a

    graduate of Master of Arts in Mathematics or Master of Science Teaching in Mathematics.

    Try Out Test Examinees. The second set of evaluators was the group of students in

    Davao Oriental Regional Science High School (DORSHS). Specifically, the members of the

    team involved all third year (first group) and all second year (second group) students of the said

    school. The administration of the test to the first group was vital in making the test structurally

    ready while the second group responses was used to distinguish the difficulty and discrimination

    indices of test items and the reliability coefficient of the over-all test.

    Validation of the Diagnostic Test

    Validation of the diagnostic test did not utilize a statistical analysis. It relied on matching

    the test items from the objectives and presenting the whole test to the group of experts in the

    content areas of secondary mathematics for item review. The team guaranteed that the instrument

    had strong content validity in which each item represented at least one topic actually being

    investigated to students, rather than asking unrelated questions.

  • 7/27/2019 Development of Mathematics Diagnostic Tests

    13/24

    Test Administration

    The administration of the test tryouts was done after securing permission from the

    concerned head of offices. The conduct of the test was officially approved by the OIC - Schools

    Division Superintendent. Likewise, the principal of DORSHS also posed no objection to proceed

    with the testing process. As a sort of ethical considerations, examinees involved in this study

    who were minors were dealt with proper information about the research.

    The first tryout was administrated to 59 third year students of DORSHS. The purpose of

    administering the test was to determine the structural readiness of the test and assured its

    compatibility to examinees thinking level. The next tryout was administered to 78 second year

    high school students of DORSHS for interpretation of test results. Specifically, the test was

    conducted for item analysis purposes.

    The researcher introduced to the test takers certain guidelines in taking the test. In

    answering the test, examinees were told to use the answer sheets as provided. They were

    instructed to shade on the answer sheet the letter that corresponds to the best answer for every

    test question. They were also been told to mark X on the previous answer if they decide to

    change their answer and that they must only have one answer for each item, otherwise, it will be

    marked wrong.

  • 7/27/2019 Development of Mathematics Diagnostic Tests

    14/24

    Results and Discussion

    Test Validation

    Validity of the test was done through certain revisions of some items as suggested by the

    test construction team in line with the following test attributes: grammar structuring, proper

    usage of punctuation, principles on item construction, setting of item difficulty, typographical

    precision and more importantly, on content criterion.

    Difficulty and Discrimination Indices of Test Items

    Table 3 shows the result of item analysis from the final tryout using IRT through a

    computer program called ConQuest: Generalized Item Response Modeling Software.

    Table 3.Difficulty and Discrimination Indices Result of the Final Tryout

    Item

    Number

    Difficulty

    Index

    Difficulty

    Level

    Discriminatio

    n Index

    Discrimination

    LevelAction

    1 -1.08 Easy -0.01 Negative Reject2 -1.08 Easy 0.34 Moderate Retain

    3 -1.41 Easy 0.20 Low Revise

    4 0.58 Difficult 0.08 Low Revise

    5 -0.12 Average 0.46 Moderate Retain

    6 -0.18 Average 0.35 Moderate Retain

    7 0.58 Easy 0.30 Moderate Retain

    8 -1.29 Easy 0.34 Moderate Retain

    9 0.57 Difficult 0.23 Moderate Retain

    10 -0.58 Easy 0.27 Moderate Retain

    11 -0.95 Easy 0.36 Moderate Retain

    12 0.52 Difficult 0.13 Low Revise13 -2.53 Very Easy 0.14 Low Revise

    14 -0.24 Average 0.46 Moderate Retain

    15 0.66 Difficult 0.46 Moderate Retain

    16 1.11 Difficult 0.24 Moderate Retain

    17 0.83 Difficult 0.32 Moderate Retain

    18 0.51 Difficult 0.33 Moderate Retain

  • 7/27/2019 Development of Mathematics Diagnostic Tests

    15/24

    Item

    Number

    Difficulty

    Index

    Difficulty

    Level

    Discriminatio

    n Index

    Discrimination

    LevelAction

    19 -0.29 Average 0.33 Moderate Retain

    20 -0.29 Average 0.34 Moderate Retain

    21 0.27 Average 0.34 Moderate Retain

    22 0.33 Average 0.50 Moderate Retain23 -0.18 Average 0.27 Moderate Retain

    24 0.45 Average 0.30 Moderate Retain

    25 1.35 Difficult 0.33 Moderate Retain

    26 -0.70 Easy 0.34 Moderate Retain

    27 -0.76 Easy 0.49 Moderate Retain

    28 0.16 Average 0.23 Moderate Retain

    29 0.57 Difficult 0.34 Moderate Retain

    30 -0.18 Average 0.30 Moderate Retain

    31 -0.82 Easy 0.41 Moderate Retain

    32 -1.37 Easy 0.37 Moderate Retain

    33 -0.12 Average 0.47 Moderate Retain34 0.33 Average 0.17 Low Revise

    35 0.10 Average 0.39 Moderate Retain

    36 -0.70 Easy 0.28 Moderate Retain

    37 -0.18 Average 0.48 Moderate Retain

    38 0.45 Average 0.17 Low Revise

    39 -0.95 Easy 0.34 Moderate Retain

    40 -2.12 Very Easy 0.14 Moderate Reject

    41 -1.37 Easy 0.14 Low Revise

    42 -0.01 Average 0.35 Moderate Retain

    43 -1.22 Easy 0.35 Moderate Retain

    44 0.44 Average 0.33 Moderate Retain

    45 -0.46 Average 0.38 Moderate Retain

    46 0.57 Difficult 0.44 Moderate Retain

    47 1.27 Difficult 0.38 Moderate Retain

    48 0.51 Difficult 0.13 Low Revise

    49 -0.20 Average 0.31 Moderate Retain

    50 -0.07 Average 0.41 Moderate Retain

    51 1.19 Difficult 0.01 Low Revise

    52 1.27 Difficult -0.02 Negative Reject

    53 2.19 Very Difficult 0.30 Moderate Reject

    54 0.45 Average 0.28 Moderate Retain55 0.45 Average 0.51 Moderate Retain

    56 -0.76 Easy 0.39 Moderate Retain

    57 0.88 Difficult 0.46 Moderate Retain

    58 -0.12 Average 0.23 Moderate Retain

    59 2.90 Very Difficult 0.42 Moderate Reject

    60 -0.18 Average 0.37 Moderate Retain

  • 7/27/2019 Development of Mathematics Diagnostic Tests

    16/24

    IRT discrimination and difficulty parameters for each item suggested that item numbers 1

    (negative discrimination), 13 (very easy), 40 (very easy), 52 (negative discrimination), 53 (very

    difficult) and 59 (very difficult) be rejected. The item numbers subjected for revision according

    to Table 3 were numbers 3, 4, 12, 34, 38, 41, 48 and 51.

    Revisions and Replacements of Weak Items

    Data generated by Conquest facilitated the revisions or replacements of some items.

    Items with negative or low discrimination like numbers 1, 3, 4, 12, 34, 38, 41, 48 and 52 were

    treated with the aid of an IRT attribute called point biserial. The following table shows the IRT

    Generalized Item Analysis Result of item number 1. Other items subject for revisions had similar

    attributes with this table.

    Table 4.IRT Generalized Item Analysis Result for Item No. 1

    item:1 (1)

    Cases for this item 78 Discrimination -0.01

    Item Threshold(s): -1.08 Weighted MNSQ 1.14 Item Delta(s): -1.08

    -----------------------------------------------------------------------Label Score Count % of tot Pt Bis t (p) PV1Avg:1 PV1 SD:1

    -----------------------------------------------------------------------

    1 1.00 56 71.79 -0.01 -0.06(.952) -0.00 0.72

    2 0.00 2 2.56 -0.02 -0.21(.833) -0.31 0.05

    3 0.00 2 2.56 -0.00 -0.00(.000) 0.11 1.29

    4 0.00 18 23.08 0.02 0.14(.885) 0.02 0.64

    =======================================================================

    As observed in Table 4, item number 1 same as the other weak items had a low or

    negative discrimination. It was noticed that the point biserial of correct answer is negative, or if

    not, very close to zero while other wrong options become positive where ideally it should be

    negative. This simply means that the wrong options were attractive as correct answer to

    examinees with high ability. These options were reviewed then replaced for improvement and

  • 7/27/2019 Development of Mathematics Diagnostic Tests

    17/24

    ease of the test takers. Other items were also improved by restructuring the manner of

    questioning to lessen confusions in answering.

    IRT Generalized Item Analysis Result also revealed that the Delta values for item

    numbers 40, 53 and 59 were -2.12, 2.19 and 2.90, respectively. It has a very easy and very

    difficult difficulty level. Thus, the level of questioning on these items was rephrased to fit the

    students level.

    Finalization of the Test

    Table 5 shows the item difficulty and discrimination results after retesting of the revised

    and replacement items.

    Table 5. Retesting Results on Difficulty and Discrimination Indices

    Item

    Number

    Difficulty

    Index

    Difficulty

    Level

    Discrimination

    Index

    Discrimination

    Level

    Action

    1 -0.76 Easy 0.39 Moderate Retain

    3 1.27 Difficult 0.38 Moderate Retain

    4 1.35 Difficult 0.33 Moderate Retain

    12 0.51 Difficult 0.33 Moderate Retain

    13 -1.37 Easy 0.37 Moderate Retain

    34 -0.18 Average 0.27 Moderate Retain

    38 1.27 Difficult 0.38 Moderate Retain

    40 -0.70 Easy 0.34 Moderate Retain

    41 0.45 Average 0.30 Moderate Retain

    48 0.57 Difficult 0.23 Moderate Retain

    51 0.83 Difficult 0.32 Moderate Retain

    52 0.88 Difficult 0.46 Moderate Retain

    53 0.57 Difficult 0.44 Moderate Retain

    59 0.57 Difficult 0.23 Moderate Retain

    Test Reliability

    As observed in the summary results by Table 6 below, the Coefficient Alpha is 0.84. This

    is the Kuder-Richardson Formula 20 (KR-20) reliability coefficient (Wu et. al., 2007).

  • 7/27/2019 Development of Mathematics Diagnostic Tests

    18/24

    Table 6. Summary Statistics from Item Analysis Results-----------------------------------------------------------------------

    The following results are scaled to assume that a single response was provided for each item.

    N 78

    Mean 28.77

    Standard Deviation 8.56

    Variance 73.30

    Skewness 0.93

    Kurtosis 1.05

    Coefficient Alpha 0.84

    =======================================================================

    Zone of Proximal Development

    The following Table translates the items into its corresponding learning areas in

    Mathematics-II that the students have shown 50% mastery which is, in fact, known as the Zone

    of Proximal Development.

  • 7/27/2019 Development of Mathematics Diagnostic Tests

    19/24

    Table 7. Students Achievement on Different Learning Areas in Mathematics-II

    Students

    Group

    Mathematics Learning Area within the

    Groups Zone of Proximal Development

    Upper Top

    Group (UTG)

    - use systems of linear equations to solve problems- represent the solution set of a system of linear inequalities by graphing - simplify complex rational algebraic expressions- solve problems involving expressions with exponents - describe an arithmetic sequence by giving the formula for the nth term- solve problems involving geometric mean

    Upper Middle

    Group (UMG)

    - represent the solution set of a system of linear inequalities by graphing - find the solution set of a quadratic equation- identify rational algebraic expression-perform operations on rational algebraic expressions- solve rational equations and check for extraneous solutions- solve problems involving rational algebraic expressions- solve equations involving variations- demonstrate understanding of expressions- rewrite algebraic expressions w/ zero and negative exponents- name two rational numbers where n x lies in between- simplify expression containing rational exponents using laws of exponents- solve problems involving radical equations- solve problems involving arithmetic means- solve problems involving arithmetic sequences- describe a geometric sequence given the first few terms- derive the formula for the sum of the terms of a geometric sequence- derive the formula for an infinite geometric series- solve problems involving geometric sequence- find the solution set of a quadratic equation

    Lower Middle

    Group (LMG)

    -perform operations on radical expressions- define a system of linear equations in two variables- translate certain situations in real life to linear inequalities - draw the graph of a linear inequality in two variables - solve rational equations which can be reduced to quadratic equations- use quadratic equations to solve problems- simplify rational algebraic expressions- identify variation relationships of real life- represent variation relationships as equations- evaluate numerical expressions involving integral exponents- identify expressions which are perfect squares or perfect cubes- find the square root or cube root of expressions- rewrite expressions with rational exponents as radical expressions- simplify the radical expression- solve radical equations- list the next few terms of a sequence given several consecutive terms- derive a mathematical rule for generating the sequence- given few terms of an arithmetic sequence, find the common difference - given two terms of an arithmetic sequence, find the specified nth term

  • 7/27/2019 Development of Mathematics Diagnostic Tests

    20/24

    (Continuation)

    Lower MiddleGroup (LMG)

    - derive the formula for the sum of the n terms of an arithmetic sequence - describe a geometric sequence given the first few terms- find the sum of the terms of a geometric sequence- define a system of linear inequalities- describe an arithmetic sequence by giving the first few terms

    Lower Bottom

    Group (LBG)

    - solve systems of linear equations in two variables- distinguish a quadratic equation from a linear equation- translate verbal expressions into rational algebraic expressions- rationalize a fraction whose denominator contains square roots- define the sum of an arithmetic sequence

    Table 7 was a mapping of students score to their zone of proximal development (ZPD) or

    the learning competencies that they had 50% mastery level. Thus, the competencies above their

    ZPD were the least learned learning areas of the group while those below their ZPD were the

    competencies the group mastered.

    Settings on Thinking Hierarchy

    Table 8 shows the results of students thinking level measured by the item as set by

    intuition and by IRT calculation. It was revealed in the table that out of 60, only 23 items had

    equal setting of difficulty level between the researchers intuition and test responses result.

  • 7/27/2019 Development of Mathematics Diagnostic Tests

    21/24

    Table 8. Item Difficulty Level (Intuition versus IRT perspective)

    Item

    Number

    Level of

    Difficulty by

    Researchers

    Intuition

    Level of

    Difficulty by

    Test

    Responses

    Item

    Number

    Level of

    Difficulty by

    Researchers

    Intuition

    Level of

    Difficulty by

    Test

    Responses1 1

    stlevel 1

    stlevel 31 3

    rlevel 1

    stlevel

    2 2n

    level 1s

    level 32 2n

    level 1s

    level

    3 3r

    level 1st

    level 33 2n

    level 2n

    level

    4 2n

    level 3r

    level 34 3r

    level 2n

    level

    5 1st

    level 2n

    level 35 3r

    level 2n

    level

    6 2n

    level 2n

    level 36 2n

    level 1st

    level

    7 3r

    level 3r

    level 37 3r

    level 2n

    level

    8 1st

    level 1st

    level 38 1st

    level 2n

    level

    9 2n

    level 3r

    level 39 2n

    level 1st

    level

    10 2n

    level 1st

    level 40 2n

    level 1st

    level

    11 3r level 1st level 41 3r level 1st level12 1

    stlevel 3

    rlevel 42 3

    rlevel 2

    nlevel

    13 1st

    level 1st

    level 43 3r

    level 1st

    level

    14 2n

    level 2n

    level 44 3r

    level 2n

    level

    15 2n

    level 3r

    level 45 2n

    level 2n

    level

    16 3r

    level 3r

    level 46 2n

    level 3r

    level

    17 3r

    level 3r

    level 47 3r

    level 3r

    level

    18 3r

    level 3r

    level 48 3r

    level 3r

    level

    19 1st

    level 2n

    level 49 3r

    level 2n

    level

    20 2n

    level 2n

    level 50 1st

    level 2n

    level

    21 3r

    level 2n

    level 51 3r

    level 3r

    level

    22 1st level 2n level 52 3r level 3r level

    23 2n

    level 2n

    level 53 2n

    level 3r

    level

    24 2n

    level 2n

    level 54 2n

    level 2n

    level

    25 3r

    level 3r

    level 55 2n

    level 2n

    level

    26 1st

    level 1st

    level 56 2n

    level 1st

    level

    27 2n

    level 1st

    level 57 2n

    level 3r

    level

    28 2n

    level 2n

    level 58 1s

    level 2n

    level

    29 2n

    level 3r

    level 59 2n

    level 3r

    level

    30 2n

    level 2n

    level 60 2n

    level 2n

    level

  • 7/27/2019 Development of Mathematics Diagnostic Tests

    22/24

    Conclusions and Recommendations

    Conclusion

    In view of the findings of this study, the following conclusions were drawn:

    1.The results of this study reflect that the diagnostic test being developed is valid. It is aninstrument that can measure the desired trait of second year students in MathematicsII.

    2.The final output of the test showed that the over-all test difficulty were within the rangeof -1.37 to 1.35. The diagnostic test therefore has an average level of difficulty.

    3.The discrimination indices were within the range of 0.23 to 0.51. Thus, the test ismoderately discriminating instrument.

    4.With the KR-20 coefficient at 0.84, the developed diagnostic test, as introduced byShrout (1998), indicates high reliability.

    5.Based from the previous items, the diagnostic test developed in this study is valid, highlyreliable and fair. Hence, this test is standardized and can now be used in assessing

    Mathematics learning of second year students in DORSHS.

    6.The test had identified the least learned, zone of proximal development and masteredcompetencies of DORSHS second year high school students in Mathematics-II.

    7.The study revealed the considerable difference of intuitive item difficulty setting fromactual item difficulty results of students.

    Recommendations

    Based on the findings and conclusions of this study, the following are the general

    recommendations:

  • 7/27/2019 Development of Mathematics Diagnostic Tests

    23/24

    1. The standardized diagnostic test can already be used in assessing Mathematics learningof second year students in DORSHS prior from any Achievement Test review program.

    2.Other researches like determining the coherence of the test to the second year NationalAchievement Test (NAT) can be made.

    3.A further validation of the test using IRT is highly encouraged to second year students ofother schools or to another batch of second year students in DORSHS.

    4.The test and its findings can also be utilized for any comparative study on item analysisbetween Classical Test Method and IRT.

    5.The result of this study can be utilized for any possible research on students assessment.

  • 7/27/2019 Development of Mathematics Diagnostic Tests

    24/24

    References

    Baker, F. (2001). The Basics of Item Response Theory (2nd

    ed.). United States of America: ERIC

    Clearinghouse on Assessment and Evaluation.

    Bontempo, Brian D., PhD (2009). Measurement Art. The Point-Biserial Correlation Coefficient.Retrieved June 30, 2009, from http://www.mountainmeasurement.com/blog/?p=148.

    Brannick, M. (2006). Concepts from IRT that Move Beyond Classical Test Theory. Multiple

    Regression and Research Methods. Retrieved November 22, 2009, from

    http://luna.cas.usf.edu/~mbrannic/files/pmet/irt.htm.

    Calmorin, L. P. (2004).Educational Research Measurement and Evaluation (3rd

    ed.). Manila,

    Philippines: National Book Store, Inc.

    Cherry, K. (2009).Reliability - What Is Reliability. Retrived November 20, 2009 from

    http://psychology.about.com/od/researchmethods/

    Davies, A., Arbuckle, M., Bonneau, D.(2005).Assessment For Learning: Planning for

    Professional Development. Retrieved October 14, 2009 fromhttp://electronicportfolios.org/afl/Assessment4learning.pdf.

    Hambleton, R., Swaminathan, H., Jane Rogers, H. (1991).Fundamentals of Item ResponseTheory. United States of America: SAGE Publications, Inc.

    Izard, J. (2005). Quantitative Research Methods in Educational Planning. Overview of TestConstruction. Paris, France. International Institute for Educational Planning/UNESCO.

    Kim, S. H., Cohen, A. S., & Park, T. H. (1995). Detection of Differential Item Functioning inMultiple Groups.Journal of Educational Measurement, 32: 261 - 276.

    Educational Diagnostic Prescriptive Services (2009).Educational Diagnostic Prescriptive.

    Retrived October 20, 2009 from

    http://homeschoolcreations.blogspot.com/2009/09/educational.

    Shrout, PE (1998). Measurement Reliability and Agreement in Psychiatry. Statistics Methods in

    Medical Results. United States of America: SAGE Publications, Inc.

    Vygotsky, L.S. (1978).Mind and society: The development of higher psychological processes.Cambridge, MA: Harvard University Press.