Assessment Notes2014

download Assessment Notes2014

of 7

description

Assessment of Learning

Transcript of Assessment Notes2014

Assessment

Capitol University

Cagayan de Oro City COLLEGE OF EDUCATIONCenter of Excellence for Teacher Education

ASSESSMENT NOTES

Dr. Ma. Jessica P. Campano

Assessment

A process where test results are subject to critical study according to established measurement principles.

Assessment decisions could substantially improved student performance, guide the teachers in enhancing the teaching-learning process and assist policy makers in improving the educational system. Testing and Assessment

TestingAssessment

1. Tests are developed or selected, administered to the class, and scored.1. Information is collected from tests and other measurement instruments (portfolios and performance assessments, rating scales, checklists, and observations).

2. Test results are then used to make decisions about a pupil (to assign a grade, recommend for an advanced program), instruction (repeat, review, move on), curriculum (replace, revise), or other educational factors.2. This information is critically evaluated and integrated with relevant background and contextual information.

3. The integration of critically analyzed test results and other information results in a decision about a pupil.

Types of Educational Decisions Instructional Decisions

Grading Decisions

Diagnostic Decisions

Selection Decisions

Placement Decisions

Counseling and Guidance Decisions

Program or Curriculum Decisions

Administrative Policy DecisionsRoles of Assessment

Summative tries to determine the extent to which the learning objectives for a course are met and why.

Diagnostic determining the gaps in learning or learning processes, hopefully to be able to bridge these gaps.

Formative allows the teacher to redirect and refocus the course of teaching a subject matter.

Placement plays a vital role in determining the appropriate placement of a student both in terms of achievement and aptitude.Approaches in Assessment Assessment of Learning

Assessment for Learning

Assessment as Learning

Modes of Assessment Traditional

Alternative

Authentic

Comparing NRTs and CRTs

DimensionNRTCRT

Average number of students who get an item right 50%80%

Compare a students performance toPerformance of other studentsStandards indicative of mastery

Breadth of content sampledBroad, covers many objectiveNarrow, covers a few objectives

Comprehensiveness of content sampledShallow, usually one or two items per objectiveNarrow, covers a few objective

VariabilityThe more spread of scores, the betterVariability may be minimal

Item constructionItems are chosen to promote variance or spread. One aim is to produce good distracter options.Items are chosen to reflect the criterion behavior

Reporting and interpretingPercentile rank and standard scores usedNumber succeeding or failing or range or acceptable performance used

1.1 Cognitive Targets

Knowledge (remembering) refers to the acquisition of facts, concepts, and theories.

Comprehension (understanding) refers to the same concept as understanding. It is a step higher than mere acquisition of facts and involves a cognition or awareness of the interrelationships of facts and concepts.

Application (applying)- refers to the transfer of knowledge from one field of study to another or from one concept to another concept in the same discipline

Analysis (analyzing) refers to the breaking down of a concept or idea into its components and explaining the concept as a composition of these concepts.

Evaluation refers to valuing and judgment or putting the worth of a concept or principle . Synthesis (creating) refers to the opposite of analysis and entails putting together the components in order to summarize the concept.

Development of Assessment Tools

Planning a Test and Construction of TOS

1. Identifying test objectives

2. Deciding on the type of objective test to be prepared.

3. Preparing a TOS.

4. Constructing the draft of the test items.

5. Try out and validationTable of Specification (TOS)

A map that guides the teacher in constructing a test.

It ensures that there is a balance between items that test lower thinking skills and those which test higher order thinking skills

It conveys to the teacher the number of items to be constructed per objective, their level in the taxonomy, and whether the test represents a balanced picture based on what was taught.Objective Test Format True-False

Make statements clearly True or False.

Avoid specific determiners.

Do not arrange responses in a pattern.

Do not use textbook jargon.

Use relatively short statements and eliminate extraneous materials.

Keep true and false statements approximately the same length, and that there are equal numbers of True and False Items.

Avoid using double-negative statements.

Avoid the following: Verbal clues, absolutes, and complex sentences.

Broad, general statements Terms denoting indefinite degree or absolutes Matching Type Items

Use a homogeneous topic.

Put longer options in the left column.

Provide clear directions.

Use unequal number of entries in the two columns.

The matching lists should be located on one page.Completion Items

Provide clear focus for desired answer.

Avoid grammatical clues.

Put blanks at the end.

Restrict the number of the blanks to one or two.

Blanks for answers should be equal in length.Essay Items

Use several short essay questions rather than a long one.

Provide a clear focus for students questions.

Indicate limitations or scoring criteria to pupils.ITEM ANALYSIS AND VALIDATION

Item Analysis a numerical method for analyzing test items employing student response alternatives or options.

Criteria in determining the desirability and undesirability of an item:

a. Difficulty of an item

b. Discriminating power of an item

c. Measures of attractiveness

Difficulty Index (P)

proportion of the number of students in the upper and lower groups who answered an item correctly. P = UL + LL

2nLevel of Difficulty of an Item

Index Range Difficulty LevelRecommendation

0.00- 0.20Very difficultNA

0.21 0.40DifficultLA

0.41- 0.60Moderately DifficultyVA

0.61 0.80EasyLA

0.81 1.00Very EasyNA

Discrimination Index (D)

Measure of the extent to which a test item discriminates or differentiates between students who do well on the overall test and those who do not.

There are 3 types of Discrimination Indexes:

1. Positive

2. Negative

3. Zero

D = UL-LL n

Discrimination Index Item

EvaluationRecommendation

0.40 & aboveVery good itemVA

0.30 0.39RGI but possibly subject to improvementLA

0.20 - 0.29MI, usually needing and being subject to improvementLA

Below 0.19Poor ItemNA

Criteria for Item Analysis

Option AnalysisPDEvaluation

AVAVAVGI (RET)

AAAGI (RET)

ALAAF (RET)

AALAF( RET)

NALALABF (REV)

NANANAP (REJ)

Properties of Assessment Methods

Validity

Reliability

Fairness

Practicality and Efficiency

Ethics in Assessment

Validity

The degree to which a test or measuring instrument measures what it intends to measure.

soundness (what the test measures and how well it could be applied)

Types of Validity

Content Validity- the extent to which the content or topic of the test is truly representative of the course.

Depends on the relevance of the individuals responses to the behavior area under consideration rather on the apparent relevance of item content.

Commonly used in evaluating achievement test.

Appropriate for the criterion-referenced measure.

Concurrent Validity - the degree to which the test agrees or correlates with a criterion set up as an acceptable measure.

Applicable to test employed for the diagnosis of existing status rather than for the prediction of further outcome.

E.g. validating a test made by the teacher by correlating with a previously proven valid test.Predictive Validity determined by showing how well predictions made from the test are confirmed by evidence gathered at some subsequent timeConstruct Validity the extent to which the test measures a theoretical trait.

Reliability

The extent to which a test is dependable, self-consistent and stable.

It is concerned with the consistency of responses from moment to moment.

A reliable test may not always be valid.

Methods in Testing the Reliability of Good Measuring In

Test-Retest Method the same measuring instrument is administered twice to the same group of students and the correlation coefficient is determined.

Limitations: time interval, environmental conditions

Spearman rank correlation coefficient or Spearman rho is a statatistical tool used to measure the relationship between paired ranks assigned to individual scores on two variables, x and y.

rs = 1 - 6 (D

N3 - N

rs = Spearman rho

6 (D = sum of the squared difference between ranks

N = total number of cases

Steps

Step 1. Rank the scores of respondents from highest to lowest in the first set of administration (X1) and mark this rank as Rx. The highest score receives the rank of 1.

Step 2. Rank the second set of scores (Y) in the same manner as in Step 1 and mark as

Ry.

Step 3. Determine the difference in ranks for every pair of ranks.

Step 4. Square each difference to get D2.

Step 5. Sum the squared difference to find (D .

Step 6. Compute Spearman rho (rs).Frequency Distributions any arrangement of the data that shows the frequency of occurrence of different values of the variable falling within defined ranges of a class interval.

applicable if the total number of cases (N) is 30 or more.

Steps:

1. Find the absolute range by subtracting the lowest score from the highest

score.

R = HS LS

2. Find the class interval by dividing the Range by 10 and 20 in order that the

size of class limits may not be less than 10 and not more than 20. In choosing the class interval, ODD number is preferable.

3. Set up the classes by adding C/2 to the highest score as the upper class limit

of the highest class and subtract C/2 to the highest score as the lower class

limit of the highest class. Set up the real and integral limits.

4. Tally the score.

5. Determine the Cumulative Frequency and the Cumulative Percentage

Frequency distributions.

6. Present the frequency polygon and histogram.

Measures of Central Tendency

A single value that is used to identify the center of the data, it is taught as the typical value in a set of scores.

Mean - Most common measure of center and it is also known as the arithmetic average.

Population mean

Sample mean

Properties of Mean Easy to compute

May not be an actual observation in the data set.

Can be subjected to numerous mathematical computation

Most widely used

Each data contributes to the mean value

Easily affected by extreme values.

Median

A point that divides the scores in a distribution into two equal parts when the scores are arranged according to magnitude.Properties of Median

Not affected by extremes values

Applied to ordinal level of data

The middle most score in the distribution

Most appropriate when there are extreme scores

Mode- Refers to the score or scores that occurred most in the distribution.

Unimodal

Bimodal

multimodal

Properties of Mode

It is the score/s occurring most frequently

Nominal average

It can be used for qualitative and quantitative data

Not affected by extreme values

Measures of Variability

Refers to a single value that is used to describe the spread out of the scores in a distribution, that is above or below the measures of central tendency.

Range, quartile deviation, sd

PAGE 7