Assessment Notes2014
-
Upload
wyzty-delle -
Category
Documents
-
view
220 -
download
0
description
Transcript of Assessment Notes2014
Assessment
Capitol University
Cagayan de Oro City COLLEGE OF EDUCATIONCenter of Excellence for Teacher Education
ASSESSMENT NOTES
Dr. Ma. Jessica P. Campano
Assessment
A process where test results are subject to critical study according to established measurement principles.
Assessment decisions could substantially improved student performance, guide the teachers in enhancing the teaching-learning process and assist policy makers in improving the educational system. Testing and Assessment
TestingAssessment
1. Tests are developed or selected, administered to the class, and scored.1. Information is collected from tests and other measurement instruments (portfolios and performance assessments, rating scales, checklists, and observations).
2. Test results are then used to make decisions about a pupil (to assign a grade, recommend for an advanced program), instruction (repeat, review, move on), curriculum (replace, revise), or other educational factors.2. This information is critically evaluated and integrated with relevant background and contextual information.
3. The integration of critically analyzed test results and other information results in a decision about a pupil.
Types of Educational Decisions Instructional Decisions
Grading Decisions
Diagnostic Decisions
Selection Decisions
Placement Decisions
Counseling and Guidance Decisions
Program or Curriculum Decisions
Administrative Policy DecisionsRoles of Assessment
Summative tries to determine the extent to which the learning objectives for a course are met and why.
Diagnostic determining the gaps in learning or learning processes, hopefully to be able to bridge these gaps.
Formative allows the teacher to redirect and refocus the course of teaching a subject matter.
Placement plays a vital role in determining the appropriate placement of a student both in terms of achievement and aptitude.Approaches in Assessment Assessment of Learning
Assessment for Learning
Assessment as Learning
Modes of Assessment Traditional
Alternative
Authentic
Comparing NRTs and CRTs
DimensionNRTCRT
Average number of students who get an item right 50%80%
Compare a students performance toPerformance of other studentsStandards indicative of mastery
Breadth of content sampledBroad, covers many objectiveNarrow, covers a few objectives
Comprehensiveness of content sampledShallow, usually one or two items per objectiveNarrow, covers a few objective
VariabilityThe more spread of scores, the betterVariability may be minimal
Item constructionItems are chosen to promote variance or spread. One aim is to produce good distracter options.Items are chosen to reflect the criterion behavior
Reporting and interpretingPercentile rank and standard scores usedNumber succeeding or failing or range or acceptable performance used
1.1 Cognitive Targets
Knowledge (remembering) refers to the acquisition of facts, concepts, and theories.
Comprehension (understanding) refers to the same concept as understanding. It is a step higher than mere acquisition of facts and involves a cognition or awareness of the interrelationships of facts and concepts.
Application (applying)- refers to the transfer of knowledge from one field of study to another or from one concept to another concept in the same discipline
Analysis (analyzing) refers to the breaking down of a concept or idea into its components and explaining the concept as a composition of these concepts.
Evaluation refers to valuing and judgment or putting the worth of a concept or principle . Synthesis (creating) refers to the opposite of analysis and entails putting together the components in order to summarize the concept.
Development of Assessment Tools
Planning a Test and Construction of TOS
1. Identifying test objectives
2. Deciding on the type of objective test to be prepared.
3. Preparing a TOS.
4. Constructing the draft of the test items.
5. Try out and validationTable of Specification (TOS)
A map that guides the teacher in constructing a test.
It ensures that there is a balance between items that test lower thinking skills and those which test higher order thinking skills
It conveys to the teacher the number of items to be constructed per objective, their level in the taxonomy, and whether the test represents a balanced picture based on what was taught.Objective Test Format True-False
Make statements clearly True or False.
Avoid specific determiners.
Do not arrange responses in a pattern.
Do not use textbook jargon.
Use relatively short statements and eliminate extraneous materials.
Keep true and false statements approximately the same length, and that there are equal numbers of True and False Items.
Avoid using double-negative statements.
Avoid the following: Verbal clues, absolutes, and complex sentences.
Broad, general statements Terms denoting indefinite degree or absolutes Matching Type Items
Use a homogeneous topic.
Put longer options in the left column.
Provide clear directions.
Use unequal number of entries in the two columns.
The matching lists should be located on one page.Completion Items
Provide clear focus for desired answer.
Avoid grammatical clues.
Put blanks at the end.
Restrict the number of the blanks to one or two.
Blanks for answers should be equal in length.Essay Items
Use several short essay questions rather than a long one.
Provide a clear focus for students questions.
Indicate limitations or scoring criteria to pupils.ITEM ANALYSIS AND VALIDATION
Item Analysis a numerical method for analyzing test items employing student response alternatives or options.
Criteria in determining the desirability and undesirability of an item:
a. Difficulty of an item
b. Discriminating power of an item
c. Measures of attractiveness
Difficulty Index (P)
proportion of the number of students in the upper and lower groups who answered an item correctly. P = UL + LL
2nLevel of Difficulty of an Item
Index Range Difficulty LevelRecommendation
0.00- 0.20Very difficultNA
0.21 0.40DifficultLA
0.41- 0.60Moderately DifficultyVA
0.61 0.80EasyLA
0.81 1.00Very EasyNA
Discrimination Index (D)
Measure of the extent to which a test item discriminates or differentiates between students who do well on the overall test and those who do not.
There are 3 types of Discrimination Indexes:
1. Positive
2. Negative
3. Zero
D = UL-LL n
Discrimination Index Item
EvaluationRecommendation
0.40 & aboveVery good itemVA
0.30 0.39RGI but possibly subject to improvementLA
0.20 - 0.29MI, usually needing and being subject to improvementLA
Below 0.19Poor ItemNA
Criteria for Item Analysis
Option AnalysisPDEvaluation
AVAVAVGI (RET)
AAAGI (RET)
ALAAF (RET)
AALAF( RET)
NALALABF (REV)
NANANAP (REJ)
Properties of Assessment Methods
Validity
Reliability
Fairness
Practicality and Efficiency
Ethics in Assessment
Validity
The degree to which a test or measuring instrument measures what it intends to measure.
soundness (what the test measures and how well it could be applied)
Types of Validity
Content Validity- the extent to which the content or topic of the test is truly representative of the course.
Depends on the relevance of the individuals responses to the behavior area under consideration rather on the apparent relevance of item content.
Commonly used in evaluating achievement test.
Appropriate for the criterion-referenced measure.
Concurrent Validity - the degree to which the test agrees or correlates with a criterion set up as an acceptable measure.
Applicable to test employed for the diagnosis of existing status rather than for the prediction of further outcome.
E.g. validating a test made by the teacher by correlating with a previously proven valid test.Predictive Validity determined by showing how well predictions made from the test are confirmed by evidence gathered at some subsequent timeConstruct Validity the extent to which the test measures a theoretical trait.
Reliability
The extent to which a test is dependable, self-consistent and stable.
It is concerned with the consistency of responses from moment to moment.
A reliable test may not always be valid.
Methods in Testing the Reliability of Good Measuring In
Test-Retest Method the same measuring instrument is administered twice to the same group of students and the correlation coefficient is determined.
Limitations: time interval, environmental conditions
Spearman rank correlation coefficient or Spearman rho is a statatistical tool used to measure the relationship between paired ranks assigned to individual scores on two variables, x and y.
rs = 1 - 6 (D
N3 - N
rs = Spearman rho
6 (D = sum of the squared difference between ranks
N = total number of cases
Steps
Step 1. Rank the scores of respondents from highest to lowest in the first set of administration (X1) and mark this rank as Rx. The highest score receives the rank of 1.
Step 2. Rank the second set of scores (Y) in the same manner as in Step 1 and mark as
Ry.
Step 3. Determine the difference in ranks for every pair of ranks.
Step 4. Square each difference to get D2.
Step 5. Sum the squared difference to find (D .
Step 6. Compute Spearman rho (rs).Frequency Distributions any arrangement of the data that shows the frequency of occurrence of different values of the variable falling within defined ranges of a class interval.
applicable if the total number of cases (N) is 30 or more.
Steps:
1. Find the absolute range by subtracting the lowest score from the highest
score.
R = HS LS
2. Find the class interval by dividing the Range by 10 and 20 in order that the
size of class limits may not be less than 10 and not more than 20. In choosing the class interval, ODD number is preferable.
3. Set up the classes by adding C/2 to the highest score as the upper class limit
of the highest class and subtract C/2 to the highest score as the lower class
limit of the highest class. Set up the real and integral limits.
4. Tally the score.
5. Determine the Cumulative Frequency and the Cumulative Percentage
Frequency distributions.
6. Present the frequency polygon and histogram.
Measures of Central Tendency
A single value that is used to identify the center of the data, it is taught as the typical value in a set of scores.
Mean - Most common measure of center and it is also known as the arithmetic average.
Population mean
Sample mean
Properties of Mean Easy to compute
May not be an actual observation in the data set.
Can be subjected to numerous mathematical computation
Most widely used
Each data contributes to the mean value
Easily affected by extreme values.
Median
A point that divides the scores in a distribution into two equal parts when the scores are arranged according to magnitude.Properties of Median
Not affected by extremes values
Applied to ordinal level of data
The middle most score in the distribution
Most appropriate when there are extreme scores
Mode- Refers to the score or scores that occurred most in the distribution.
Unimodal
Bimodal
multimodal
Properties of Mode
It is the score/s occurring most frequently
Nominal average
It can be used for qualitative and quantitative data
Not affected by extreme values
Measures of Variability
Refers to a single value that is used to describe the spread out of the scores in a distribution, that is above or below the measures of central tendency.
Range, quartile deviation, sd
PAGE 7