DOCUMENT RESUME ED 055 111 TM 000 842 AUTHOR Klein, … · 2013-11-15 · DOCUMENT RESUME. ED 055...
Transcript of DOCUMENT RESUME ED 055 111 TM 000 842 AUTHOR Klein, … · 2013-11-15 · DOCUMENT RESUME. ED 055...
DOCUMENT RESUME
ED 055 111 TM 000 842AUTHOR Klein, Stephen P.; And OthersTITLE Procedures for Needs-Assessment Evaluation: A
Symposium.INSTITUTION California Univ., Los Angeles. Center for the Study
of Evaluation.SPONS AGENCY Office of Education (DHEW) , Washington, D.C.
Cooperative Research Program.PUB DATE May 71NOTE 63p.; From symposium delivered at the Annual Meeting
of the American Educational Research Association, NewYork, New York, February 1971
EDRS PRICE MF-$0.65 HC-$3.29DESCRIPTORS -*Decision Making; *Educational Needs; Educational
Objectives; Elementry Schools; Evaluation Criteria;*Evaluation Techniques; Models; Norms; Principals;*Resource Allocations; Symposia; Test Reliability;*Test Selection; Test Validity
IDENTIFIERS *CSE Elementary School Evaluation KIT
ABSTRACTSymposium topics and speakers include "Choosing Needs
for Needs Assessment" (Stephen P. Klein) ; "Selecting Tests to Assessthe /4.-eds" (Ralph Hoepfner) ; "Making Better Decisions OD AssessedNeeds: Differentiated School Norms" (Paul A. Bradley and DaleWoolley); and "Allocating Resources by Subject Area" (James S. Dyerand Guy P. Strickland). A list of 145 goals of elementary education.from the CSE Elementary School Evaluation KIT is appended. (AG)
U.S. DEPARTMENT OF HEALTH.EDUCATION & WELFAREOFFICE OF EDUCATIDN
THIS DOENT HAS BEEN REPRO-DUCED :=XACTLY AS RECEiVED FROMTHE PERLON OR ORGANIZATION ORIG-INATING IT POINTS OF VIEW OR OPIN-IONS STATED DO NOT NECESSARILYREPRESENT OFFICIAL OFFICE OF MU-CATION POSITION OR POLICY
2e.re
PROCEDURES FOR NEEDS-ASSESSMENT
EVALUATION: A SYVIPOSIUM
Stephen P. Klein,Ralph:Hoelpfher
Paul'A.J31744ley &A)ale WpolleyJatilesS.Dyer:404- P. S:trickland'
CSE Report No. 67May 1971
CENTER FOR THESTUDY OF
EVALUATION
k1CATh,Iro$0 "44
vIC4*'AND
UCLA Graduate School of Education
Marvin C. Alkin, Director
The CENTER FOR THE STUDY OF EVALUATION is one of nine centerfor educational research and development sponsored hy the United States Depart-ment of Health, Education and Welfare, Office of Education. The research anddevelopment reported herein was performed pursuant to a contract with the U.S.O.E.under ,.he provisions of the Cooperative Research Program.
Established at UCLA in June, 1966, CSE is devoted exclusively to findingnew theories and methods of analyzing educational systems and programs andgauging their effects.
The Center serves its unique functions with an inter-disciplinary staffwhose specialties combine for a broad, versatile approach to the complex prob-lems of evaluation. Study projects are conducted in three major program areas:Evaluation of Instructional Programs, Evaluation of Educational Systems, andEvaluation Theory and Methodology.
This publication is one of many produced by the Center toward its goals.Iuformation on CSE and its publications may be obtained by writing:
-Office of DisseminationCenter,for the Study of Evaluation'UCLA Gradnate School of EducationLos Angeles, California 90024
PROCEDURES FDR NEEDS-ASSESSVENT
EVPLUATION: A SYMFOSIUM*
by
Stephen P. KleinRalph Hoepfner
Paul A. Bradley & Dale WoolleyJames S. Dyer & Guy P. Strickland
CSE Report No. 67May 1971
Center for the Study of EvaluationUCLA Graduate School of Education
Los Angeles, California
,*This report is based on a symposium delivered at the AmericanEducational Research Association Annual Convention, New York.February, 1971.
TABLE OF CONTENTS
Page
Choosing Needs for Needs AssessmentBy Stephen P. Klein 1
Selecting Tests to Assess the NeedsBy Ralph Hoepfner 9
Mhking Better Decisions on Assessed Needs:Differentiated School Norms
By Paul A. Bradley and Dale Woolley 20
Allocating Resources by Subject AreaBy James S. Dyer and Guy P. Strickland 33
References 52
Appendix: List of 145 Goals of Elementary Education from theCSE Elementary SChool Evaluation KIT
CHOOSING NEEDS FOR NEEDS ASSESSMENT
There are three major reasons for determining educational needs.
The first of these is to ascertain which needs have the highest priority.
Since this information helps to focus the attention of the program planners
on the salient problems, it can be used to facilitate planning decisions
regarding the modification and development of educational programs. Needs
assessment data can thus be used to ensure more efficient utilization and
allocation of personnel time and resources. The second reason for con-
dur-.ting a needs assessment is that it justifies focusing attention on
some needs and not others. Such justification must often be made in pro-
posals and in reports to school boards and parents. Finally, needs
assessment data provides valuable baseline information against whiCh to
assess subsequent chbinges in student performance.
scope and focus of a needs assessment are, of course, determined
by the purposes for which the data will be used. For example, in a school
district a needs assessment might be conducted to determine what goals
the district should focus on in developing new programs. At a different
level, of a particular school, for instance, a needs assessment might be
conducted to determine what objectives its ninth-grade mathematics program
should try to achieve. Although there are a number of important proced-
ural differences between conducting needs assessments at different levels
in the educational system, all needs assessments should include four basic
activities:
1. Listing the full range of possible goals (ar objectives) that
might be involved in the needs assessment.
-2-
2. Determining the relative importance of the goals (or objectives).
3. Assessing the degree to which the important goals (or objectives)
are being achieved by the program (i.e., identifying discre-
pancies between desired and actual performance).
4. Determining which of the discrepancies between present and
desired performance are the ones most important to ,:orrect.
The papers presented in this symposium will discuss these four com-
ponents. This paper will discuss the first two: listing the range of
goals that could be involved in needs assessment and determining their
relative importance.
Preparing Sets of Goals
Many of us have experienced the confusion, frustration, and arguments
-)ciated with trying to construct educational goals and objectives. We
have also learned that it is better to construct these goals in cooperation
with parents, teachers, students, and others, so that in the end the goals
are more readily accepted. The inclusion of such groups, however, almost
always seems to increase the frustration and conflict associated with the
goal construction process. In fact, it often seems that by the time the
goals have been constructed, the energy and rapport that might have been
directed at constructing programs to meet these goals has already been
spent. This situation has led the Center to suggest a somewhat different
approach to goal selection. The first step of this approach is to have a
team of experts construct a set of the full range of goals and objectives
that might be included in a needs assessment. Once this set is prepared,
however, all the people who ought to be involved in goal selection should
be asked to participate. In other words, the strategy we are proposing
6
is to have experts construct the full range of potential goals that might
be included, and then to achieve community, student, and teacher involve-
ment by having these groups participate in the selection of the goals
which will be examined in the needs assessment. The total list of goals
is not, therefore, a prescription as to which goals the school should
try to achieve; rather, it is a varied bill of fare from which the appro-
priate judg.es can pick and choose the goals they feel are the most rele-
vant. Thus, the emphasis is placed ul_in schools selecting which goals
they wish to assess rather than on trying to construct just those in
whiCh they are most interested.
This approaCh of using experts to develop the full range of goals
also speeds up the construction process. It does this by eliminating
many arguments about which goals should or should not be included in a
given school's program since the construction process is now limited to
describing what mieltlae accompliShed by any school as opposed to what
should be achieved by a particular school.
If only one school in the country wanted to conduct a needs assess-
ment, this procedure might not be very efficient. In reality, however,
almost all sChools conduct needs assessments. Thus, it appears that
unless this approach is adopted schools will continue to spend consider-
able time, energy, and funds only to reinvent slightly different wheels.
The Center's Elementary School Evaluation KIT (Hoepfner, Klein,
Bradley, 1970) is a working example of this goal selection approach.
This KIT contains a comprehensive set of 106 goals to help the principle
select those he wishes to assess. (These goal areas are listed in the
Appendix.) The rationale for developing this set was as follows:
-4-
(1) it is a waste of valuable time and resources for every principal to
review the relevant literature and write his own set of goals uhen goals
overlap so much between schools; and (2) a single, comprehensive set
facilitates determining the utility of potential evaluation measures as
well as interpreting the data they provide. The set used in the KIT was
compiled from a wide variety of sources, including curriculum guides
from different parts of the country, recently published elementary school
textbooks, national and statewide evaluation studies, basic research
studies of psychologists and educators, and reports of various researCh
centers and laboratories. As one might anticipate, these important
sources USG different classification systems. The goals of the KIT,
therefore, are not presented in terms of a single theoretical position,
but are organized to permit continued revision and expansion when addi-
tional goals are needed.
In constructing the total set of goals for elementary schools, we
were forced to comprmise with regard to the specificity of the stated
goals. Very specific, operationally stated behavioral objectives have
the advantage of being easily understood, defined, and measured. Many
feel, however, that they tend to be so specific as to limit their use-
fulness. Further, maintaining comprehensiveness at such a level of
precision would result in such an unwieldly list that even to read
through it (let alone make relative value judgments) would be a dis-
couraging prospect and totally unrealistic for regular use in the field.
On the other hand, very generally stated goals are often equivocal
or vague; they lead quite easily to different interpretations among
different individuals. It is difficult, therefore, to find measurement
-5-
instruments to mateh them precisely. For example, no one would deny that
students "should have the abilities and skills necessary to engage in the
process of science," but this is a rather useless statement unless it is
further defined.
These considerations led us to adopt a set of 106 goals at a level
of specificity between behavioral objectives and vague intentions. We
then printed these goals and a brief description of them on cards,
samples of which appear in Figure 1 on the next page.
Goal Selection
Once a comprehensive set of goals has been constructed or obtained
from some other source, the next step is to select those which are most
relevant to the particular school or program. In other words, the total
set of goals is not a dogmatic prescription as to what a school should
try to achieve, but rather a varied listing from Which to select those
of primary concern.
"Who should be involved in tit-as selection process?" is, of course,
a critical question. As noted earlier, there is general agreement that
involving more people in making a decision will enhance the likelihood
of its acceptance. This involvement must be structured, however, in a
way that does not inhibit arriving at a final decision. For example,
although some schools might be successful in holding something like a
town meeting to select goals, others would find this approadh too un-
wieldy.
This situation has led the Center to suggest that schools use packs
of goal cards (perhaps printed on IBM cards for ease in subsequent data
processing) and have each person involved in the selection process go
Figure 1:
Sample Goal Cards
RESEARCH SKILLS
IN SOCIAL STUDIES
Uses reference materials, maps, globes, and encyclope-
dias.
Uses the library, reading,writing, and problem-
solving skills to research and write reports on social
studies topics, issues, problems, current events,
points of view, etc.
DANCE (RHYTHMIC RESPONSE)
Has poise, muscular control, coordination, and rhythm.
Responds to the mood, beat, and rhythm of a selection
through movement.
Expresses himself freely through
movement.
Learns popular and folk dances.
SOCIALIZATION
REBELLIOUSNESS
Has a healthy balance between conformity, acceptance,
obedience, rigidity, and non-conformity, criticism,
and disrespect.
Is open-minded and tolerant to new
ideas, non-conformity in others.
Respects public
and private property, shares, cooperates, is respectful
and courteous.
INDEPENDENT APPLICATION
OF WRITING SKILLS
Appreciates the importance of good grammar to clear
com-
munication.
Appreciates writing as a means of self-
expression, as a creative endeavor, and as an important
means of communication.
Enjoys writing activities.
Finds satisfaction in having written something well.
Takes pride in turning in neat work.
-7-
through his deck and indicate those goals he feels are most important.
In the Elementary School Evaluation KIT, this process is facilitated by
the provision of five envelopes into which the cards may be sorted. These
envelmes are labeled to denote the relative importance of goals. The re-
lative importance ranges from "1. Unimportant, Irrelevant" to "5. Most
Important." Since each person rates his OUT) deck of 106 cards by placing
each card in the appropriate envelope, one can involve as many people in
the selection process as there are decks of cards. In fact, since the
decks are reusable, there is really no limit on the number of raters other
than that imposed by the available time and resources which can be allocated
for collecting and analyzing the data in this step of the needs assessment.
Once the ratings are gathered froM all of the people involved in the
goal selection process, there are a number of ways of organizing and sum-
marizing the results. These methods range from taking the simple average
rating among all the raters to computing weighted averages for subsets
of raters, such as parents and teachers. The particular technique chosen
will, of course, be a function of the number of raters involved, the
political context in which the needs assessment is being done, available
time and facilities, etc. Whatever the technique chosen, however, the
end result should provide a score for each goal in the whole set. This
score indicates that goal's relative priority to the other goals. In short,
the scores reflect the value system of the people who rated the goals.
Summary and Conclusions
This paper has presented a new technique for conducting the initial
steps of a needs assessment. The essence of this technique is that com-
prehensive sets of goals or objectives should be constructed by experts
-8-
who have the time, knowledge, and resources to fully cover the field of
potentially relevant goals and objectives. The second step in this process
is to have appropriate individuals select from an appropriate set those
goals which are most relevant for their particular situation. In other
words, the total set of goals or objectives does not prescribe what a
school or program should do, but rather provides a catalog from whith
selections can be made. The major ad-vantars of this approach at-e that
it can involve many more people than tEe traditional committee aoproadh
of constructing goals and objectives, and it can accomplish this goal
selection task qutcker and at significant:I less cost and frustration.
The other papers presented in this symposium will discuss the actions
to be taken following the identification of important goals.
-9-
SELECTING TESTS TO ASSESS THU NEEDS
Tests and questionnaires are used to gather etivation data because
they generally are the most efficient means for doing so. They provide
more and higher quality information at lower cos do other assess-
ment techniques. The decision as to which test to e is often a di±ficult
one, however, since existing tests and measures difFar. tAL...del:-.An the
quality and quantity of evaluation information they Tyrcv-ide. The problem
is often compounded further by misleading claims of tes7 publt_shers and
by complicated technical manuals.
To minimize the difficulty of selecting tests, it _l_rstv7as necessary
to have independent-test experts evaluate essentially- aLl the existing
published tests for elementary school pupils. The four basic criteria
used in this analysis were as follows:
1. How well does the test measure the educational. goal?-
.2. To what extent is the test appropriate for the students?
3. To what degree can the test be easily utilized in the school?
4. Is tho test sufficiently reliable and refined in measurement?
The complete set of test reviews, organized by grade level and by
objective, is presented in CSE Elementary School Test Evaluations
(Hoepfner, et al., 1970).
In order to appraise equably-the output measures used in elementary
schools today (mostly tests Of achievement and aptitude), a critical
method of test evaluation was developed: Preparatory to the evaluation,
all those tests presently available were locatea and compiled. The tests
were then evaluat d in order to identify and endorse :Lose :lost appro-
priate, effective, and useful in assessing schools or -.,--n-zuderts.
13
Four evaluation criteria comprise the test evaluation system labeled.
the "NffiAN" method, an acronym for the four criteria:
1. Measurement validity,
2. Examinee appropriateness,
3. Administrative usability and
4. Normed technical excellence.
All the output measures prepared.for, or potentially useful for, evalua-
tions within the elementary schools that are generally available to
educators and researchers were evaluated on the above four critical
assessment criteria. Also, each subscale of a measure was evaluated
separately if it had been normed or was recommended for use in decision
making. The four criteria comprising the MEAN system will be described in
this paper.
Measurement Validity
This criterion is essentially a measure of psychological validity.
Empirical measurements of such validities were most desired, but indirect
evidence was also taken into account. In addition, evidence for corre-
lative validity was weighted.
Evaluators were trained to use the Center's list of educational goals
designed to categorize elementary school outputs meaningfully and exhaus-
tively; each test was then judged according to its capacity to assess
the particular goal that was determined most appropriate to it. Decisions
as to which goal was most appropriate to a test were not based merely upon
the goal implied by the test name or on the stated objectives usually
given in the test manual. The evaluators went to che individual items
to determine which goal the plurality of the items reflected. A consensus
among -che evaluators then determined the educational goal 17 which the
test would be evaluated.
This procedure may, of course, unjustly penalize some otherwise ex-
cellent test instruments, particularly those constructed or_ a model of
educational goals that differs substantially from those adopted for the
this test evaluation program. It appeared, however, that such situations
were not common and, in fact, that very few tests of educational output
are based upon any explicit model of education or evaluation. It also
appeared that evaluation could not logically proceed on a global level,
since concepts like "has developed social skills" must be analyzed and
refined into reasonably small units in order for them to have much
meaning in any evaluation program.
Examinee Appropriateness
The second evaluation criterion is designed to assess how appropri-
ate the test is for the students who will be taking it. Concern was
directed toward the appropriateness of the test's comprehension level(s),
the physical format, and the manner in which a student records his
answers.
The test's comprehension level included two aspects: content and
instructions. Evaluation of the appropriateness of test content
centered upon,the difficulty of the semantic or numerical items, and
also upon the relevance or interest-arousing aspects of the items.
Instructions were evaluated on clarity, completeness, and complexity.
The'second major area where appropriateness is felt to be important
is that of test format. The visual principles employed in a test-page
layout were evaluated in terms of effective use of visual principles
and design. The eluators looked for specific format features such as
sufficiency of white space between items, visual coherence of itc,Ani stems
and alternatives, and effective use of color as an aid in separa-ing
items.
In addition to the hole-page format, the evaluation considered
the quality of illustrations and print. Pictorial and geometric item
material was evaluated according to meaningfulness and ease of decoding
for young Children. Evaluations of print were made on the basis of
clarity, size, and type-face, at all times considering the limitations
of the examinees.
The psychometric problem of speededness vs. power of a test mas
also considered in the evaluation of appropriateness. For each scale,
pacing or time limits were judged as to their appropriateness for the
subject matter and for the examinees. In almost all cases, power
(i.e., relatively unhurried conditions) was preferred to speed as an
attribute of tests of educational output.
The last aspect of appropriateness considered was the type of
response recording. The more simple and direct the connections were
between the item stem and the recording of a response, the more credit
was given. Complicated conversions from item stems to alternatives
to unusual or hovel answer sheets were given less credit as being
generally too complicated, especially for the lower grades.
Administrative Usability
After asking questions such as "What will it measure?" and "is it
designed for my students", the next logical question should he concerned
-13-
with how usable the test is in terms of administration, scoring, inter-
pretal-ion, and decision making. These utilization questions comprise
the third evaluation criterion of the MEAN method: Administrative
usability.
It was assumed that for general assessment of educational output,
a test that can be administered to a large group is desirable. Small-
group and individually administered instruments, although having their
unique advantages, were judged to be less efficient for educational
evaluation. It should be noted that all individually administered tests
therefore suffer from this evaluative decision, and consequently their
ratings indicate less usability.
A second variable strongly affecting a test's utility is the train-
ing necessary to administer the test appropriately. Since few schools
have resident psychometrists and district psychometrists generally
focus their attentions on individual student problems, a test has more
utility if it can be administered by the school staff, preferably the
students' teacher. The time necessary for test administration also
affects its utility. Under the assumption that the average class "unit"
of time in elementary schools is about 40 to 45 minutes, tests were
credited if they fit into one such time unit, but were not credited
on this aspect if their lengths necessitated special scheduling.
The utility of a test is further affected by its scoring procedure.
Simple and objective hand or machine scoring of tests was considered
optimal for utility, while difficult and subjective scoring received
respectively less credit. Although the general utility of tests is
not much altered by slight variations in scoring difficulty, it was
-14-
decided that tests scored on a purely subjective basis, i.e., many pro-
jective techniques, should not even be considered as reasonable candidates
for educational evaluation instruments.
From a pragmatic viewpoint, while ease of administration and scoring
are desirable, they are dwarfed by the importance of being able to inter-
pret the scores and then reach a decision. Scores can only be interpreted
normatively through some type of score standardization or conversion. If
the score conversion is to be a trustworthy one, the procedures must be
empirical. The empirical conversions are obtained through normed samples
which have been given the test under standard conditions.
The samples used in test norming were evaluated according to two
criteria: breadth and representativeness. A broad normative sample is
one which ranges over a measured dimension greater than the group to
which the test is directed. One could then know about extreme perfor-
mances, either high or low.
Representativeness of the normative sample is concerned with the
procedures used in obtaining the comparison group. While purely local
tests can be quite adequate measurement devices, the trend in educational
evaluation is not. in that direction. With national questions being asked,
federal support for education and related research being given, and
national problems to be solved, a representative national normative
sample becomes a most desirable quality of educational tests. The
criteria valued for a normative sample were currer,cy, representation of
geographic regions, ages, racial and ethnic origin, population density,
and variety of schools and school districts. It might be important to
note here that few test publishers have done their normative sampling
-15--
very well, and that the technical manuals abound with confusing if not
downright misleading sampling techniques.
After the test has been administered to its normative sample, the
raw scores from that sample are isomorphically mapped into some standard-
ized score conversion system. The normative score conversions were eval-
uated according to three criteria. If the derived scale is common and
generally understood, the test is given credit. If the conversion to the
derived, normed scores is clear, with unaMbiguous tables presented and
described, the test earns credit over those with complicated, multi-stage
conversions. These two aspects of the derived scores determine in part
Who can interpret them. Tests yielding scores interpretable by the school
staff were preferred to those demanding the skills of a psychometrist.
The final practical consideration of a test's usefulness was whether
or not decisions, either individual or group, can be made. Tests which
have manuals describing well both score interpretation and subsequent
decisions that might be made were evaluated as better than those that
have doubtful decision-making utility. The decisions that were considered
ranged from selection of the next textbook for a class to whether or not
the child should be referred to a specialist for remedial instruction or
psychiatric help.
Normed TeChnical Excellence
The last major criterion of the MEAN evaluation procedure, Normed
teChnical excellence, is concerned with the reliability, replicability,
and refinement of measurement of the tests. The standard approadhes to
test reliability are not vitally relevant to tests of educational achieve-
ment, although the underlying concepts of reliability theory are.
-16--
While test-retest reliability, assessing the long-range stability
of a measure (and the examinee), is important for long-range prediction,
the notion of long-term stability of examinees' achievements is dia-
metrically opposed to the goals of education. The fatalistic concept
of relative stability within groups of students, bi.hile not in contra-
diction to educational goals (although perhaps in contradiction to many
educational philosophies), is perhaps the most relevant aspect of
stability.measurement for long-range prediction. In other words,
stability measures are betting on no real change in relative scores
over time.
Internal-consistency reliability estimates indicate how coherently
the test items assess the same dimension(s) of behavior. This type of
reliability also has marginal value in the assessment of educational
achievement, sinc :?. the more internally consistent a test is, the more
coherent, and therefore similar, are the test items. Typically, how-
ever, achievement tests must assess a broad range of specific educa-
tional objectives. It is concluded from the technical manuals of
tests that most test publishers do feel that internal consist6hcy
among test items is desirable. Whether this decision of the publishers
rests upon psychometric judgments or the fact that internal-consistency
reliability estimates can be inflated easily by numerous extraneous
test qualities remains unknown.
A third esti-a:ate of test reliability evaluated is the alternate-
form type, when altornate forms are available. When instructional
treatment effects are studied, alternate forms of a test are particularly
desirable.
-17-
Since all three types of reliability estimates are more or less rele-
vant to questions of educational achievement to an equal degree, they were
all given equal credit as aspects of the NEAN evaluation procedure. This
tactic was necessitated by the fact that selection of any one of the esti-
mates with omission of the remaining two would do violence to the fourth-
cr' terion rating for many of the test instruments.
Closely related to the concept of test reliability is that of repli-
cability of procedures to obtain the achievement scores. If procedures
described in test manuals are complicated, subjective, or based upon
abnormal samples the test is clearly not replicable in its findings
and therefore is less useful for the educator.
The range of coverage is also an important aspect of a test's
teChnical excellence. A restricted range of assessment, i.e., measure-
ment of a narrow band of achievement like the second month of third-
grade geography, limits the test's interpretability. A test which is
appropriate for one level of assessment but can also be applied to
students from one to two years above and below that level has Obvious
advantages since both advanced and retarded students can be compared
with the normative sample.
Related to range of coverage is the refinement of gradation of the
inter-individual comparison scores. Tests yielding scores graduated
into centiles or grade placements were rated as well graduated; deciles,
stanines, and similar scales as poorly graduated or uncommon; pass-fail,
quartiles and novel scales as poorly graduated and uncommon.
The primary concerns of applying the MEAN system were the objec-
tivity and consistency of the evaluations. To maximize both the
-18-
objectivity for any one test evaluator and the consistency with which
several test evaluators would evaluate, specific guidelines for evalua-
tion of each aspect of each criteria and for letter-grade assignment
were developed. These appear in Figure 1.
At least two psychometrically trained educational researchers inde-
pendently evaluated each test or subscale published or normed. Each
measure was independently placed into an educational goal category and
then. rated by the NEAN system. A third and sometimes a fourth trained
researcher then adjudicated the goal assignment and the MEAN ratings.
Figure 1
NEAN TEST EVALUATION FORM
Test Name FormEvaluation Crite.i..
Rater Date
Rating (circle one number in each row)I. Measurement Validities
a. Co ent and Constructnt 0 (only in name) 2 (a few) 4 (some) 6 (fair job)....
8 (best available) 10 (hit nail onthe head) M Total
Gradeb. Concurrent and Predictive 0 (none reported) (very little) 2 (some) 3 (not enough) 4 (considerable) 5 (exhaustive)2. Exarninee Appropriateness
a. Comprehension: contentinappropriate
0doubtful
Ipossibly appropriate
2probably appropriate
3
...exactly right
4instructions 0 I 2 3 4
b. FormatI. Visual principles 0 (complicated) 1 (probably good) L 2 (outstanding aids)2 Quality of illustration (print) 0 (not good) j 1 (helpful)
J2 (excellent)
IE Total13. Time and pacing 0 (bad)
I1 (appropriate for broad range)
Gradec. Recording answers 0 (complicated) 1 (standard) 2 (especially easy)3. Administrative Usability
a. Administration1. Test administration 0 ( individual) 1 (small groups) 2 (large groups)2. Training of administrators 0 (psychometrist) 1 (school staff)3. Administration 0 (43+ minutes) 1 (42 minutes or less)
b. Scoring 0 (subjective) i 1 (difficult) 2 (simple)C. Interpretation
I. Normsa. Norm range 0 (restricted) I (broad)b. Score interpretation 0 (uncommon, abstruse) 1 (common, simple)c. Score conversion 0 (complicated) 1 (s'mple) 2 (clear, tables)d. Norm groups 0 (local, cutdated, or poorly sampled) 1 (national, well sampled) A Total
d. Score Interpreter 0 (psychometrist) 1 (school staff)deGra
1e. Can Decisions Be Made 0 doubtful 1 possible 2 probable 3 yes charts and graphs4. Norrned Technical Excellence
a. Stabilitynot reported or less than .70
0.70 to .80
1
...80 to .00
2.90+
3b. Internal Consistency 0 1 2 3C. Alternate form
d. ReplicabilityLI Total Ie. Range of Coverage 0 no information 1 floor or ceiling reached 2 adequate 3 more than adequate .....!Grade IIf. Scores 0 poorly graduated and uncommon I 1 poorly graduated or uncommon 2 well graduated and standard
-19-
Although the MEAN criteria are relatively complex, within each one
of the four evaluative categories a total letter grade was determined
to reflect the desired assessment aspects in proportion to their de-
sirability. Points were assigned to each aspect of each criterion in
such a way that there would be discrimination for each aspect. The
total letter grade, assigned for each major criterion, only indirectly
reflects the separate-aspect evaluations.
Each measure earned four letter grades by the MEAN system. The
four-letter combination serves as the Center's official evaluation of
the test. Should the goals of the user not coincide with those of
the CSE Elementary School Evaluation Project, then the MEAN evaluations
may be interpreted with different emphasis.
-20-
MAKING BETTER DECISIONS ON ASSESSED NEEDS: DIFFERENTIATED SCHOOL NORMS1
The procedures in a needs assessment evaluation which have already
been described include the selection of educational goals that are to
be evaluated and the selection of the best available instruments to
assess student performance on these goals. After their selection, the
assessment instruments are then administered and scored. The informa-
tion provided by the assessment devices becomes one of the inputs to
the final phase of needs assessment evaluation: the selection of the
one or more educational goal areas in which revisions in the instruc-
tional program will be made so as to improve student performance.
This last phase in the needs assessment evaluation is the critical
one, obviously, as it pinpoints where a school is going to devote some
time, effort and, probably, money to correct a deficiency in its instruc-
tional program. Making a bad decision at this phase would have dire
consequences; expenditures of time, effort, and money in behalf of the
selected goal(s) would be wasted, and another goal which better deserved
attention would have been neglected. It is imperative, therefore, that
a school have the best information possible before it decides which educa-
tional goal to select as the target area for improving student perfor-
mance.
One type of information that is an input to the last phase of a
needs assessment evaluation is the data obtained from the assessment of
student performance. This p.iper is concerned with ways in which this
An expanded version of this paper, including a review of previousefforts to obtain differentiated school norms, will appear in a forth-
coming issue of Evaluation Comment.
-21-
information can be improved.so that it iS maximally Useful to the school
which is involved in a needs assessment evaluation.
What is the outcome when a school assesses student performance with
standardized instruments? First of all, the outcome depends on what the
publisher of the instrument makes available. Since all, or nearly all,
publishers provide tables of norms, a school could prepare a roster of
the raw and scaled scores achieved by every pupil. It is to be under-
stood- that from this point on we are talking about student nerformancz
within a given grade level. At no time are we looking at LIT comparing
the performance of students in different grades. This notizm follows
the comwn practice of interpreting test results relative 7- the grade
level of the student. The most frequently repo7ted scalec :7,:cres are
centiles, grade equivalents, and stanines. An aMbitious --peon could
take this roster and compute averages for each grade level, and if
this information were available for other schools in the district then
he could compare averages across schools. If a battery of tests had
been adminiStered it would then be possible to prepare a profile of
achievement for every pupil. However, very infrequently is there a
person in an elementary school who either has the time or the experience
to undertake such an endeavor.
The 1F.,rger publishing houses make available several services
that aid the school in the interpretation of test results. These
services include providing information similar to that described
above: that is, rosters of scaled scores, 7arious descriptive sta-
tistics for grades, schools, or systems, and individual pupil pro-
files. In addition, a publisher may indicate the procedures to be
followed if a school wished to develop percentile scores for students
-22-
either within a school building or within a school system. However,
most of the information that can now be provided by test publishers
is useful only for evaluating the current status of individual
pupils. That is, the various scaled scores that publishers provide
indicate the goodness of a student's performance relative to the
performance of all students who took the same test.
There are two reasons why the information that is tyDically
available neither is the best information possible nor is maximaily
useful to a school that is engaged in a needs assessment evaluation.
Ore reason is that virtually all currently available test norms are
woil norms; that is, they indicate the relative goodness of an in-
dividual student's raw =:ore. The second reason is that, again,
virtually all test norms are national norms based on samT. es cr..= stu-
dents that are intended to be representative of all students in the
country. Why do these reasons make the typical test norms inappro-
priate for a needs assessment evaluation?
School Norms
In a needs assessment evaluation, the unit being evaluated is
the school, not a single student. One aspect of a needs assessment
evaluation is determining how well a school is producing appropriate
student achievement in the chosen educational goal areas. That is,
once a school has identified the most important educational objec-
tives, it must determine its level of achievement on these educa-
tional objectives. It is not possible to determine the school's
level of achievement from pupil norms, as these norms are inappro-
priate. What is needed instead, for the purpose of a needs assess-
ment evaluation, are norms that would give the relative goodness of
-23-
the school't performance on a standardized test. Percentile norms
could be derived for schools very easily since the process would
be the same as that used in deriving pupil norms. The one neces-
sary change is that the :school's mean raw scores on the test rather
than the pupils raw sccres would_ he used tc cL,Apute percentiles.
One night, at this 3oint, wonder why a straool cannot determine
its level of performancE- by looking up its mean raw score on a
standardized test in a 7Ab1e of pupil percentile norms. It is not
appropriate to do this -:7,ocause a schoc-1 woulc: get an incorrect indf_ca-
tion of its level of pe_formance. The differ7:mce between pupil and
school norms is based 311 the fact that there is less variation in
sthool means than in pupil raw scores. Figures 1 and 2 illustrate
this difference. Figure 1 shows hypothetical normal frequency distri-
butions of pupil raw scores (A) and school scores (B). It is seen
that there is less variation in the school scores than in the pupil
raw scores.
A normal frequency distribution was chosen for convenience only.
No implication is intended that actual pupil or school frequency dis-
tributions have the characteristics of a normal distribution. It is
also a convenience that the means of the distributions are the same.
The standard deviation of the pupil scores is 10 while that of the
sthool scores is 5. No generalization is possible regarding the ratio
of standard deviations of pupil scores and school scores other than
that the former is larger than the latter. Again, the standard devia-
tions of 10 and 5 were chosen for their graphical and conceptual im-
pact.
-24-
Figure 2 shows The cumulative proportions of the frecuency distri-
butions in Figure 1, The curves in this figure can he used to read the
pupil and school percentile scores. Curve A gives pupil percentile
scores; curve B gives school percentile scores. For examDle, if a pu-
pil's score is 24, then-his percentile score is 27. But if a school's
score is also 24, t_aen its percentile score is 11. If ori-3 looks at a
score of 37, however, it is seen that the pupil percentile is 75 and the
school percentile is 92. Thus, looking at a raw score that would fall
below the SOth percentile, a school's percentile score is lower than a
pupil's percentile score, but looking at a raw score that would fall
above the 50th percentile, a school's percentile score is higher than a
pupil's percentile score.
Differentiated Norms
There is yet another way in which the norm tables provided by most
publishers can be less than adequate for needs assessment evaluation.
In most instances, for better or worse, the norm tables are based on a
national sample of schools. Thus, even if a publisher did produce
school percentile norms, a school's performance would be compared to
the performance of all schools in the sample. But what if there were
certain characteristics of schools, characteristics outside the students'
cognitive and affective skills, that were found to be related to their
level of performance? The characteristics of a school that are pertinent
here are often referred to as input variables. An example of an output
variable would be school performance on a standardized achievement test.
The input variables could be such things as the number of volumes in the
2 -8
-25-
sChool 1-lbrary, the ave-Tage expenditure per student, the occupational
level r)f parents, .aLld. e racial mixture of the students. It is not
even :-ecessary tc at-out this situation; the Coleman study,
foT has shown that suCh relationships do exist. Under these
condf.tions, then, the e of national norms can lead to an unfair and
biased comparision if a school is atypical in its characteristics.
Me bias can 1.zwE 1:o both over- and underestimation of a school's
level of performance. This bias can be easily illustrated through
the use of cumulative :Proportion curves. Suppose it were found that
there were three different "types" of schools which had markedly dif-
ferent performance on a 5tandardized test. Figure 3 shows the cumula-
tive proportion curves (A, B, C) for the three hypothetical types of
schools as well as a cumulative proportion curve for all the schools
(D). The three cumulative proportion curves A, B, and C correspond to
normal frequency distributions with means of 20, 30, and 40, respectively.
All have a standard deviation of 4. The cumulative proportion curve D
corresponds to a normal frequency distribution with mean 30 and standard
deviation 8. Again, these means and standard deviations were selected
for convenience and graphical impact. No implication is intended that
these curves represent distinct possibilities. It is not known how
much separation of schools can be achieved, how school scores are distri-
buted, and what the relationship among standard deviations is.
Case 1. Consider the four outcomes indicated in Figure 3. First of
all, suppose a school's score was 14. If that school found its percen-
tile rank from curve D on Figure 3, it would find that its performance
fell at the 2nd percentile. It is possible that a principal confronted
-26-
with this -::es7.1_ :.ould immediately begin a litany of alibis to account
for his schcc low performance. Many of these alibis are likely to
refer to the L to the school, such as a low SES level, a variable
ethnic compos:-poor tax support, low teacher salaries, etc. How-
ever, suppose- curve A in Figure 3 gives the percentile ranks for
schools that '27- imilar to the principal's school in terms of input
variables. Usfr-- this curve (A), the principal would find that his
school's perf7ce fell at the 6th percentile. That is, when compared
to schools tha77 .:ave similar resources, his school did betnr than only
6% of the schocis This result should indicate, rather unequivocally,
to the principal that his school is not doing a good job in producing
student performale in the area that the test measures.
Case 2. Now consider a school whose score was 21. With reference
to curve D, which represents a table of national norms, a score of 21
corresponds to a 7ercentile rank of 13. Again, it is likely that the
principal of thi school would echo the sentiments of the principal whose
sthool's perfc=mance fell at the 2nd percentile. However, when this
principal compo-res his school's score of 21 with similar schools, he
learns that hilz level of performance falls at the 60th percentile, not
the 13th percentile. Needless to say, this principal is not likely to
be disappointed with such an outcome, and may even be mildly pleased
to outrank 60% of the schools.
Case 3. The third outcome indicated in Figure 3 is a score of 38.
The percentile rank of this score is 84, with reference to curve D
(national norms:. A principal who finds that his school falls at the
84th percentil!? ir possibly engage in some strutti,:.g and issue
-27-
proclamations attesting to the superiority of his school. Suppose, however,
that this school is of a type that is characterized by favorable values
on the input variables that are related to student performance. For
example, some of the characteristics of this type might be high SES
level, high teacher salaries, and an all white student body. The
percentile ranks for this type of school are given by curve C in Figure
3. Using this curve, a score of 38 corresponds to a percentile rank
of 31. A markedly different picture of this school now emerges. Rather
than doing a good job with the students, the school appears to be some-
what deficient in producing student output commensurate with its input
characteristics.
Case 4. Lastly, consider a school whose score is 46. On the
national norms this school falls at the 97th percentile, and the
principal of this school would probably be jubilant. Being aware
of the quality of the input Characteristics of his schools, he may
wonder if his school's performance is really as good as it seems.
This principal, at least, is in a good position, because his score
of 46 falls at the 93rd percentile with reference to curve C.
What are the consequences for needs assessment evaluation when
a school uses national norms rather than differentiated school norms?
In virtually all cases a school would have an incorrect indication of
its relative success. In many of these cases, a sChool would reach
the same decision to select a particular goal area for curriculum re-
vision no matter what norms were used. But in some of these instances,
using the national norms rather than differentiated school norms will
lead to one of two types of errors: selecting a goal area for
-28-
curriculum revision that in fact does not need it, and not selecting
a goal area that in fact does need curriculum revision. It is impossible
to predict when these errors will occur because there are inputs other
than test scores to the decision-making process of selecting a goal
area for curriculum revision. (rhis phase of needs assessment evalua-
tion will be discussed in the last paper in this symposium.)
It should be reiterated at this point that the notion of having
different norms for different types of schools is one whose feasibility
depends on finding types of schools that differ in their performance on
standardized tests. This latter point is important, because while it
may be possible to group schools into a small number of categories
based on similarities of input characteristics, it may not be the case
that there are significant differences in level of student performance
If there is no difference between the groups in level of performance,
then there is no need to have three separate norm tables which are essen-
tially identical to each other. It should be remembered, 0-lough, that
the situation is different with regard to the notion of having tables of
school norms as well as tables of pupil norms. This notion is not clnly
very feasible and plausible, it has been done by some publishers.
To summarize, improvements need to be made in the information that
results from an assessment of student performance. It is important
to improve this information becausa it is an input to the last phase of
a needs assessment evaluation: the selection of the one or more educa-
tional goal areas in which revision in the instructional program will be
made so as to improve student performance. Specifically, it was proposed
that improvements can be made by altering the types of norms that
-29-
accompany standardized tests. The two alterations suggested were to
provide school norms as well as pupil norms and to provide, if feasible,
norms for different "types" of sthools as well as national norms.
30
10 20 30
RAW SCORL
FIGURE 1. HYPOTHETICAL FREQUENCY DISTRIBUTIONS OF
40
PUPIL SCORES (A) AND SCHOOL SCORES (B).
50 60
100
92
-31--
20
24
RAW SCORE
FIGURE 2. HYPOTHETICAL CUMULATIVE PROPORTIONS OF
30
37
40 50
PUPIL SCORES (A) AND SCHOOL SCORES (B).
0097
93
84
31
13
-32-
14 21
RAW SCORE
38
FIGURE 3. HYPOTHETICAL CUMULATIVE PROPORTIONS OFSCHOOL SCORES FOR THREE DIFFERENT TYPES OF SCHOOLS(A, B,C) AND FOR ALL SCHOOLS (D).
46
ALLOCATiNG RESOURCES BY SUBJECT AREA
This paper describes a procedure designed to asist elementary
school principals in the process of selecting educational subject areas
which should command their attention, resources, or support. For each
subject area, the model produces an index number which represents the
expected "value" which will accrue to the school from the adoption of
an instructional program appropriate for strengthening that area. Al-
though the procedure will be explained in terms of this specific appli-
cation., the approach proposed in this paper could also be used to
structure similar decision problems at the district or state levels, or
in secondary and pre-school educational systems.
The calculation .of the index number for a particular area depends on
the .following factors: (1) the relative importance of that area; (2) the
"utility," or "value" to the decision maker, of making an improvement in
that area, given the current'level of performance; and (3) the probability
distribution of the results of implementing a particular type of improve-
ment program for that area, given the current level of performance. The
first two factors will be discussed in the first section of this paper.
In the second section the probability of various results will be consid-
ered, and all three will be combined in a formula yielding the desired
index nuMber. The use of these indices as an aid to decision making will
then be explained.
-34-
THE ESTIMATION OF UTILITY
This section describes how the "utility"' (of the decision maker)
for the current state of the system is estimated. In our particular
application, the system is an elementary school, the decision maker is its
principal, and the state of the system is represented by the level of
educational achievement of the school. As we shall see, the process of
estimating utility presupposes the existence of (1) a well-defined
hierarchy of system objectives, and (2) adequate devices for measuring the
degree of achievement of these objectives. We are then left with the
problem of transforming performance measurements into a single (utility)
nuMber; this number reflects the decision maker's "satisfaction" with the
state of the system as represented by these performance measures.
The formal structuring of a decision problem (in our case, the selec-
tion of educational subject areas to be emphasized or strengthened) must
begin with the statement of a general goal -7 perhaps one as vague as
"promote the good life". By asking how a giVen system contributes to the
achievement of this "meta-objective," a hierarchy of primary objectives,
secondary objectives, goals, and subgoals can be identified.
However, the performance of an educational system is not usually
determined by a direct measurement of how well it achieves its primary or
secondary objectives. These objectives are too broad in nature for the
development of valid and reliable measuring instruments. Instead, the
1For a discussion of the meaning of "utility," see the classic workby J. Vbn Neumann and 0. Mbrgenstern (1953) or that of Schlaifer(1959).
-35-
state of the system may be assessed indirectly by measuring its per-
formance in each terminal goal area for which adequate measuring devices
exist. As discussed earlier in this symposium, the School Evaluation
Project has investigated the existence of test instruments for each of
the goals listed in the Appendix. The results of this study indicate
that numerous instruments are available in the skill and cognitive areas,
with their number and validity decreasing in the affective areas. However,
continuing improvement in the number and validity of tests in all areas can
be expected.
The results achieved on the various test instruments are generally
expressed in terms of percentile scores or rankings on a national basis.
Consequently, there is no absolute, invariant scale against which to
measure performance, and we must take these scores to be our "raw"
system performance measurements. It seems reasonable to assume that the
"worth" or "value" (to the principal) of a given percentile score depends
strongly pn both (1) the particular goal area involved and (2) his aspira-
tion level fur that area. Since past achievements of a school depend to
some extent on such exognous input factors as the socioeconomic status
of the parents, the location of the school (urban vs. rural), the region
of the country, etc., one can reasonably expect these factors to influence
the principal's aspiration level (and hence his utility for performance
measurements) in each goal area.2 So that the model will be sensitive to
2This will be discussed in a forthcoming Center Report on the possibleimpact of such factors on the shape of the utility function for a given goalarea.
-36-
these environmental factors, performance data is currently being col-
lected on schools categorized according to such environment-2 chaacter-
istics. (See Elementary School Evaluation KIT: Needs Assessment, Booklet
IV.)
If we accept the (percentile) scores obtained on standardized tests
as adequate measures of the school's erformance in the associated goal
areas, it remains to be determined if these scores can be used directly
in estimating the principal's utility. The contents of the previous
paragraph and the following observations suggest that they cannot; i.e.,
the scores must be transformed before they can be used for that purpose:
1. Given results in a particular goal area and ignoring all other
results, it is clear that a score of 80 may not be considered
twice as'"good" as a score of 40. Also, it may not be true that
a score increase from 40 to 50 has the same "value" as an increase
from 80 to 90.
2. The "worth" of an increase from, say, 40 to 50 percentile points
in two different goal areas may not be the same, (i.e., the principal
may not be indifferent between these two outcomes.)
The three words, "worth," "good," and "value" are often used synony-
mously with the term utility. However, in the last two paragraphs, these
words have been used to express the decision maker's utility for only one
goal area, and not the system as a whole. A key assumption of this model
is that the principal's utility for a set of n percentile scores (one for
each goal area, thus characterizing the state of the educational system)
is simply the sum of his utility3 for each of the individual sccres;
3When these utilities are measured on appropriate scales.
4 0
-37-
therefore, it is evident that our next task is to transform the area
scores into their associated "area utility values."
The above assumption about the decision maker's utility is equiva-
lent to saying that his utility function is additively separable.4
If
we let
n E the number of terminal goal areas;
a- E the (percentile) score obtained on a standardized test appro-i
priate for measuring performance in area i;
f. (-) E the principals' standard5 utility function for area i, i.e.,
ittransformsthescoreforareai(a.) into a nuMber between1
0 and 1 (f.(a.));
E the "weighting factor" for area i. If we requirc that E w. == 1, thisil
weight expresses the relative importance of area i with respect.to the
wholesetofareas.Thenumberw.can also be viewed as the proper
"scaling" factor for the standard utility function fi (-) such that
the principal's utility for the score ai is given by the scaled utility
function fi(ai) = wifi(li); and
f(a1,.. n) E the principal's utility for the set of n scores ai; i=1,---,n;
then the additive utility assumption says that:
(1) f(a1,a2,---,an) = fl(q1) f2(a2)
or equivalently:
(2) f(a1'a2,-Than) wlfl (al) +142f2 (a2) -+Urnfn (an)
4See Appendix II in Amor and Dyer (1970).
-The utility of a score of "0" is 0; the utility of a score of '100"is 1.
-38-
To transfoiia the area score, ai, into its associated area utility value,
f.(a.)' wenowneedtodeterminetheconstantwiandthefunctionf.(-)1 1
1
Thevaluesofthew.'s may be obtained in various ways. A procedure1
which illustrates one particular approach in terms of the 106 goals
listed in the Appendix was described in the first paper of this symposium.
This method allows the principal to gather information from several groups,
including parents and teachers, regarding their priorities for student
achievement in these 106 goal areas. Alternative methods are described in
Fishburn (1967). The function fi(-) can only be approximated through the
analysis of empirical data.6 However, a few statements can be made about
its expected share. It seems reasonable that ai, defined in terms of a
percentile score, will be considered to be of greater value as it increases;
1 2 1 2that is, if a.>a., then f.(a.)>f.(a.). Consequently, we may assume that
1 1 1111the function fi () is monotonically increasing on the closed interval
[0, 100j, where the numbers in that interval refer to percentile scores.
Additionally, there appear to be two particular percentile scores on a
standardized test which serve as aspiration levels for the school prin-
cipal: (1) the national norm (50th percentile score) and (2) the norms
for schools of a particular "type," as characterized by the various
environmental factors discussed earlier. Interviews with principals
and the other individuals associated with elementary school systems
indicated tlyAl an increase (in percentile score) of ajgiven amJunt from
a point below the national norm is considered to be of significantly
greater value than the same_amount of increase from a point above the
(.For a discussion of several alternative approaches and a complete
bibliography, see Fishburn (1967). For a discussion of how this was done
in this particular project, see a forthcoming, Center Report,
-39--
national norm. This suggests that the slope of fi (-) is steeper
at points below the 50th percentile score than at points above this
score. A similar behavior may be "expected" with respect to the
"environmental" norm; however, current data limitations prevent us from
verifying this hypothesis. This prediction of a decreasing slope for
the utility function as the percentile score increases is also consistent
with the "law of diminishing marginal utility" which has been empirically
verified in numerous studies in an economic context. One possible form
of f () is shown in Fig. 1.
EXPECTED CHANGE IN UTILITY ASA DECISION CRITERION
This section describes how estimates of the decision maker's utility
for various performance levels of the system may be combined with estimates
of the effects on system performance of implementing various "programs" to
provide a guide for decisions. In our particular application, the decision
problem is to select educational goal areas which should receive more em-
phasis. The key assumption implicit in this discussion is that the decision
maker prefers actions which maximize his expected utility.
It is reasonable to assume that a particular goal area will be selected
for increased emphasis because (1) there exists an educational program (e.g.,
a new set of workbooks, the Sullivan reading program, a computer-assisted
arithmetic program, etc., which has a "reasonably good chance" of improving
the student's performance in that area, and (2) a "significant" increase kn
the percentile score in that area will result in a "significant" increase
in the decision maker's utility.
(-)
1
.5
-40-
Figure 1 A Possible Form of fi (-)
100 a.
-41-
Because of the interaction among the exogenous input factors, it is
impossible to state that "the adoption of a program in area i will increase
theperformancefroma°to a + d.." Instead, the potential results of
adopting a program in area i should be described by the conditional (or pos-
terior) probability distribution of the scores which would be obtained upon
retesting the students after implementation of the program. Using this dis-
tribution, one could calculate the probability ot achieving a specified result.
Weassumethattherandomvariablea'.(representing the score to be obtained
upon retesting) has a probability density function which depends on 2 factors:
(7) the particular goal area (i), and (2) the current level of achievement
and we denote this conditional probability distribution by g. (a'.1a.o).
For example, the goal area may be "operations with integers" and the current
level of achievement may be "40th percentile ranking." A possible form of
gi (a'i 140) is shown in Figure 2.
cf- (a'. 140) A
40 50 100 a.
Figure 2
A. Possible Form of a Posterior Probability Distribution of Scores
-42-
Unfortunately, it is extremely difficult and costly to obtain objec-
tive and generalizable information which would yield an estimate of this
probability distribution. However, subjective information may be used to
approximate the form of gi (a'i la). Persons who have observed the ijr-
pact of new educational programs may be asked for their estimate of the
probability of achieving specific aange in performance as a reult of
adopting a new program in area i, when the current level of performance is
a-. For example, these "experts" may estimate the probability of an in-
crease of from three to five percentile points in area i, given a and
the adoption of a new program, to be .3. This implies that
oa
(3)
S gi
I a?)da'.
a.+3
We will denote a subjective estimate of the value of the integral of gi (a'i la)
over a particular interval, Ak, 7by Pi (Ak I a7). By obtaining similar
estimates of K non-overlapping intervals which cover the range of a'i, we
can obtain a discrete approximation to the dasired distribution. By
varying the size and number of the intervals we can make the approximation
as "close" to the continuous function as we desire. 8In addition, we
naturally require that E P. (Ak I ai)=1.k=1 1
7Note that the symbol Ak carries information about both the location
and the size of the interval of interest.
8See Schlaifer (1959) for a further discussion of this topic.
The theoretical probability distribution may be used to determine the
expected change in utility, resulting from the implementation of a new program
in area i given that the current level of performance is ai, as follows:
(4) E (ALli)
-43-
100
f-Ca'-)g.(a'. a?)da'. f.(a:jo 9
0iii
where E E expected value operator,
E change in utility in area i,
and the other variables are identical to those defined in the previous section.
Ifthediscreteapproximationtog.(a'.12) is used, expressicn (4) simplifies to1
(5) E(AU i) = r--{E f,(Ak).picAk aT)} fi(q..)]
1 LwherethevalueoffiGAIdmaybeapprmthmtedbythevalueoff.(-) at the
midpoint of the interval Ak.
This number is the product of the measure of relative importance, wi, the
change in utility associated with a given change in performance, fi (Ak)
and the subjective probability of that change, Pi(Ak al). If the probabilities
that the resulting score will fall in each of several disjoint intervals Ak
are
estimated, their product with their associated measures Of change in utility
must be summed. In words, expression (5) for'the expected change in utility 'is
approximately equivalent to
a measure ofthe importanceof area i
the change in utilityassoriated with achange in performancein area i
t e pro ability-1of achicvingthat change
a-See Appendix II of Almr and Dyer (1970) for a theoretical discussion of
the assumptions implicit in this formulation.
47
-44-.
IMPLEMENTING THE DECISIONMODEL
This section will describe how the decision model could be used by an
elementary school principal. This use would be encouraged by the provision of
a series of tables containing the values computed from the expression in brackets
in (S) for each area, i, and for a series of scores, a7. The principal would
merely be required to determine his own measures of the relative importance of
eacharea(thew.'s), and obtain the remainder of the information directly from
the appropriate table.
The computation of the "index numbers" or priority values for educational
subject areas is quite simple and is outlined in a step-by-step procedure below.
Figure 3 contains sample computations of priority values that are i_rovided as
an example.
Step 1. The names of educational =i7..bject or "goal" areas for which priority
values are desired are listed. This list should include all of the goal areas
in which student performance has been assessed with a standardized testing in-
strument. The first column of Figure 3 shows ten goal areas for which priority
values will be computed.
Step 2. For each of the goal areas listed in column 1., the current performance
level (in school or class percentiles) is entered in column 2. For example, in
column 2 of Figure 3 the current performance level for Creativity is the 43rd
percentile.
10The authors wish to thank Dr. Paul Bradley for providing the example.
4
-45-
Step 3. The numerical value of the average rated importance of the goal area is
entered in column 3. These values may be the averages obtained from a collective
viewpoints rating of the goal areas as performed in Booklet II of the Elementary
School Evaluation KIT described earlier, and correspond to the wits. In the
example shown on Figure 3, the average rated importance of Creativity is 3.7,
indicating that Creativity is a goal area which is moderately important.
Step 4. This is the first of the two steps that result in an entry for column 4,
expected increase in utility. The first step is to select one of the s
tables which are provided and in which the unscaled (standard) expected in-
a?"creases in utility (i.e., {E fk)°Pi(Ak1 1
)) are tabulated. Thisk=1
choice is made on the basis of typical level of student performance. The
school type can be determined from a chart similar to-the ore presented in
Figure 4.
Step 5. After the school type is determined, the corresponding table is used.
That is, if a school is a type 1 se.lool with typical achievement in the lower
percentile range, then the principal should consult Table 1 to obtain esti-
mates of unscaled expected increases in utility. The apprOpriate column in
Table 1 is selected to correspond to the current level of student performance.
If, for example, the cur-mt performance is at the 55th percentile, the sixth
column of the table headed 51-60 would be used.
The expected unscaled increases in utility for the particular goal areas
in question are found by searching down the appropliate columns. The compu-
tations in Figure 3 are based on a school whose typical level of student
performance is low, so -chat the values in column 4 are taken fram the figures
presented in Table 1. Since the current level of studem _formance
Creativity is the 43rd -rcenAle, the appropriate column in Tab 3 the
4 0
-46--
Figure 3
Sample computations of priority values
1 2 3 4 6
Goal AreaCurrent
PerformanceLevel (%ile)
AverageRated
Importance
ExpectedIncrease
in Utility
PriorityValue
Rank
Temperament: Social 57
21
43
34
48
22
52
39
17
28
3.2 017 54.4 10
3
9
4
5
6
7
8
1
2
Reasoning 3.3 035 115.5
Creativity 3.7
-3.6
017
026
62.9
Language Construction 93.6
Arithmetic Operations 3.9
2.5
4.4
023 89.7
Health & Safety 032
017
80.0
Reading Comprehension 74.8
Scienl-ific Knowledge 2.8 024 -67.2
History 4 Civics 4.2
4.1
0S7
032
239.4
Sociology 1 '.51.2
ool Type
-47-
Figure 4
Typical Levels of Student Performance
Characteristic
1 A school whose students are Characteristically
poor performers on standardized achievement tests.
Many and/cr most of the students' test scores fall
in the bottom 25 to 30 per cent of a distribution
based on a national sample. That is, the scores
range from the 1st percentile to the 25th or 30th
percentile.
2 A school whose students are characteristicall,
average performers on standardized ach5evement
tests. Many and/or most of the students' test
scores fall in the middle 40 to 50 per cent of
a distribution based on a national sample. That
is, the scores range from the 25th to 30th per-
centile to the 70th or 75th percentile.
3 A school whose students are characteristically
good performers on standardized achievement tests.
Mhny and/or most of the student31 test scores fall
in th top 75 to 3C per cent of a distribution
based on a national sample. That is,the scores range
f- AR the 70th or 75th percentile to the 99th percentile.
-48-
TABLE 1*
Expected increases in utility fo- school type 1
Current Performance Level in Percentiles
Goal Area1-
1011-20
21-
3031-40
41-50
51-60
61-70_
71-80
81-
99
I. Temperament: Personal 091 056 036 025 020 017 011 011 0102. Temperament: Social 091 056 036 025 020 017 014 011 0103. Attitudes 091 056 036 025 020 017 014 011 0104. Needs and InterPst 091 056 036 025 020 017 014 011 010S. Valuing Arts and Crafts 091 056 036 025 020 017 014 011 0106. Producing Arts and Crafts 091 056 03u OZS 020 017 OL4 011 0107. Understanding Arts and Crafts 091 056 036 025 020 017 j14 C11 0108. Reasoning 061 055 035 025 020 016 014 012 0109. Creativity- 076 047 030 022 017 014 012 010 008
10. Memory 068 051 032 0'23 018 015 013 011 00911. Foreign Language Skills 103 064 041 029 020 017 015 012 01012. Foreign Language Assimilation 103 064 041 029 020 017 015 012 01013. Language Construction 094 058 0-,7 026 020 017 015 012 01014. Reference Skills 094 058 037 026 020 017 015 012 01015. Arithmetic Concepts 103 064 041 029 023 019 016 013 01116. Arithmetic.Operations 103 064 041 029 023 019 016 013 01117, Mathematical Applications 103 064 041 029 023 019 016 013 01118. Geometry 103 064 041 029 023 019 016 013 01119. Measurement 103 064 041 029 023 019 016 013 01120. Music Apprecation and Interest 091 056 036 025 020 017 014 011 01021. Music Performance 091 056 036 025 020 017 014 011 C70
Milsic Understanding 091 056 036 025 020 017 014 011 01023. Health and Safety 081 050 032 023 018 015 013 010 00824. Physical Skills 081 050 032 023 018 015 013 010 00825. Sportsmanship 081 050 032 023 018 015 013 010 00826. Physical Education 081 050 032 023 0,18 015 013 010 30827. Oral-Aural Skills 094 058 037 026 021 017 015 012 01028. Word Recognition 094 058 037 026 021 017 015 012 01029. Reading Mechanics 094 058 037 026 021 017 015 012 01030. Reading Comprehension 094 058 037 026 021 017 013 012 01031. Reading Interpretation 094 058 037 026 021 017 015 012 01032. Reading Appreciation and Response 094 058 037 026 021 017 015 012 01033. Religious Knowledge 087 053 034 024 019 016 013 011 00934. Religious Belief 087 053 034 024 019 016 013 011 00935. Scientific Processes 087 054 034 024 019 016 013 011 00936. L-itific Knowledge 087 054 034 024 01',J 01_6 013 011 00937. Scientific Approach 087 054 034 024 019 016 013 011 00938. History and Civics 092 057 036 026 320 017 014 012 01039. Geography 087 053 034 024 019 016 013 rll 00c.)40. Sociology 081 050 032 023 018 015 013 010 (-941. Applicacion of Social Skills 087 053 034 024 019 016 01: 011 009
*Based on hypnthetical data
-49-
one with 41-50 at the top. The expected unsealed increase in utility is
found by going down this column until the goal area Creativity is reached.
This number is 017, and it is seen that this is the number which appears
in column 4 of Figure 3.
Step 6. Now the priority value for a goal area can b computed. It is
obtained by multiplying tao numbers -- the rated importance and the ill,-
scaled expected increase in utility. These two numbers -e found in
columns 3 and 4 of Figure 3 respectively, and the product f the two numbers
is entered in column 5. For the example of Creativity, its priority value
is 3.7 X 017 = 62.9. This number corresponds to the results from expression
(5) in the previous section.
Step 7. This last step is not performed until all the priority values have
been computed. When this Is accomplished, it is time to rank the goal areas
on the basis of their priority values. The goal area with the highest pri-
ority value is given a rank of 1, the next highest is 2, and so cn until
all the goal areas have been ranked. In Figure 3 it is seen that the goal
area of Creativity has a rank of 9, whereas the highest ranked goal area is
History and Civics.
Now that the decision model has been used to compute priority values for
some educational goal areas, and the educational goal areas have been ranked in
terms of priority value, the next, and final, step is to imploment the decision
rule. The decision rule for the principal is elegantly 3implet plan to revise
the instructional program in the goal area that has the highest priority value.
It is clear that some error could incurred by the suggestion of this rule.
Ideally, the principal should estimate his available resources, the resource
requirements cssociated with each program, and solve the classical "knapsack
-50-
problem" by using these priority values in the objective function of an
integer programming formulation. However, this process would require an
expertise wtlich cannot be assumed. Therefore, the simpler "rule of --jAumb"
is suggested with certain caveats.
It is crlite clear in the sample compute.tions of Figure 3 that the
priority value of His.tory and Civics is easily the highest and that there
is nc other goal area which is close to History and Civics in priority
value. This may not happen in all cases. For instance, if History and
Civics were not included in Figure 3, then the highest ranked goal a,-ea
would be Sociology, with a priority value of 131.2. Notice, however, -Aat
the priority value for Reasoning (the third ranked goal area) is 115.5,
and the diff:,,rence between it and Sociology is rather small compared to
the difference in priority values for History and Civics and Sociology.
When such a situation occurs, that is, when there are two or more goal
areas with similar priority values, may be best to suggest the tempor-
ary postponement of the decision to plan to revise the instructional proc- ,
in a particular goal area. In lieu of making a final decision at this
point, the principal shou]d wait until a program evaluation is performed
for each of the two or more goal areas that are similar in priority value.
On the basis of these evaluations, he may then decide which one of the goal
areas will receive .. new instructional program. An alterrative, and typical,
solution would be to decide to plan revisions in the instructional programs
of the two or three goal areas which are similar in rriority value. This
would be ecially deirable if sufficient resources exist.
A further consideration in implementing the decision rule is the
extent to which revising the instructioual prnvam in one goal area will
-51-
detract from achievement in other goal areas. There are nc rules by which
such deleterious side-effects can be detemined, as they are unique to
each school. Suffice it to say that the selection of one goal area does
not imply that lesser efforts should be made in the remaining goal areas.
CONCLUSION
The model which was described in this paper provides the decision maker
(an elementary school principal) with index nuMbers which represent estimates
of the expected changts in his utility for the performance of his school
which would result from Lhe adoption of programs in particular areas. It is
felt that these index numbers would provide valuable information to -tile
docisior maker for dealing with the problem of identifying areas in which
action should be considered, and identifying the types of programs which
would prmide the greatest expected contribution to the achievement of his
instructional goals. The model provides the basis for Booklet V of the
Elementary School Evaluation Kit: Needs Assessment.
-52-
REFERENCES
1. AMOY, J. P., & Dyer, J. S. A Decision Model for Evaluating PotentialChange in Instructional Programs. CSE Report No. 62. Los Angeles:Center for the Study of Evaluation, University of California, 1970.
2. Hoepfn7,-, R., Klein, S. P., & Bradley', P. A. Elementary shool evalua-tion KIT: Needs assessment. Los Angeles: Conter for the Studyof Evaation, UCLA, 1970.
3. Hoepfner, R., Strickland, G., Stangel, Jansen, P., & Patalino, M.CSE elementary sJ:hool test evaluatl, Los Angeles: Centerfor the Study of Evaluation, 1970.
4. Fishburn, P. C. Methods of estimating a6;ditive utilities. ManagementScience, 1967, 13 (7).
S. Schlaifer, R. Probability and statistics for business decisions.New York: MtGraw-Hill, 1959.
6. Von Neumann, J., & Morgenstern, 0. Theory of games and economicbehavior. Princeton, N. J.: Princeton, University Press, 1953.
APPENDIX
OUTLINE OF 145 GOALSof Elementary School Educationfrom the CSE Elementary School
Evaluation KIT
OUTLINE OF 145 GOALS
OF ELEWENEAW SCHOOL EDUCATION
AFFECTIVE
1. TEMPERAMENT: PEP7ONALA. Shyness-BoldnessB. Neuroticism-AdjustmentC. General Activity-Lethargy
2. TEMPERAMENT: SOCIALA. Dependence-IndependenceB. Hostility-FriendlinessC. Socialization-Rebelliousness
3. ATTITUDESA. School OrientationB. Self Esteem
4. NEEDS AND INTERESTSA. Need AchievementB. Interest Areas
ARTS-CRAFTS
5. VALUING ARTS AND CRAFTSA. Appreciation of Arts and CraftsB. Involvement in Arts and Crafts
6. PRODUCING ARTS AND CRAFTSA. Representational Skill in Arts and CraftsB. Expressive Skill in Arts and Crafts
7. UNDERSTANDING ARTS AND CRAFTSA. Arts and Crafts ComprehensionB. Developmental Understanding of Arts and Crafts
COGNITIVE
8. REASONINGA. Classificatory ReasoningB. Relational-Implicational ReasoningC. Systematic ReasoningD. Spatial Reasoning
9. CREATrVITYA. Creative Flexibility
B. Creative Fluency
58
10. MEMORYA. Span and Serial MemoryB. Meaningful MemoryC. Spatial Memory
FOREIGN LANGUAGE
11. FOREIGN LANGUAGE SKILLSA. Reading Comprehension of a Foreign LanguageB. Oral Comprehension of a Foreign LanguageC. Speaking Fluency in a Foreign LanguageD. Writing Fluency in a Foreign Language
12. FOREIGN LANGUAGE ASSIMILATIONA. Cultural Insight through a Foreign LanguageB. Interest in and Application of a Foreign Language
LANGUAGE ARTS
13. ISNGUAGE CONSTRUCTIONA. SpellingB. PunctuationC. CapitalizationD. Grammar and UsageE. PenmanshipF. Written ExpressionG. Independent Application of Writing Skills
14. REFERENCE SKILLSA. Use of Data Sources as Reference SkillsB. Summarizing Information for Reference
MATHEMATICS
15. ARITHMETIC CONCEPTSA. Comprehension of Numbers and Sets in MathematicsB. Comprehension of Positional Notation in MathematicsC. Comprehension of Equations and InequalitiesD. Comprehension of Number Principles
16. ARITHMETIC OPERATIONSA. Operations with IntegersB. Operations with FractionsC. Operations with Decimals and Percents
17. MATHEMATICAL APPLICATIONSA. Mathematical Problem SolvingB. Independent Application of Mathematical Skills
18. GEOMETRYA. Geometric FacilityB. Geonetric Vocabulary
19. MEASUREMENTA. Measurement Reading and MakingB. Statistics
MUSIC
20. MUSIC APPRECIATICN AND INTERESTA. Music AppreciationB. Music Interest and Enjoyment
21. MUSIC PERFORMANCEA. SingingB. Musical Instrument PlayingC. Dance (Rhythmic Response)
ZZ. MUSIC UNDERSTANDINGA. Aural Identification of MusicB. Music Knowledge
PHYSICAL EDUCATION-- HEALTH - SAFETY
23. HEALTH AND SAFETYA. Practicing Health and Safety PrinciplesB. Understanding Health and Safety PrinciplesC. Sex Education
24. PHYSICAL SKILLSA. MUscle Control CPhysical Education)B. PhySical Development and Well-Being (Physical Education)
25. SPORTSMANSHIPA. Group Activi-ty - SportsmanshipB. Interest in and Independent Participation in Sports and Games
26. PHYSICAL EDUCATIONA. Understanding of Rules and Strategies of Sports and GamesB. Knowledge of Physical Education Apparatus and Equipment
READING
27. ORAL-AURAL SKILLSA. Listening Reaction and ResponseB. Speaking
28. WORD RECOGNITIONA. Phonetic Recognition-B. Structural Recognition
29. READING MECHANICSA. Oral ReadingB. Silent Reading Efficiency
30. READING CCMPREBENSIONA. Recognition of Word MeaningsB. Understanding Ideational ComplexesC. Remembering Information Read
31. READING INIERPRETATIONA. Inference Making from Reading SelectionsB. Recognition of Literary DevicesC. Critical Reading
32. READING APPRECIATION AND RESPONSEA. Attitude toward ReadingB. Attitude and Behavior Modification from ReadingC. Familiarity with Standard Children's Literature
RELIGION
33. RELIGIOUS KNOWLEDGE
34. RELIGIOUS BELIEF
SCIENCE
35. SCIENTIFIC PROCESSESA. Observation and Description in ScienceB. Use of Numbers and Measures in ScienceC. Classification and.Generalization in Science
D. Hypothesis Formation in ScienceE. OPerational Definitions in ScienceF. Experimentation in ScienceG. Formulation of Generalized Conclusions in Se ce
36. SCIENTIFIC KNOWLEDGEA. Knowledge of Scientific Facts and TorminoloB. The Nature and Purpose of Science
37. SCIENTIFIC APPRIOACHA. Science Interest and AppreciationB. Application of Scientific Methods to Everyday Life
SOCIAL STUDIES
38. HISTORY AND CIVICSA. Knowledge of HistoryB. Knowledge of Governments
39. GEOGRAPHYA. Knowledge of Physical GeographyB. Knowledge of Socio-Economic Geography
40. SOCIOLOGYA. Cultural KnowledgeB. Social Organization Knowledge
61
41. APPLICATION OF SOCIAL STUDIESA. Research Skills in Social StudiesB. CitizenshipC. Interest in Social Studies
62
This publication is published pursuant to a contractwith the U.S. Office of Education, Department of Health,Education and Welfare. Points of view or opinions stateddo not necessarily represent official U.S.O.E. positionor policy.