An Assessment Primer - Metropolitan Community College · An Assessment Primer INTRODUCTION Why is...

36
An Assessment Primer: A Guide to Conducting Assessment Projects August 2003 Metropolitan Community College Produced by the Office of Research, Evaluation and Assessment

Transcript of An Assessment Primer - Metropolitan Community College · An Assessment Primer INTRODUCTION Why is...

An Assessment Primer:

A Guide to Conducting Assessment Projects

August 2003

Metropolitan Community College

Produced by the Office of Research, Evaluation and Assessment

An Assessment Primer:

A Guide to Conducting Assessment Projects

A Resource Developed for the

District Steering Committee for Institutional Assessment General Education Outcomes Assessment Subcommittee

Subcommittee for Assessment of Occupational Program Outcomes Student Development Assessment Subcommittee

The Metropolitan Community College District

Business & Technology College – Blue River Community College Longview Community College – Maple Woods Community College

Penn Valley Community College – Administrative Center

Produced by the Office of Research, Evaluation and Assessment

August 22, 2003

TABLE OF CONTENTS Section Page INTRODUCTION……………………………………………………….. 1 Why is Assessment Important…………………………………………….. 1 What is Assessment?..................................................................................... 1 Explanations of an Assessment Program………………………………….. 1 Measures of Learning……………………………………………………… 3 Expectations and Demands for Accountability……………………………. 4 DEVELOPMENT OF ASSESSMENT PROJECTS…………………… 4 Spend Time to Review the Literature and Discuss Issues about the Topic... 5 Develop Question(s) that form the Basis of Inquiry……………………….. 5 Develop a Research Plan to Obtain the Data………………………………. 5 Develop Answers Based on the Data………………………………………. 6 METHODOLOGICAL ISSUES………………………………………… 6 What to Assess……………………………………………………………... 6 Define Components………………………………………………………... 6 Examine Component Intricacies…………………………………………… 6 Review Measurement Options…………………………………………….. 7 Scales………………………………………………………………………. 7 Rubrics……………………………………………………………………... 7 How Many Subjects………………………………………………………... 8 Sampling…………………………………………………………………… 8 Random Sampling…………………………………………………………. 8 Stratified Sample…………………………………………………………... 9 Representative Sample…………………………………………………….. 9 Living with Something Called Error………………………………………. 9 DESIGN ISSUES…………………………………………………………. 10 Experimental vs. Control Group…………………………………………... 10 Collection of Students…………………………………………………….. 10 Longitudinal Design………………………………………………………. 10 Mixed Model……………………………………………………………… 11 METHODOLOGICAL CONSTRUCT………………………………… 11 Written Data………………………………………………………………. 11 Objective Data…………………………………………………………….. 11 Rating Scales………………………………………………………………. 12 Survey Data………………………………………………………………... 12

TABLE OF CONTENTS Section Page IDENTIFYING MEANS OF MEASUREMENT………………………. 13 Fixed Scale Data…………………………………………………………… 13 Sliding Scale Data…………………………………………………………. 13 Using Proportions………………………………………………………….. 13 Norm Referenced Measurements………………………………………….. 13 Criterion Referenced Measurements………………………………………. 14 Qualitative vs. Quantitative Data…………………………………………... 14 LOOKING AT DIFFERENCES………………………………………... 14 Significant vs. Meaningful Differences……………………………………. 14 Control for Effect………………………………………………………….. 15 Pre-Test to Post-Test Differences…………………………………………. 15 Reliability vs. Validity……………………………………………………... 15 IMPLEMENTATION ISSUES…………………………………………. 16 Establishing Analytical Creditability……………………………………… 16 Answer the Question but First Define the Question………………………. 16 Provide Information According to its Purpose……………………………. 17 Match your Information with its Recipients………………………………. 17 Beware of the Perils of Printout Worship…………………………………. 17 Keep Focused on what you are Attempting……………………………….. 18 Develop a Research Plan and Follow It…………………………………… 18 Take Time to Summarize………………………………………………….. 18 Make an Effort to “Do It Right”…………………………………………... 19 How Much Data is Enough?......................................................................... 19 HOW GENERALIZABLE ARE THE SCORES………………………. 20 Do the Scores Suggest Learning has Occurred?............................................ 20 Does the Learning Relate to the Larger Question of Assessment?............... 21 Does the Activity Producing the Score Suggest General or Specific Learning………………………………………………………………… 21 GATHERING AND MANAGING RESOURCES FOR ASSESSMENT………………………………………………………... 21 Whose Responsibility is Assessment Anyway?........................................... 21 Obtaining Human Resources for Assessment…………………………….. 22 Consultants………………………………………………………………... 22 Institutional Employee…………………………………………………….. 22 Assessment Committee……………………………………………………. 23 Assessment Consortium…………………………………………………… 23 External Agency…………………………………………………………… 23 Avoiding the HINIBUS Syndrome………………………………………... 24

TABLE OF CONTENTS Section Page A COLLECTION OF SPECIFIC ASSESSMENT TECHNIQUES…. 24 Embedded Course Assessment……………………………………………. 24 Using Norm-Referenced Tests……………………………………………. 25 Commercially Produced Tests…………………………………………….. 26 Criterion-Referenced Tests………………………………………………... 27 Portfolio…………………………………………………………………… 27 Scoring Rubrics…………………………………………………………… 28 Classroom Assessment Techniques (CAT)……………………………….. 28 Capstone Courses…………………………………………………………. 29 SUMMARY………………………………………………………………. 29 REFERENCES……………………………………………………………. 30

Preface This document was developed to assist MCC personnel engaged in assessment or

research activities. The contents are the result of the author’s consulting with district

faculty, staff and administrators and mirror the questions and concerns voiced by those

persons as they engage in their research/assessment activities.

Any questions or comments regarding this document should be directed to its author:

Dr. Charles L. Van Middlesworth District Director, Research, Evaluation and Assessment

Metropolitan Community College 3200 Broadway

Kansas City, MO 64111

Telephone: (816) 759-1085 Fax: (816) 759-1158

Email: [email protected]

An Assessment Primer INTRODUCTION Why is Assessment Important? Since the late 1980’s much of higher education has been focused, with varying degrees of

success, on assessing what students know. External mandates for institutional

accountability have made college and universities shift their focus from teaching to

learning. This is not to say that the assessment of student learning has not been taking

place within the wall of academe; rather, the emphasis has not been on what students

learn and the validation that students learn what we think they learn. Assessment is far

more than external accountability. It is the process of gathering and using information

about student learning to improve the institution, curriculum, pedagogy, faculty and

students.

What is Assessment? Assessment has been defined as the systematic gathering, interpretation and use of

information about student learning for the purposes of improvement. Assessment has

also been defined as a means for focusing our collective attention, examining our

assumptions and creating a shared academic culture dedicated to continuously improving

the quality of higher learning. Assessment requires making expectations and standards

for quality explicit and public by systematically gathering evidence on how well

performance matches those expectations and standards. It also includes analyzing and

interpreting the evidence of assessment and using the information to document, explain,

and improve performance. Thus, the single unifying notion of this discussion is that

assessment is a process not a product.

Expectations of an Assessment Program For an assessment program to be considered “effective” it should contain the following

features:

8/22/2003 1

• Structured; that is it should be organized and have a recognizable

framework;

• Systematic; it is conceived and implemented according to an assessment plan

developed by the institution;

• Ongoing; assessment activities and feedback are continuing rather than

episodic;

• Sustainable; that is, the institution is able to maintain the assessment program

with the structures, processes, and resources in place;

• A Process exists that uses assessment results to improve student learning.

The assessment process will be “framed” through the questions an institution wants to

know about its teaching and learning. This cannot be emphasized enough – questions

will influence every decision made concerning the assessment process. Many assessment

plan developers attempt short cuts by beginning with how the data is to be collected,

rather than to discuss and question what they want to know. I have learned from years of

experience that it is nearly impossible to solve a problem that has not been defined! It is

as important to query what students have learned, but it is equally important to provide

students with the option of “reflecting on their learning”.

Assessment can refer to two different activities: the gathering of information and the

utilization of information for improvement. From a practitioner point of view, the most

important description of assessment is simply: 1) what do students know; 2) how do

you know they know; and 3) what do you do with the information? As a term,

assessment data has various meanings within higher education. The term can refer to a

student’s course grade, a college placement test score, or a score on some form of

standardized instrument produced by an external agency or organization. The

information for assessment may be numerical, written reflection on previous work,

results from internally or externally developed instruments or examinations, or

assessments embedded within course work, to name a few. The principal goal of a

program for the assessment of student academic achievement is to improve teaching and

learning with evidence provided from an institution’s assessment program. Assessment

8/22/2003 2

information can stimulate changes in curriculum. A common misconception in higher

education is that assessment consists of administering exams, assigning a course grade,

scoring an assignment with a rubric or having a student demonstrate learning.

Assessment is not an act but rather a process that includes developing a conceptual

framework, identifying learning outcomes, developing a methodology for

implementation, evaluating design issues, administration of the activity, analyzing the

results, with the final step being using the information learned from the process to

improve teaching and learning. This primer is designed to assist the reader as they

become involved with their institution’s assessment program.

Measures of Learning Data collection methods include paper and pencil testing, essays and writing samples,

portfolio collections of student work, exit interviews, surveys, focused group interviews,

the use of external evaluators, logs and journals, behavioral observations, and many other

research tools. Research methods should be tailored to the type of data to be gathered,

the degree of reliability required, and the most appropriate measurement for the activity

being conducted.

In practical terms, there are two different types of learning measurement: direct

measures of learning and indirect measures of learning. Direct Measures of Learning

include pre- and post-testing; capstone courses; oral examinations; internships; portfolio

assessments; evaluation of capstone projects; theses or dissertations; standardized

national exams; locally developed tests; performance on licensure, certification, or

professional exams; and juried reviews and performances. Indirect Measures of

Learning might include information gathered from alumni, employers, and students;

graduation rates; retention and transfer studies; graduate follow-up studies; success of

students in subsequent institutional settings; and job placement data. The preference for

assessment programs is to use direct measures of learning.

8/22/2003 3

What can institutions of higher education do to prepare themselves to meet the expectations or demands for accountability through the assessment of student learning?

Institutions should create “an assessment-oriented perspective”. “An assessment-oriented

perspective” exists when all levels of the institution become advocates for “doing what is

right” and commit time, energy and resources to seriously examine student learning.

Advocating an emphasis toward student learning enables the institution to place “at the

head of the table” the single most important aspect of higher education, and that is

learning. The role of faculty in this endeavor is paramount. The comment made most

often by faculty is they are content not assessment specialists; however, faculty need to

realize it is through their knowledge of the content area that learning questions are posed,

discussed, defined and assessed. One method to use when engaging in the process of

assessment is triage. Triage has been defined as the sorting of and assigning priority

order to projects on the basis of where time, energy and resources are best used or most

needed. In this case, triage refers to identifying a learning need, such as general

education, discussing its attributes, defining its context, developing its component and

learning outcomes and developing assessment strategies to answer those questions. First

and foremost, all should recognize that assessment is research. Assessment is research

because through the multi-stage research process judgments are made, and judgment

translates into meaning.

DEVELOPMENT OF ASSESSMENT PROJECTS

For the last 10 years, institutions of higher education have spent considerable time and

energy refining their “Plan for the Assessment of Student Academic Achievement”.

When an institution’s plan was developed, submitted and accepted, there were probably

some that did not think revisions would be necessary until the next accreditation visit.

This is not true and by now it is widely known that some plans will require significant

changes. A method of reviewing assessment plans and making adjustments that are time-

wise and appropriate are:

1. Spend time to review the literature and discuss issues about the topic

8/22/2003 4

2. Develop question(s) that form the basis of inquiry

3. Develop a research plan to obtain the data

4. Develop answers based on the data.

Spend Time to Review the Literature and Discuss Issues about the Topic

In general terms, the first step to building an effective assessment project is to spend time

to review the literature and discuss issues about the topic. Without question, discussions

regarding learning topics are more productive when efforts are taken to review pertinent

literature. The literature review allows members of the “learning topic group” the ability

to obtain a “theoretical perspective” of the topic as well as examine implementation of

similar projects at like institutions. Having and discussing background information tends

to keep the learning topic “in focus”.

Develop Question(s) that Form the Basis of Inquiry

Once the literature review and subsequent discussion has taken place the learning topic

group needs to develop a series of questions that will form the basis of inquiry. As

mentioned previously, assessment is research and research is an activity that is employed

to answer questions. During the question development stage steps need to be taken to

insure the question and its components are adequately defined or specified. Poorly

framed questions will generate data of little value.

Develop a Research Plan to Obtain the Data

The third step in this process is to create a research plan that becomes the operational

document for examining the learning topic. A research plan is the methodological

roadmap to successful and useful assessment activities. The development of the research

plan occurs following the first two steps of this process, noted above. Earlier narrative

provides the basis from which the learning topic will be framed, structured and assessed.

8/22/2003 5

Develop Answers Based on the Data

The last statement constituting this process seems logic and/or obvious. In fact, this

aspect of the process can be the most difficult because the data provided might show that

earlier steps were not “fleshed-out” to the extent they should have been. At this stage if

any short cuts or other “less defined” activities were implemented, the data will provide a

clear statement that the process needs to be refined or some aspects of the project need

revisiting.

METHODOLOGICAL ISSUES What to Assess

Within the context of the aforementioned steps, it is critical that appropriate

methodological decisions be made about the project and its associated activities. Several

issues have been identified to provide a “framework” to assist with the development of

the methodological component of the learning topic project. The first issue is to

determine what to assess. Earlier, a four-step process asked the learning topic group to

review the literature and develop questions that formed the basis of the assessment

activity.

Define Components

In methodological terms, it is time to define the components of the question in

operational terms. “In operational terms” suggests that each component of the learning

topic be defined in a unique and identifiable way.

Examine Component Intricacies Once the learning topic components are defined, members of the group need to focus on

the measurement possibilities. Examination of component intricacies provides a basis for

determining if the learning topic is to be assessed as a whole and/or through the

individual components. If the “individual component” option is chosen then the

measurement option chosen needs to afford the learning topic group the ability to

individually assess each component as well as its contribution to the whole.

8/22/2003 6

Review Measurement Options

Measurement options for components include the use of fixed or sliding scales or scoring

rubrics. Scales or rubrics become numerical representations of learning activities,

experiences and breadth of learning.

Scales

A scale has been defined as a set of items “arranged in some order of intensity or

importance” (Vogt, 1993). Scales associated with research and assessment activities can

be generally categorized as fixed or sliding. A fixed scale refers to a set of points that

identify a particular social or learning event. The points on a fixed scale are integers, that

is, on a scale of 1 to 7 each case (or person’s) behavior and/or learning is associated with

a component score that is either a 1,2,3,4,5,6 or 7. Fixed scale points are defined in terms

of “agreement”, “satisfaction” or other, but the scale definition and value are the same for

each item. Fixed scales are sometimes misconstrued as sliding scales because during the

analysis the mean for an individual item may be a 3.7, thus, many think of the scale as

sliding. This is not so, because 3.7 is a summary of item responses rather than an

individual item score. All scores for the item create the item summary mean. A sliding

scale may use the same 7-point scale but a person or case has the option of placing

himself or herself at any point along the scale. Sliding scales are typically used to

determine where a person would place himself or herself in response to a set of questions

or items that pertain to a defined knowledge or opinion set. Although the scale is a 7-

point scale, the scale attributes are different for each item.

Rubrics

A rubric, on the other hand, utilizes an attribute scale designed to signify the level or rate

of learning that has taken place. The idea behind a rubric is for members of the learning

topic group to identify competencies or attributes that enable specific distinctions

between performance or knowledge levels for each person completing the assessment

activity. A rubric may not lend itself to utilizing a mean to describe performance or

differences between participant scores. Rubrics are typically discrete scales that identify

8/22/2003 7

participants as achieving a score of 1, 2, 3, 4, 5 or 6, rather than identifying participants

as at the 3.46 level. If the learning topic group wants to utilize a sliding scale, those

discussions need to occur during the project development stage prior to implementation.

How Many Subjects?

One of the most frequently asked question of methodologists is “how many people do we

need to make this a legitimate study/process? The answer to this question has

antecedents in the discussion above. The number of students is determined by the nature

of the project and the level of implementation (pilot study or major assessment

component).

Sampling

The first question that should be asked is whether or not all students should receive the

assessment or a sample. This decision is largely determined by the size of the student

population, availability of human and financial resources, and relevance to the project

intent. Small institutions should carefully scrutinize the use of sampling because of the

small numbers of students that represent the target group. Readers should be cautioned

that not all members of a campus community necessarily endorse the use of sampling.

There are many reasons for the reluctance to accept sampling as a viable component of

the assessment process. This writer believes much of the reluctance stems from a

misunderstanding of sampling, its attributes, and rules regarding the appropriate use of

sampling. When properly applied, sampling is a powerful tool and should be in every

institution’s methodological toolbox.

There are several types of sampling: random, stratified and representative. By far the

easiest sampling technique to implement is the random sample.

Random Sampling

A random sample is determined by “selecting a group of subjects for study from a larger

group (population) so that each individual is chosen entirely by chance” (Vogt, 1993).

8/22/2003 8

The important aspect of the random sample is that every member of the population has an

equal opportunity of being chosen.

Stratified Sample

A stratified (random) sample involves selecting a group of subjects from particular

categories (or strata, or layers) within the population (Vogt, 1993). Examples of a

stratified sample are using females, or males 30 years of age or older, or students

completing Composition I, as the population for the study.

Representative Sample

Representative samples are not mentioned in the literature to the extent of random and

stratified samples. A representative sample is the selection of a group of subjects that are

similar to the population from which it was drawn. In the case of representative sample,

the operative word is similar. In its complete form, similar refers to matching the

characteristics of a subject population in terms of the criteria that identify that population.

For instance, if the learning topic group wants to assess a representative sample of

students at an institution the criteria for determining membership within the sample must

be through “demographic analysis”. Prior to selecting the sample the learning topic

group would identify the attributes of the student population that are determined

“demographic”. In most cases the demographic attributes of a subject population would

be gender, age, racial/ethnic affiliation, marital status, socio-economic level and so forth.

To obtain a representative sample subjects would be selected in the direct proportion of

their membership within the student population. For instance, if 25 percent of the student

population is female, over 25 years of age, white, single and lower middle class, then 25

percent of the study subjects must also meet these criteria. Representative samples are

not used at a frequency as great as random or stratified because of the complex nature of

developing such samples.

Living with Something Called “Error”

Aside from the mechanics involved with selecting a sample of subjects is the realization

that every form of design has some sampling error. The typical amount of sampling error

8/22/2003 9

associated with surveys varies from 3 to 5 percent. The amount of error a project is

allowed to tolerate is proportional to the number of persons associated with the project

and the amount of “infrastructure’ support that is available.

DESIGN ISSUES

There are several types of research designs that lend themselves to assessment projects:

experimental versus control groups, pre- post-test, collection of students, longitudinal or

mixed model. Each design will be briefly discussed.

Experimental vs. Control Group

The experiment versus control group is one of the most widely known research designs.

The underlying principal of the experimental design is the ability to assess the

intervention(s) and/or the lack of, with a group of subjects. As is commonly known, the

control group does not receive the intervention whereas the experimental group does.

The pre- post-test design enables campus personnel the opportunity to evaluate the

intervention results by using a test at the beginning of the semester as a comparison with

data acquired at the conclusion of a semester.

Collection of Students

The design using a collection of students involves identifying an unique characteristic,

such as, a collection of students that have completed Comp I, have completed 50 credit

hours, or have an ACT score of 24 and above. Students meeting the selection

requirements become project subjects.

Longitudinal Design

An increasingly popular design is the longitudinal study. Longitudinal designs involve

identifying subjects on the basis of participation in a particular course, program, and/or

course of study. Data elements collected for the longitudinal design involve those aspects

that directly relate to the learning topic being examined. The length of a longitudinal

study is determined based on the needs of the project.

8/22/2003 10

Mixed Model

On the other hand, the mixed model involves using a combination of both qualitative and

quantitative data. Mixed models are used with many of the designs previously discussed.

Mixed models provide an excellent opportunity to link the qualitative and quantitative

data being collected in this process.

METHODOLOGICAL CONSTRUCT

The term methodological construct may appear to be misleading, but the writer uses the

term to refer to “how the data is collected”. There are many different means for

collecting data: written, objective, rating scales, sliding scale data, proportional data, and

norm referenced measurements, and qualitative versus quantitative data.

Written Data

Written data refers to data collected through the use of “controlled prompts”, open-ended

responses to assessment surveys or classroom assignments linked with larger assessment

projects. For written assessment data to be of value, considerable “up-front” time must

be used to develop scoring criteria and appropriate score values. Categorization of

written data, also relevant when using qualitative analysis, requires rigorous training in

order to “norm” responses to scale values. Inter-rater reliability is obtained from making

a concerted effort to insure that multiple readers/raters assign a score to a subject’s

writing that falls within the same scale value or does not vary more than one score value.

Without a control for inter-rater reliability written assessment data becomes suspect for

use institution-wide.

Objective Data

Objective data are those data that are presented with a multiple-choice theme, and are

typically used to identify the amount of knowledge possessed. This format is frequently

used in the development of local assessment instruments. These instruments may be

associated with specific courses and/or faculty that are using these data as a link with

larger assessment projects. Prior to using locally developed instruments such as these, it

8/22/2003 11

is necessary to conduct the appropriate reliability and validity checks. It should be noted

that some commercially produced assessment instruments also use the objective format.

Rating Scales

Rating scales refer to the use of specific-point scale values to distinguish subject opinion

or knowledge level. Likert-type scales are “fixed point” scales when subjects complete

the instrument; however, during analysis an item receives a score value that appears to be

sliding or continuous. Readers should note that Likert-type scales are constructed so

subject opinion or perception is determined from the mid-point of the scale (neutral or

undecided), rather than the anchorage points (extreme lower and upper score). Scores

obtained from a rubric need to be recorded as an integer, unless provisions have been

made to account for “sliding” scores. As has been mentioned, significant “up-front”

discussion needs to occur prior to implementing an assessment instrument that yields data

values that create more confusion than answers.

Survey Data Surveys as a means to collect assessment data has as many advocates as it has critics.

The prevailing wisdom is to view survey data as supplemental to other forms of

assessment data. Unfortunately, surveys have received a great deal of criticism that is

unwarranted when viewing surveys as a methodological design. In this writer’s

experience, most surveys developed and administered fail to meet the lowest level of test

for appropriateness. For some persons a survey consists of the best questions they can

think of and typed, using the latest word processing “gee whiz” format to make it look

professional. If surveys are to be used as an assessment tool, considerable time and

energy needs to be expended to insure: 1) the information desired can be obtained by

using a survey design, 2) questions/statements have been clearly thought-out and written

clearly, 3) the survey has been field-tested on a group of respondents that mirror those for

whom the survey is written, 4) appropriate reliability and validity tests have been

administered, and 5) the data collected can be linked with and support larger sets of

assessment data.

8/22/2003 12

IDENTIFYING MEANS OF MEASUREMENT

As mentioned previously, defining the means of measurement for an assessment project

is one of the most important steps in the process. The following four types of

measurements are possible options for most assessment projects.

Fixed Scale Data

The first type is fixed scale data, or data that have numerical points that are integers.

Discussion topics for this type of scale is whether or not an individual score can have a

partial value; e.g., decimal point. Typically, fixed scale data points should be viewed as

having meaning or value unto themselves; that is, a 1 is a 1 with multiple scores are

summed rather than averaged. This rule is violated frequently.

Sliding Scale Data

The sliding scale is the second type of measurement. As mentioned previously, sliding

scales allow subjects to place their opinion or knowledge level at any point, fraction or

otherwise, along the scale.

Using Proportions

Proportions as a measurement provide a simple way of determining the knowledge or

opinion of a group of subjects. The precision of the proportion is determined by the sum

of what the proportion represents. Many faculty prefer this because of current grading

practices.

Norm Referenced Measurements

Norm referenced measurements represent a score that allows local subjects to be

compared with a larger population regionally or nationally. These scores are derived

externally and are generally very reliable.

8/22/2003 13

Criterion Referenced Measurements

Criterion referenced measurements are questions or items that are written to a specific or

predetermined set of criteria. The criteria become the standard from which students

compete.

Qualitative vs. Quantitative Data Quantitative data refers to numerical information that explains events, abilities and/or

behavior that requires precision. Qualitative data refers to observational information such

as color, texture or clarity of some object or place. Qualitative data is desirable when

describing or measuring something that lacks inherent precision. Many times qualitative

data is used to construct instruments from which quantitative data is eventually collected.

Both have a place in assessment program measurement.

LOOKING AT DIFFERENCES

Once a process has been developed, instruments implemented, and data collected

researchers have the task of examining the results. The research plan provides an outline

of the analytical steps necessary to demonstrate the degree to which learning has

occurred.

Significant vs. Meaningful Differences

One method of determining "learning growth" is the use of significant tests. There are a

variety of tests that can be used and numerous reference materials that explain their

meaning and use. What is of importance to the current topic is the distinction of what is

significant and what is meaningful. Identification of significance is established during

initial discussion about the assessment project and is stated within the research plan.

Significant difference is a difference between two statistics, e.g., means, proportions,

such that the magnitude of the difference is most likely not due to chance alone (Wilson,

1980). Values associated with significant differences are normally thought of as .05

(occurrence a result of chance is less than 5 in 100), .01 (less than one chance in 100), or

.001 (occurs by chance less than 1 in 1,000). Many tend to view assessment results

strictly in terms of significant difference and are disappointed if those results are not

8/22/2003 14

obtained. What should be recommended is viewing what researchers categorize as

"meaningful differences". Meaningful differences refer to differences that fall within the

range of .15 to .06 (occurs less than 15 to 6 chances in 100). The differences are worthy

to note and represent a meaningful change among and/or between subjects. Meaningful

differences are especially noteworthy for assessment projects that examine growth or

change within short periods of time; e.g., one semester, year or two years. Change that

occurs through short periods of time is less dramatic and must be measured with a

"learning ruler" printed with large numbers. Viewing assessment through smaller

increments of change is more realistic and more reflective of what occurs with shorter

intervals of time.

Control for Effect

Another aspect of "looking at differences" is qualifying change by "controlling for

effect". Controlling for effect is an important step in the interpretation of results from an

experimental/control group or evaluation study. "Effect size" is a process used to see

how much of the standardized mean difference is attributed to the intervention as

compared with differences generally found for this intervention or activity.

Pre-Test to Post-Test Differences

Assessment projects look for differences between groups of subjects through testing for

prior intervention and post intervention; e.g., pre- and post-testing. The use of pre- and

post-testing can serve an assessment project in several ways: 1) pre-tests provide the

basis from which future tests are compared, 2) the legitimacy of the intervention is

framed from pre- and post-test data; 3) differences between the two test scores form the

basis of reporting change or learning, and 4) change can be identified through differences

in test score means or the growth attributed to the distance between scores.

Reliability vs. Validity

Construct Validity is the degree to which inferences can be made from

operationalizations in a study to the theoretical constructs on which those

operationalizations were based (Vogt, 1993).. In other words, if your study is based on a

8/22/2003 15

theoretical notion about some aspect of student learning, it is assumed that as the study

was developed the context of the study is defined operationally (specific aspects of

learning to be studied). At the conclusion of the study if inferences or predictions about

learning from the operational aspects of the assessment can be linked to the theoretical

constructs, then the project has construct validity. On the other hand, content validity

refers to the degree to which items accurately represent the “thing” (behavior, learning,

etc) being measured. Reliability refers to the consistency or stability of a measure to test

from one administration to the next. If the results are similar the instrument is said to be

reliable. Two common techniques for reliability is KR-20 or Cronbach’s Alpha.

Reliability rates are provided by a two-point decimal value with a .60 being considered as

minimal for most studies. A reliability of .60 means that 6 out of 10 persons view

attributes of the study similarly.

IMPLEMENTATION ISSUES Establishing Analytical Creditability During the analytical phase of an assessment project two questions resonate: 1) does the

design have analytical credibility?; and 2) how is the data (or score) generalizable to the

fundamental questions of assessment? To answer question one, nine points are presented

to provide the framework for determining analytical credibility:

• Answer the Question but First Define the Question Making the statement to define questions before seeking answers probably seems like

questionable advice. If assessment projects “get into trouble” it typically is a result of

varying from the identified question or focus with an excursion to “it would be nice to

know land”. Defining the question before seeking to answer the question, especially

during the project development stage, allows the linking of literature review, colleague

discussion, question definition and agreement to occur prior to implementation.

8/22/2003 16

• Provide Information According to its Purpose Research and assessment projects have the potential for voluminous and complex

analytical reports containing facts, figures, data tables and graphs, ad nausam. A general

rule is to provide the minimum of information necessary to fulfill a request or answer the

assessment question, uncluttered with excessive verbiage and unnecessary analysis.

Without question members of the assessment project group should have detailed

information as well as synthesis about the project in which they are involved. Providing

the “right fit”, regarding the amount of information needed to make judgments about the

project, needs to be part of the routine business discussed at assessment committees or

project teams.

• Match your Information with its Recipients The discussion of who receives information is as important as purpose and content.

The question of who is to be excluded from the distribution list is an important as that of

who should be included. When communicating the results of assessment it is essential to

build a data portrait of assessment recipients. Most faculty and administrators have

preferences for particular types of data; some prefer data in tabular form, graphs,

narrative, and/or executive summaries. It is clear that most assessment initiatives lack the

resources to produce five different versions of a report to fit the preferences of different

persons. Knowing how strong particular preferences are for information and adjusting

accordingly will enable the assessment “feedback loop” to be more effective.

• Beware of the Perils of Printout Worship When many persons think of research and assessment projects a picture comes into focus

of several people carefully examining several inches of computer output while engaged in

an intense discussion. While it is true many forms of analysis produce large stacks of

computer output, it would be unwise for any learning topic group to distribute the

computer output without some synthesis. Regardless of the significance of the work at

hand, very few decision-makers, faculty or administrators, will sift through reams of

output to find the project produced significant results.

8/22/2003 17

• Keep Focused on what you are Attempting The origin of this phrase comes from years of experience watching research and

assessment committees’ loose focus of their project, consequently spending inordinate

amounts of time discussing issues long since agreed. When a project team looses focus

on its goals, work stalls and the subsequent inertia provides the catalyst for individuals to

re-engage academic debate that attempts to redefine a project as implementation is taking

place. The best analogy for this is a dog chasing its tail. Assessment projects should

never be implemented as long as there are questions about its intent, theoretical

framework, design or implementation procedures.

• Develop a Research Plan and Follow It Projects that have a research plan demonstrate to the campus community that thought and

deliberate action were principal agents of its creation. Likewise, following the

established research plan demonstrates professional and methodological credibility.

Research plans are dynamic and they can be changed to adjust to circumstances and

events that necessitate change. It is imperative that if a change in the plan does occur

those changes should reflect sound methodological judgment. Research plans that are

continually in a state of flux provide little data that is worthwhile or effective. The rules

of reliability and validity must always be the plan’s fundament tenet.

• Take Time to Summarize For research and assessment information to be helpful it is necessary for it to be

summarized in some manner. It would be helpful to the campus audience that the

summary contained narrative as well as tabular data. For instance, if a summary report is

distributed and the principal text is a table highlighting a mean (average) score of 3.543,

several questions need to be answered. First, what does a mean (average) score of 3.543

mean, and from what context does this score emerge? Issues pertaining to the score can

be defined by explaining the “context” for the score from the section in the assessment

project plan that pertains to score range. Second, when viewing the score 3.543, how is

this score different from 3.5 or 3.54? It is important to have an agreed upon style for

8/22/2003 18

reporting assessment data that does not suggest or infer artificial precision. An example

of artificial precision is using percent figures containing decimal points, such as, 86.3

percent. Is it necessary to provide data points that contain two proportions, e.g., 86 with

regard to 100 and .3 with regard to 10)?

• Make an Effort to “do it Right” No one doubts research and assessment projects take time, financial resources, as well as

physical and emotional energy. If an activity consumes this much energy it would seem

logical for participants to insure assessment meets the “do it right” criterion. “Doing it

right” should take on a practical orientation that is supportive of multi-methodologies and

design philosophy. Assessment projects that are part of an overall institutional

assessment initiative and support the “assessment-oriented perspective” produce

information about learning that assists with the evaluation of curriculum.

Another aspect of “doing it right” is to follow sound methodological practices. Earlier it

was stressed that discussion about the learning topic, literature review and identification

of assessment goals provide the basis for determining the methodologies used. Following

sound or accepted practices sends a message throughout the institution that assessment

projects have creditability. Creditability also occurs when the assessment initiative has

faculty buy-in. If faculty do not legitimize the assessment process, it is unlikely that

anyone else will view the efforts as meaningful. The campus assessment initiative needs

to be inclusive; that is, involve all levels of the faculty involvement. Critics as well as

“believers” should have equal seating at the assessment table. It is important for

assessment committees to meet “head-on” issues or questions raised by the critics of

assessment. Excluding critics from assessment discussions only strengthens their resolve

and intensifies attacks. If the issues raised by critics have merit then those issues should

be examined and discussed by the campus community. An action that lessened tension at

my institution was to encourage the district assessment steering committee to develop a

statement on assessment. A statement was developed and endorsed by the faculty senate

at one of their business meetings. The statement on Ethical Conduct and Assessment

provides the institution’s philosophy on assessment as well as an understanding of how

8/22/2003 19

assessment data will be used. Developing and following the research plan, involving

faculty in its development and implementation, making curriculum decisions based on the

assessment data, and providing feedback to all levels of the institution meets the criteria

for an institution “doing it right”.

• How Much Data is Enough? Or When does Baseline Data Cease being Baseline Data? All assessment initiatives, regardless of institutional size, must make the decision

regarding “how much data is enough”? Prior to answering this question there are several

issues that must be discussed. First, there are statistical concerns that include replication,

sampling error, and methodological viability. The second issue is manageability.

Manageability includes what is practical in terms of human resources, burnout, and

financial costs. As mentioned in the section, “How Many Subjects”, a plan for consistent

assessment of student learning is of greater value to an institution than massive testing

that runs for several semesters, stops and lacks applicability to the overall assessment

initiative.

HOW GENERALIZABLE ARE THE SCORES Asking the question about how generalizable the scores are to the fundamental questions

of assessment (reference step 2, above) provides the basis for the project’s external

validity. Establishing the relationship between assessment scores and the assessment

instrument answers “to what population(s), settings, treatment variables, and

measurement variables can the effect (outcome) of the activity be generalized to other

learning” (Leedy, 1974:150)? This section poses three questions as a way to explain the

link between generalizable scores with assessment questions.

• Do the Scores Suggest Learning has Occurred?

During the development phase of the assessment program faculty engage in discussions

that establish the theoretical framework of the project as well as defining what constitutes

learning. A majority of the time evidence of learning is attributed to an assessment score.

8/22/2003 20

The score must be viewed within the context it was created as well as through the

rationale used to create the meaning of each score value.

• Does the Learning Relate to the Larger Question of [component]

Assessment? This question seeks to establish whether or not an assessment activity (or score) can be

used to provide evidence of more global learning. For instance, if a group of faculty

develops a project to examine an attribute of critical thinking, will the score obtained

through its assessment provide creditable evidence or data that links the project’s activity

with the established components for critical thinking institution-wide? Administrators

and faculty need to recognize that assessment projects scattered throughout the institution

with little or no linkage to the overall assessment initiative cannot be classified as an

effective assessment program.

• Does the Activity Producing the Score Suggest General or Specific

Learning? This question tries to determine that when a group of subjects completing an assessment

activity produce a set of scores, is the basis for these scores a series of questions that

produced evidence of general learning or specific learning? For example, when a group

of subjects complete an assessment activity for critical thinking, do the scores reflect the

subject’s general knowledge about critical thinking or can specific evidence be extracted

from the assessment to identify knowledge of deduction, inference, and so forth.

Distinguishing general knowledge from specific component knowledge provides the

basis for what could be termed a “robust” assessment program.

GATHERING AND MANAGING RESOURCES FOR ASSESSMENT Whose Responsibility is Assessment Anyway? Several months ago there was considerable discussion on an assessment listserv

regarding the responsibility for assessment. In the course of the dialogue, the tally was

roughly even between faculty having the responsibility and administrators being

responsible. However, most agreed that it is all our responsibility. Faculty should be

8/22/2003 21

responsible for developing the institutional assessment program, with the administrators

responsible for obtaining the financial resources. In addition to financial resources, it is

also the responsibility of the administration to provide technical assistance for the

assessment initiative. It is understood that not all institutions have the ability to support

assessment in a multi-level fashion; that is, provide financial, human and methodological

resources.

Obtaining Human Resources for Assessment

Institutions not able to hire an assessment director or coordinator may have to look at

other options for the technical and human resources needed to support the assessment

program. Several options exist for institutions with limited resources.

Consultants

Institutions can hire a consultant to visit the campus periodically to monitor assessment

progress. Consultants are able to place a considerable amount of knowledge and

experience in an assessment program in fairly short order. However, the limitation of a

consultant, especially if he or she is not local, is that they are not always available when

assessment questions or problems arise. If a consultant is used, institutions must insure

that the momentum for assessment does not wax and wane with consultant visits.

Institutional Employee Another option is identifying an existing institutional employee to coordinate the

assessment program. If an institution chooses to use a current employee, he or she should

have several characteristics: 1) must be a faculty member; 2) must have the respect of

his/her colleagues; and 3) he or she should be freed from a majority of their teaching

assignment. The institution should be willing to invest some of its resources to provide

the faculty member with fundamental knowledge about assessment as well as funds to

bring in experts from time to time. Of course, the advantage of using a current employee

is they are on campus everyday. There is one caution about naming an employee as the

“assessment person”. It is too easy for campus personnel to assume the “assessment

person” is responsible for administering, analyzing and writing assessment project

8/22/2003 22

reports. Therefore, it becomes too easy for faculty and others to assume their role is to

respond to the work of others rather than actively engaging in the assessment program.

Assessment Committee Many institutions create an assessment committee that has representatives from the

faculty, staff and administration to monitor the assessment program. Providing a

“charge” that outlines expectations for the assessment committee insures the work of the

group is meaningful. For instance, the assessment committee is charged with:

• Determining how the assessment program functions;

• Clarifying the role faculty play in its operation;

• Identifying what measures and standards have been proposed and adopted for

assessing student learning; and

• Stating how the results of assessment are used to identify changes that may be

needed if student learning is to improve.

Assessment Consortium The different option is to network with a group of colleges and establish an “assessment

consortium”. A consortium has several advantages in that expenses are shared, a variety

of personnel may be available as an assessment resource rather than a single consultant.

Collective knowledge and collaboration are primary benefits of assessment through a

consortium. A limitation of a consortium is its strength is only as viable as the

commitment of the institutions involved.

External Agency

A fifth option is to have a testing organization/company analyze institutional data. If a

testing organization option is chosen, it may necessitate an institution modifying its

assessment strategies to use commercially produced assessments. Testing organizations

can provide a considerable amount of technical expertise in the field in very short order;

however, affiliation with a “for profit” business may create obligations for using their

8/22/2003 23

products. Technical expertise is available but may be by telephone or email rather than

on-site.

Avoiding the HINIBUS Syndrome

One shortcoming to using persons that are not employees of your institution may result in

a “lukewarm” commitment to assessment because of comments like, “Dr. Smith is not a

member of our faculty”, “it’s not our data”, or “the data are more supportive of the needs

of outsiders than our needs”, and so forth. Institutions should be careful not to fall for the

“beware of it’s not our data” when “outsiders” analyze institutional assessment data”.

It is too easy for assessment momentum to be lowered by questions of doubt or

allegations of inappropriate use of institutional assessment data. This is what can be

called the HINIBUS Syndrome, or “Horrible If Not Invented By Us”. Faculties need

to be cautioned that it is not necessary to independently invent all assessment procedures

or activities. Using a combination of locally developed and norm referenced assessments

compliment most assessment programs. Readers should note these are issues that need to

be discussed during the steps reported in the section on Development of Assessment

Projects.

A COLLECTION OF SPECIFIC ASSESSMENT TECHNIQUES Embedded Course Assessment The term “course embedded assessment” refers to linking classroom activities and

assignments to the assessment of a common learning outcome. The outcome is linked to

what students are already learning in class, thereby taking advantage of existing

[curricular offerings] that instructors collect or by introducing new assessment measures

into courses. To successfully embed assessment measures into existing assignments, the

following sequence of activities are recommended:

• Specify intended outcomes; • Identify related courses; • Select measurements and techniques; • Assign techniques to course and embed measures; • Specify assessment criteria;

8/22/2003 24

• Evaluate student performance on exams, papers, projects, etc., for course grades;

• Evaluate student performance on course embedded measures. (Larry H. Kelley, Workshop on Embedded Assessment. Kelley Planning and Educational Services, LLC).

The most commonly used embedded assessment methods involve the gathering of

student data based on questions placed within course assignments. These questions are

intended to assess student outcomes and are incorporated into course assignments or

requirements, such as, final examinations, research reports, course projects or some type

of demonstration. Student responses are typically graded by at least two faculty members

in order to determine whether or not the students are achieving the prescribed learning

goals and objectives. It should be noted the embedded assessment is a different process

from that used by the course instructor to grade the course assignments, exams, or papers.

There are several advantages to using course embedded assessments, they are:

• Student information gathered from embedded assessment draw on accumulated education experiences and familiarity with specific areas or disciplines.

• Embedded assessment often does not require additional time for data collection, since instruments used to produce student-learning information can be derived from course assignments that are currently part of the requirements.

• The presentation of feedback to faculty and students can occur quickly creating an environment conducive for ongoing programmatic improvement.

• Course embedded assessment is part of the curricular structure and students have a tendency to respond seriously to this method. (Blue Ridge Community College Student Outcomes Assessment Manual: A Guide for the College Community).

Using Norm-referenced Tests Norm-referenced tests refer to instruments that are designed and administered to large

groups of students. The collective responses of these students represent learning

associated with the student sample and the test; the results being a Mean (average)

response. After the test is administered many times and with each administration the

instrument is subject to rigorous item and content validity and reliability, the test is

8/22/2003 25

considered “normed” and is the reference point for all students taking the test. The

Means for students at your campus can then be compared with all students that have

taken the test. Norm-referenced tests are often developed by testing companies through

the use of employing “experts” to develop the test items. The assumption regarding

norm-referenced tests is the specific test, subtest or module content is considered to be

what all students should know about a given topic. Tests, subtests or modules are

normally categorized or named in general terms, such as: Reading or Reading

Comprehension, Critical Thinking, Scientific Reasoning, and so forth.

Commercially Produced Tests [similar to norm referenced] Commercially produced tests and examinations are used to measure student competencies

under controlled conditions. Tests are typically developed by professional organizations

and companies to determine the level of learning a student should acquire in a specific

field of learning. Commercially produced tests generally consist of multiple choice

questions whose results can be used to compare local students with other students from

institutions across the country. If properly chosen, the results from these tests can be

used to improve teaching and learning. The most notable advantages of commercially

produced tests are:

• Institutional comparisons of student learning; • Little professional time is needed beyond faculty efforts to analyze

examination results and develop appropriate curricular changes that address the findings;

• Nationally developed tests are devised by experts in the respective field; • Tests are typically administered to students in large groups and do not require

faculty involvement when students are taking the exam. The strongest criticism of commercially produced tests is they may not be reflective of

the institution’s curriculum. Test design and content should be reflective of an

institution’s curriculum in order for the results to be helpful.

8/22/2003 26

Criterion-Referenced Tests Criterion-referenced tests are designed to measure how well a student has learned a

specific body of knowledge and skills. Multiple choice tests, similar to a driver’s license

test, are examples of a criterion-referenced test. Criterion-referenced tests are usually

made to determine whether or not a student has learned the material taught in a specific

course or program. Criterion-referenced tests that are used within a course are designed

to test the information learned from the course as well as the instruction that prepared

students for the test. The principal use of criterion-referenced tests come from using a

pre- and post-test design to determine how much students know prior to the beginning of

instruction and after it has finished. The test measures specific skills which make up the

designated curriculum. Each skill is expressed at an instructional objective and each skill

is tested using at least four items in order to obtain an adequate sample of student

performance and to minimize the effect of guessing. The items which test any given skill

are parallel in difficulty. Each student’s score, normally expressed as a percentage, is

compared with a preset standard for acceptable achievement with little regard for the

performance of other students. [Source: The National Center for Fair & Open Testing

and Educational Psychology Interactive].

Portfolio A portfolio is normally considered to be a collection of work that represents an

individual’s cumulative work (at a given point in time and space). In assessment terms, a

portfolio represents a collection of student work that exhibit to faculty a student’s

progress and achievement in specified areas. For instance, included in a student’s

portfolio could be written papers, either term papers, reports, etc. that include a reflective

piece written by the student; results from a comprehensive or capstone examination;

norm-referenced exam results, such as, WGCTA, Cornell X, CAAP-Critical Thinking,

CollegeBASE, etc. If a student is a vocational student the portfolio may consist of pieces

of machinery a student designed and built; a collection of computer programs; field

reports about specific activities or procedures; or a set of drawings that demonstrate

student knowledge.

8/22/2003 27

A portfolio can be collected over the student’s experience at the institution, e.g., one year,

several semesters, etc., so faculty can evaluate the full scope of a student’s work. In

particular, the longitudinal aspect of evaluating portfolios allow faculty to “see” the

academic growth of students as they matriculate through the institution. Central to the

evaluation of a student portfolio is a scale or rubric from which to grade the material or

artifacts. The criteria for grading the portfolio need to be in place prior to the formal

evaluation of a student’s material or artifacts. The proliferation of modern technology

has provided new ways for storing written and visual information; such as through a disk,

CD or webpage.

Scoring Rubrics Rubrics are a popular means for assessing student learning because, with proper training,

they can be a consistently reliable means to assess essays, papers, projects, and

performances. A rubric contains descriptions of the traits a student must have to be

considered “high-level”, “acceptable”, or “poor quality”. A rubric can contain several

layers of student ability; such as, comprehensibility, usage, risk taking and variety, to

name a few. Within each rubric category (e.g., risk taking) there are multiple “levels” of

learning a student can display. An analytic rubric measures each part of a student’s work

separately whereas a holistic rubric combines them.

Classroom Assessment Techniques (CAT) Classroom Assessment Techniques (CAT) refers to when faculty obtain useful

information, or feedback on what, how much and how well their students are learning.

The feedback can be as simple as: 1) list the most important thing you learned today; 2)

what is it that you are most confused about; or 3) what additional information would you

like to know about what was discussed today? This process was popularized by Angelo

and Cross in their work Classroom Assessment Techniques, published by Jossey-Bass in

1993.

8/22/2003 28

Capstone Courses Capstone courses are designed to integrate knowledge, concepts, and skills associated

with a complete sequence of study in a program. The method of assessment is to use the

courses themselves as the instrument and basis for assessing teaching and student

learning. The evaluation of a student’s work in capstone courses is used as a means of

assessing student outcomes. The capstone course becomes the forum from which a

student displays his or her knowledge through various aspects of their programmatic

experiences. It varies from program to program whether or not a single or several

capstone courses are necessary to adequately assess a student’s learning. Generally

speaking, capstone courses are the final experiences for students within a discipline or

program.

SUMMARY The basis for this paper was to provide an overview of the practical application for

conducting learning assessment in a campus community. The comments and suggestions

are derived from years of working with campus communities as they attempt to put their

“assessment house in order”. As mentioned previously, encouraging an institution to

create “an assessment-oriented perspective” is the first step in creating a campus climate

that is assessment friendly. The majority of comments made within this paper can be

summarized as a “back-to-the-basics” for fostering the assessment initiative and

developing assessment projects. Hopefully the information contained within this paper

will provide readers with suggestions or techniques that will enable their experience with

assessment to be more rewarding, exciting and productive.

8/22/2003 29

REFERENCES 1993 Angelo, Thomas A., Cross, K. Patricia. Classroom Assessment Techniques: A

Handbook for College Teachers. Jossey-Bass, Publishers: San Francisco. 1987 Kraemer, Helena Chmura and Thiemann, Sue. How Many Subjects? Sage

Publications: Newbury Park, CA. 1991 Miller, Delbert C. Handbook of Research Design and Social Measurement. Sage

Publications: Newbury Park, CA. 1983 Norris, Donald M. “Triage and the Art of Institutional Research.” The AIR

Professional File, Number 16, Spring-Summer 1993. The Association for Institutional Research.

1993 Vogt, W. Paul Dictionary of Statistics and Methodology: A Nontechnical Guide

for the Social Sciences. Sage Publications: Newbury Park, CA. 1980 Wilson, Terry C. RESEARCHER'S GUIDE TO STATISTICS: Glossary and

Decision Map. University Press of America. Lanham, MD.

8/22/2003 30