Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third...

23
Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in Higher Education (AALHE) Conference, Lexington, Kentucky, June 3, 2013 Dr. Yan Zhang Cooksey University of Maryland University College

Transcript of Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third...

Page 1: Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project

Presented at the Third Annual Association for the Assessment of Learning in Higher Education

(AALHE) Conference, Lexington, Kentucky, June 3, 2013

Dr. Yan Zhang CookseyUniversity of Maryland University College

Page 2: Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Outline of Today’s Presentation

• Background and purposes of the full-day grading project

• Procedural methods of the project• Discuss the results and decisions

informed by the assessment findings• Lessons learned through the

process

Page 3: Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Purposes of the Full-day Grading Project

• To simplify the current assessment process

• To validate the newly developed common rubric measuring four core student learning areas (written communication, critical thinking, technology fluency, and information literacy)

Page 4: Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

UMUC Graduate School Previous Assessment Model: 3-3-3 Model

Page 5: Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Previous Assessment Model: 3-3-3 Model (Cont.)

Page 6: Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Previous Assessment Model: 3-3-3 Model (Cont.)

Strengths: Weaknesses:

• Tested rubrics • Added faculty workload

• Reasonable collection points

• Lack of consistency in assignments

• Larger samples - more data for analysis

• Variability in applying scoring rubrics

Page 7: Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

C2 Model: Common activity & Combined rubric

Page 8: Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Compare 3-3-3 Model to (new)C2 Model

Current 3-3-3 Model Combined Activity/Rubric (C2) Model

•Multiple Rubrics: one for each of 4 SLEs

•Single rubric for all 4 SLEs

•Multiple assignments across graduate school

•Single assignment across graduate school

•One to multiple courses/4 SLEs •Single course/4 SLEs

•Multiple raters for the same assignment/course

•Same raters/assignment/course

•Untrained raters •Trained raters

Page 9: Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Purposes of the Full-day Grading Project

• To simplify the current assessment process

• To validate the newly developed common rubric measuring four core student learning areas (written communication, critical thinking, technology fluency, and information literacy)

Page 10: Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Procedural Methods of the Grading Project

• Data Source • Rubric • Experimental design for data

collection • Inter-rater reliability

Page 11: Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Procedural Methods of the Grading Project (Cont.)

• Data Source (student papers/redacted)Course name # of PapersBTMN9040 27BTMN9041 29BTMN9080 7DETC630 9MSAF670 20MSAS670 13TMAN680 16

Total 121

Page 12: Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Procedural Methods of the Grading Project (Cont.)

• Common Assignment• Rubric (rubric design and

refinement)• 18 Raters (faculty members)

Page 13: Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Procedural Methods of the Grading Project (Cont.)

• Experimental design for data collection randomized trial (Group A&B) raters’ norming and training grading instruction

Page 14: Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Procedural Methods of the Grading Project (Cont.)

• Inter-rater reliability (literature) Stemler (2004): in any situation that

involves judges (raters), the degree of inter-rater reliability is worthwhile to investigate, as the value of inter-rater reliability has significant implication for the validity of the subsequent study results.

Intraclass Correlation Coefficients (ICC) were used in this study.

Page 15: Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Results and Findings

• Two-sample t-testGroup Statistics

  Group # N Mean Std. Deviation

Std. Error Mean

Differ_Rater1and2

Group A-Experiment Group 483 .249 1.0860 .0494

Group B-Control Group 540 .024 1.2463 .0536

Page 16: Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Results and Findings (Cont.)Independent Samples Test

  Levene's Test for Equality of Variances

t-test for Equality of Means

F Sig. t df Sig. (2-tailed)

Mean Difference

Std. Error Difference

95% Confidence Interval of the

Difference

Lower Upper

Differ_Rater1and2

Equal variances assumed

11.311 .001 3.056 1021 .002 .2246 .0735 .0804 .3688

Equal variances not assumed

   

3.0801020.31

5.002 .2246 .0729 .0815 .3677

Page 17: Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Results and Findings (Cont.)

• Inter-rater Reliability: Intraclass Correlations Coefficients (ICC)

Overall Intraclass Correlation Coefficient 

Intraclass Correlation  

 Group A Group B

 

Single Measures .288 .132  

Average Measures .447 .233  

One-way random effects model where people effects are random.

Group A-Experiment Group; Group B-Control Group

Page 18: Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Results and Findings (Cont.)

• Intraclass Correlation Coefficient by Criterion

CriterionAverage Measures Intraclass Correlation

Group A

1 Conceptualization/Content/Ideas  [THIN]                      .461

2 Analysis/Evaluation   [THIN]    .3723 Synthesis /Support [THIN]   .4594 Conclusion/Implications  [THIN]         .1635 Selection/Retrieval  [INFO] .4616 Organization [COMM] .5327 Writing Mechanics [COMM] .6488 APA Compliance [COMM] .4509 Technology Application  [TECH] .303

Page 19: Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Results and Findings (Cont.)

• Inter-Item Correlation for Group A

Reliability Statisticsa

Cronbach's Alpha Cronbach's Alpha Based on

Standardized Items

N of Items

.895 .900 9

a. Group# = Group A-Experiment

Page 20: Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Results and Findings (Cont.)Inter-Item Correlation Matrixa

  Criterion 1

Criterion 2

Criterion 3

Criterion 4

Criterion 5

Criterion 6

Criterion 7

Criterion 8

Criterion 9

Criterion 1 [THIN] 1.000 .707 .575 .811 .296 .687 .518 .319 .397

Criterion 2 [THIN] .707 1.000 .868 .788 .198 .788 .478 .325 .403

Criterion 3 [THIN] .575 .868 1.000 .743 .344 .843 .494 .541 .424

Criterion 4 [THIN] .811 .788 .743 1.000 .314 .820 .500 .344 .379

Criterion 5 [INFO] .296 .198 .344 .314 1.000 .301 .444 .523 .241

Criterion 6 [COMM] .687 .788 .843 .820 .301 1.000 .540 .555 .428

Criterion 7 [COMM] .518 .478 .494 .500 .444 .540 1.000 .510 .081

Criterion 8 [COMM] .319 .325 .541 .344 .523 .555 .510 1.000 .445

Criterion 9 [TECH] .397 .403 .424 .379 .241 .428 .081 .445 1.000

Page 21: Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Lessons Learned through the Process

• Get faculty excited about assessment!

• Strategies to improve inter-rater agreement More training Clear rubric criteria Map assignment instructions to rubric

criteria

• Make decisions based on the assessment results Further refined the rubric and common

assessment activity

Page 22: Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Resources

• McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1), 30-46 (Correction, 1(1), 390).

 • Nunnally, J. (1978). Psychometric theory (2nd ed.). New York:

McGraw-Hill.  • Stemler, S.E. (2004). A comparison of consensus, consistency,

and measurement approaches to estimating. Practical Assessment, Research & Evaluation, 9(4). Retrieved from http://pareonline.net/getvn.asp?v=9&n=4.

• Shrout, P.E. & Fleiss, J.L. (1979). Intraclass Correlations: Uses in Assessing Rater reliability. Psychological Bulletin, 2, 420-428. Retrieved from http://www.hongik.edu/~ym480/Shrout-Fleiss-ICC.pdf.

Page 23: Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Stay Connected…

• Dr. Yan Zhang CookseyDirector for Outcomes AssessmentThe Graduate School, University of Maryland University CollegeEmail: [email protected]: http://assessment-matters.weebly.com