Download - 1 Evaluation of Admission Process: Written Communication Kate Kaiser, BS, PA-S Christine Reichart, BS, PA-S Jennifer Snyder, MPAS, PA-C Larry Vandermolen,

1

Evaluation of Admission Process: Written Communication

Kate Kaiser, BS, PA-SChristine Reichart, BS, PA-S Jennifer Snyder, MPAS, PA-C

Larry Vandermolen, BS, MM, PA-S Jennifer Zorn, MS, PA-C

2

Today’s Agenda

Cognitive and non-cognitive factors in the admission process

Review current admission processIntroduce automated essay scoringCurrent findings utilizing an automatic essay

scoring systemOur ongoing research Recommendations for improvement to the

current admission process

3

Cognitive and Non-Cognitive Factors in the Admission Process

Cognitive or quantitative variables are known predictors of success for applicants seeking admission into and graduating from many healthcare programs Pre-professional grade point average (GPA) and

standardized scores

Non-cognitive abilities are less consistent in predicting success Oral and written communication skills

Notwithstanding, many admission committees believe that both cognitive and non-cognitive factors are important

4

Our Current Admission Process

Individuals are required to submit an application to the Central Application Service for Physician Assistants (CASPA)

This includes: Candidate’s demographics Academic record Experience in healthcare Personal essay

5

Today, our focus is on the personal essay and the program’s most recent admission process

Admission Process: The Personal Essay

6

Personal Essay Evaluation

A pool of community physician assistants was recruited to participate in the review and evaluation of the candidates’ CASPA essays

Two physician assistants evaluate each essay utilizing a program-developed Likert scale rubric

7

Personal Essay Evaluation

The program’s idiosyncratic rubric defines a basis for scoring the essay in three categories:

1. Spelling and grammar2. Organization and readability3. The ability of the applicant to answer the CASPA essay

topic of “describe the motivation towards becoming a PA”

The PA evaluators independently assign scores to the three categories, which are then totaled, and an average is determined between the two evaluator’s scores

8

The Coordination and Effort is Significant

In the most recent admission cycle, there were over 1,000 essays evaluated

Including those essays reviewed by a third evaluator

9

Our Current Admission Process

Together, the personal essays, GPA, and healthcare experience points are totaled and ranked from highest to lowest

Invitations for interviews to assess oral communication skills are offered to approximately the top 90 ranked candidates

After completion of the interviews, the interview scores are combined with previous subtotals and offers of admission are extended to approximately the top 50 candidates

10

Limitations of Personal Essay Evaluation

The essays are prepared in advance by the applicants and submitted to CASPA This raises an important question: to what extent does

this system actually assess the applicant’s writing ability?

11


Community PAs are untrained to analyze sample essays and may not be capable themselves of accurately evaluating written work

Process relies on a program-developed rubricTwo evaluators must disagree on the score

for a writing sample by 50% of the total score before a third evaluation of the writing sample takes place

Unlimited time to reflect on essays is not congruent with the thinking process required of a PA in professional practice

12


Resource DepletingPaper Trail Internal DelaysDeadline CrunchEssays are not de-identifiedInterrater reliability; validity?

13

As consumers, applicants can be considerably more demanding, increasingly requesting detailed information regarding their performance in the entire admission’s process, not simply accepting an “admit,” “wait list,” or “non-admit” decisions Often, the applicant requests specific information to

guide future direction if admission is not initially offered

Admission Process: Personal Essay Evaluation

14

And Yet We Have Been Successful…

Our program has been successful in selecting very capable students who regularly achieve above average national board scores, and who in the past, have received positive reviews from physicians employing them

15

What Can the Program Do?

16

Rudimentary Computer Scoring

By 1997, Microsoft Office® incorporated grammar checking as a tool for users of the software Readability Scores

Determines both Reading Ease, Flesch-Kincaid Grade levels, and word count

Grammar-checking software effectively evaluates essays between 500 to 1000 words covering a wide range of topics

17

Automated Essay Scoring (AES)

Evaluation and scoring of written prose via computer technology Based on a set of pre-scored essays

Perfect test-retest reliabilityUsed to overcome time, cost, reliability, and

generalizability issues in writing assessments The automated system applies the scoring criteria

uniformly and mechanically avoiding the fluctuations found in untrained graders

Works well on short descriptive essays encompassing a wide range of topics Between 500-1000 words

18

Advantage Learning IntelliMetric ® Software Automated Essay Scoring

AES product that utilizes artificial intelligence, natural language processing, and statistical analyses to score and evaluate written prose

19

Vantage Learning IntelliMetric® Rubric Domains

Domain Area of Evaluation

Focus and Unity Is there a main idea, and is it consistently supported?

Development and Elaboration

Are the supporting ideas varied, well developed, and elaborative?

Organization and Structure

Does the essay logically transition ideas from introduction, supporting paragraphs, and conclusion?

Sentence Structure Is there syntactic complexity and variety?

Mechanics and Conventions

Does the essay follow rules of standard American English?

20

Limitations of AES

There are some common criticisms of AES software like IntelliMetric®

First, it is possible to respond to an essay question using appropriate keywords and synonyms but the essay may still lack a comprehensible answer

A second criticism of AES software like IntelliMetric® is that it requires a great deal of effort to write multiple model answers to essay topics to “train” the software that properly grades the writing samples

Some critics wonder if it is possible for a computer to “artificially think” to generate domain and a holistic score

21

Study Methods

The study protocol was reviewed by Butler University’s institutional review board involving human subjects and approved as exempt

Of the 521 applicants in the most recent admission cycle, the top 90 were selected for interviews using the program’s standard evaluation process

A twenty-five minute, onsite written essay was then required of each candidate as part of the interview process

The topic chosen for the onsite essay was non-medical and pre-developed by Vantage Learning

22

Study Methods

Two of the 90 candidates did not submit an onsite essay

Completed onsite essays were reviewed by a faculty member to excise any identifying names or dates, and were assigned random identification numbers

To ensure uniformity, all essays were reduced to single-spaced documents

23

Controls: Fabricated Essays

These fabricated essays included: Two essays were well-written but in response to a

different essay topic One essay was with simple repetition of the topic Four sentences written on an essay topic and then

simply repeated in subsequent paragraphs in a different sequence

An essay with the initial half consisting of a well-written response to the essay topic, and the second half consisting of a simple repeat of the essay topic, not a response to the topic

One essay that responded to the topic and was considered of good quality

24

Study Methods

While the IntelliMetric® license fee was reduced, the study was conducted independent of Vantage Learning, the licensor of IntelliMetric®

25

Study Methods

For consistency, the PAs who assessed the onsite and fabricated essays were from a group of community PA volunteers who reviewed CASPA essays in the past

Each onsite essay was evaluated by two community PA volunteers using the programmatic rubric and by two other community PAs using a hard copy of the IntelliMetric® rubric

26

Study Methods

As a means of rudimentary comparative analysis, onsite and fabricated essays were evaluated by Microsoft Word® version 2003 to obtain the Flesch Reading Ease, Flesch-Kincaid Level, and Word count

De-identified, random numbered, onsite and fabricated essays were electronically submitted for automated scoring utilizing the IntelliMetric® systems to Vantage Learning

Once results were received, data were maintained on an Excel spreadsheet and statistical analyses performed using Statistical Package for the Social Sciences (SPSS), version 15

27

Null Hypotheses

1. Utilizing the programmatic rubric, there is no difference between the scores of the CASPA and onsite written essays

2. There is no rater agreement and no correlation in corresponding domain scores of the CASPA essays

3. There is no correlation in the scores between the methods of evaluation of onsite essays

4. There is no correlation in the community PA scores between the programmatic and IntelliMetric® rubric of onsite essays

5. Utilizing the IntelliMetric® rubric, there is no difference between the scores of onsite essays evaluated by the AES system and community PAs

6. There is no correlation between the candidates’ totaled scores evaluated by the seven methods of onsite essay evaluation and GPA

28

Descriptive Statistics for Methods of Evaluation N = 88

Method of EvaluationPossible

RangeMean

S.D.(+/-)

Range

CASPA Essay^ 0 - 10 8.48 1.26 2 – 10

Word Count ∞ 357.93 107.61 142 - 687

Flesch Reading Ease 0 - 100 58.90 9.27 40.4 - 78

Flesch Kincaid Level Grade Level 9.46 1.93 5.9 – 14.2

AES 5 - 30 15.44 4.11 5 - 25

Onsite Community PA Programmatic Rubric 0 - 10 7.15 1.61 3 - 10

Onsite Community PA IntelliMetric® Rubric 5 - 30 21.57 3.86 8 - 30

^ N = 78

29

Null Hypothesis 1: Utilizing the programmatic rubric, there is no difference between the scores of the CASPA and onsite

written essays

To determine if there was a statistically significant difference between the ranked difference scores, a Wilcoxon Signed Rank test was utilized

There was a statistically significant difference z = -5.025, p < 0.01

Therefore, the hypothesis of no difference is rejected

Utilizing the programmatic rubric, the community PAs evaluated the onsite essay 57 out of 78 times lower than the CASPA essay

30

The students may have been incapable of composing a written response to the onsite essay as they had for the essay prepared in advance for CASPA because they felt pressured or constrained by time

It is unclear as found in previously reported studies if or to what extent they received help in developing the prepared essay’s content and grammar or spelling

Discussion Null Hypothesis 1: Utilizing the programmatic rubric, there is no difference between the scores of the CASPA

and onsite written essays

31

An onsite essay significantly eliminates doubt regarding the origin of the essay and is an essential step in

actually assessing the applicant’s writing ability

Discussion Null Hypothesis 1: Utilizing the programmatic rubric, there is no difference between the scores of the CASPA

and onsite written essays

32

Null Hypothesis 2: There is no rater agreement and no correlation in corresponding domain scores of the CASPA

essays

To evaluate the consistency of the community PA scores for each domain of the programmatic rubric for the CASPA essay, the corresponding scores were examined by agreement statistics with perfect, adjacent, discrepant, and perfect + adjacent agreement percentages

33

CASPA Essay* Rater 1 Mean Scores Versus Rater 2 Mean Scores: Agreement Statistics Evaluating the Domain Scores using the Programmatic Rubric

Domain (Total Points)

Exact(%)

Adjacent (%)

Perfect + Adjacent

(%)

Discrepant (%)

Rater 1 Mean Scores

S. D.(+/-)

Rater 2 Mean Scores

S. D.(+/-)

Grammar and Spelling (3)

56.4 42.3 98.7 0.013 2.67 0.54 2.59 0.55

Organization & Readability (3)

43.5 53.8 97.3 0.025 2.57 0.56 2.62 0.53

Motivation to become PA (4)

38.4 39.7 78.1 21.8 3.1667 0.88 3.3846 0.80

* N = 78

Null Hypothesis 2 Results: There is no rater agreement and no correlation in corresponding domain scores of the CASPA

essays

34

Further, the agreement between the corresponding domain scores for the CASPA essays was examined by intraclass correlation at the 0.05 level of significance by two-way random, average measures with absolute agreement

Null Hypothesis 2 Results: There is no rater agreement and no correlation in corresponding domain scores of the CASPA

essays

35

Null Hypothesis 2 Results: CASPA Essay Intraclass Correlation for Rater 1 Mean Scores Versus Rater 2 Mean Scores: Evaluating the Domain Scores using the Programmatic Rubric, N = 78

95% Confidence Interval

Domain ICC Lower Bound Upper Bound Significance

Grammar and Spelling

0.378 0.026 0.603 0.019*

Organization and Readability

-0.069 -0.685 0.321 0.613

Motivation to Become a PA

0.166 -0.291 0.464 0.208

*p is significant at < 0.05

36

While the level is statistically significant, too many external sources may be confounding the findings and no meaningful relationship exists as indicated by the low ICC value

Therefore, the community PAs evaluating the CASPA essay resulted in unreliable scoring outcomes

Null Hypothesis 2 Discussion: There is no rater agreement and no correlation in corresponding domain scores of the CASPA

essays

37

Null Hypothesis 3: There is no correlation in the scores between the methods of evaluation of

onsite essays.

The six methods of evaluation of onsite essays were normalized using Z scores

The ICC (1, 6) was calculated to compare the reliabilities of the methods

The ICC (1, 6) = 0.410, p < 0.01 Two-way random, average measure with absolute

agreement While it is statistically significant, the

hypothesis is rejectedHowever, because the correlation is so low

No meaningful relationship exists between the methods of evaluation of onsite essays

38

Null Hypothesis 4: There is no correlation in the community PA scores between the programmatic and IntelliMetric® rubric of

onsite essays

Onsite essays were evaluated by ICC comparing the programmatic and IntelliMetric® rubrics used by the community PA evaluators

ICC (1, 2) = 0.567, p < 0.01

While the results are statistically significant, a minimal meaningful relationship exists between the scores of the community PA utilizing the programmatic and IntelliMetric® rubrics of onsite essays

39

Null Hypothesis 5: Utilizing the IntelliMetric ® rubric, there is no difference between the scores of onsite essays evaluated by

the AES system and community PAs

There was a statistically significant difference of the totaled scores between the onsite essays evaluated by the community PAs utilizing the IntelliMetric® rubric and the AES totaled outcome by the Wilcoxon Signed Rank test with a z = -7.542, p < 0.01.

The community PAs’ mean average rating was higher in 82 of the 88 essays

40

Null Hypothesis 6: There is no correlation between the candidates’ totaled scores evaluated by the seven methods

of onsite essay evaluation and GPA

Spearman Rank Correlation Coefficient of Essay Scores Evaluated by Different Methods and GPA, N = 88

CorrelationSpearman Coefficient Significance

CASPA Essay^ -0.260 0.022*

Community PA Programmatic Rubric 0.076 0.479

Community PA IntelliMetric Rubric 0.170 0.112

AES Scoring 0.307 0.004*

Word Count 0.237 0.026*

Flesch Reading Ease -0.067 0.536

Flesch Kincaid 0.122 0.257

^ N = 78; *p is significant < 0.05

41

Hypothesis 6 Discussion:

The Spearman Rank correlation was to evaluate a possible relationship between GPA and the candidates’ individual totaled essay scores

As previously reported, essay length is important to a certain number of words so that concepts and ideas may be developed; however, beyond this point, the essay length does not add to the positive outcome of the essay

It seems reasonable to assume that an individual who has a higher GPA likely is able to write an essay more effectively than those with a lesser GPA

42

Power AnalysisPost Hoc

N =Power

%Effect Size

WilcoxonCohen’s d

CASPA vs. Onsite Community PA Programmatic Rubric 78 100 0.92

AES vs. Community PA IntelliMetric® Rubric 88 100 1.54

Spearman (vs. GPA)Correlation r

CASPA Essay 78 100 0.94

Word Count 88 100 0.92

Flesch Reading Ease 88 100 0.97

Flesch Kincaid Level 88 100 0.91

AES 88 100 0.90

Onsite Community PA Programmatic Rubric 88 100 0.84

Onsite Community PA IntelliMetric® Rubric 88 100 0.96

43

Fabricated Essays

The five of six fabricated essays were identified by the Vantage Learning IntelliMetric® System

The same was not true of the community PA evaluators

44

Limitations of the Study

Generalizability of results limited to this program

Analysis compares the AES scoring from IntelliMetric® to a known flawed system Validation limited

45

Ongoing Study

Outcome data to determine the correlation between the onsite essay AES score and the first semester GPA of the candidates who matriculate into our program

46

Future Studies

Challenge all of the methods of evaluation for intrarater reliability by submitting two of the same essays with different identification numbers to determine if the grading outcome would be the same

Consider fixing raters to specific groups in the random evaluation of essays

Consider utilizing two, twenty-five minute timed essays for reasons of reliability and construct validity

Consider investigating students’ comfort levels and test anxiety with using computerized writing test and paper-and-pencil writing test by age, gender, and ethnicity

47

Conclusion

The purpose of this study is to support that there may be a much more effective and reliable way to evaluate the writing skills of candidates for admission to the PA program than the utilization of community PAs

Questions exist as to whether the current, labor-intensive process of essay review by volunteer community PAs, is a reliable process

Not only is there uncertainty about the source of the essay itself; there is also uncertainty about the consistency and quality of the essay review skills of the community PAs

Serious consideration should be given to incorporate AES into the admission process This would reduce the time spent waiting for community

PAs to evaluate the essays, reduce the cost of postage, and potentially increase the reliability of the essay scoring

48

References

Accreditation Review Commission on Education for the Physician Assistant Standards of Accreditation A2.05b. http://www.arc-pa.org/Standards/standards.html. Accessed July 7, 2008.

Campbell A, Dickson C. Predicting student success: a 10-year review using integrative review and meta-analysis. J Prof Nurs. 1996; 12(1): 47 – 59.

Platt L, Turocy P, McGlumphy B. Preadmission criteria as predictors of academic success in entry level athletic training and other allied health educational programs. Journal of Athletic Training. 2001; 36(2): 141 – 144.

Sandow P , Jones A , Peek C, Courts F, Watson R. Correlation of admission criteria with dental school performance and attrition. J Dent Educ. 2002; 66(3): 385 – 392.

Hardigan P, Lai L, Arneson D, Robeson A. Significance of academic merit, test scores, interviews, and the admission process: a case study. American Journal of Pharmaceutical Education. 2002; 65: 40 – 43.

http://www.arc-pa.org/Standards/standards.html

49

References

Salvatori P. Reliability and validity of admissions tools used to select students for the health professions. Advances in Health Sciences Education. 2001; 6:159 – 175.

Sadler J. Effectiveness of student admission essays in identifying attrition. Nurse Education Today. 2003; 23(8): 620 - 627.

Ferguson E, James D, O’Hehir F, Sanders A. Learning in practice. BMJ. 2003; 326: 429 – 432.

Kulatunga-Moruzi C, Norman G. Validity of admissions measures in predicting performance outcomes: the contribution of cognitive and non-cognitive dimensions. Teaching and Learning in Medicine. 2002; 14(1): 34-42.

Dieter P, Carter R, Rabold J. Automating the complex school admission process to improve screening and tracking of applicants and decision making outcomes. Perspective on Physician Assistant Education. 2000; 11(1): 25 – 34.

Skaff K, Rapp D, Fahringer D. Predictive connections between admissions criteria and outcomes assessment. Perspective on Physician Assistant Education. 1998; 9(2): 75-78.

50

References

Hanson M, Dore K, Reiter H, Eva K. Medical school admissions: revisiting the veracity and independence of completion of an autobiographical screening tool. Acad Med. 2007; 82(10): S8 - S11.

Chestnut R, Phillips C. Current practices and anticipated changes in academic and nonacademic admission sources for entry-level PharmD programs. American Journal of Pharmaceutical Education. 2000; 64: 251-259.

https://portal.caspaonline.org/# Albanese M, Snow M, Skochelak S, Huggett K, Farrell, P. Assessing

personal qualities in medical school admissions. Acad Med. 2003; 78(3): 313 – 321.

Powers D, Fowles M. Balancing test user needs and responsible professional practice: a case study involving assessment of graduate level writing skills. Applied Measurement in Education. 2002; 15(3): 217 – 247.

Bill Gates 1997 Annual report letter to shareholders. http://www.microsoft.com/msft/reports/ar97/bill_letter/bill_letter.htm. Accessed July 7, 2008.

Flesch R. A new readability yardstick. J Appl Psychol. 1948; 32(3): 221 – 233.

Shermis M, Koch C, Page E, Keith T, Harrington S. Trait ratings for automated essay grading. Educational and Psychological Measurement. 2002; 62(5): 5 – 18.

https://portal.caspaonline.org/

http://www.microsoft.com/msft/reports/ar97/bill_letter/bill_letter.htm

51

References

Shermis M, Barrera F. Automated essay scoring for electronic portfolios. Assessment Update. 2002; 14(4): 1-4.

Rudner L, Garcia V. An evaluation of IntelliMetric® essay scoring system. Journal of Technology Learning and Assessment. 2006; 4(4): 1 – 21.

Shermis M, Burstein J, Leacock C. Applications of computers in assessment and analysis of writing, chapter 27. In: Handbook of Writing Research Guilford Press, c2006: 403-416.

Dikli S. An overview of automated scoring essays. Journal of Technology, Learning, and Assessment. 2006; 5(1): 4-35.

Vantage Learning. IntelliMetric® scoring accuracy across genres and grade levels. 2006. www.vantagelearning.com. Accessed July 7, 2008.

Korbin J. Forecasting the predictive validity of the new SAT I writing section. Available at the College Board Webpage www.collegeboard.com/prod_downloads/sat/newsat_pred_val.pdf Accessed June 15, 2008

Breland H, Bridgeman B, Fowles M. Writing assessment in admission to higher education: review and framework. College Entrance Examination Board and Educational Testing Service, 1999 New York

http://www.vantagelearning.com/

http://www.collegeboard.com/prod_downloads/sat/newsat_pred_val.pdf