1
Evaluation of Admission Process: Written Communication
Kate Kaiser, BS, PA-SChristine Reichart, BS, PA-S Jennifer Snyder, MPAS, PA-C
Larry Vandermolen, BS, MM, PA-S Jennifer Zorn, MS, PA-C
2
Today’s Agenda
Cognitive and non-cognitive factors in the admission process
Review current admission processIntroduce automated essay scoringCurrent findings utilizing an automatic essay
scoring systemOur ongoing research Recommendations for improvement to the
current admission process
3
Cognitive and Non-Cognitive Factors in the Admission Process
Cognitive or quantitative variables are known predictors of success for applicants seeking admission into and graduating from many healthcare programs Pre-professional grade point average (GPA) and
standardized scores
Non-cognitive abilities are less consistent in predicting success Oral and written communication skills
Notwithstanding, many admission committees believe that both cognitive and non-cognitive factors are important
4
Our Current Admission Process
Individuals are required to submit an application to the Central Application Service for Physician Assistants (CASPA)
This includes: Candidate’s demographics Academic record Experience in healthcare Personal essay
5
Today, our focus is on the personal essay and the program’s most recent admission process
Admission Process: The Personal Essay
6
Personal Essay Evaluation
A pool of community physician assistants was recruited to participate in the review and evaluation of the candidates’ CASPA essays
Two physician assistants evaluate each essay utilizing a program-developed Likert scale rubric
7
Personal Essay Evaluation
The program’s idiosyncratic rubric defines a basis for scoring the essay in three categories:
1. Spelling and grammar2. Organization and readability3. The ability of the applicant to answer the CASPA essay
topic of “describe the motivation towards becoming a PA”
The PA evaluators independently assign scores to the three categories, which are then totaled, and an average is determined between the two evaluator’s scores
8
The Coordination and Effort is Significant
In the most recent admission cycle, there were over 1,000 essays evaluated
Including those essays reviewed by a third evaluator
9
Our Current Admission Process
Together, the personal essays, GPA, and healthcare experience points are totaled and ranked from highest to lowest
Invitations for interviews to assess oral communication skills are offered to approximately the top 90 ranked candidates
After completion of the interviews, the interview scores are combined with previous subtotals and offers of admission are extended to approximately the top 50 candidates
10
Limitations of Personal Essay Evaluation
The essays are prepared in advance by the applicants and submitted to CASPA This raises an important question: to what extent does
this system actually assess the applicant’s writing ability?
11
Limitations of Personal Essay Evaluation
Community PAs are untrained to analyze sample essays and may not be capable themselves of accurately evaluating written work
Process relies on a program-developed rubricTwo evaluators must disagree on the score
for a writing sample by 50% of the total score before a third evaluation of the writing sample takes place
Unlimited time to reflect on essays is not congruent with the thinking process required of a PA in professional practice
12
Limitations of Personal Essay Evaluation
Resource DepletingPaper Trail Internal DelaysDeadline CrunchEssays are not de-identifiedInterrater reliability; validity?
13
As consumers, applicants can be considerably more demanding, increasingly requesting detailed information regarding their performance in the entire admission’s process, not simply accepting an “admit,” “wait list,” or “non-admit” decisions Often, the applicant requests specific information to
guide future direction if admission is not initially offered
Admission Process: Personal Essay Evaluation
14
And Yet We Have Been Successful…
Our program has been successful in selecting very capable students who regularly achieve above average national board scores, and who in the past, have received positive reviews from physicians employing them
15
What Can the Program Do?
16
Rudimentary Computer Scoring
By 1997, Microsoft Office® incorporated grammar checking as a tool for users of the software Readability Scores
Determines both Reading Ease, Flesch-Kincaid Grade levels, and word count
Grammar-checking software effectively evaluates essays between 500 to 1000 words covering a wide range of topics
17
Automated Essay Scoring (AES)
Evaluation and scoring of written prose via computer technology Based on a set of pre-scored essays
Perfect test-retest reliabilityUsed to overcome time, cost, reliability, and
generalizability issues in writing assessments The automated system applies the scoring criteria
uniformly and mechanically avoiding the fluctuations found in untrained graders
Works well on short descriptive essays encompassing a wide range of topics Between 500-1000 words
18
Advantage Learning IntelliMetric ® Software Automated Essay Scoring
AES product that utilizes artificial intelligence, natural language processing, and statistical analyses to score and evaluate written prose
19
Vantage Learning IntelliMetric® Rubric Domains
Domain Area of Evaluation
Focus and Unity Is there a main idea, and is it consistently supported?
Development and Elaboration
Are the supporting ideas varied, well developed, and elaborative?
Organization and Structure
Does the essay logically transition ideas from introduction, supporting paragraphs, and conclusion?
Sentence Structure Is there syntactic complexity and variety?
Mechanics and Conventions
Does the essay follow rules of standard American English?
20
Limitations of AES
There are some common criticisms of AES software like IntelliMetric®
First, it is possible to respond to an essay question using appropriate keywords and synonyms but the essay may still lack a comprehensible answer
A second criticism of AES software like IntelliMetric® is that it requires a great deal of effort to write multiple model answers to essay topics to “train” the software that properly grades the writing samples
Some critics wonder if it is possible for a computer to “artificially think” to generate domain and a holistic score
21
Study Methods
The study protocol was reviewed by Butler University’s institutional review board involving human subjects and approved as exempt
Of the 521 applicants in the most recent admission cycle, the top 90 were selected for interviews using the program’s standard evaluation process
A twenty-five minute, onsite written essay was then required of each candidate as part of the interview process
The topic chosen for the onsite essay was non-medical and pre-developed by Vantage Learning
22
Study Methods
Two of the 90 candidates did not submit an onsite essay
Completed onsite essays were reviewed by a faculty member to excise any identifying names or dates, and were assigned random identification numbers
To ensure uniformity, all essays were reduced to single-spaced documents
23
Controls: Fabricated Essays
These fabricated essays included: Two essays were well-written but in response to a
different essay topic One essay was with simple repetition of the topic Four sentences written on an essay topic and then
simply repeated in subsequent paragraphs in a different sequence
An essay with the initial half consisting of a well-written response to the essay topic, and the second half consisting of a simple repeat of the essay topic, not a response to the topic
One essay that responded to the topic and was considered of good quality
24
Study Methods
While the IntelliMetric® license fee was reduced, the study was conducted independent of Vantage Learning, the licensor of IntelliMetric®
25
Study Methods
For consistency, the PAs who assessed the onsite and fabricated essays were from a group of community PA volunteers who reviewed CASPA essays in the past
Each onsite essay was evaluated by two community PA volunteers using the programmatic rubric and by two other community PAs using a hard copy of the IntelliMetric® rubric
26
Study Methods
As a means of rudimentary comparative analysis, onsite and fabricated essays were evaluated by Microsoft Word® version 2003 to obtain the Flesch Reading Ease, Flesch-Kincaid Level, and Word count
De-identified, random numbered, onsite and fabricated essays were electronically submitted for automated scoring utilizing the IntelliMetric® systems to Vantage Learning
Once results were received, data were maintained on an Excel spreadsheet and statistical analyses performed using Statistical Package for the Social Sciences (SPSS), version 15
27
Null Hypotheses
1. Utilizing the programmatic rubric, there is no difference between the scores of the CASPA and onsite written essays
2. There is no rater agreement and no correlation in corresponding domain scores of the CASPA essays
3. There is no correlation in the scores between the methods of evaluation of onsite essays
4. There is no correlation in the community PA scores between the programmatic and IntelliMetric® rubric of onsite essays
5. Utilizing the IntelliMetric® rubric, there is no difference between the scores of onsite essays evaluated by the AES system and community PAs
6. There is no correlation between the candidates’ totaled scores evaluated by the seven methods of onsite essay evaluation and GPA
28
Descriptive Statistics for Methods of Evaluation N = 88
Method of EvaluationPossible
RangeMean
S.D.(+/-)
Range
CASPA Essay^ 0 - 10 8.48 1.26 2 – 10
Word Count ∞ 357.93 107.61 142 - 687
Flesch Reading Ease 0 - 100 58.90 9.27 40.4 - 78
Flesch Kincaid Level Grade Level 9.46 1.93 5.9 – 14.2
AES 5 - 30 15.44 4.11 5 - 25
Onsite Community PA Programmatic Rubric 0 - 10 7.15 1.61 3 - 10
Onsite Community PA IntelliMetric® Rubric 5 - 30 21.57 3.86 8 - 30
^ N = 78
29
Null Hypothesis 1: Utilizing the programmatic rubric, there is no difference between the scores of the CASPA and onsite
written essays
To determine if there was a statistically significant difference between the ranked difference scores, a Wilcoxon Signed Rank test was utilized
There was a statistically significant difference z = -5.025, p < 0.01
Therefore, the hypothesis of no difference is rejected
Utilizing the programmatic rubric, the community PAs evaluated the onsite essay 57 out of 78 times lower than the CASPA essay
30
The students may have been incapable of composing a written response to the onsite essay as they had for the essay prepared in advance for CASPA because they felt pressured or constrained by time
It is unclear as found in previously reported studies if or to what extent they received help in developing the prepared essay’s content and grammar or spelling
Discussion Null Hypothesis 1: Utilizing the programmatic rubric, there is no difference between the scores of the CASPA
and onsite written essays
31
An onsite essay significantly eliminates doubt regarding the origin of the essay and is an essential step in
actually assessing the applicant’s writing ability
Discussion Null Hypothesis 1: Utilizing the programmatic rubric, there is no difference between the scores of the CASPA
and onsite written essays
32
Null Hypothesis 2: There is no rater agreement and no correlation in corresponding domain scores of the CASPA
essays
To evaluate the consistency of the community PA scores for each domain of the programmatic rubric for the CASPA essay, the corresponding scores were examined by agreement statistics with perfect, adjacent, discrepant, and perfect + adjacent agreement percentages
33
CASPA Essay* Rater 1 Mean Scores Versus Rater 2 Mean Scores: Agreement Statistics Evaluating the Domain Scores using the Programmatic Rubric
Domain (Total Points)
Exact(%)
Adjacent (%)
Perfect + Adjacent
(%)
Discrepant (%)
Rater 1 Mean Scores
S. D.(+/-)
Rater 2 Mean Scores
S. D.(+/-)
Grammar and Spelling (3)
56.4 42.3 98.7 0.013 2.67 0.54 2.59 0.55
Organization & Readability (3)
43.5 53.8 97.3 0.025 2.57 0.56 2.62 0.53
Motivation to become PA (4)
38.4 39.7 78.1 21.8 3.1667 0.88 3.3846 0.80
* N = 78
Null Hypothesis 2 Results: There is no rater agreement and no correlation in corresponding domain scores of the CASPA
essays
34
Further, the agreement between the corresponding domain scores for the CASPA essays was examined by intraclass correlation at the 0.05 level of significance by two-way random, average measures with absolute agreement
Null Hypothesis 2 Results: There is no rater agreement and no correlation in corresponding domain scores of the CASPA
essays
35
Null Hypothesis 2 Results: CASPA Essay Intraclass Correlation for Rater 1 Mean Scores Versus Rater 2 Mean Scores: Evaluating the Domain Scores using the Programmatic Rubric, N = 78
95% Confidence Interval
Domain ICC Lower Bound Upper Bound Significance
Grammar and Spelling
0.378 0.026 0.603 0.019*
Organization and Readability
-0.069 -0.685 0.321 0.613
Motivation to Become a PA
0.166 -0.291 0.464 0.208
*p is significant at < 0.05
36
While the level is statistically significant, too many external sources may be confounding the findings and no meaningful relationship exists as indicated by the low ICC value
Therefore, the community PAs evaluating the CASPA essay resulted in unreliable scoring outcomes
Null Hypothesis 2 Discussion: There is no rater agreement and no correlation in corresponding domain scores of the CASPA
essays
37
Null Hypothesis 3: There is no correlation in the scores between the methods of evaluation of
onsite essays.
The six methods of evaluation of onsite essays were normalized using Z scores
The ICC (1, 6) was calculated to compare the reliabilities of the methods
The ICC (1, 6) = 0.410, p < 0.01 Two-way random, average measure with absolute
agreement While it is statistically significant, the
hypothesis is rejectedHowever, because the correlation is so low
No meaningful relationship exists between the methods of evaluation of onsite essays
38
Null Hypothesis 4: There is no correlation in the community PA scores between the programmatic and IntelliMetric® rubric of
onsite essays
Onsite essays were evaluated by ICC comparing the programmatic and IntelliMetric® rubrics used by the community PA evaluators
ICC (1, 2) = 0.567, p < 0.01
While the results are statistically significant, a minimal meaningful relationship exists between the scores of the community PA utilizing the programmatic and IntelliMetric® rubrics of onsite essays
39
Null Hypothesis 5: Utilizing the IntelliMetric ® rubric, there is no difference between the scores of onsite essays evaluated by
the AES system and community PAs
There was a statistically significant difference of the totaled scores between the onsite essays evaluated by the community PAs utilizing the IntelliMetric® rubric and the AES totaled outcome by the Wilcoxon Signed Rank test with a z = -7.542, p < 0.01.
The community PAs’ mean average rating was higher in 82 of the 88 essays
40
Null Hypothesis 6: There is no correlation between the candidates’ totaled scores evaluated by the seven methods
of onsite essay evaluation and GPA
Spearman Rank Correlation Coefficient of Essay Scores Evaluated by Different Methods and GPA, N = 88
CorrelationSpearman Coefficient Significance
CASPA Essay^ -0.260 0.022*
Community PA Programmatic Rubric 0.076 0.479
Community PA IntelliMetric Rubric 0.170 0.112
AES Scoring 0.307 0.004*
Word Count 0.237 0.026*
Flesch Reading Ease -0.067 0.536
Flesch Kincaid 0.122 0.257
^ N = 78; *p is significant < 0.05
41
Hypothesis 6 Discussion:
The Spearman Rank correlation was to evaluate a possible relationship between GPA and the candidates’ individual totaled essay scores
As previously reported, essay length is important to a certain number of words so that concepts and ideas may be developed; however, beyond this point, the essay length does not add to the positive outcome of the essay
It seems reasonable to assume that an individual who has a higher GPA likely is able to write an essay more effectively than those with a lesser GPA
42
Power AnalysisPost Hoc
N =Power
%Effect Size
WilcoxonCohen’s d
CASPA vs. Onsite Community PA Programmatic Rubric 78 100 0.92
AES vs. Community PA IntelliMetric® Rubric 88 100 1.54
Spearman (vs. GPA)Correlation r
CASPA Essay 78 100 0.94
Word Count 88 100 0.92
Flesch Reading Ease 88 100 0.97
Flesch Kincaid Level 88 100 0.91
AES 88 100 0.90
Onsite Community PA Programmatic Rubric 88 100 0.84
Onsite Community PA IntelliMetric® Rubric 88 100 0.96
43
Fabricated Essays
The five of six fabricated essays were identified by the Vantage Learning IntelliMetric® System
The same was not true of the community PA evaluators
44
Limitations of the Study
Generalizability of results limited to this program
Analysis compares the AES scoring from IntelliMetric® to a known flawed system Validation limited
45
Ongoing Study
Outcome data to determine the correlation between the onsite essay AES score and the first semester GPA of the candidates who matriculate into our program
46
Future Studies
Challenge all of the methods of evaluation for intrarater reliability by submitting two of the same essays with different identification numbers to determine if the grading outcome would be the same
Consider fixing raters to specific groups in the random evaluation of essays
Consider utilizing two, twenty-five minute timed essays for reasons of reliability and construct validity
Consider investigating students’ comfort levels and test anxiety with using computerized writing test and paper-and-pencil writing test by age, gender, and ethnicity
47
Conclusion
The purpose of this study is to support that there may be a much more effective and reliable way to evaluate the writing skills of candidates for admission to the PA program than the utilization of community PAs
Questions exist as to whether the current, labor-intensive process of essay review by volunteer community PAs, is a reliable process
Not only is there uncertainty about the source of the essay itself; there is also uncertainty about the consistency and quality of the essay review skills of the community PAs
Serious consideration should be given to incorporate AES into the admission process This would reduce the time spent waiting for community
PAs to evaluate the essays, reduce the cost of postage, and potentially increase the reliability of the essay scoring
48
References
Accreditation Review Commission on Education for the Physician Assistant Standards of Accreditation A2.05b. http://www.arc-pa.org/Standards/standards.html. Accessed July 7, 2008.
Campbell A, Dickson C. Predicting student success: a 10-year review using integrative review and meta-analysis. J Prof Nurs. 1996; 12(1): 47 – 59.
Platt L, Turocy P, McGlumphy B. Preadmission criteria as predictors of academic success in entry level athletic training and other allied health educational programs. Journal of Athletic Training. 2001; 36(2): 141 – 144.
Sandow P , Jones A , Peek C, Courts F, Watson R. Correlation of admission criteria with dental school performance and attrition. J Dent Educ. 2002; 66(3): 385 – 392.
Hardigan P, Lai L, Arneson D, Robeson A. Significance of academic merit, test scores, interviews, and the admission process: a case study. American Journal of Pharmaceutical Education. 2002; 65: 40 – 43.
49
References
Salvatori P. Reliability and validity of admissions tools used to select students for the health professions. Advances in Health Sciences Education. 2001; 6:159 – 175.
Sadler J. Effectiveness of student admission essays in identifying attrition. Nurse Education Today. 2003; 23(8): 620 - 627.
Ferguson E, James D, O’Hehir F, Sanders A. Learning in practice. BMJ. 2003; 326: 429 – 432.
Kulatunga-Moruzi C, Norman G. Validity of admissions measures in predicting performance outcomes: the contribution of cognitive and non-cognitive dimensions. Teaching and Learning in Medicine. 2002; 14(1): 34-42.
Dieter P, Carter R, Rabold J. Automating the complex school admission process to improve screening and tracking of applicants and decision making outcomes. Perspective on Physician Assistant Education. 2000; 11(1): 25 – 34.
Skaff K, Rapp D, Fahringer D. Predictive connections between admissions criteria and outcomes assessment. Perspective on Physician Assistant Education. 1998; 9(2): 75-78.
50
References
Hanson M, Dore K, Reiter H, Eva K. Medical school admissions: revisiting the veracity and independence of completion of an autobiographical screening tool. Acad Med. 2007; 82(10): S8 - S11.
Chestnut R, Phillips C. Current practices and anticipated changes in academic and nonacademic admission sources for entry-level PharmD programs. American Journal of Pharmaceutical Education. 2000; 64: 251-259.
https://portal.caspaonline.org/# Albanese M, Snow M, Skochelak S, Huggett K, Farrell, P. Assessing
personal qualities in medical school admissions. Acad Med. 2003; 78(3): 313 – 321.
Powers D, Fowles M. Balancing test user needs and responsible professional practice: a case study involving assessment of graduate level writing skills. Applied Measurement in Education. 2002; 15(3): 217 – 247.
Bill Gates 1997 Annual report letter to shareholders. http://www.microsoft.com/msft/reports/ar97/bill_letter/bill_letter.htm. Accessed July 7, 2008.
Flesch R. A new readability yardstick. J Appl Psychol. 1948; 32(3): 221 – 233.
Shermis M, Koch C, Page E, Keith T, Harrington S. Trait ratings for automated essay grading. Educational and Psychological Measurement. 2002; 62(5): 5 – 18.
51
References
Shermis M, Barrera F. Automated essay scoring for electronic portfolios. Assessment Update. 2002; 14(4): 1-4.
Rudner L, Garcia V. An evaluation of IntelliMetric® essay scoring system. Journal of Technology Learning and Assessment. 2006; 4(4): 1 – 21.
Shermis M, Burstein J, Leacock C. Applications of computers in assessment and analysis of writing, chapter 27. In: Handbook of Writing Research Guilford Press, c2006: 403-416.
Dikli S. An overview of automated scoring essays. Journal of Technology, Learning, and Assessment. 2006; 5(1): 4-35.
Vantage Learning. IntelliMetric® scoring accuracy across genres and grade levels. 2006. www.vantagelearning.com. Accessed July 7, 2008.
Korbin J. Forecasting the predictive validity of the new SAT I writing section. Available at the College Board Webpage www.collegeboard.com/prod_downloads/sat/newsat_pred_val.pdf Accessed June 15, 2008
Breland H, Bridgeman B, Fowles M. Writing assessment in admission to higher education: review and framework. College Entrance Examination Board and Educational Testing Service, 1999 New York
Top Related