1. Interested parties, and their heterogeneity 2 2 ...

1

Perspective on the use of GRE scores for PIBS admissions We are a group of faculty members in the University of Michigan Medical School1. In recent weeks, we participated in a series of discussions about a proposal to remove the requirement of GRE in PIBS admissions. These discussions, including a live-streamed town hall discussion (video: https://www.youtube.com/playlist?list=PL8eFiHZnfRDoSbeeClC4I1GLZoGCkMkF6) on August 3, 2017, uncovered a wide range of views. Here we attempt to synthesize the main arguments on both sides and in doing so, create an initial record of the evolving dialog on this important topic. The goal of this write-up is to bring together multiple ongoing discussions into a living document that can be modified further by many others, with questions re-framed in a collaborative way, and their answers solicited widely and updated continuously. For many, the GRE may be considered "biased" because some ethnic groups do better than others. Therefore, as it was often argued, abolishing the GRE may result in higher diversity among the students we admit. While we will discuss this presumed link between GRE and diversity below (see 3.2), we would like to highlight the statistics that2, for the 2016 class, 29% of PIBS applicants are under-represented minorities, and this group consists of 30% of our admitted students. This fact signifies the remarkable success of our dedicated effort to enhance diversity in PIBS, and suggests that further discussions are needed as to whether PIBS has a serious diversity problem, especially when we hold the GRE responsible for that problem. In the following, we will critically evaluate the evidence for or against the utility of the GRE in making admissions decisions, We have organized this document into four broad categories that encompass different aspects of considering the GRE in graduate admissions.

1. Interested parties, and their heterogeneity ................................................................................. 2

2. Sources of opinion ....................................................................................................................... 3 2.1 Personal experience as a test-taker ........................................................................................................ 3 2.2 Personal experience in graduate education and/or in evaluating applicants ........................................ 3 2.3 Quantitative studies of the correlation between GRE and multiple measures of outcome .................. 4 2.4 Societal concerns .................................................................................................................................... 5

3. Main arguments against the use of GRE and their broader context ............................................. 5 3.1. Lack of predictive power ........................................................................................................................ 5 3.2. Potential bias in relation to women and minority ............................................................................... 11 3.3. Costs as a burden: how big is it? .......................................................................................................... 12 3.4. Outside trends...................................................................................................................................... 12

4. Recommendations ..................................................................................................................... 13 4.1 To committee chairs and members ...................................................................................................... 13 4.2 To PIBS .................................................................................................................................................. 13

1 This document is prepared by members of the pro-GRE team, with input from Steve Ragsdale and additional faculty members. Given the time limit not all co-authors have provided final input. As of August 15 its signatories include Jun Li, Alon Kahana, Aaron King and JoAnn Sekiguchi. 2 Source: according to the information shared at the town hall meeting.

https://www.youtube.com/playlist?list=PL8eFiHZnfRDoSbeeClC4I1GLZoGCkMkF6

2

1. Interested parties, and their heterogeneity3 To understand the many perspectives on the use of GRE in admissions, it is useful to outline the main stakeholder groups in order to provide a frame of reference. In general, four groups are interested in this discussion:

1. Admission committees in the fourteen PIBS programs, include the committee chairperson(s) and committee members.

2. Faculty members who are involved in research and graduate education, including mentoring students as their PhD advisors and teaching in classrooms.

3. Students themselves, with several sub-groups: those who applied, those who were offered admission, those who accepted our offer and are in various stages of progression to PhD, and those who have graduated.

4. The general public, including the families and teachers of our prospective students.

In addition to the diversity between the four groups, there exists extensive heterogeneity within groups. Some PIBS programs have a higher requirement of quantitative competency than others. Many programs recruit international students, whose language skills need to be assessed with a reliable measure. Underrepresented minority students are essential for enhancing diversity and inclusivity of PIBS; thus they represent an important demographic stratum. Individual committee members have distinct skills and may bring personal habits in their use of GRE scores, overlaid with both conscious and subconscious preferences. Given such heterogeneity, it stands to reason that a fruitful discussion needs to recognize the varying characteristics of our fourteen programs, the many student strata, and the actual implementation by individual evaluators. Our deliberation would be incomplete if we have not heard from all the stakeholder groups and carefully considered the ramifications from their viewpoints. We therefore encourage all those involved in this discussion to treat the process as complex and iterative, rather than black-and-white or needing only one round of conversation. While some of us may have gone through many discussions, many have not: many stakeholders have not heard from other groups, or colleagues with a difference source of opinion; and many feel they have not been heard by PIBS or their program's admissions committee. The heterogeneity among stakeholders underscores the challenge to implement a PIBS-wide uniform policy, as it will affect different programs differently. For example, removing GRE would have an especially large impact on evaluating (1) international students, whose language skills may be difficult to gauge without a standardized verbal test, (2) quantitative competency, especially for students who wish to pursue highly competitive research in biostatistics or bioinformatics, (3) students with low UGPA in earlier college years, some of whom may have shown a late surge of commitment to pursue graduate study, (4) students without access to research opportunities or coming from an education system where GPA and reference letters are more difficult to interpret than those from well-known institutions.

3 This document is written in Word and contains numbered sections. Readers are encouraged to insert tracked edits, comments, or links to online resources. It is meant to be authored by many.

3

In sum, it is crucial that, before we harden our opinion on the question of retaining or abolishing GRE, we need to have the patience to fully appreciate the heterogeneous demands in different situations. If GRE is "useless" in some programs but useful in others, we need to carefully weigh the expected gain in some scenarios against the potential harm in others. If one program finds little use of GRE scores, it may not have the authority to speak for the need in another program, especially before listening to each other. The burden upon all of us, before we decide it is wise to abolish GRE, is to demand clear evidence that GRE is useless in nearly all programs and nearly all student strata, is uniquely problematic when compared to other measures (such as GPA and reference letters), and there is no other solution to address potential mis-use of all these measures, which include GRE as well as other metrics: GPA, selectivity of undergraduate institution, reference letters, and research experience.

2. Sources of opinion In recent discussions we noticed that most of our colleagues see the pros and cons of GRE as a matter with many shades of gray, and they are open to hearing new data and new perspectives. Almost no one gave simple answers. Few answers can be captured by yes-no polling. By listening to what matters most to them, we came to realize that it is instructive to trace one's opinion to its many possible sources.

2.1 Personal experience as a test-taker Some students and some faculty had performed poorly in GRE in the past, and offered their own subsequent success as a reason in favor of abolishing GRE. Other colleagues saw themselves as a counter example: having entered science partly on their strong performance in standardized tests. While some students spoke of GRE as a flawed instrument that can be gamed, others recalled using GRE as a goalpost to motivate sustained efforts to prepare for professional success, an experience marked by real personal growth that went well beyond simply gaming a test. Personal experiences like these exert a powerful influence on one's opinion. We think sharing and respecting each other's anecdotes represents a key component of a healthy discussion. We also think that individual stories need be considered alongside other components of deliberation, including systematic quantitative studies, and the actual process of evaluating hundreds of student applicants. For those who did not receive high GRE scores, their very presence in our program (and their success today) strongly suggests that our admissions process had done a good job recognizing their merit. This testifies for the power of holistic use of multiple measures.

2.2 Personal experience in graduate education and/or in evaluating applicants Faculty members who devote significant amounts of their time to graduate education, and further, the subset of these who provide the departmental service of evaluating PIBS applicants, are central stakeholders of any policy change in PIBS. Yet the working experience they bring may differ from each other, does not always align with their own history as a test taker, and tends to be mysterious to our students and the general public.

4

While it is natural for an "outsider" to fear that our evaluators adopt simple cutoff scores due to ignorance of the between-group differences in GRE score distribution, to our best knowledge, our faculty are aware of such between-group differences, do not apply simplistic cutoffs, and are generally highly sophisticated in evaluating multiple noisy and often conflicting measures. It is therefore extremely valuable to explain to prospective students and the public that their worst fears are unnecessary. Still, suspicions linger, sometimes even from within our faculty members. At the town hall meeting, for example, it was stated that the uneven distribution of GRE scores among ethnic groups, by itself, is the argument that it should not be used. It was further argued, also at the town hall meeting, that even glancing at the GRE score might bias the mind of the evaluator, so that the only solution is to withhold GRE, either forever or after the ranking has concluded. While we think these views are to be respected, not least because those who hold them are no less sincere than others, we encourage our colleagues to consider the alternative: that perhaps we should not categorically doubt the basic competence of our faculty evaluators in the simple task of treating all pieces of information objectively. These suspicions stem from a presumptive fear, especially when many of us may be unfamiliar with the actual implementation in another program4. For instance, those who have only evaluated domestic candidates may be unaware of the value of verbal tests in assisting the appraisal of a foreign applicant's language skills. Having said that, we recognize that how individual evaluators/programs use or over-use GRE is not fully known. Without a careful survey we are left with mere impressions and second-guessing. A major contribution that PIBS can make is to conduct formal reviews of individual program's evaluation procedure, identify situations that lead to overweighing or underweighing of GRE. Lessons learned can be shared across PIBS programs and with graduate programs around the country.

2.3 Quantitative studies of the correlation between GRE and multiple measures of outcome For many of us, systematic studies of the predictive value of GRE are an important factor in deciding whether to use it, or how to use it. Not surprisingly, the literature on this matter is complex and often contentious. Our own conversations with our colleagues uncovered many instances of misinformation or careless interpretation of available data. For example, having inconsistent reports does not mean that GRE's predictive value has been proven inconsistent. Having between-group differences in GRE score distribution doesn't mean that they measure nothing else, or cannot be trusted to measure anything else. Despite the reports to the contrary, several large studies and most of the largest meta-analysis have shown a moderate correlation of GRE with multiple outcome measures (more details below), and such correlations are consistent and significant, even though they vary by situation, as we will explain below. Importantly, the predictive power of GRE is on par with most of the other measures in use, including UGPA, and sometimes higher.

4 Members of the pro-GRE team would be happy to discuss actual examples of a multi-variate analysis. Of the metrics considered: 1. GPA. 2. Major GPA. 3. in-class rank. 4. the university, 5. Whether there is a master degree, if yes, GPA, in-class rank, and the rigor of the program. 6. TOFEL scores. 7-9. GRE-V, Q, W percentiles, and how they are viewed differently for foreign students or in quantitative programs, 10. Research statement, 11. Personal statement. 12. Reference letters, 13. Additional factors. Not everyone is familiar with how an evaluator identifies notable positives and notable negatives, and how such ratings are combined, corroborated, and debated among the committee.

5

We think a good approach to sort through literature is for PIBS to commission a study group to conduct a systematic review of relevant data. However, for many reasons this has not happened. This is despite the often repeated statement that such discussions have gone on every year and there is nothing new to learn. Given this gap, we hope to contribute to the discussion by highlighting key findings in 3.1 and 3.2 below. We will cover reports both for and against GRE and discuss their limitations. While this summary is not the perfect solution and not the final word we hope it represents a strong effort given the time limit (see 3.1-3.2).

2.4 Societal concerns Many stakeholders see graduate admissions as an instrument to enhance societal good, such as diversity, equality, inclusion (DEI). Students from disadvantaged background, in particular, often carry fresh memories of past injustices, sometimes related to naive use of common academic metrics. As such, some see sending-the-right-message as a driving force for changing the GRE policy, and hope PIBS can be seen as a national leader for that message. We understand this reasoning. Most of us care about these values, and share the concern about public perceptions of PIBS. We also think U-M has a strong history in promoting DEI, and PIBS enjoys a highly positive reputation. Such a reputation can be further enhanced by thoughtful changes: through a carefully designed process that pulls in the diverse perspectives of all stakeholders and reflects all sources of opinions. As we discuss in 3.2, abolishing GRE may do little to enhance DEI, or even undermine it. PIBS has an excellent opportunity to lead the nation both in reaching the right decision with regard to GRE, and in following the right process to build wide support for that decision. Conversely, there is the risk of reaching a decision through careless reasoning or an incomplete and un-inclusive process. We have recently heard the view that people will never change their minds, it is impossible to reach consensus, or it's a never ending process. To the contrary, we think there are ways to bring closure for this discussion within weeks, but the right process needs to be constructed. As it happens, the process has just started. Cynics and pessimists have already withdrawn from this discussion. Those who remain continue to have faith in the process, and continue to trust the patience and good judgement of our peers. Asking for a greater understanding with those we disagree is not demanding a unanimous outcome, but an outcome where people feel they have been heard and that we have a consensus-building process. We recommend that PIBS adopt a deliberate course of action with transparent timeline; and we propose that PIBS not only seek input widely, but also seek advice on how to design a better procedure to elicit input.

3. Main arguments against the use of GRE and their broader context There have been four main arguments against the use of GRE in admissions evaluation.

3.1. Lack of predictive power In recent discussions, those who advocate abolishing GRE cited three studies 5. Below is an excerpt of the con-GRE whitepaper:

5 Additional studies, beyond the three cited, likely exist.

6

Burton and Wang (2) reported that 1) students who withdrew from programs had mean quantitative GRE averages actually higher than those completing degrees or remaining in training, and 2) the GRE is at best a predictor of first-year graduate school grades, but undergraduate GPA does this as well. A larger study (5) using data from Vanderbilt’s PIBS-like graduate program (2003-2011) reached similar conclusions and highlighted a lack of correlation in predicting who would graduate with a PhD, time to defense, number of conference presentations, number of first author papers, or ability to receive an individual grant/fellowship. Additionally, study of the Tetrad graduate program at UCSF concluded that GRE scores varied by fewer than 6 points among students ranked by faculty from lowest to highest using multiple criteria; most importantly, some of the highest-ranked students had scores below the 30th percentile on verbal and analytical portions of the GRE (6).

We would like to call attention to several finer details in the interpretation of these results, and the larger context surrounding the objective reading of relevant literature. 3.1A. References 1-9 from the con-GRE white paper actually contains two pro-GRE reports. They have much larger sample sizes than the three con-GRE studies. Kuncel 2001 (reference 4 from the con-GRE white paper) concluded

Data from 1,753 independent samples were included in the meta-analysis, yielding 6,589 correlations for 8 different criteria and 82,659 graduate students. The results indicated that the GRE and UGPA are generalizably valid predictors of graduate grade point average, 1st-year graduate grade point average, comprehensive examination scores, publication citation counts, and faculty ratings. GRE correlations with degree attainment and research productivity were consistently positive; however, some lower 90% credibility intervals included 0. Subject Tests tended to be better predictors than the Verbal, Quantitative, and Analytical tests.

Tables 3 and 5 from this study, copied blow, illustrates the generally positive correlations as well

as variability across disciplines and across different test scores. The third column, obs, is the correlation between test scores and graduate GPA (Table 3) and faculty rating (Table 5).

8

Table 9 shows that a combined use of multiple measures has higher predictive power than using single measures in isolation. This is because when A explains 30% of variance in the outcome and B explains 30%, it doesn’t mean A and B are redundant. When they explain different sets of 30%, they complement each other, and warrant their joint use.

Burton and Wong (reference 2 from the con-GRE white paper) concluded

The results indicate that the combination of GRE scores and undergraduate grade point average strongly predicts cumulative graduate grade point average and faculty ratings. These results hold in each discipline and appear to hold in the small subgroups.

3.1B. Other studies, not cited by the con-GRE white paper, also support GRE's predictive value. Kuncel and Hezlett, Science, (2007), with an overall sample of 259,640, concluded:

Four consistent findings emerged: (i) Standardized tests are effective predictors of performance in graduate school. (ii) Both tests and undergraduate grades predict important academic outcomes beyond grades earned in graduate school. (iii) Standardized admissions tests predict most measures of student success better than prior college academic records do (1–5, 7, 8). (iv) The combination of tests and grades yields the most accurate predictions of success (1–4, 7, 8).

While three letters-to-editor have appeared following Kuncel 2007, none challenged their data, analysis method, or results. The debate was on the implications of the findings. Another report, https://www.ets.org/s/research/pdf/gre_compendium.pdf has this table:

https://www.ets.org/s/research/pdf/gre_compendium.pdf

9

and

We acknowledge that these studies may have limitations that have yet to come to light. For this reason, there is value to allow the vetting process and meta-analysis to continue. We also note that some faculty members may find these population-level trends have limited impact in evaluating individual candidates. 3.1C. The UNC study (con-GRE white paper reference 3) has a small sample size (n=280). It led to this conclusion: "We found no correlations of test scores, grades, amount of previous research experience, or faculty interview ratings with high or low productivity". If we follow the logic of eliminating GRE on the basis of no significant correlations we should eliminate not just GRE, but also GPA, research experience, and faculty interview. 3.1D. The Vanderbilt study (con-GRE white paper reference 5), which was cited as “the best study” (see YouTube video) by the con-GRE team during the town hall meeting, reported "a lack of correlation in predicting who would graduate with a PhD, time to defense, number of conference presentations, number of first author papers, or ability to receive an individual grant/fellowship.” (quotes from con-GRE paper)

10

This study used nine input variables (including GRE and UGPA) to predict multiple measures of graduate student success by using regression models. The nuanced outcomes of each result are illuminating:

Graduation with PhD (Table 2; N = 495) – The GRE-only model explains 28% of the variation in this measure. Adding UGPA or any of the other input variables only increases it by one percentage point, to 29%. Further, UGPA has a positive regression coefficient and underrepresented minority status has a negative regression coefficient. If we follow this study to optimize outcome we would literally pick those with high UGPA and are not minorities. Similar trends were seen in other measures.6

Subjective mentor ratings after the defense (Table 5, N = 210). UGPA was significantly associated with classwork and writing ratings. GRE has a broader predictive power: was significantly associated with classwork, reading literature, writing, and leadership ratings. Notably, the only input variable significantly associated with the overall mentor rating is undergraduate institution selectivity. Access to selective institutions reflects many factors including socioeconomic status. This warrants concern in using this metric over other measures like GRE. GRE would assist us to select candidates from less selective and less well-known places.

3.1E. Range compression Both the UNC study and the Vanderbilt study used only students already admitted, who have a higher and narrower range of GRE scores that the original pool of applicants. Correlation analysis in this narrower range tends to underestimate the real predictive effect in the overall admissions process. In summary, prevailing evidence support a moderate and consistent correlation between GRE and academic success in the graduate school. The effect is highly variable: it is more pronounced in some disciplines and varies by (1) which GRE component (V, Q, W), (2) which outcome measures, (3) which cohort, and (4) the analytical method used.

6 Time to defense (Table B; N = 318) – The GRE model explains 8% of the variation, with GRE-V having a statistically significant contribution. The full nine-input-variable model (including GPA) explains 9% of the variation, again increasing a single percentage point. In the full model, no single input variable has a statistically significant contribution. Presentation count (Table C; N = 271) – The GRE model explains 2% of the variation, while the full nine variable model explains 3% of the variation. No single variable in the full model has a statistically significant contribution. First author publication count (Table D; N = 271) – The GRE model explains 0% of the variation, while the full nine variable model explains 1% of the variation. No single input variable in the full model has a statistically significant contribution. Individual grant or fellowship (Table E; N = 271) – The GRE model explains 2% of the variation, while the full nine variable model explains 4% of the variation. The single input variable in the full model that has a statistically significant contribution is being from an underrepresented minority group (who are more likely to receive grant or fellowship).

11

3.2. Potential bias in relation to women and minority The following figure, showing between-group differences in GRE score distribution, played a central role in arguing that GRE is biased and could contribute to a biased outcome.

The August 2 pro-GRE white paper addressed this point by stating "Diversity concerns arise only when GRE scores are used to compare individuals from different groups, because of the different ranges of scores within these groups." We think the interpretation of this figure needs to consider the following: 3.2A. Other measures are also biased; some are harder to demonstrate. Removal of the GRE would put a greater emphasis on other remaining areas, including greater weights on research experience. Opportunities to do research are not readily accessible to many foreign applicants, nor to domestic applicants from a disadvantaged socioeconomic background. Prior experience and publication, prestige of the referees, etc., could lead to a bias towards students from large research institutions, which may be even more discriminatory than what the GRE is purported to be doing. Currently, the GRE are being used holistically to offset the lack of experience in some candidates. We therefore strongly encourage PIBS to consider potential new biases that will emerge from the omission of GRE. 3.2B. A data point itself does not dictate how it will be used. A biased measure does not automatically or inevitably lead to a biased outcome. The focus of discussion in PIBS should be on the actual algorithm of using imperfect measures, especially when all measures are biased in some way yet they capture useful information, often in complementary fashion. At the simplest level, a well-known between-group difference can be adjusted, and are in fact being adjusted in practice 7. For example, high Quantitative scores

7 It is too simplistic to claim "GRE burdens disadvantaged groups" or "it predicts race and gender more

than it predicts ability".

12

from east Asian applicants are routinely seen as uninformative. URMs are typically ranked against their within-group peers. As stated in the August 2 pro-GRE white paper:

In many cases, a high GRE score has contributed to admission of successful students who have weaker performance on undergraduate GPA or lacked research experience. Similarly, students that were admitted despite low GRE scores may be equally successful because they possessed other positive attributes. Thus, the lack of correlation between GRE scores and success in the admitted subset is an indication that the holistic approach is working effectively, not that the GRE scores are without value. … GRE scores can therefore be very useful to identify areas where specific students could benefit from targeted support. This can help mentors tailor course selection and training to the student according to their discipline. Omitting the GRE scores will unnecessarily decrease the diversity of metrics that inform the holistic review process, and will undercut our current process of identifying a diverse and talented pool of students.

In summary, we question the assumptions linking GRE with reduced diversity, and we feel it is too simplistic to hope that the diversity of admitted students would be substantially improved by removing the GRE requirement. Further, there is real risk that omitting the GRE scores will unnecessarily decrease the diversity of metrics that inform the holistic review process, and will undercut our current process of identifying a diverse and talented pool of students.

3.3. Costs as a burden: how big is it? Today, the ~$200 cost allows a test taker to send scores to four institutions, for $50 per institution. After the four, each additional reporting costs $27. Since most programs require GRE, if PIBS abolishes the requirement, most applicants will still take the test simply because other programs require it (we acknowledge that this may change if most programs stop requiring it). If PIBS is one of the 4 programs an applicant applies to, he/she will save $50 when we stop requiring it; or $27 if he/she applies to more than 4. This burden is small when compared to the overall cost for both our program and the applicants. The annual cost of attending graduate school for master’s degree students in 2007–2008 ranged from an average of $28,375 to $38,665, and the cost to educate doctoral students ranged from an average of $32,966 to $46,029 (Wendler et al., 2010). In comparison, PIBS has an application fee of $75 for domestic students and $90 for international students. The burden can be reduced by waiving the fee – PIBS has this option. ETS also has a program for waiving the GRE fee. In general, our public perception is a positive one, as we have many mechanisms to elevate the financially disadvantaged applicants, including dedicated fellowships and earmarked slots in various training grants.

3.4. Outside trends While NIH training grants and individual NRSAs do not require GRE, these opportunities are limited to domestic students who have already advanced for several years in a graduate program. For them, many other measures have become available that make GRE less important.

13

PIBS have many ways to demonstrate its national leadership, including organizing a dispassionate and measured analysis of the best available data, and an inclusive, unbiased process of soliciting perspectives. We have a golden opportunity to do these.

4. Recommendations

4.1 To committee chairs and members We suggest that each committee considers a series of decisions in sub-areas rather than a single binary decision. This is in keeping with the complexity of the matter itself and reflects the nuanced thinking already exist among our faculty. For example, based on the specifics of your program, do you want to keep or abolish GRE for international candidates? For domestic applicants, do you want to explore the idea of waiving GRE for some applicants, such as those already had a US-based master degree in a relevant field, or are otherwise confident that he/she had many strong measures. Individual programs can also share their view on how to design the best process to make PIBS-wide decisions.

4.2 To PIBS We don't think now is the right time to make a hasty decision, as the threshold of "First do no harm" is high. Rather, the following ideas need to be given time for discussion. 1. Lead the nation in commissioning a comprehensive review of how different measures are used in practice, covering the granular level of individual programs and individual evaluators.

2. Institute regular updates of best practices in holistic evaluation; and mandate training in DEI and unconscious biases.

3. Compile and analyze in-house statistics to identify problem areas, so that any policy change has a clearly measurable objective. The analysis needs to differentiate those who apply, those who received offer, those who accept the offer, stratify by program and ethnic groups.

4. Study how multiple metrics are used in conjunction with each other in practice, rather than the hypothetical shortcomings of using one in isolation. Seek help from experts in quantitative analysis of observational data.

5. Form a faculty advisory committee to provide counsel on a wide range of policy matters.

1. Interested parties, and their heterogeneity 2 2 ...

Documents

Transcript of 1. Interested parties, and their heterogeneity 2 2 ...