Developing professional standards for EFL testing … · Developing professional standards for EFL...

41
Developing professional standards for EFL testing in China: Contexts, considerations, and challenges Presenter: Jinsong Fan, Ph.D. Fudan University Email: [email protected]

Transcript of Developing professional standards for EFL testing … · Developing professional standards for EFL...

Developing professional standards for EFL testing in China: Contexts, considerations, and challenges

Presenter: Jinsong Fan, Ph.D.

Fudan University

Email: [email protected]

1

Overview

1. Standards in language testing and assessment

2. Language testing in China: What’s special?

3. Standards development and implementation in other testing contexts

4. Fundamental considerations for developing professional standards in China

5. Challenges facing standards development and validation

6. Conclusions and future studies

2

1. Standards in language testing and assessment

A dictionary definition of ‘standard’:

‘Standard’ refers to a level of quality, skill, ability or achievement by which someone or something is judged, that is considered necessary or acceptable in a particular situation. (Longman Advanced American Dictionary, 2000, p. 146)

3

Definition 1: The skills and/or knowledge required to achieve mastery and proficiency levels leading to mastery, along with the measures that operationalize these skills and/or knowledge and the grades indicative of mastery at each level ( Davies, 2008, p. 437) – e.g. cut-score, CEFR, ACTFL

Definition 2: An agreed set of guidelines which should be consulted and, as far as possible, heeded in the construction or evaluation of a test (Alderson, Clapham, & Wall, 1995, p. 236) – e.g. Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1999), ETS Standards for Quality and Fairness (ETS, 2002)

Two interpretations of ‘standards’ in language testing

4

1. Developing a language test and ensuring that it is valid or useful is extremely difficult (see e.g. Bachman, 1990; Alderson, Clapham, & Wall, 1995; Bachman & Palmer, 1996; Henning, 2001; Fulcher & Davidson, 2007, etc.).

Why are standards important?

5

2. The important role that language tests play in modern society: Results on language tests are often used to make a variety of high-stakes decisions such as admissions, employment, promotion, immigration, and citizenship (see, e.g. Spolsky, 1995; Shohamy, 2001a, 2001b; McNamara, 2005, etc.).

3. The call for better accountability, transparency and fairness in language testing practices (see e.g. Kunnan, 2000, 2004; Shohamy, 2001a, 2001b; Bachman, 2005; Bachman & Palmer, 2010; Xi, 2010, etc.).

6

Accountability

Responsible professionalism (Boyd & Davies, 2002)

The need for those involved in the testing act to assume responsibility for the tests and their uses (Shohamy,

2001b)

(Accountability) requires shared authority, collaboration, involvement of different stakeholders – test takers included – as well as meeting the various criteria of validity… The cost of this approach is high… it takes more time, it involves more people, it requires greater resources… But…the cost is worth paying in order to demonstrate the ethicality of the profession (Shohamy, 2001b, p. 161-2)

7

Is test-related information equally accessible to all test candidates?

Can test users access relevant information to make informed decisions about test candidates?

Is information about the test quality available to all stakeholders?

Can independent researchers access relevant test data to investigate the quality of language tests?

8

Transparency

Traditional conceptualization of test fairness:

Fairness as absence of bias

The relationship between test

validity and test fairness: ‘A test has

to be valid to be fair’ or ‘A test has

to be fair to be valid’?

Theoretical frameworks of test fairness (Kunnan, 2000, 2004) and approaches to investigate test fairness (AERA, APA, & NCME, 1999; Xi, 2010)

9

Fairness

10

2. Language testing in China: What’s special?

1) A huge testing country with a large number and variety of English language tests developed, administered and used at different levels and for different purposes

11

2) Large scale, high stakes, and strong washback effects on English teaching and learning

Large scale: Huge test population

High stakes: The results of English language tests are often used to make important decisions such as admissions into higher education, employment, the awarding of academic degrees, application for the permanent residential permit in major cities, etc.

Strong washback effects: What is tested becomes what is taught: “the assessment tail wagging the educational dog” (Briggs, 1992, p. 11)

Almost all tests are developed and administered by the relevant examinations authorities.

The quality and authority of language tests are seldom questioned or challenged.

Language testing is considered as more administrative than academic behavior (Yang & Gui, 2007).

Test developers are believed to be solely responsible for the fairness of all testing operations.

Stakeholders such as test takers and teachers cannot participate in the testing process as equal partners (see also Shohamy, 2001a).

12

3) A highly centralized testing system

Report of the Task Force on Testing Standards (TFTS) to the International Language Testing Association (ILTA) (ILTA TFTS, 1995): A survey of language testing standards worldwide

13

3. Standards developed and implemented in other testing contexts

A collection of 110 standards from all over the world with 58 standards identified as guidelines of good testing practice.

Many standards in the collection have been revised or updated since the completion of the project (e.g. AERA, APA, & NCME, 1985).

14

Language testing standards in use: A few examples

International Language Testing Association (ILTA) ILTA Code of Ethics (ILTA, 2000; see also Davies,

1997; Boyd & Davies, 2002) ILTA Guidelines for Practice (ILTA, 2007)

Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1999)

Code of Fair Testing Practices in Education (JCTP, 2004)

ETS Standards for Quality and Fairness (ETS, 2002)

15

Association of Language Testers in Europe: The ALTE Code of Practice (ALTE, 1994; see also Avermaet, Kuijper, & Saville, 2004)

European Association of Language Testing and Assessment: EALTA Guidelines for Good Practice in Language Testing and Assessment (EALTA, 2006)

Japan Language Testing Association: The JLTA Code of Practice (JLTA, 2007; see also Thrasher, 2004)

Why reinvent the wheel, particularly if there is an excellent wheel already? (McNamara & Roever, 2006, p. 146; see also Alderson, Clapham, & Wall, 1995).

16

4. Considerations for developing standards in the Chinese context

Consideration 1: Why a new set of standards?

17

Towards a new set of standards

1) The special features of language testing in China: Applicability and validity of other sets of standards

2) The product and process perspectives The product perspective: A set of guidelines for language testing in the Chinese context; The process perspective: Awareness-raising, more discussions about quality and professionalism, transparency and fairness, etc.

18

Consideration 2: Who are the targeted audiences/users of the standards?

In any testing context, be it centralized or decentralized, test validity and fairness will eventually be compromised without the collaboration of all stakeholders in the testing process, including test design, administration, and use (Fan & Jin, 2013).

19

Testing standards

Test developers

Other stakeholders

Test users EFL teachers

Test takers

Figure 1: The targeted audiences of the standards

20

Consideration 3: What purpose(s) do the standards serve?

Primary Purposes (Educational & Aspirational)

To enhance the awareness of quality among test developers

To promote among the

stakeholders the basics of

language testing

Targeted Audiences

Test developers

Test takers

EFL teachers

Test users

Educational officials and administrators and other stakeholders

Expected Outcomes Provide a

benchmark of good testing practices

Enhance professional awareness

Promote dynamics between test developers and the other stakeholders

Pursue better washback effects

Figure 2: Purposes, audiences, and expected outcomes

21

Consideration 4: How to generate the standards?

Three possible approaches to standards development

The top-down approach: An organization develops the

standards and impose the

standards on all.

The bottom-up approach: The

standards reflect the

consensus of the individual

test developers.

The interactive approach: A

combination of the top-down

and bottom-up approaches

22

Theoretical Frameworks Good practices in language testing Validity and validation Test administration and use Test fairness, etc.

Contextual Features Identify the features of the local

context Investigate the current situation of

language testing practices

Language Testing Standards

Co

nsid

eration

s for valid

ity

Co

nsi

der

atio

ns

for

valid

ity

23

5. Challenges facing standards development and validation

Challenge 1: The collection, review, and critique of the standards in existence

What standards to include, and what standards to exclude: The criteria for standards selection

How to review the standards in the collection: Theoretical frameworks and research methodologies

How to build upon the existent research on language testing standards in the generation of our own standards?

How to determine the macro-structure of the standards?

If some qualities (e.g. practicality) are more relevant to the local contexts, shall these qualities be prioritized over others in the standards?

If some areas are identified as particularly problematic in the current language testing practices, shall these areas be emphasized in the guidelines?

If requirements of the local features run into conflict with what is generally held as proper in language testing (e.g. protecting the privacy of test candidates’ scores), how shall these requirements be reflected in the guidelines?

24

Challenge 2: The reflection of local features in the standards

The ILTA Code of Ethics (ILTA, 2000): The failure to uphold the Code may have serious penalties, such as the withdrawal of the ILTA membership on the advice of the ILTA Ethics Committee.

ETS Standards for Quality and Fairness (ETS, 2002): The audit requirements – help to ensure that ETS products and services will be evaluated with respect to a uniform, rigorous set of standards through a well-documented process.

25

Challenge 3: Enforceable or not enforceable?

Are enforcement mechanisms feasible?

Are standards without enforcement mechanisms not meaningful/valuable?

Can the standards without enforcement mechanisms be adopted and adapted by individual test developers so that effective local enforcement mechanisms can be built?

How to develop the enforcement mechanisms and to ensure their implementation in language testing practices?

26

Are the standards applicable to the testing context?

How to validate the standards?

How to continuously improve the standards based on the validation research? Shall standards validation include the investigation of the impact and consequences of the standards? Who shall be held responsible for investigating the validity of the standards?

Who shall get involved in the process of standards validation?

27

Challenge 4: Standards validation

The language testing standards developed and implemented in different parts of the world have clearly indicated the pursuit for better quality and professionalism in language testing and assessment (see also Davies, 1997, 2004; Boyd & Davies, 2002).

The salient features of language testing in the China call for a set of professional standards which can cater to the needs and circumstances of language testing in the Chinese context (see also Yang & Gui, 2007; Fan & Jin, 2011, 2013).

28

6. Conclusion and future research

29

A new set of standards will help to raise the awareness of professionalism among test developers and enhance the involvement of other stakeholders in the process of language testing.

The standards shall be targeted at both test developers and the other stakeholders, including test takers, EFL teachers, test users, publishers, curriculum designers, and officials with educational or examinations authorities, etc.

An interactive model shall be adopted in the generation of the standards with a view to both reflecting the theoretical frameworks in language testing and local features.

29

30

Building a corpus of language testing standards: Standards selection, review and critique; distilling useful experience and avoiding pitfalls (see Fan & Jin, 2010, 2012).

Investigating the current language testing practices in the Chinese context: Identifying the gap between the current practices and the practices as reflected in the professional standards (see Fan & Jin, 2013).

Investigating stakeholders’ views and perceptions of good testing practices: Involving stakeholders in standards development.

Investigating the validity of the standards: Applicability, usefulness, impact and consequences, etc.

Future studies

31

1. Alderson, J. C., Clapham, C. & Wall, D. (1995). Language test construction and evaluation. Cambridge: Cambridge University Press.

2. Avermaet, P. V., Kuijper, H., & Saville, N. (2004). A code of practice and quality management system for international language examinations. Language Assessment Quarterly 1 (2 & 3), 137-150.

3. Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.

4. Bachman, L. F. (2005). Building and supporting a case for test use. Language Assessment Quarterly 2 (1): 1-34.

5. Bachman, L. F. & Palmer, A. S. (1996). Language testing in practice. Oxford: Oxford University Press.

6. Bachman, L. F. & Palmer, A. S. (2010). Language assessment in practice. Oxford: Oxford University Press.

7. References

32

7. Boyd, K. & Davies, A. (2002). Doctor’s orders for language testers: The origin and purpose of ethical codes. Language Testing 19 (3), 296-322.

8. Briggs, J. (1992). The psychology of educational assessment and the Hong Kong scene. Bulletin of the Hong Kong Psychological Society 28/29 (4), 5-26. Davies, A. (Guest editor.) (1997). Special issue: Ethics in language testing. Language Testing 14.

9. Davies, A. (Guest editor.) (2004). Special issue: Ethics in language testing. Language Assessment Quarterly 1 (2/3).

10. Davies, A. (2008). Ethics, professionalism, rights and codes. In Shohamy, E. & Hornberger, N. H. (Eds.) Encyclopedia of language and education (2nd edition). Vol. 7: Language testing and assessment, 429-433.

11. Fan, J. & Jin, Y. (2011). The way towards a code of practice: A survey of EFL testing in China. Research paper presented at the 33rd Language Testing Research Colloquium (LTRC). Ann Arbor: The University of Michigan.

33

12. Fan, J. & Jin, Y. (2012). Developing a code of practice for China’s EFL tests: A data-based approach. Research paper presented at the 34th Language Testing Research Colloquium (LTRC). Princeton, New Jersey: Educational Testing Service.

13. Fan, J. & Jin, Y. (2013). A survey of EFL testing in China: The case of six examination boards. Language Testing in Asia (3), 7.

14. Fulcher, G. & Davidson, F. (2007). Language testing and assessment: An advanced resource book. Routledge: London and New York.

15. Henning, G. (2001). A guide to language testing: Development, evaluation and research. Beijing: Foreign Language Teaching and Research Press.

16. ILTA-TFTS. (1995). Report of the task force on testing standards (TFTS) to the International Language Testing Association (ILTA). Retrieved from http://www.iltaonline.com/images/pdfs/tfts_report.pdf.

17. Kunnan, A. J. (Ed.). (2000). Fairness and validation in language assessment. Cambridge: Cambridge University Press.

34

18. Kunnan, A. J. (2004). Test fairness. In Milanovic, M. & Weir, C., (Eds.), European language testing in a global context: Proceedings of the ALTE Barcelona Conference (pp. 27-48). Cambridge: Cambridge University Press.

19. Longman. (2000). Longman advanced American dictionary. The author. 20. McNamara, T. (2005). 21st century Shibboleth: Language tests, identity and

intergroup conflict. Language Policy 4 (4), 351 - 370. 21. McNamara, T. & Roever, C. (2006). Language testing: The social dimension.

Oxford: Blackwell Publishing. 22. Shohamy, E. (2001a). The power of tests: A critical perspective of the uses

of language tests. London: Pearson Education. 23. Shohamy, E. (2001b). Democratic assessment as an alternative. Language

Testing 18 (4), 373-91. 24. Spolsky, B. (1995). Measured words. Oxford: Oxford University Press. 25. Thrasher, R. (2004). The role of a language testing code ethics in

establishment of a code of practice. Language Assessment Quarterly 1 (2 & 3): 151-160.

35

26. Xi, X. (2010). How do we go about investigating test fairness? Language Testing 20 (10), 1-24.

27. Yang, H. & Gui, S. (2007). The social dimensions of language testing. Modern Foreign Languages, 4, 368-378.

36

1. AERA, APA, & NCME. (1985). Standards for educational and psychological testing. Washington D.C.: AERA.

2. AERA, APA, & NCME. (1999). Standards for educational and psychological testing. Washington D.C.: AERA.

3. ALTE. (1994). The ALTE code of practice. Retrieved from http://www.alte.org/attachments/files/code_practice_eng.pdf.

4. EALTA. (2006). EALTA guidelines for good practice in language testing and assessment. Retrieved from http://www.ealta.eu.org/documents/archive/guidelines/English.pdf.

5. ETS. (2002). ETS standards for quality and fairness. Princeton, New Jersey: Author.

6. ILTA. (2000). Code of ethics. Retrieved from http://www.iltaonline.com/images/pdfs/ILTA_Code.pdf.

8. Appendix: Standards in this presentation

37

7. ILTA. (2007). The ILTA guidelines for practice. Retrieved from http://www.iltaonline.com/images/pdfs/ILTA_Guidelines.pdf.

8. JCTP. (2004). Code of fair testing practices in education. Washington D.C.: Author.

9. JLTA. (2007). The JLTA code of good testing practices. Retrieved from http://www.avis.ne.jp/~youichi/COP.html.

38

The preparation of this presentation was supported by the National Social Sciences Fund (Guojia Sheke Jijin) under the project of “The development and validation of standards in language testing” (Project No. 13CYY032).

9. Acknowledgement

39