Grammatical CarefulnessScale: Scale Development and Its ...

Japan Society of English Language Education

NII-Electronic Library Service

JapanSociety ofEnglish LanguageEducation

Foreign Language Grammatical CarefulnessScale:

Scale Development and Its Initial Validation

Kunihiro KUSANAGI

Junya FUKUTA Graduate SbhooL Aidgqya UitiversityJopan Sbcietyfor the P)"omotion ofStrience Yusakti KAWAGUCHI

Yu TAMURA

Aki GOTO

AI(ari KURITA

Daisuke MUROTA Graduate SbhooL Migaya Uhiver:sity

Abstract

This study aimed to develop and validate a scale to measure the Grarnmatical Carefulness

(GC) of fbreign language learners. GC, by its definition, refers to psychological, behavioral, and

meta-cognitive traits of a learner, and it entails highly controlled, cautious, analytical, and

time-consuming laiiguage use. By conducting a sct of questionnaire surveys targeting Japanese

jum'or high school, high school, and university students (N = 2,288), a Foreign Laiiguage

Gramrnatical Carefulness Scale (FLGCS) with 14 items, written in Japanese, was developed and

tested for its factorial stmcture, reliability, convergent, content, and criterion validity. The results

demonstrated that FLGCS yields three factors: (a) phonological, (b) lexical-syntactic, and (c)pragmatic carefulness, with a high reliability fbr each. The factorial validity was also supported byusing both exploratory and confirmatory factor analyses. Further, a set of analyses confirmed

various types ofvalidity. [[he evidence fbr the validity is as foliows: (a) the 1inguistic experts (n =

10) consistently judged that all the iterns properly referred to each factor in an appropriate

linguistic sense, (b) FLGCS showed correlations with learner beliefs, consisteni with theoreticalexpectations, and (c) FLGCS correlated to the scores of a C-test, and with the time to finish the

C-test. The applical)ility ofFLGCS in EFL teachng and research wi11 also be discussed,

1. Background

largeUndoubtedly,

grammatical perfbrrnance of a secondlforeigri language shows a relatively

variance among learners in comparison with that of their first language. Researchers in

77



JapanSociety ofEnglish Language Education

secondrforeign language Iearning and teaching have thus attempted for a long time to discover

which factors explain this large variance in grammatical perforrnance among individnals. Second

language acqpisition (SLA) theories, fbr instance, have offered various frameworks relating to the

development of grammatical perfbrmance (e.g., Segalowitz & Segalowitz, 1993). Task-related

factors serve as another important contributor to the variance (e.g,, Tarone, 1985). Also,

behavioral and psychological traits within individuals, such as aptitudes, attitudes, motivation,

beliefs, and anxiety are yet more important factors, It is selievident that such factors interact in a

complex manner to predict one's grammatical perfbrmance, and they may jointly affect the

acquisition ofa second!fbreign lariguage.

It is a consideral)le challenge to cover all of these factors using just a single framewotk,

However, a concept which has been commonly adopted in the field of cognitive psychology and

psychological measurement is possibly one which captures the inter-learner variance of

gramrnatical perfbrmance; that is, Speed-Accuracy Tradeoff (SAT), SAT generally proposes that

task perfbrmance, with regard to various aspects, shows a very similar pattern, whereby faster

actions result in lower accuracy, while slower actions have a higher accuracy (e.g., Dennis &

Evans, 1996; Goldhammer & Kroehne, 2014; van der Linden, 2007). See plot (a) in Figure 1,which graphically represents this concept. Taking an example in the case of gramrnatical

perfbrmance, it can be said that ifatask is speeded up or timed, or the test taker is inahurry, they

may exhibit reduced accuracy. On the other hand, if the person doing the task can take enough

time to accomplish the task, helshe can plan and monitor histher language use deliberately. 'Ihis

basically leads to higher accuracy, This tendency of SAT may be common in many aspects of

language use and language assessment.

Theoretically, thc SAT pattern cxhibits functions as in (a) of Figure 1. Howeyer, it is

hypothesized that, in parallel to ski11 development, the funetions fbr SAT will also change, as in

(b) in Figure 1. [[he changes ofthe functions may correspond to some ofthe SLA theories, such as

ski11-acquisition theory or automatization (e,g,, Segalowitz & Segalowitz, 1993).

b98e

(a) {b)

'h,,'N,,,'1,`,,,...

bg8e

(c)

eqt!;tillli-.:-r-.-H-=----- XL"-":::::.-::J S-L 1 -LLu.-

-Ls--'--::

it9ge

hL;IL-hlRTittLlllxtLs-s

hsit;x:""----E)..

-L>tS--'-'-L--

hxXL-"ei--

s--

Speed Speed Speed

FVgure i, Schematic plots of the concepts in SAT. Plot (a) shows the basic tradeoff

pattem, Plot (b) explains the changes of the fUnctions caused by development. Plot (c)shows the inter-learner variance ofthe compromise points.

78




There is yet another important viewpoint in this framework, Irrespective of development,

irrter-learner variance still rernains, This can be captured by considering the compromise point of

SAT functions of individuals. Assume that one person at least in a specific situation tends to

prioritize accuracy, and another does not choose fbr accuracy, but speed, A part of such a variance

of compromise points among individnals can be determined by histher psychological and

behaviora1 traits. This may be responsible for the rest ofthe variance, as in (c) in Figure 1 ,

The present study calls this hypothetical trait Grammatical Carefulness (GC), which we will

view as a constmct in a psychometric sense.i The next section will introduce the concqpts of ([}C

and review some of the relevarrt studies in the literature. This article wi11 report the developmentand initial validation ofa scale to measure this new constmct, GC, in the latter sections.

2. Grammatical Carefulness in Foreign Language

2.1 Definition of Grammatical Carefulness

GC in foreign laiiguage refet;s to a behavioral, psychological, and rneta-cognitive trait of

individuals which is characterized by the fbllowing: (a) it entails highly cautious, carefu1,

deliberate, intentional, and analytical language use, fo) it promotes relatively slow,

time-consuming, and cogriitively demanding language use and leaming, which leads to a higher

accuracy oflearners with some grammatical tasks, and (c) it complexly links to other inter-learner

varial)les, such as aptitudes, attitudes, motiyation, beliefs, and anxiety.

The SAT framewotk regards GC as a moderator of the compromise points, In other werds,

it is hypothesized that someone with a higher GC tends to achieve higher accuracy at the expense

of speed, and another person with a lower GC tends to perfbrrn speedily and less accurately.

2.2 Grammatical Performance and Inter-Learner Variables in the Literature

A couple of previous studies attempted to reveal the relationships between inter-learnervariables and grammatical per[Ebrmance. For instance, Krashen (1978), in his early theoretical

wotk, suggested that there are two types of second laiiguage learnersi monitoFunder-usens and

monitor-over-useng (See also Seliger, 1980). Kormos (1999) extended this idea, and empirically

investigated the effects of the two different spealdng styles of individuals (aecuracly-centered and

fluenay-centerec() on their selfcorrection behaviors by observing Ll-Hungarian English learners'

speech production and questionnaire answers. Kormos looked at the interplays among the

speaking styles and the frequency of selfcorrection behaJvior; the aceuracy-centered participantsshowed higher frequencies of selfcorrection behavior than those with a fluency-centered style.

One other case is a recent classroom-based study condncted by Kartchava and Ammar

(2014), which investigated the effect of learner beliefs ahout corrective feedl)ack on noticing

behaviors and leaming outcomes. They reported that some beliefs mediated the frequencies of

noticing behaviors, but not the learning outcomes.

79




These studies focused only on very specific behaviors and situations of learners'

perfbrmance; Kormos (1999) is concerned with speaking, especially selfcorrection, while

Kartchava and Ammar (2014) are concerned with noticing, fbr instance. The studies picked up

very 1irnited learners' traits (speakmg styles and beliefs ahout corrective feedbacki. More critically,

some studies did not consider the measurement as constmcts. For instance, Kormos' study only

used five questionnaire items for determining the learnersJ speaking styles, of which reliability

and validity remained unclear, Kartchava and Arnmar's study, on the other hand, reported the

reliability and factorial stmctures, but ofcourse further validation would be desirable,

The present study takes a broader view, wnh some methodological sophistication regarding

the relationships between grammatical perfbrrriance and inter-leainer variables. GC is a trait of

individuals which directly links to grammatical performance in general, unlike beliefs regarding

some specific behaviors, GC also has value with regard to its relationships with other types of

inter-learner variables, such as beliefs. ln the latter section, fbr initial validation ofthe developedscale, we will report that the scales of GC are actually correlated to a part of grammatical

perfbrmance, and we show the theoretically plausible relationships with certajn types ofbeliefs.

2.3 Signhicance of GC in Theory and Practice

Estal)lishing GC as a psychological constmct and developing its reliable measures would bea promising way to shed light on various fields of study in the future. For instance, in

psycholinguistic experiments, it can be considered that controlling and establishing the variahles

of SAT-related inter-learner variance such as GC will be both theoretically and methodologically

lmportant,

In classroom-based studies, GC can be applied to measure moderators of the outcome of

students' leaming directly. Also, the impacts on GC of certain teaching metheds or treatments

would be interesting research topics. More specifically in practice, understating students' traits,

such as GC, can provide much information to teachers in the context of curriculum design, choice

ofteaching materials, and everyday teaching practice. A reliable, validated, and also easy-to-use

psychological scale is hence strongly desired.

3. Scale Development

3.1 The Preliminary Survey

in order to develop a psychological scale to measure GC, we condncted two questionnaire

surveys. The main purpose of the first tpreliminary) survey was the initial item selection for the

scale. In total, 169 students in two private universities participated, All of the participants were

first year students who took English classes. Their academic majors included economics and

education. The survey was canied out at the beginning of Apri1, 2014, The participants answered

the questionnaire during their English classes. The questionnaire consisted of <a) a face sheet, (b)

80




GC items written in Japanese (k = 40), as detailed below, and (c) anchor scales (k = 14, Tanaka &

Ellis, 2003). The surveys were conducted in the style ofa Likert scale, from one to seven points.

The initial item pool (k = 40) was created by the authors. By referring to learners'

retrospective data about lar)guage use in the literature, the authors composed the items in

consultation with each other. All the items in the initial item pool are availahle online at the first

author's website (see Appendix),

All the data were typed and were verified twice by the authors. The response of one

participant was excluded because ofsome missing values; the number ofvalid responses was l68.

Before the initial item selection by factor analysis, we excluded 18 items which obviously violated

the normal distribution, Since the goodness of fit indices of the initial exploratory factor analysis

(22 items were submitted), which extracted three factors, were unfavorahle, 7 items which caused

a misfit were excluded, using a step-wise exploratory factor analysis (SEFA). Hence, 15 items outof40 were selected for the secondary study. These 1 5 items can be seen in the Appendix,

3.2 The Secondary Survey

The secondary survey was undertaken from May to June, 2014, using the selected items (k= 15). In total 2,288 participants took part in the survey, and 2,098 answers, with no missing

values or extraordinary responses, were analyzed, The participants consisted ofjunior high schoolstudents sampled from two public schools (n =-

216), high school students (n - 1,078) from two

normal public schools, and university students from 1 1 national, public, and private universities (n= 804). Almost all of the university students were first year students and had various academic

majors. Junior high and high school students, on the other hand, were sampled in a well-balanccd

way in terms oftheir academic years.

As in the preliminary survey, the participants answered the questionnaire in their English

classes. The questionnaire consisted of the face sheet and the 15 items related to GC. The

secondary survey used a computer-readable questionnaire, Ihe data were automatically processedusing scanners and computers. Then, the authors yalidated the responses twice by hand,

Firstly, descriptive statistics of all the valid answers (n = 2,098) were calculated. Befbre

condncting factor analyses, we confirrned the distributions of all the responses (k = 15), Item No.

7 showed a strongly biased distribution, which may negatively affect the factorial stmcture. Hence,the item was excluded. Then, we conducted an exploratory factor analysis to determine theconstmcts ofGC. This study also perfbrmed confirrnatory factor analyses for the model in order to

confirm its factorial validity.

The distributions of the responses are graphically represented in the multiple histograms in

Figure 2. Tal)le 1 summarizes correlation coeMcients and the variance!covariance matrix of the

item responses (k = 14, excludmg item 7).

81



JapanSociety ofEnglishLanguage Education

[teml ltem2 1tem3 1tem4 1tem5

)h 7lt )h )h h

.O'g･liMlb.

:g!ma S!IMk S:IM]} SEinlh.., IS57 1357 1357 1357 1357

Rate Ftare Rate Hate Rate

ltem6 lteme 1tem9 ltemlO

h h )h }.

iElg:n.. Slma iy'llMln- lk.!l:[!b.. 1357 135T 1357 1357

nste Mte date tete

1temM ltem12 ltem13 ltemd4 ltemrl5 S X )h pt X

SEI[!11)i [1!lma S':lua Ig･!ima i,-!in#I) 1357 la57 1357 1357 1357

fete Flate Rate Rate Rate

]Fligttre 2. Histograms representing the distributions of the responses.

Table 1,Cbrrelation CbEz(i7cients and P'Ziriancel(]ovariance Matrix ofthe ftem Responses,ItemNo.1 2 345689101112131415

123･4568910111213l4152.01

.49

,55

.50

.47

.43

.40

･op.44

.36

.36

.40

.40

.35

1.062.35

.41

51

.32

.37

,37

.33

.32

.39

.36

.39

.40

,37

1,Ol,811,66,59

,53

55

.43

.63

.63

.29

,35

.44

.42

,32

,991,091.061,93

,39

.40

.37

.48

.44

.26

,34

.45

.41

.31

,97

.731.00

.792.14

.37

.3S･"

.48

.29

.36

.39

.37

.30

,79.74.92.72.701.72.45.57.58.36.30.32.3528 .83

.83

.81

.75

.75

.g62.13

.55

,56

,36

,35

.38

,42

.39

.82.671.07.87.85.991,051.73,71,33.34.40.40.36 .82

.641.06

.80

,93

.991,081221.72

.30

.37

.38

.39

,31

.81

.95

,59

.57

,66

.74

,82

,68

.612.45

.63

.54

,57

,66

.79

.84

,69

,72

.81

.60

.78

.69

.741.502.32

.67

.68

.66

.Sl

.87

,81

,90

.82

.61

.80

.76

.721211.462.06

.72

.61

.84

,92

.81

.86

,80

,69

,92

.79

.771.331,561,552,25

.61

.76

,86

.62

.65

.67

.55

,86

.72

,631.591,541,341.402.32

7Vbte. Values on the left side represent correlation coeencierrts, right for covariance.

Table 2 presents the descriptive statistics, the surnmary of the fhctor analysis, and the

reliability for each factor. The factor analysis extracted three factors. The goodness of fit indices

were demonstrated to be not favorable, btrt at an acceptahle level, x2(52) ,== 285.44, p < .Ol, TLI

= ,93, RMSEA

= .08, and with a 90 % confidence interval (CI) [.07, .08]. Items No. 11 to 15

82

NII-Electionic




showed higher loadings fbr Factor 1, and these items were all related to the carefulness towards

the phonological aspects of grammatical perfbrmance. Items No, 6 to 1O loaded Factor 2 heavily,

and these oorresponded to the lexical-syntactic aspects, The rest of the items (from Item No. 1 to

5) showed relevance fbr the pragrnatic aspects. Hence, this study narned the factors phonolQgiealcarEzti(lnesty, lexical-syntactic carefuiness, andprtigmatic carE:tiiiness respectively,

Table 2.

Descriptive Statistics and the Regults ofthe Ex/ploratoiy jFkectorAnalysis

Descriptive statistics Pattern!Structme matrix

ItemM saSkewness KurtosisFactor 1Factor 2Factor 3Communality

131412151110968421353.573,823.844.083.103.263263.303,963.464,093.553203211.441,501,521,521,561.311.311,311,461.391,531.421.291,46O.24O,15O.18o.oo-O.02O.30O.30O,24O.06022-O.06O.27O.25O.40.O.40-O,53-O,59-O,63-O,78.O.09-O.11-O,16-O,45-O,31-O,65-O.32.O.08-O,28.87/.80.86/.81.801.80.751.83.70/.75-.OIL44.OIL49.04L41-.02f,44-.08L40-.05!.49-,10!.58-,11L44.12L49.22f,42,70.65.57,67,66

-.03f.44-.Ol!.46-,OIL40,18!.48.92L86.82/.83.601.6857L65

-.071.59.02f.62,11!.55-,04L50.74.69.47.44

-,07L44,16L48.04L47-.121.45,06L43-.04/.64-.131,42.09L55.421.75.29L56.841.77.621.62.591.77.54L77.32L57 .60Al.48.68.37

FactorCorrelationsFactor

2Factor 3

,55,63

,73

Reliahilitya

coesucients

Average correlation coeencients

.90,64 .84,57 .82.48

Sums ofsquares ofloadings

Proportionofvariance

Cumulative proportion ofvariance

3,28

.23

,23

2.68

.19

.43

2,16

J5

.58

?Vbte. The factor analysis was conducted using maximum 1ikelihood estimation method, and

Promax rotation, with the number of factors, three, as suggested by tlie parallel analysis, and we

judged that this model was also theoretically the most plausible.

83

NII-Electionic Libiaiy




The reliability of each factor was calculated with Cronbach's a coethcients and average

inter-correlation coecacients, The reliability coecacients of all the factors were sucacient, as can

be seen in Table 2.

The model was then submitted te confirrnatory factor analyses. All the paths to the observed

variables were statistically significant atp < .O1 , and the goodness of fit indices showed acceptable

levels, ln order to observe the differences in this factorial stmcture among the three groups of

participants, confirmatory factor analyses, using the same model, were condncted by dividing the

three groups (see Figure 3 fbr its path diagi;am). Table 3 sumrnarizes the comparison of the

goodness offit indices among the groups. The indices showed almost equal goodness offit among

the groups. Also, multiple sample stmcture equation modeling was used to detect the differences

arnong the groups. We tested four models: (a) "configural",

ofwhich the paths are equal among

the groups, fo) "weak measurement invariance", of which loadings are invariant, (c) a

"strong

measurement inyariance", of which loadings and intercepts are invariant, and (d) another type of

the former, ofwhich loadings, intercepts, and means are invariant, As the results demoilstrate, all

of the models showed a goodness of fit, as in Table 4. The fburth model, which was under the

strongest constraints, was the best model. Hence, it can be safely stated that at least the factorial

structure and its loadings were not invariant among the groups. This suggests that the scale which

the present study developed can measure the GC ofvarious levels oflearners.

Table3.

Cbmparison ofGoociness of]FVt indices among the Groups

(lroup n f of p CFITLI RMSEA SRMR

A!1Junior

high

HighUniversity

2,098 1,121.27

216 206.98

1,078 567.61

804 537.29

74747474<.Ol<.

Ol<.

Ol<. Ol

.94.93.94.92 .92.92.92.91 .08 [.07, ,08]

.09 [.08, .1 1]

.08 [.07, ,09]

.09 [.08, ,11]

.05.05.05.06

Table 4.

Sle(mmar;I? ofMeasurement invariance among the Muttipie Sbnrples

Model f ctf"pCFI RMSEA BIC

Configuralmodel

Weak measurement invariance model

(equal loadings)

Strong measurement invariance model

(equal loadings + intercepts)

St!!Qng-gigqsuig!ugn!-igya!iaggg-!ugdg!tr t dl

(equal loadings + intercqpts + means)

1,311.87

1,333.l8

1,345.09

-138045

222

2an

266

<.Ol .93

<.Ol .93

<,Ol ,93

Z2tZ2 <,Ol ..9.3

.05

.05

.04

=04

90,127.45

89,980.58

89,824.28

.8.N.....981385

84




.3s/.s4l.s41(･si}-)[iiEiiilll'N .so!.6s1.6s1(.7o)

.631.671.67/(.66}-!l!Ellll21- .61/.ss/.s7

.391.32!.31/(.33}p m .78/.82/.83

･4i1･4g1.s4/{.so}-.Il!Eliilll. ･77/･7i/･6s

,63!,6S/,60

.6o i .s7 / .64 / (･6o) -b[iiEiiil5]'

,451,52f,S4/(.51)p a .74/.S9/.68

.391,561.61!{.55)p a .781.661.63

,231,32f.31/{.3o}-[l!EiiiSl. .88f.82/.83!

,91/.80/.84l .161,3s/.3o!C3o) ltemlO

,4Sl,46f,48!C47) ltemll ,74/J3/.

.34!.33/.28/(.31) ltem12 .821.82/.

,331,311,37/(,33} ltem13 .s21.831,

.281,301.3s!C31) ltem14 ･85/.84/.

.81/.77/.801(.78) .351,411.36/{.39) ltemlS

.F7gure 3. Path diagram represeming the model in

! university 1 (all)". N= 2,098.

The results so far provided sufficierrt

psychological scale, foreign Language

which yields the three factors: phonological

pragmatic carefulness, For reference, the descriptive

All the scores exhibited a normal distribution.

.65/.S61.53/(.S7)

(.64)

question, with standardized estimates. The

standardized estimates for each group were shown in the form of `tiunior

high school / high school

empirical evidence for establishing the new

Gvammatical Ckerefulness Sbate (FLGCS, hereafter),

carefulness, Iexical-syntactic carefulness, and

statistics of the summated scale scores are

sumniarized in Tal)le 5. Phonological carefulness exhibited relatively higher scores than the others.

Table 5,

Descriptive Slatistias ofthe Shrmmated Sbale SZroregkM

sw Skewness Kurtosis

Phonological carefulness (item No.1 1 to 15)Lexical-Syntactic carefulness (item No. 6 to 1O)

Pragrnatic carefulness (item No. 1 to 5)

All

545143.883.443.503.621.271.111.llO.98O.14O.22O.14O.13-O.35O.06-0.35-O.13

Nbte, The summated scale scores here were the mean scores for the responses ofthe items fbr

each. The factor scores were not used here. n =

2,098.

85




4. Initial Validatien

For the mitial validation procedure, the present study further examined the content and

criterion validity of FLGCS by condncting multiple fbllow-up analyses. We tested the three

hypotheses below (hypothesis I to III),

H)/pothesis I: The contents ofall the items in FLGCS match the theoretical concepts ofeach

factor. For instance, it was hypothesized that the items fbr phonological carefulness actually refeT

to the phonological aspects of grarnmatical perfbrmance in linguistic terms.

thpothesis IZIi Each type of grammatical carefulness is correlated to learner beliefs with a

medium level of strength. More specifically, GCs are correlated to analytic beliefs (Tanaka &

Ellis, 2003) more strongly than to experiential beliefs.

Hilpothesis I[l: GCs are correlated to the accuracy ofa C-test, which is supposed to measure

general lariguage perfbrmance, and the time which test-takers take to complete the test. As

discussed in the Background section of this study, GC was considered as a type of moderator of

compromise points in the SAT framework; thus, it is assumed that semeone with a higher GCshould exhibit higher accuracy and lower speed in the task.

4.1 Content Validity

Ten 1inguistics experts voluntarily participated in this part of the study, Using an online

version of the questionnaire, we asked the participants to read the questionnaire items (k = 14)

carefu11y, then to select which type of grammatical performance the item refers to, in linguistic

temis, by choosing from one of four alternatives: (aj phonological, th) lexical-syntactic, (c)pragmatic, and (d) none ofthem. It was not allowed to skip an item.

The result was that all the participants answered that the items No. 1 to 5 referred to the

pragrnatic aspects of grammatical perfbrrnance, 6 to 1O the lexical and syTitactic aspects, and 1 1 to

15 the phonological aspects. This provides us with empirical support for the content validity of

FLGCS on a certain point.

4.2 Criterion Validity

4.2.1 Relationship with Learner Beliefs

In order to confirrn a part of the criterion validity (especially convergent and discriminatevalidity) of FLGCS, this study investigated the correlation patterns between FLGCS and two

types,oflearner beliefs, analytic and experiential beliefs (H/mpothesis M. Analytic and experiential

beliefs (AB and EB for each) were established by Tanaka and Ellis (2003). The ft)rmer type refers

to learners' beliefs which support analytical types of learning methods and their benefits, and

consisted of 7 questionnaire items (e.g., I can learn well by writing clown evetything in no?

notebooe, while the other type supports experiential ones, with 7 questionnaire items (e.g., I can

86




learn well Lly spealdng with otheKs in English). All ofthe items in the Japanese-translated version

are also available online (see Appendix). Theoretically, it can be expected that GC and beliefs

show some correlations, and GCs are related to analytic beliefs morc strongly than experiential

ones in terms of their conceptual relevance.

The data ef this section was cempared with that of the preliminary study, which included

both the GC questionnaire items and the learner belief items. [[hus, all the participants (168answers were used) were first year university students.

Firstly, the descriptive statistics and the reliability coethcients for each of the' summated

scale scores were calculated (see Tahle 6), The sarnple showed relatively higher experiential

beliefs, and a lower level of GCs than the results ofthe secondary study. It is possible to infer thai

the subsample had a tendency to support experiential beliefs preferahly and be less grammaticallycarefu1. We judged that the relial)ility ofeach score was sufficient (.73 to .91).

Then, a correlation analysis among the five summated scale scores was conducted. Figure 4

graphically sumrnarizes the correlation pattem and the disuibutions of the scores, We also used

classical mutti-dimensional scating (Crvfl)S, also known as principle cooTdinate ana4ysis; see

Coxon, 1982). CMDS is a statistical method to visualize the similarity ofvariables. Based on the

correlation coefficients matrix, CMDS can place each variable on a two-dimensional scale. Thus,

it can be interpreted that a pair of closer variables in the plot means that they have a higher

correlation, and more distant varial)les show lower correlations. Figure 5 shows the results of

CMDS.

Table 6.

Descriptive Sinttstias and Reliability ofthe Sle{mmated Sbale Sloores ofE[LGC:S andLearner Beli(:tS

k M saSkewness Kurtosis ct

Phonological carefulness (PH)Lexical-Syntactic carefulness (LS)Pragmatic carefulness (P)Analytic beliefs (AB)Experiemial beliefs (EB)

545 3.492.983,301.211.331.29 O.19

O.54-O,57

O.04O.16O.67.87,91,89

77 4.154.59O.95O,97O,13-O,33O.46O.52 ,73,74

IVlote. n == 168.

The results ofthe correlation analysis clearly supported Ilypothesis U; all ofthe GSs showed

low to middle levels of correlation coecacients, but more specifically they were more strongly

related to analytical beliefs, PH: r= ,63, with its 95% CI being [,53, .71], LS: r == .51 [.39, .61], P:

r =

.61 [.51, ,70], than to experiential ones, PH: r =

,33, with its 959'6 CI being [.19, ,46], LS: r= ,22 [.07, .36], P: r= .39 [.25, .51]. Also, as Figure 5 presents, all ofthe GCs were located closer

to analytic beliefs than to experiential ones. [rhis links perfectly to the conceptual relevance among

them.

87



JapanSociety ofEnglishLanguage Education

1 3 5 7 1 3 5 7

tn{] ma Ei] tw [iSl l･ ieniilistl[IEI][l5i][l21]

'

ff pm kifim [IIEi] [EIIEI l' :- pa MM rm EII

r

papas[iiijllimal 1 3 S 7 I S 5 7 1 3 5 7

jF igure 4. Scatter plot matrix representing the

correlation coeracients on the upper side, the

histograrns in the middle columns, and the

scatter plets with 1inear regressions on the

lower side.

9-

:

g

8

:

R

P,(i]ilg

-1 ,O-O,5o,oO.51,O

Figure 5. Plot representing the distancesbetween the variables, based on their

correlation coeMcients matrix.

4.2.2 Relationship with the Performance of a C-test

This section wi11 report the results of the experiment which investigated the relationship

between GC and language perfbrmance. We assumed that GC as a moderator in SAT will show a

correlation with both the accuracy (score) and the speed (time to complete) ofa language test. The

present study focused on the perfbrrriance ofa C-test.

The number ofpanicipants was 77. All of the panicipants were first year university students,

The participants overlapped with the secondary study. After the secondary study, they participatedin a C-test (detailed below) as a part ofthe learning activity of their English classes, in June, 2014,All the participants were women. We also used the data about their GC, as determined in the

secondary study.

TIhe C-test was created by the authors (also avai1ahle on the authors' web page). The text

type was narrative (a letter to a writer's friend), The length ofthe text was 249 words, including

some blank words. The number of items folanks) was 17, which is equal to almost 7% of the

whole text. The readability scores ofthe text, ignoring the blanks, were 91 at Flesch Readmg Ease,

2.6 at Flesch-Kincaid Grade Level; these levels are usually regarded to be easy in･fbreigri

language reading studies. The examiner asked the participants to fi11 in the blanks in the untimed

condition (there was no time 1imt), but also asked the participants to report the time when they

had completed answering. A digital count-up timer was displayed on the monitor of the

classrooms, and the participants could note the time when they finished answering, using this.

Table 7 presents the descriptive statistics and the reliability fbr the smmated scale scores of

GCs, the scores of the C-test, and the time to complete the test. This subsample may harre shown

88




lower GCs in cornparison to the whole data of the secondary study, The reliahility coethcients

were acceptable, Figures 6 and 7 summarize the correlation patterns, as in the previous section.

Table 7.

Description of the Stimmated Sbale Sbores of GCs,C-7layt

the Slrows, and the 7Tme to Cbmplete the

k M swSkewness Kurtosisa

Phonological carefulness (PH)Lexical-Syntactic carefulness (LS)Pragmatic carefulness (P)ScoreTime

to complete (sec)

545 3.182.622.79O,97O.91O.87O.03O,20-O.08O.03-O.14-O.74.81.79.79

17n.a. 5,45491,34 2.35138.36O.58O.54 O.421.45,63n,a.

IVbte, n = 77.

135 04Slt

[ImuEi61ESIEillEIZI: i ilRili rm [il2i] [!!l] [III6]

F

tw Eiiill] [ii[iiN [illiE] [ilill r

iew ge ge [illl [ilii]

e

!llllll [kiii] [iiiiE] [liillll IZilill : 1SS lt34 ZIO Eco

jFVgure 6. Scatter plot matrix representing the

distributions and the correlations between thevariables fbr H)/pothesis ILIL

g･

gg-

g-

8g8

dy

S re

wy

-O.6 -O.4 -O.2 O.O O,2 O.4 O,6

jFVgure 7. Plot graphically representing the

results of CmoS.

The results supported I]5/pothesis llI. GCs are correlated to both the scores, PH: r = .35

[,21, .48], LS: r= ,41 [.28, .53], P: r - .38 [.24, .50], and the tmes, PH: r - .27 [,12, ,41], LS: r

= .36 [,22, ,49], P: r

= .31 [.17, .44], with low to middle levels for the coedicients. The results of

CMDS also suggest that GCs have a correlation with the score and the time, with almost the same

magnitudes. This means that GC links to both the accuracy and speed of language perforrnance,exactly as the framewotk of SAT expected,

4.3 Summary of the Initial Vafidation

Our initial validation provided infbrmatien regarding both the content and criterion validity

of the scale. The sumrriary of the results of our mitial validation, using a hypothesis testing

procedure, is shown in Table 8.

89




Table 8.

71Jie SIimmary pfthe Results ofthe initial Vbliciation

Hypotheses Content Results Evidence

I ThecontentsofalltheitemsinFLGCSmatch Supported Properly judged by 10

the theoretical concepts of each factor. Iinguistic experts

II GCs are correlated to analytical beliefs more Supported Showed the correlation

strongly than to experiential beliefs. pattern exactly as expected

III GCs are correlated to both the score ofa Supported Showed the correlation

C-test and the time to finish the test, pattern exactly as expected

5. General Discussion

The results presented al)ove lead us to conclude that the new psychological scale, FLGCS,

with its three factors (k = 14), is a statistically reliahle measure. Its stmctural, content, and criterion

validity were also supported by conducting multiple analyses. Funhermore, multiple sample

stmctural equation modeling demonstrated that FLGCS showed measurement mvariance among

the groups. However, FLGCS has a couple ofpotential limitations, as noted below.

Most importantly, regardless of its high reliability, FLGCS covers only a small area of GC

as a constmet. As inter-correlation and inter-factor correlation coefficients sigriified quite strongly,

the questionnaire items may measure very close behaviors and characteristics of individuals. This

phenomenon is called bandwidlrh:fidelity dilemma. However, since GC is a new concept, our

preliminary aim was to establish a reliable scale at the expense ofits coverage, in order to provide

a basis for further research. As in the Background section and the literature review of the present

study, the rationale for GC underlies the concepts of SAT, and the scale was mainly designed to

be applied in psycholinguistic studies, classroom-based studies, and teaching practice. Needless to

say, less reliable measures lead to attenuation problems, statistically, Hence, we presumed to

judge that a more reliable scale was preferable in this case.

Obviously, validation is not a dichotomous judgment and fUrther validation is always

strongly desired, A part of the evidence which the initial validation provided may cover only a

very small range of validity. Futurc studies should confirm the links between GC and other types

of individual diffbrences, the development of ([}C, and relationships with other types of

grammatical perfbrrnance (e.g., grammaticality judgment, sentence verification, and imprornptu

speech), Additionally, whereas the present study was a 1arge-scale survey, it never denies the

existence of sampling errors, Data with more varied and 1arger samples wi11 also be needed.

It should be noted that the present study failed to assess the feasibility and the consequential

aspects of validity. It wi11 be importarrt to analyze the washback efft)cts on leaminglteaching

behaviors oflearnersfteachers in practice.

90




6. Conclusion

This study developed and validated a psychological scale to measure GC, which is related to

irrter-learner variance on SAT. SAT, a sophisticated framework conceming human behaviors,

may explain a large part of language perfbrmance, and GC, as an individual's trait, will be key tocapturing the dynamics of numerous varial)les concemed with language perforrnance. However,

the importance ofthis is not limited to theories ofsecondlfbreign language acquisition and use.

In teaching practice, FLGCS can also provide teachers mnch infbrmation about their students.

FLGCS will enable us to understand students' traits. It wi11 also help teachers specify what kmd of

grammatical carefulness (phonological, lexical-syntactic, and pragrnatic) of a panicular student is

(in)suthcient. The infbrmation wi11 contribnte to the everyday teaching practice of English by

playing various roles in the work ofteachers. Likewise, FLGCS will be usefu1, even fbr learners to

understand their own traits, This may promote leamers' selfregulated learning.

Notes

LThe terrn GC has numerous simi1ar terTns such as meta-lingtiistic awarenexy, language

awareness, language sensitivity, and grammatieal sensitivity, However, these generally refer

to one's knowledge, or certain types oflanguage-related ski11s, which are mainly measured by

language tests. We intended to refer to GC only as a psychological and behavioral trait, which

we consider to be fUndamentally separate from language knowledge or ski11s, However, wealso assume that they may be correlated to each other to some extent.

References

Coxon, A, P. M, (1982). 77)e userls guide to multidimensional scaling: Mith speeial rc!XZirence to

the MDS. London: Heinemann Educational Books.

Dennis, I., & Evans, J. St. B, T. (1996), The speed-error trade-offproblem in psychometric testing.

British .loumal ofRsychology, 87, 105-129.

Goldhammer, F., & Kroehne, U. (2014). Controlling individuals' time spent on task in speeded

perfic}rrnance measures: Experimenta1 tirne limits, posterior time limits, and response time

modeling. Al?plied Rsycholqgi'cal A4easurement, 38, 255-267,

Kartehava, E., & Ammar, A. (2014). Learners' beliefs as mediators ofwhat is noticed and learned

in the language classroom. 1:ES()L euarterly, 48, 86-109.

Krashen, S, D. (1978), Individnal variation in the use ofthe monitor. In W. Ritchie (Ed.), Sticond

language acquisition research (pp. 175-183). New Yotk, NY: Academic Press,

Kormos, J, (1999). The effect of speaker variables on the selfcorrection behaviour of L2

learners. Srgtem, 27, 207-221,

91



Japar ユ　Society 　of 　English 　Language 　Education

Segalowiセ，　N ，

，＆ Segalowitz

，　S．」．（1993），　Ski11ed　pe曲 mance

，　practice，　and 　the　differentiation

　　of 　speed −up 丘om 　automa 重ization　effbcts ： Evidence倉om 　second 　language　word 　recognition ．

　　ノ勿 1ガθ4勾 6乃o〜inguistics，／4，369−385，Seliger

，　H ．（1980），　Utterance　planning　and 　correction 　behavior： Its　ft ction 　l皿 the　grammar

　　construction 　process　ibr　second 　language　leamers． H ．W ．　Dechert，　M ．　Raupach （Eds．），

　　 Tovvarcts　a 　cro ∬ 伽 g istic　a ∬ essment （抑 θε吻 ro ぬ o伽（pp、87−99）．　Frank顛，　Germany ：

　　Lang．

Tanaka，　K ．

，＆ Ellis

，　R ．（2003）．　Study　abroad ，

　language　proficiency，　and 　leamer　beliefs　about

　　language　lean血 g．　L 尻乙τソ：ournal

，25

，63− 85・

Tarone，　E ．（1985）．　Variability　in　interlanguage　use ： Astudy　of 　style−shift血g　in　morphology 　and

　　 syntax ．　Language　Leaming，35，373−404，van 　der　Linden

，　W ，　J．（2007）．　A 　hierarchical　frarnework　for　modeling 　speed 　and 　accuracy 　on 　test

　　items．　Psychometrika，73

，287− 308．

Appen 〔hx

Foreign　Language（］ran 〃natical 　Carefalness　Scale　（FLGcs ）

Item　1

（P）

ltem　2

（P）

ltem　3

（P）

ltem　4

（P｝

Item　5

（P）

Item　6

（LS）

ltem　8

（LS）

外国語を使うとき，会話の流れの不自然さに

ついてよく考える

外国語を使うとき，表現が文脈にあわないと

考えこんでしまう

外国語を使うとき，一貫してない表現や曖昧

な表現にはよく気がつく

外国語を使うとき，一貫していない表現があ

ると考えこんでしまう

外国語を使うとき，失礼な表現や丁寧過ぎる

表現がよく気になる

外国語を使うとき、語の形の変化の誤りには

よく気がつく方だ

外国語を使うとき，単語のつづりが間違って

いるとよく気になる

1tem　 9

（LS）

ltem　10

（LS）

Item　11

（PH）

Item　t　2

（PH）

Item　13

（PH）

item　14

（PH ）

ltem　15

（PH ）

外国語を使うとき，文章の中で間違った単語

があるとよく気がっく

外国語を使うとき，単語の間違いにはよく気

づく方だ

外国語を使うとき，発音が正確が考えること

が多い

外国語を使うとき，いつも発音が正しいかど

うか気になる

外国語を使うとき，発音が正確でないと考え

こんでしまう

外国語を使うとき，発音が誤っていると気に

なってしまうことが多い

外国語を使うとき，発音が本当に正確か確認

することがある

Note．　Item　7　was 　deleted（see　the　section　concerning 　scale 　development）．　Note，　just　for　reference ，

　　 Item　7：“外国語を使うとき，文法規則に合わない表現によく気がつく

”．　 P ＝pragmatic　carefUlness ，

　LS ＝

　　lexical−syntactic　carefUlless ，　PH ＝

phonological　carefUlness ．　Supplementary　data　including

　　（a）the　mitial　item　pool，（b）the　C−test，　and （c）the　final　version 　of 　the　questio皿 aire 　used 　in

　　 the　present　stUdy ，　are 　available 　at　the　lilst　author

’swebsite ：

hゆ s：1／sites．google．com ／site！kUsanagikt　ni！home！pr（｝jects／gc

92

N 工工一Electronic 　 Library 　

Grammatical CarefulnessScale: Scale Development and Its ...

Documents

Transcript of Grammatical CarefulnessScale: Scale Development and Its ...