Grammatical error detection from English utterances spoken ... · Fig. 1. Training of an N-gram...

4
Proceedings of the Second APSIPA Annual Summit and Conference, pages 482–485, Biopolis, Singapore, 14-17 December 2010. Grammatical error detection from English utterances spoken by Japanese Takuya Anzai , Seongjun Hahm , Akinori Ito , Masashi Ito and Shozo Makino Tohoku University, Sendai 980-8579 Miyagi Japan E-mail: {takuya,branden65,aito}@makino.ecei.tohoku.ac.jp Tohoku Institute of Technology, Sendai 982-8577 Miyagi Japan E-mail: [email protected] Tohoku Bunka Gakuen University, Sendai 981-0943 Miyagi Japan E-mail: [email protected] Abstract—This paper describes methods to recognize English utterances by Japanese learners as accurately as possible and detects grammatical errors from the transcription of the utter- ances. This method is a building block for the voice-interactive Computer-Assisted Language Learning (CALL) system that en- ables a learner to make conversation practice with a computer. A difficult point for development of such a system is that the utterances made by the learners contain grammatical mistakes, which are not assumed to happen in an ordinary speech recog- nizer. To realize generation of accurate transcription including grammatical mistakes, we employed a language model based on an N-gram trained by generated texts. The text generation is based on grammatical error rules that reflect tendency of gram- matical mistakes made by Japanese learners. The experimental results showed that the proposed method improved recognition accuracy compared with the conventional recognition and error detection method. I. I NTRODUCTION With the progress of globalization in recent years, popu- lation of English learners has been increased. Among vari- ous English learning methods, Computer-Assisted Language Learning (CALL) system is one of the most promising learning methods. Most CALL systems focus on training for reading, writing and listening, and few commercial CALL systems provide with training method for speaking. Recently, CALL systems for speaking practice have been developed. However, conventional CALL systems with speaking practice focus on training of pronunciation or intonation [1,2,3]. To improve conversation skills, it is necessary for learners to practice conversation in a real dialogue. So, several voice- interactive CALL systems for conversation practice have been developed [4,5]. One of the biggest problems on realizing such CALL sys- tems is how to recognize learners’ utterance correctly. In most cases, utterances spoken by a foreign language learner contain pronunciation errors and grammatical mistakes. However, a voice-interactive CALL system must recognize a learner’s utterance correctly including grammatical errors to progress dialogue smoothly and to give pertinent feedbacks. Therefore, to develop a CALL system for grammar or expression prac- tice, we have to improve recognition accuracy of utterances containing grammatical errors. In this paper, we propose a method for recognizing learner’s utterances and detecting grammatical errors. This work is an improvement of recognition method based on text generation and N-gram trained from the generated text [6]. We revised the rules of generation of sentences with errors for improving recognition performance. On developing the CALL system, we assume that learners first do a pre-exercise on the fundamental vocabulary and grammatical expressions used in a specific situation, and then make a conversation with the system on the topics. The learner’s utterance is recognized by the system using a speech recognizer. If the utterance contains any errors, the system points them out and demonstrates a correct utterance. In this way, learners progress a conversation practice [6]. Because learners already know keywords through the pre- exercise before the conversation with the system, we can expect learners to respond to the system using the same expressions as those appearing in the pre-exercise [5]. We call a correct sentence expected to be uttered by learners “a target sentence”. However, in reality, not all utterances made by a learner match the target sentences. A sentence actually uttered by a learner is referred to as “an uttered sentence”. And then, the recognition results by the speech recognizer are often different from the uttered sentences because of recognition errors. We refer to a sentence obtained from the speech recognizer as “a recognized sentence”. II. RECOGNITION OF LEARNERS UTTERANCE USING AN N- GRAM FROM GENERATED TEXT In this section, we describe the approach by Ito et al. [6], which is our previous work. An N-gram LM is powerful and flexible with regard to unpredictable errors, but it requires a large amount of training data. For training an N-gram without large training data, we developed a method to train an N-gram using artificial sentences generated from the target sentences. First, we prepare grammatical error rules that are frequently made by Japanese learners. Then, the rules are applied to the target sentences, and a sufficient amount of sentences are generated. Finally, a back-off N-gram is trained from the generated sentences. 482 10-0104820485©2010 APSIPA. All rights reserved.

Transcript of Grammatical error detection from English utterances spoken ... · Fig. 1. Training of an N-gram...

Page 1: Grammatical error detection from English utterances spoken ... · Fig. 1. Training of an N-gram from generated text TABLE IV E XAMPLE OF UTTERED SENTENCES Freq. Uttered sentences

Proceedings of the Second APSIPA Annual Summit and Conference, pages 482–485,Biopolis, Singapore, 14-17 December 2010.

Grammatical error detection from Englishutterances spoken by Japanese

Takuya Anzai∗, Seongjun Hahm∗, Akinori Ito∗, Masashi Ito† and Shozo Makino‡∗ Tohoku University, Sendai 980-8579 Miyagi Japan

E-mail: {takuya,branden65,aito}@makino.ecei.tohoku.ac.jp† Tohoku Institute of Technology, Sendai 982-8577 Miyagi Japan

E-mail: [email protected]‡ Tohoku Bunka Gakuen University, Sendai 981-0943 Miyagi Japan

E-mail: [email protected]

Abstract—This paper describes methods to recognize Englishutterances by Japanese learners as accurately as possible anddetects grammatical errors from the transcription of the utter-ances. This method is a building block for the voice-interactiveComputer-Assisted Language Learning (CALL) system that en-ables a learner to make conversation practice with a computer.A difficult point for development of such a system is that theutterances made by the learners contain grammatical mistakes,which are not assumed to happen in an ordinary speech recog-nizer. To realize generation of accurate transcription includinggrammatical mistakes, we employed a language model based onan N-gram trained by generated texts. The text generation isbased on grammatical error rules that reflect tendency of gram-matical mistakes made by Japanese learners. The experimentalresults showed that the proposed method improved recognitionaccuracy compared with the conventional recognition and errordetection method.

I. INTRODUCTION

With the progress of globalization in recent years, popu-lation of English learners has been increased. Among vari-ous English learning methods, Computer-Assisted LanguageLearning (CALL) system is one of the most promising learningmethods. Most CALL systems focus on training for reading,writing and listening, and few commercial CALL systemsprovide with training method for speaking. Recently, CALLsystems for speaking practice have been developed. However,conventional CALL systems with speaking practice focus ontraining of pronunciation or intonation [1,2,3].

To improve conversation skills, it is necessary for learnersto practice conversation in a real dialogue. So, several voice-interactive CALL systems for conversation practice have beendeveloped [4,5].

One of the biggest problems on realizing such CALL sys-tems is how to recognize learners’ utterance correctly. In mostcases, utterances spoken by a foreign language learner containpronunciation errors and grammatical mistakes. However, avoice-interactive CALL system must recognize a learner’sutterance correctly including grammatical errors to progressdialogue smoothly and to give pertinent feedbacks. Therefore,to develop a CALL system for grammar or expression prac-tice, we have to improve recognition accuracy of utterancescontaining grammatical errors.

In this paper, we propose a method for recognizing learner’sutterances and detecting grammatical errors. This work is animprovement of recognition method based on text generationand N-gram trained from the generated text [6]. We revisedthe rules of generation of sentences with errors for improvingrecognition performance.

On developing the CALL system, we assume that learnersfirst do a pre-exercise on the fundamental vocabulary andgrammatical expressions used in a specific situation, and thenmake a conversation with the system on the topics. Thelearner’s utterance is recognized by the system using a speechrecognizer. If the utterance contains any errors, the systempoints them out and demonstrates a correct utterance. In thisway, learners progress a conversation practice [6].

Because learners already know keywords through the pre-exercise before the conversation with the system, we canexpect learners to respond to the system using the sameexpressions as those appearing in the pre-exercise [5]. Wecall a correct sentence expected to be uttered by learners “atarget sentence”. However, in reality, not all utterances madeby a learner match the target sentences. A sentence actuallyuttered by a learner is referred to as “an uttered sentence”.And then, the recognition results by the speech recognizerare often different from the uttered sentences because ofrecognition errors. We refer to a sentence obtained from thespeech recognizer as “a recognized sentence”.

II. RECOGNITION OF LEARNERS’ UTTERANCE USING ANN-GRAM FROM GENERATED TEXT

In this section, we describe the approach by Ito et al. [6],which is our previous work. An N-gram LM is powerful andflexible with regard to unpredictable errors, but it requires alarge amount of training data. For training an N-gram withoutlarge training data, we developed a method to train an N-gramusing artificial sentences generated from the target sentences.First, we prepare grammatical error rules that are frequentlymade by Japanese learners. Then, the rules are applied tothe target sentences, and a sufficient amount of sentencesare generated. Finally, a back-off N-gram is trained from thegenerated sentences.

482

10-0104820485©2010 APSIPA. All rights reserved.

Page 2: Grammatical error detection from English utterances spoken ... · Fig. 1. Training of an N-gram from generated text TABLE IV E XAMPLE OF UTTERED SENTENCES Freq. Uttered sentences

TABLE IGRAMMATICAL ERRORS IN THE SST CORPUS

Rank Correct Uttered Type1 a ε DEL2 the ε DEL3 ε the INS4 a the SUB5 to ε DEL

In this text generation method, two kinds of error rulesare used: corpus-based error rules and generic error rules.The first ones are extracted from transcriptions of Englishutterances by native Japanese speakers. The SST (StandardSpeaking Test) corpus was used as a source of the error rules[7]. The SST corpus consists of transcriptions of interviewswith native Japanese speakers in English. The grammatical andlexical errors in this corpus are manually annotated with thecorrect expressions. Table I shows the errors most frequentlyobserved in the corpus. The symbol ε denotes the absence ofa corresponding word. The error types DEL, INS and SUBindicate a deletion error, an insertion error and a substitutionerror, respectively. The second ones are manually created tocover a wider range of error types. These error rules includeinsertion of articles, confusion between singular and plural,adjective comparative and verb tense errors.

On applying these two kinds of rules, the generic error ruleswere first applied, and the corpus-based rules were appliednext. The probabilities of choosing a corpus-based rule are inproportion to the frequencies of the errors in the SST corpus.Conversely, the probabilities of applying a generic error ruleare given a priori. The final recognition performance was notgreatly affected by the probabilities of the generic error rules.

In the corpus-based rules, only substitution and deletionerror rules were applied to one word. However, errors appliedto more than one word (e.g. Correct: office worker, Uttered:work man) and insertion errors also exist in the SST corpus.Therefore, the error rules concerning multiple words shouldbe treated properly.

III. REVISION OF SENTENCE GENERATION RULES

A. Contraction rules

In the previous work, generic error rules include contractionrules such as “I am → I’m” or “He’s → He is”. However,for a contraction is not a mistake, it is inappropriate toregard a contraction as a grammatical error. Therefore, wegenerate sentences with and without contractions, and all ofthe generated sentences are used as new target sentences.

B. Corpus-based rules

Ito et al. [6] did not use insertion error as corpus-basederror rules because contextual information is indispensablefor applying such rules. Instead, only insertion of articleswas involved in the generic error rules. In this paper, weaddress various insertion errors including article insertionsconsidering POS (parts-of-speech) of the words before andafter the insertion words.

It is necessary to determine the POS of all words thatappeared in the SST corpus to extract the insertion error rules.

TABLE IIINSERTION ERRORS IN THE SST CORPUS

Rank Preceding POS Insertion Following POS1 preposition the singular noun2 base form verb the singular noun3 not 3rd person singular, the singular noun

present tense verb4 preposition or infinitive “to” the singular noun5 not 3rd person singular, a singular noun

present tense verb

TABLE IIIMULTIPLE WORD SUBSTITUTIONS IN THE SST CORPUS

Rank Correct Uttered1 like it like2 a lot of lot of2 a little bit little bit3 will have have3 see it it

We estimated POS tags of all words using Brill’s Tagger [8],and counted insertion errors and the POS tags around theinsertions. Table II shows the rules most frequently observed inthe corpus. These results show that most of the insertion errorsmade by native Japanese speakers are insertions of an articlebefore a noun. The other ones are conjunction or prepositioninsertion errors.

The errors applied to more than one word were not usedin the previous work. Table III shows the multiple wordsubstitutions most frequently observed in the corpus. Theserules are also included in the revised error rules.

C. Generic error rules

The generic error rules of the previous work include inser-tion of article, contraction of words, singular/plural confusionof nouns, comparative form of adjectives and misuse of tenseof verbs. Among these error rules, insertion of articles andcontraction rules are excluded from the generic error rulesbecause they are included in other than the generic error rules.In addition, the following error rules concerning adjectives areadded.

• The rules for generating “base form”, “more + base form”(e.g. more good) and “more + comparative” (e.g. morebetter) from comparative form of adjectives.

• The rules for generating “base form”, “most + base form”(e.g. most good) and “most + superlative” (e.g. mostbiggest) from superlative form of adjectives.

These errors were frequently observed in the SST corpus.

D. Thesaurus-based word substitution rules

If the learners utter a wrong word and the word did notappeared in the SST corpus, it is impossible to recognizethat word correctly because that word is not included in thevocabulary of the recognizer. To solve this OOV (out-of-vocabulary) problem, we used WordNet [9] to reduce the OOVby acquiring synonyms and hypernyms. We refer to these rulesas the thesaurus-based word substitution rules.

E. Generation of training text using the error rules

Fig. 1 shows an overview of N-gram training using the errorrules. We first apply all sorts of error rules according to the

483

Page 3: Grammatical error detection from English utterances spoken ... · Fig. 1. Training of an N-gram from generated text TABLE IV E XAMPLE OF UTTERED SENTENCES Freq. Uttered sentences

Fig. 1. Training of an N-gram from generated text

TABLE IVEXAMPLE OF UTTERED SENTENCES

Freq. Uttered sentences6 I’m an office worker. I work at a car company.2 I’m an office worker. I worked at a car company.1 I’m an office worker. I work at car company.1 I’m a worker. I work at a car company.1 I’m a office worker. I work at a car company.1 I’m a office worker. I worked at an car company.

output probability of error words. The error rules are appliedword by word in the following order.

• Insertion errors considering POS context• Multiple-word substitution errors• Corpus-based substitution and deletion errors• Generic error rules or thesaurus-based word substitution

rulesIf a word is as is after the corpus-based error rules are

applied, generic error rules or thesaurus-based word substitu-tion rules are applied according to the probability of applyingthesaurus-based word substitution rules.

IV. EXPERIMENT

A. Data collection

To carry out an experiment under the condition similar to theassumed CALL system, we collected utterances by Japanesenative speakers as follows.

1) English sentences (the target sentences) containing key-words were prepared.

2) Learners were asked to remember the target sentences.No time limitation was set for the memorization task.

3) Japanese translations of the target sentences were pre-sented to the learners, and the learners were asked to

TABLE VEXPERIMENTAL CONDITIONS

Acoustic model 512-mixture 5-state HMMAcoustic feature MFCC, ∆MFCC, ∆∆MFCC,

∆pow, ∆∆powDecoder Julius 4.1.2Output probability of 0.2error wordsNumber of generated texts 100,000/target sentence

utter English translation of those sentences. Promptsfor utterances were given using synthesized voice. Inaddition to the memorized sentences, the learners werealso asked to translate Japanese sentences that were notincluded in the previously presented target sentences.

Table IV shows an example of uttered sentences correspond-ing the target sentence “I’m an office worker. I work at a carcompany.”.

B. Effect of the contraction rules

We carried out an experiment to examine the effect ofrevision of the contraction rules. In this experiment, we usetwo sets of target sentences: one contains the original targetsentences, and the other one includes the sentences generatedby the contraction rules. The system performance was mea-sured using the word accuracy of the recognized sentences.The test data consisted of 126 sentences spoken by 15 students,each of which includes contraction or contractible expression.Table V shows the experimental conditions.

Word accuracy using the sentences generated by the pro-posed contraction rules was 84.88%, while that of the conven-tional method was 83.46%. The performance of the proposedmethod that used multiple target sentences was better thanthe method where the contraction rules were included in thegeneric error rules.

C. Effect of revision of error rules

Next, we investigated the effect of revision of other typesof error rules: the corpus-based rules, the generic rules andthe thesaurus-based rules.

According to the experiment, word accuracy was not veryaffected by the revision of the corpus-based and generic rules.However, 6 words decrease in OOV was observed by applyingthe corpus-based rules because of the insertion rules and themultiple word substitution rules.

Next, we examined the effect of thesaurus-based wordsubstitution rules upon word accuracy and the number of OOV.The test data included 441 sentences spoken by 15 students,and other experimental conditions are same as shown in TableV. Applying the thesaurus-based word substitution resulted indecreasing word accuracy by 0.2. However, the number ofOOV decreased by 4 words.

D. Comparison of total performance

Next, we compared the conventional error rules and therevised error rules. The performance was compared using wordaccuracy as well as recall and precision of grammatical errordetection. The probability of applying thesaurus-based word

484

Page 4: Grammatical error detection from English utterances spoken ... · Fig. 1. Training of an N-gram from generated text TABLE IV E XAMPLE OF UTTERED SENTENCES Freq. Uttered sentences

82

83

84

85

86

87

88

89

0 0.1 0.2 0.3 0.4 0.5

acc[

%]

output probability of error words

conventionalrevised

Fig. 2. Word accuracy of the methods with respect to error probability

substitution rules was 0.4 when we used revised rules. Theexperimental results are shown in Fig. 2. It shows the wordaccuracy with respect to the output probability of error words.The results show that the best performance was obtained whenthe revised rules were applied and the output probability oferror words was 0.05. From the result shown in Fig. 2, it isfound that the optimum probability for applying error ruleswere different for each error rules. The optimum probabilityfor the revised rules was smaller than that for the conventionalrules. This difference seems to be caused by the fact that theproposed method employs larger variety of error rules than theprevious method. Fig. 3 shows the number of distinct sentencesin the generated text. For the variation of the sentencesgenerated by the conventional rules was narrower than that bythe revised rules, we needed to generate more errors to obtainsufficient number of distinct sentences when the conventionalrules were used. When the output probability of error wordswas 0.05, the number of distinct sentences by the revisedrules was about 200 more than that by the conventional ones.However, we observed deterioration of word accuracy, whichseems to be caused by too large vocabulary size.

Next, we evaluated the same results from the aspect ofrecall and precision of error detection. A detection result wasregarded as a misdetection when any of the following werewrong: word position, recognized word, or type of error. Therelationship between the recall and precision rate is shownin Fig. 4. The results showed that the revised rules producedbetter performance than conventional rules.

V. CONCLUSIONS

We have developed a method to recognize English ut-terances spoken by Japanese for development of a voice-interactive CALL system. We prepared target sentences withand without contractions. Then we applied the revised errorrules, which involved insertion errors or rules applied tomultiple words. The generated text using the proposed methodimproved the recognition performance and recall/precisionratio. Although we could improve the performance of recog-nition and error detection, the absolute performance of themethod is still low for applying an actual CALL system. Weneed further improve ment of the performance of the system

1000

10000

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

dist

inct

sen

tenc

e

output probability of error words

conventionalrevised

Fig. 3. Number of distinct sentences with respect to error probability

0.2

0.22

0.24

0.26

0.28

0.3

0.32

0.34

0.36

0.38

0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

prec

isio

n ra

tio

recall ratio

conventionalrevised

Fig. 4. Recall and precision results

by using more corpus data and improving the error rules.

ACKNOWLEDGMENT

The present study was supported in part by a JSPS Grant-in-Aid for Scientific Research (B) 20320075.

REFERENCES

[1] M. Eskenazi, “Detection of foreign speakers’ pronunciation errors forsecond language training - preliminary results”, Proc. ICSLP, Philadel-phia, 1465-1468, 1996.

[2] S. Nakagawa, N. Nakamura and K. Mori, “A statistical method of evalu-ating pronunciation proficiency for English words spoken by Japanese”,IEICE Trans. Inf. and Syst., E87-D(7), 1917-1922, 2004.

[3] A. Ito, Y. Lim, M. Suzuki and S.Makino, “Pronunciation error detectionfor computer-assisted language learning system based on error ruleclustering using a decision tree”, Acoust. Sci. and Tech., 28(2), 131-133, 2007.

[4] M. Eskenazi, “Using automatic speech processing for foreign languagepronunciation tutoring : some issues and prototype”, Language Learningand Technology, 2(2), 62-76, 1999.

[5] O. P. Kweon, A. Ito, M. Suzuki and S. Makino, “A grammatical errordetection method for dialogue-based CALL system”, Journal of NaturalLanguage Processing, 12(4), 137-156, 2005.

[6] A. Ito, R. Tsutsui, M. Ito and S. Makino, “Recognition of Englishutterances with grammatical and lexical mistakes for dialogue-basedCALL system”, Proc. Interspeech, 2819-2822, 2008.

[7] Y. Tono, T. Kaneko, H. Isahara, T.Saiga and E. Izumi, “A 1 million-wordspoken corpus of Japanese learners of English and its implications forL2 lexicography”, Proc. 2nd Asialex International Congress, 257-262,2001.

[8] E. Brill, “A simple rule-based part of speech tagger”, Proc. ANLP-92,3rd Conf. on Applied Natural Language Processing, 152-155, 1992.

[9] Princeton University, “WordNet”, http://wordnet.princeton.edu/

485