Transcript of Seven ways to make CALL more intelligent. Towards the effective integration of NLP techniques Piet...
- Slide 1
- Seven ways to make CALL more intelligent. Towards the effective
integration of NLP techniques Piet Desmet in collaboration with
Frederik Cornillie, Ruben Lagatie, Sien Moens, Maribel Montero
Perez, Hans Paulussen & Serge Verlinde
- Slide 2
- 0. Introduction 0.1. Terminology Parser-based CALL (Holland et
al. 1993) NLP-enhanced CALL (Nerbonne 2005) Intelligent CALL or
iCALL ICALL Intelligent CALL is a field within CALL which applies
concepts, techniques, algorithms and technologies from AI to CALL
(). Most relevant to CALL is research in four branches of AI: (1)
natural language processing, (2) user modelling, (3) expert systems
and (4) intelligent tutoring systems. (Schulze 2010: 65) ->
progressively larger scope
- Slide 3
- 0.2. Challenges and opportunities Most of the CALL environments
do not use NLP -> use of NLP in classrooms is hardly mainstream
practice The development of systems using NLP technology is not on
the agenda of most CALL experts, and interdisciplinary research
projects integrating computational linguists and foreign language
teachers remain very rare (Amaral & Meurers 2011: 6). cf. also
report by the Dutch Language Union: Onderzoek taal- en
spraaktechnologie en onderwijs (Van den Heuvel, TSas & Verberne
2012) Technological concerns Pedagogical concerns
- Slide 4
- Technological concerns Can NLP account for the full complexity
of natural human languages I am pessimistic about the possibility
of ICALL (Nyns 1989: 46) + in educational settings, no room for
erroneous analyses -> need for nearly error-free applications
But: too limited accuracy of NLP-tools -> risk of mislearning of
L2 http://www.flickr.com/photos/maphutha/1303854829 /
- Slide 5
- It is therefore of the utmost importance to warn the users of
the limitations of the tools. Ideally, inadequate outputs provided
by the NLP tools should make the learners reflect even more on the
language they are learning and on its structure. Therefore, even
NLP tool errors should help the learners master the target language
because they require more thinking on their part. Thus, NLP tools
can be useful for CALL and usefully used in CALL. The present,
imperfect state of technology should not be a hindrance, although
there is obviously room for improvement. However, improvements are
often triggered by remarks and studies on how tools effectively
work in context. Thus, if one waits to see improvements before
using NLP tools in CALL software, one might have to wait for a long
time, while using the tools in their present state will encourage
improvements to be made according to the needs of the language
learners (Vandeventer 2003 Linguistik online 17 Learning and
Teaching (in) Computational Linguistics) -> users are
intelligent and undemanding (BUT: for all language levels?) + ICALL
stimulates reflective practice -> Perhaps the best is yet to
come, but lets choose not to wait
- Slide 6
- The Penrose impossible stairs Computerpower doubles every 2
years Technological progress vs Pedagogical progress! Pedagogical
concerns
- Slide 7
- Initial NLP-enhanced CALL is mainly form-focused Also focus on
meaning & meaning-based activities in Foreign language learning
and teaching (FLLT) ICALL allows for the use of authentic materials
and skills-oriented activities (cf. communicative approach) BUT:
still appropriate task design needed in ICALL From a computational
perspective, a well-defined task design with its clear set of
relevant language constructions facilitates the restriction to a
linguistic domain which is manageable for a systems natural
language processing modules (Schulze 2010: 79) In FLLT focus on
authentic language tasks with fully open & unpredictable
interaction in real life settings (cf. task-based language
teaching) -> challenges for ICALL!
- Slide 8
- 0.3. Towards a typology (a)Linguistic perspective: type of
language involved Written vs spoken, native vs learner language,
etc. (b)Technological perspective: NLP & AI-technologies
involved Parsing, NER, topic detection, sentiment mining, text
summarization, etc. (c)Pedagogical perspective: type of learning
activities involved Reception, production, interaction, mediation
(d)CALL perspective: type of technology-based learning or teaching
activities
- Slide 9
- CALL perspective: type of technology-based learning or teaching
activities From receptive to productive activities with focus on
written language input & output 1.Input provider 2.Reading
companion 3.Exercise and test generator 4.Error detector, feedback
generator and automatic scoring tool 5.Writing aid 6.Adaptive item
sequencer 7.Resource generator -> Conceptual outline +
applications from academic R&D
- Slide 10
- 1. Input provider What? (Semi-)automated selection of
comprehensible & authentic text material How? a. analysis of
readability and formal complexity + syntactic and lexical text
simplification/elaboration b. analysis of meaning or text
categorization (e.g. subject categorization, topic detection, text
categorization, etc.) Seven roles for ICALL applications
- Slide 11
- a. Text retrieval on the basis of readability evaluation of
input REAP-project (Carnegie Mellon) http://reap.cs.cmu.edu
Reader-Specific Lexical Practice for Improved Reading
Comprehension: support learners in searching for texts that are
well-suited for providing vocabulary and reading practice. +
analysis of formal complexity of input SATO-project (Franois
Daoust) http://www.ling.uqam.ca/sato Systme danalyse de texte par
ordinateur + SATO-Calibrage Automated formal analysis of a corpus
(existing or personal)
- Slide 12
- http://cental.uclouvain.be/amesure/
- Slide 13
- b. analysis of meaning & text categorization Sien Moens
(2009)
- Slide 14
- 2. Reading companion What? helping learners understand
foreign-language input How? annotation layers, both formal and
semantic GLOSSER-project (John Nerbonne)
http://www.let.rug.nl/glosser/Glosser lemmatization of the
inflected forms dictionary entry (cf. Van Dale) examples of the
word from corpora iRead+ project (ITEC) Lemmatization &
POS-tagging Named entity recognition (persons, organisations,
locations)
- Slide 15
- GLOSSER-project Lemmatization of inflected forms Dictionary
entry (Van Dale) Examples from corpora
- Slide 16
- Slide 17
- Vb Anderlecht Taalk. /3
- Slide 18
- Slide 19
- 3. Exercise and test generator What?(semi-)automated generation
of exercise and test items How?based on the analysis of L1 text
materials and/or on the analysis of learner errors -
morpho-syntactical activities: iRead+ project (ITEC) VIEW-project
(Detmar Meurers) http://sifnos.sfs.uni-tuebingen.de/VIEW
http://sifnos.sfs.uni-tuebingen.de/VIEW Visual Input Enhancement of
the Web - lexical activities: Alfalex-project (Serge Verlinde)
http://ilt.kuleuven.be/publicaties/tools.php - semantic activities:
cf. Sien Moens - semantic frame labeling
- Slide 20
- iRead+ Exercise generator: Three step model Explore
1.Application highlights target items 2.Learner recognizes target
items and clicks on target items Practice Fill gaps or multiple
choice activities Feedback Play Target items, drill & practice
Time constraint
- Slide 21
- Slide 22
- Alfalex (KU Leuven Serge Verlinde)
- Slide 23
- Slide 24
- Semantic frame labeling Sien Moens (2009)
- Slide 25
- 4. Error detector, feedback generator and automatic scoring
tool What? Automated analysis of learner output and generation of
feedback Not limited to closed exercises (MC, fill-in-the-blank,
etc.) but also (semi-)open practice tasks (translation, correction,
rephrasing, etc.) How? 1) Approximate (or fuzzy) string matching
(ASM) -> anticipate all potential well-formed and ill-formed
learner responses (with inclusion of regular expressions) 2)
Parser-based (malrules, constraint relaxation, etc.) and/or
statistical and machine learning methods
- Slide 26
- Slide 27
- Looking at the number of publications shown in this figure,
parser-based CALL has dominated research up until 2005, but there
seems to have been a shift in research interest and statistical
error detection systems have since then taken the upper hand. PhD
Ruben Lagatie
- Slide 28
- BUT: fully open-ended learner output hard to manage ->
constrain learner output (reduce possible answers) by reducing the
search space of the processing tools How? task: structured towards
specific type of output - require the learner to use certain
language material e-tutor-project (Trude Heift)
http://www.e-tutor.sfu.ca translation Tagarela-project (Detmar
Meurers) http://sifnos.sfs.uni-tuebingen.de/tagarela answer should
include certain words - the task design offers clues towards a
limited set of possible answers Dialog Dungeon (ITEC) gamified
written dialog tasks correction: focus on specific set of problems
(e.g. past tenses, subject-verb agreement, etc.)
- Slide 29
- E-tutor project
- Slide 30
- Tagarela-project
- Slide 31
- social log-in Dialog Dungeon gamified written dialogue tasks
@Frederik Cornillie semi-open written exercise learning support:
link to responses of peers
- Slide 32
- learning support: responses of peers
- Slide 33
- focused exercise on tenses
- Slide 34
- typical error
- Slide 35
- contrastive analysis learner reponse closest correct match
(based on approximate string matching, POS tagging,
lemmatization)
- Slide 36
- supportive prompt with metalinguistic hints (based on POS
tagging)
- Slide 37
- 5. Writing aid What?Support the learner in writing a
functional, well-formed tekst How? No automated correction, but
suggestions to help the learner improve himself his output
Post-writing checks (or on-the-fly prompts) Assistance on
lower-order skills (spelling) but also on lexical (including MWE),
lexico-grammatical and grammatical skills Interactive Language
Toolbox-project (Serge Verlinde) https://ilt.kuleuven.be/inlato Bon
patron, SpellCheckPlus & SpanishChecker (Nadaclair Language
Technologies) http://bonpatron.com; http://spellcheckplus.com;
http://spanishchecker.comhttp://bonpatron.comhttp://spellcheckplus.comhttp://spanishchecker.com
+ semantic and pragmatic analysis Glosser-project (Univ. of Sydney)
http://www.glosserproject.org/en/
- Slide 38
- Interactive Language Toolbox
- Slide 39
- Slide 40
- Slide 41
- Glosser project (Sydney)
- Slide 42
- 6. Adaptive item sequencer (a)Adaptivity: to what? - difficulty
level of the tasks - learner profile (prior knowledge, motivation,
cognitive load, interests & preferences) - context (time and
place, device, etc.) (b) What to adapt? - adaptive form
representation (form in which content is presented to the learner,
e.g. dynamically generated hypermedia pages) - adaptive content
representation (e.g. with or without learner support) - adaptive
curriculum sequencing (e.g. selection of items in function of
difficulty level, learner profile, etc.) (c) How to implement
adaptivity? - full program control (via reasoning component if..
then rules) - full learner control - shared control Wauters, K.,
Desmet, P., Van Den Noortgate, W. (2010). Adaptive Item-Based
Learning Environments Based on the Item Response Theory:
Possibilities and Challenges. Journal of Computer Assisted
Learning, 26 (6), 549-562.
- Slide 43
- What about AI? (a)Adaptivity: to what? - linguistic complexity
-> language agent (expert system) - item difficulty -> IRT -
learner profile (prior knowledge, motivation, cognitive load,
interests & preferences) -> student agent (b) What to adapt?
- adaptive curriculum sequencing (e.g. selection of items in
function of difficulty level, learner profile, etc.) -> tutor
agent (c) How to implement adaptivity? full or shared control ->
tutor agent Beuls, Katrien (2013) Processing, learning and tutoring
of Spanish verb morphology. Brussels: VUB. (PhD)
- Slide 44
- 7. Resource generator What? creating reference materials:
concordancing on bilingual corpora REBECA-project (ITEC) Ressources
lectroniques bilingues extraites de corpus aligns & more
advanced search engines DPC-project (ITEC)
http://www.kuleuven-kulak.be/DPC/http://www.kuleuven-kulak.be/DPC/
Dutch Parallel Corpus corpus-enriched learner dictionaries &
grammars BLF-project (Serge Verlinde)
http://ilt.kuleuven.be/blf/http://ilt.kuleuven.be/blf/ Base
Lexicale du Franais How?Annotation & exploitation of corpora
For whom? (Intermediate or) advanced learners with well developed
linguistic awareness
- Slide 45
- REBECA-project
- Slide 46
- DPC-project
- Slide 47
- 16 June 201247 Example: en ralit French-Dutch
- Slide 48
- 16 June 201248 Example: en ralit French - Dutch
- Slide 49
- Extension: video search Exploitation of XML-based captions
(transcriptions and translations) to render video selection more
versatile and attractive: lemmas are time tags indexes which form
reference points to video selections
- Slide 50
- 00:00:09.6800 a
00:00:24.8900|00:00:28.8400|00:00:59.9000|00:01:12.7000 alors
00:00:24.8900 assez 00:00:18.0300|00:00:31.9000 au 00:00:38.3000
aussi 00:00:38.3000 aux 00:00:09.6800|00:01:05.8100 bas
00:00:35.6500 beaucoup 00:00:24.8900|00:00:55.0000|00:00:59.9000
bonnes 00:00:31.9000 calais 00:00:51.1200 ce 00:01:05.8100 celle
00:00:42.0800 cest 00:00:09.6800|00:00:24.8900 cette
00:00:06.6400|00:01:05.8100|00:01:12.7000 chefs 00:01:16.0000
chercher 00:00:18.0300|00:00:31.9000 chou 00:00:04.0000 choux
00:01:05.8100 club 00:01:16.0000 comme 00:00:06.6400 comment
00:00:18.0300 composer 00:00:42.0800 cuisine
00:00:04.0000|00:00:14.1800|00:01:12.7000 curiosit 00:00:38.3000
dans 00:00:18.0300|00:00:18.0300|00:00:24.8900|00:00:28.8400|
00:00:31.9000|00:00:35.6500|00:00:47.2800|00:01:16.0000|00:01:19.9500
Viewing video extracts based on lexical search in transcription
and/or translation (challenge: search on semantically related
words)
- Slide 51
- Conclusion Digital (learning) is the new normal a must trivial
mainstream Its not about technology. Its about usage & added
value! Usage: ICALL should enter our classrooms + be integrated
into applications Added value: ICALL is a (potentially) powerful
means to realise the main objective: improve learning ->
qualified optimism about NLP-enhanced CALL