Supporting e-learning with automatic glossary extraction Experiments with Portuguese

Supporting e-learning with

automatic glossaryextraction

Experiments with Portuguese

Rosa Del Gaudio, António BrancoRANLP, Borovets 2007

Presentation Plan

● LT4eL project● ILIAS● Corpus● Tool● Grammars

● Copula● Other Verbs● Punctuation

● Results● Conclusion

LT4eL● Improve retrieval and accessibility of LO in learning management systems●Employ language technology resources and tools for the semi-automatic generation of descriptive metadata .

●Develop new functionalities such as a key word extractor and a glossary candidate detector, semantic search, tuned for the various languages addressed in the project (Bulgarian, Czech, Dutch, English, German, Maltese, Polish, Portuguese, Romanian).

Objective

● Build a Glossary in an automatic way to support e-learning process. In practice this means to extract a definition from unstructured text (scientific papers, enciclopedia, web pages)

● Better access to information for student ●Accelerate the work of the tutor

ILIAS: Glossary Candidate Detector

The Corpus

• 274.000 tokens • Tutorials

• PhD Thesis

• Scientific papers

• 3 Domains evenly represented

• e-learning

• Technology for non experts

• Calimera

XML format

<definingText continue="y" def="m147" def_type1="is_def" id="d5"><markedTerm dt="y" id="m147" kw="y"><tok base="intranet" class="word" ctag="PNM" id="t9032" sp="y">Intranet</tok></markedTerm><tok base="ser" class="word" ctag="V" id="t9033" msd="pi-3s" sp="y">é</tok><tok base="uma" class="word" ctag="UM" id="t9034" msd="fs" sp="y">uma</tok><tok base="rede" class="word" ctag="CN" id="t9035" msd="fs" sp="y">rede</tok><tok base="desenvolver,desenvolvido" class="word" ctag="PPA" id="t9036" msd="fs"

sp="y">desenvolvida</tok><tok base="para" class="word" ctag="PREP" id="t9037" sp="y">para</tok><tok base="processamento" class="word" ctag="CN" id="t9038" msd="ms"

sp="y">processamento</tok><tok base="de" class="word" ctag="PREP" id="t9039" sp="y">de</tok><tok base="informação" class="word" ctag="CN" id="t9040" msd="fp"

sp="y">informações</tok><tok base="em" class="word" ctag="PREP" id="t9041" sp="y">em</tok><tok base="uma" class="word" ctag="UM" id="t9042" msd="fs" sp="y">uma</tok><tok base="empresa" class="word" ctag="CN" id="t9043" msd="fs" sp="y">empresa</tok><tok base="ou" class="word" ctag="CJ" id="t9044" sp="y">ou</tok><tok base="organização" class="word" ctag="CN" id="t9045" msd="fs">organização</tok><tok class="punctuation" ctag="PNT" id="t9046" sp="y">.</tok></definingText>

LxTransduce

• Input: simple text or xml

• Regular expressions

• Substitution and markup

• Output the same file with changes

• Match tree using elements

• Quick

• Unicode friendly

• freeware

• Easy to integrate in other tools (java)

Rules in lxtransduce

<rule name="Conj"> <query match="tok[@ctag =

'CJ']"/></rule>

First developmentphase

● Less than 50% of the corpus● Focus on the verb● Precision: manually marked/all automatic● Recall: correct automatic/manually marked● F2 :3*(precision*recall)/2*precision+recall

0.220.200.31Gr 01

0.260.440.14Gr 00

F2RecallPrecision

Second developing phase

• 75% of the corpus for developing

• 25% of the corpus for testing

• Specific grammar/rules for each type

Copula baseline grammar

Verb “to be” third person singular or plural present indicative

Copula base result

• Sentence level results

• Problem with precision

Copula Grammar

Rules for is_type

<rule name="Serdef"> <querymatch="tok[@ctag = ’V’ and

@base=’ser’ and(@msd[starts-with(.,’fi-

3’ )]or @msd[starts-with(.,’pi-

3’ )])]</rule>....

<rule name="copula1"><seq><ref name="SERdef"/><best><seq><ref name="Art"/><ref name="adj|adv|prep|"

mult="*"/><ref name="Noun" mult="+"/></seq>....</best><ref name="tok" mult="*"/><end/></seq></rule>

Confronting Results

Include that patterns that were excluded

Try to gather the syntactic pattern of non definition and confront with the syntactic pattern of definition.

Other_Verbs grammar

• Collect verbs in a lexicon• Three different category:

reflexive, active, passive.• 22 different verbs

<rule name="Vpas"><seq><ref name="tok"/><not><ref name="not"/> </not><ref name="tok" mult="?"/><query match="tok[mylex(@base)

and (@ctag='PPA')]" constraint="mylex(@base)/cat='pas'"/>

</seq></rule>

Results for verb_type

• Analyze each verbs separately as with is_type

• Richer syntactic patterns

Punctuation Grammar

<rule name="punct_def"><seq><start/><ref name="CompmylexSN"

mult="+"/><query match="tok[.~’^:\$’]"/><ref name="tok" mult="+"/><end/></seq></rule>

●Preliminary work

●Definition introduced by colon mark (most frequent)

All-in-one

• Combination of the previous grammars

• The type is not take into account to calculate precision and recall

Conclusions and Future Work

• Overall results: Recall 86%, Precision 14%

• Difference among domains: the style of a document influence the result.

• Improve the rules for verb_type and punc_type

• Combining with other techniques such as ML

Supporting e-learning with automatic glossary extraction Experiments with Portuguese

Documents

Transcript of Supporting e-learning with automatic glossary extraction Experiments with Portuguese

Global History y Glossary English Portuguese Glossar · Pre‐Columbian civilizations civilizações Pré ‐Colombianas pre ‐history pré ‐história prejudice preconceito president

IMF Glossary · IMF glossary : English-French-Portuguese = Glossário do FMI : inglês-francês-português. – 1st ed. – Washington, D.C. : International Monetary Fund, 2007. p.

Intermediate School Level Glossary · 2018. 1. 19. · Intermediate School Level Glossary Science Glossary English / Portuguese Translation of Science Terms Based on the Coursework

Glossary of Nautical Terms: English – Portuguese Portuguese ...

airelimestones.comairelimestones.com/iconO2_Airelimestones_Apresentacao...AIRELIMESTONES has a vast knowledge and experience in the extraction and transformation of Portuguese limestones

A Rule-Based Meronymy Extraction Module for Portuguese · A Rule-Based Meronymy Extraction Module for ... tions these authors identify are member-of (player - team) ... A Rule-Based

Global History y Glossary English Portuguese Glossar · y High School Level Global History Glossary English | Portuguese Translation of Global History terms based on the Coursework

European portuguese vs brazilian portuguese differences or similarities

Brazilian Portuguese 1 - sns-production …sns-production-uploads.s3.amazonaws.com/.../Portuguese-Braz1-Bklt… · BRAZILIAN PORTUGUESE 1 Introduction Portuguese is an Ibero-Romance

Parleremo English-Portuguese Portuguese-English Dictionary 1ed

Elementary School Level Glossary - steinhardt.nyu.edu · Elementary School Level Glossary Mathematics Glossary English / Portuguese Translation of Mathematics Terms Based on the Coursework

Automated Extraction and Clustering of Requirements Glossary … · 2020. 5. 26. · requirements glossary for various reasons, e.g., to familiarize themselves with the technical

English Portuguese Glossary Nautical Terms

English–Portuguese Glossary — Glossário inglês-português · PDF fileA Affidavit Juramento African-Americans Americano Africano Aged, blind or disabled Idoso, cego ou incapacitado

2020 CENSUS GLOSSARY...2020 CENSUS GLOSSARY – ENGLISH TO PORTUGUESE U.S. Census Bureau – Issued 08/01/19 2 English Portuguese baby bebê basement porão billboard outdoor biological

DptOIE: A Portuguese Open Information Extraction system ...

English–Portuguese Glossary — Glossário inglês-português

Lets preserve our identity: building a Portuguese – English glossary of typical Brazilian cooking ingredients Stella E. O. Tagnin (University of São Paulo)

ITILV3 Glossary Brazilian Portuguese v3.1.24

The names of lighting artefacts: extraction and ...christophe-roche.fr/wp-content/uploads/2020/01/TAL...The names of lighting artefacts: extraction and representation of Portuguese