Tips and Tricks … with INTEX/NOOJ

Post on 11-Jan-2016

45 views 0 download

Tags:

description

Tamás Váradi Institute for Linguistics Research Hungarian Academy of Sciences varadi@nytud.hu. Tips and Tricks … with INTEX/NOOJ. Max Silberztein University of Franche-Comte max.silberztein@univ-fcomte.fr. Outline. Why INTEX/NOOJ should be a tool of choice? raising language awareness - PowerPoint PPT Presentation

Transcript of Tips and Tricks … with INTEX/NOOJ

Tips and Tricks … with INTEX/NOOJ

Tamás VáradiInstitute for Linguistics ResearchHungarian Academy of Sciences

varadi@nytud.hu

Max SilberzteinUniversity of Franche-Comte

max.silberztein@univ-fcomte.fr

Outline● Why INTEX/NOOJ should be a tool of choice?● raising language awareness● studying linguistics

– lexical analysis● morphology

– paradigms– word formation

● automatic lexical acquisition– syntax

● local grammars– semantic tagging

List of useful features

● instant lexical lookup● linguistically sophisticated lexicon● intuitive graphical interface● fast, robust, finite-state technology

● corpus, lecxicon, grammar handled uniformly● instant confirmation from corpus● can be used at different levels of competence

● simple corpus query tool● grammar development environment● research tool for NLP projects

Morphology I - Inflection

paradigms handled in the form of fst’s

Morphology I - Inflection

stem variants processed with operations on strings

L = move left erasing character

Morphology II derivation

● All the formsderived fromthe root ‘fran-’

● Ideal to learnand experimentwith morphologicalsegmentation

Automatic lexical extraction

Store any sequence of letters, which is

followed by –ize or –ify in variable $Root

Produce the lexical entry:wordform: $Root+$Suf,lemma:$Rootpart of speech:Vsynsem:+V

Lexical constraints

check if the string stored in $Root is in the lexicon

as an A, with feature +Nation

Produce the lexical entry:wordform: $Root+$Suf,lemma:$Rootpart of speech:Vsynsem:+V

Syntax

● grammars defined in graphs relying on info stored in the lexicon (minimally lemma and POS)

Instant feedback from corpus

Labelled bracketing

● hit strings may be tagged (merge mode)● [NP a soft, slow step NP]

● or replaced with bracketing● [NP NP]

Disambiguation

● Very – Adjective or Adverbs

Recursion – embedded graphs

An exercise in semantic tagging

● Expressions of time

An exercise in semantic tagging

● Expressions of time

Finally, not for the faint hearted …

● the big picture

Conclusions● Teaching linguistic analysis by doing it● INTEX/NooJ is [det THE] technology to use

honestly…

All welcome to have a go at it

Thank you for your attention!