CLARIN-PL
Language Technology for Polish in Practice Systems supporting development of resources
Maciej Piasecki, Marek Maziarz, Michał Marcińczuk, Marcin Oleksy
Wrocław University of Science and Technology G4.19 Research Group
[email protected] 2017-01-17
Systems supporting development
§ Inforex § system for corpus construction, editing and verification § linguistic team managament
§ Complex system of tools for corpus-based, semi-automatic wordnet development § Morpho-syntactic preprocessing § Extraction of Multiword Expressions with MeWeX (Maziarz et al.,
2015), (Piasecki et al., 2015) § SuperMatrix – extraction of lemmas and statistics from corpora, and
Measures of Semantic Relatedness (Broda & Piasecki, 2013) § LexCSD – identification and extraction of usage examples (Broda &
Piasecki, 2011) § Corpus browsing, e.g. NoSketch https://nlp.fi.muni.cz/trac/noske § WordnetLoom 2.0 – wordnet editing, verification, group working
(Piasecki et al., 2013b) § WordnetWeaver – semi-automatic wordnet expansion (Piasecki et
al., 2013a)
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
plWordNet development proces
plWordNet Corpus 7.0 2 billion tokens
plWordNet Corpus – a merged corpus:• available Polish corpora:
• Corpus IPI PAN • Rzeczypospolita Corpus • Wikipedia (2015)
• Texts on open licence • Text collected from Internet
• larger texts • Max. 20% tokens not recognised by Morfeusz
• The version 7.0: ~ 2 billion tokens • The version 10.0: >4 billion tokens (for plWordNet 4.0)
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Cf (Maziarz et al., 2013)
plWordNet development proces
plWordNet Corpus 7.0 2 billion tokens
List of entries (most frequent lemmas)
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
plWordNet development proces
Identification of meanings
plWordNet Corpus 7.0 2 billion tokens
List of entries (most frequent lemmas)
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
plWordNet development proces
software tools Identification of meanings
plWordNet Corpus 7.0 2 billion tokens
List of entries (most frequent lemmas)
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
plWordNet development proces
corpus concordancer
software tools Identification of meanings
plWordNet Corpus 7.0 2 billion tokens
List of entries (most frequent lemmas)
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
plWordNet development proces
Korpus Słowosieci 2 mld tokenów
siatka haseł (słowa najczęstsze)
wyróżnić znaczenia konkordancer korpusu
narzędzia komputerowe
automatyczne przykłady użycia
NoSketch Engine
Inforex
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
plWordNet development proces
corpus concordancer
software tools Identification of meanings
plWordNet Corpus 7.0 2 billion tokens
List of entries (most frequent lemmas)
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
plWordNet development proces
automated extraction of usage examples
corpus concordancer
software tools Identification of meanings
plWordNet Corpus 7.0 2 billion tokens
List of entries (most frequent lemmas)
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
plWordNet development proces
Korpus Słowosieci 2 mld tokenów
siatka haseł (słowa najczęstsze)
wyróżnić znaczenia konkordancer korpusu
narzędzia komputerowe
automatyczne przykłady użycia
n.a. - przykłady użycia -> wyróżnianie znaczeń, przykłady typowe, 10 znaczeń (Marek)
`o zwierzętach: gryźć używając zębów, powodując rany’ `o zjawiskach pogodowych (np. mrozie): gryźć, szczypać’
Usage examples for kąsać
`o owadach: gryźć’ `o zmartwieniach, wyrzutach sumienia: gryźć’ `o ludziach: dokuczać, szkodzić komuś’
1 2 3 4 5 6 7 8 9
10
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
plWordNet development proces
automated extraction of usage examples
corpus concordancer
software tools Identification of meanings
plWordNet Corpus 7.0 2 billion tokens
List of entries (most frequent lemmas)
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
plWordNet development proces
plWordNet Team guidelines
automated extraction of usage examples
corpus concordancer
software tools Identification of meanings
plWordNet Corpus 7.0 2 billion tokens
List of entries (most frequent lemmas)
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
plWordNet development proces
dictionaries, encyclopaedias, lexicons…
automated extraction of usage examples
corpus concordancer
software tools Identification of meanings
plWordNet Corpus 7.0 2 billion tokens
List of entries (most frequent lemmas)
plWordNet Team guidelines
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
plWordNet development proces
software tools Identification of meanings
plWordNet Corpus 7.0 2 billion tokens
List of entries (most frequent lemmas)
defining lexical units
dictionaries, encyclopaedias, lexicons…
plWordNet Team guidelines
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
plWordNet development proces
relation assignment = linking to network
WordnetWeaver
software tools Identification of meanings
plWordNet Corpus 7.0 2 billion tokens
List of entries (most frequent lemmas)
defining lexical units
dictionaries, encyclopaedias, lexicons…
plWordNet Team guidelines
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
plWordNet development proces
Korpus Słowosieci 2 mld tokenów
siatka haseł (słowa najczęstsze)
wyróżnić znaczenia narzędzia komputerowe
słowniki, encyklopedie, leksykony…
zespół Słowosieci wytyczne
zdefiniować jednostkę
przypisać relacje = podpiąć
Tkacz Wordnetu
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
plWordNet development proces
relation assignment = linking to network
WordnetWeaver
software tools Identification of meanings
plWordNet Corpus 7.0 2 billion tokens
List of entries (most frequent lemmas)
defining lexical units
dictionaries, encyclopaedias, lexicons…
plWordNet Team guidelines
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
plWordNet development proces
Measure of Semantic Relatedness relation assignment =
linking to network
WordnetWeaver
software tools Identification of meanings
plWordNet Corpus 7.0 2 billion tokens
List of entries (most frequent lemmas)
defining lexical units
dictionaries, encyclopaedias, lexicons…
plWordNet Team guidelines
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
plWordNet development proces
antonym hypernym hyponym co-hyponym
closely related holonym
(Piasecki & Wendelberger, 2014)
Measure of Semantic Relatedness: results (generated by SuperMatrix)
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
plWordNet development proces
Measure of Semantic Relatedness relation assignment =
linking to network
WordnetWeaver
software tools Identification of meanings
plWordNet Corpus 7.0 2 billion tokens
List of entries (most frequent lemmas)
defining lexical units
dictionaries, encyclopaedias, lexicons…
plWordNet Team guidelines
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
plWordNet development proces
relation assignment = linking to network
software tools Identification of meanings
plWordNet Corpus 7.0 2 billion tokens
List of entries (most frequent lemmas)
defining lexical units
dictionaries, encyclopaedias, lexicons…
plWordNet Team guidelines
+ stylistic register + gloss + usage examples
concordancer extracted usage examples WordnetWeaver Measure of Semantic Relatedness
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
plWordNet development proces
relation assignment = linking to network
Identification of meanings
plWordNet Corpus 7.0 2 billion tokens
List of entries (most frequent lemmas)
defining lexical units
+ stylistic register + gloss + usage examples
• Intuition: linguist, team, • But controled by:
• guidelines • and substitution tests
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Definition of relations
Substitution Test for Hypernymy Condition:
Stylistic register of Y must be not lower in the register hierarchy than register of X.
Testing expressions: If she/it is X, then she/it must be Y If she/it is Y, then she/it need not be X If she/it is not Y, then she/it cannot be X
(Maziarz, Piasecki, Szpakowicz, Rabiega-Wiśniewska 2010)
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Definition of relations
Applying Hypernymy Test to a Pair Condition:
Both: ocean ‘ocean’ and zbiornik wodny ‘water basin’ are of the general stylistic register.
Testing expressions: If she/it is oceanem ‘ocean’, then she/it must be
zbiornikiem wodnym ‘water basin’ If she/it is zbiornikiem wodnym ‘water basin’, then she/
it need not be oceanem ‘ocean’ If she/it is not zbiornikiem wodnym ‘water basin’, then
she/it cannot be oceanem ‘ocean’
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
WordnetLoom Samsung R&D
Institute Invit. Lecture
2017-01-17
CLARIN-PL
Cf (Piasecki et al., 2013b)
plWordNet `Big Brother’ Samsung R&D
Institute Invit. Lecture
2017-01-17
CLARIN-PL
Nie można wyświetlić obrazu. Na komputerze może brakować pamięci do otwarcia obrazu lub obraz może być uszkodzony. Uruchom ponownie komputer, a następnie otwórz plik ponownie. Jeśli czerwony znak x nadal będzie wyświetlany, konieczne może być usunięcie obrazu, a następnie ponowne wstawienie go.
WordnetWeaver
§ Semi-automated wordnet expansion method § For new lemmas – not yet described in a wordnet § possible attachment synsets are automatically identified § and visually presented on the screen as wordnet subgraphs § Wordnet editors are free to make any action
§ Implemented as an extension to WordnetLoom
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Paintball: knowledge sources
§ Knowledge sources K1, … Ks extracted by different methods from the corpus
§ Ki = { <ln, lj, w>: ln – a new word, (not in the wordnet) lj – a wordnet word w – local weight (for the pair) }
§ weight(Ki) ∈ (0,1] – global weight (for the knowledge source)
(Piasecki et al., 2013a)
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Knowledge sources
§ Methods § Measure of Semantic Relatedness § Lexico-syntactic Patterns
§ specific – manually constructed § generic – automatically extracted
§ Classifiers based on Machine Learning § Only some of them produce probability values § Results: heterogeneous, partial, and imperfect – substantial
error level
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Knowledge sources: used in experiments § Hypernymy classifier (Snow et al. 2004)
§ trained on patterns in the corpus parsed by Minipar (Lin, 1993)
§ e.g. 〈feminism, movement, 1.0〉, 〈feminism, idea, 0.951〉, 〈feminism, study, 0.951〉, 〈feminism, theory, 0.948〉, 〈feminism, politics, 0.867〉, 〈feminism, relationship, 0.867〉
§ Cousin classifier § logistic regression applied to a Measure of Semantic
Relatedness § e.g 〈feminism, socialism, 0.204〉, 〈feminism, humanism,
0.207〉, 〈feminism, nationalism, 0.208〉, 〈feminism, liberalism, 0.207〉, 〈feminism, pacifism, 0.208〉, 〈feminism, anarchism, 0.205〉
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Paintball algorithm
§ Input: a wordnet, a new word and a set of Knowledge Sources
§ Output: a set of subgraps – attachment areas – with one synset marked in each
§ Idea § each knowledge source expresses some error level § knowledge source triples are not precise in pointing to
particular synsets § hits covers regions § spreading activation helps to analyse and combine the
delivered information
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Paintball Metaphor: initial state
nowy lemat
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Paintball Metaphor: hits from the knowledge sources
nowy lemat
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Paintball Metaphor: hits from the knowledge sources
nowy lemat
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Paintball Metaphor: hits from the knowledge sources
nowy lemat
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Paintball Metaphor: attachment area
nowy lemat
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Paintball: algorithm
Step 0 Setting up the initial state 1. Converting the synset graph into a graph of
lexical units –> table Q 2. ∀j∈J.Q[j] = supp(j, x) 3. for each j∈J
if Q[j ]) > τ0 T=append(T, j) 4. T = sort_descendingly(T) § where:
§ J – a set of lexical units (word+senses) § Q – graph nodes, supp() – sum of weights (support)
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Paintball: algorithm
Step 1 Spreading support across the graph 1. k = head(T) and T = tail(T ) 2. fitRep(k, x, supp(k, x))
spreading support for x from the node k to linked nodes 3. if not empty(T) then goto Step 1
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Paintball: algorithm: Step 1 § fitReplication(j, x, M, T) 1. if M < ε then return 2. for each p ∈ dsc(j)
fitRepTrans(p, x, fT (p, µ ∗ M ), [j])
§ fitRepTrans(p, x, M , T) 1. if M < ε then return 2. for each p’ ∈ dsc(p|1)
if not (p’|1 ∈ T ) fitRepTrans(p’, x, fI(p, p’, fT(p’, µ ∗ M)), [p’|1|T])
3. Q[p|1] = Q[p|1] + M
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Paintball: algorytm Step 2 Identifying attachment areas 1. Calculating synset support matrix F from Q 2. Indentifying connected wordnet subgraphs (activation
areas), such that Gm = {s ∈ Synsety : F[s] > τ3}
3. for each Gm score(Gm) = F[jm], where jm = maxj∈Gm.F[j]
4. Return Gm, such that score(Gm) > τ4, as attachment areas
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Evaluation: method
§ Evaluation by reconstruction § a word sample is removed from the wordnet § Paintball is applied to reattach the words
§ Data collected § histogram of path lengths between suggested synsets
and the original positions in a wordnet § paths of up to 5 links, including hyper/hyponymy links
with at most one final meronymic were considered
Samsung R&D Institute Invit. Lecture 2017-01-17
CLARIN-PL
Evaluation: method
§ Criteria § closest path: attachment proposition that is closest to
the original location § strongest suggestion: top scored § all suggestions
Samsung R&D Institute Invit. Lecture 2017-01-17
CLARIN-PL
Evaluation: experiment setup
§ Wikipedia corpus, including almost 1 billion words
§ Word sample § corpus frequency threshold for words: 200 § words that have at least 3 hypernymy links to the top synset § 1064 test words selected § margin of error 3% and 95% confidence level § frequent words ≥ 1000 § infrequent words ≤ 999
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Evaluation: baseline
§ Baseline: Probabilistic Wordnet Expansion (Snow, Jurafsky, & Ng, 2006) § lack of procedure for setting the values of parameters § selected experimentally:
§ minimal probability of evidence: 0.1, § inverse odds of the prior: k = 4, § maximum size of the cousins neighbourhood: (m, n) ≤ (3,3), § maximum links in hypernym graph: 10 § penalization factor: = 0.9
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Evaluation: Paintball parameters
§ Spreading start (τ0): 0.4 § Spreading stop (ε): 0.14 § Threshold for synset activation (τ3): 0.4 § Threshold for attachment areas (τ4): 0.8 § Spreading decay factor (µ): 0.65
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Results: straight path strategy
Method Hit distance
0 1 2 3 4 5 6 [0-2] ∑ PWE
Rare C 3.7 21.7 16.2 9.6 6.9 3.4 0.1 41.6 61.5
S 0.5 5.9 9.7 10.9 8.9 4.5 0.5 16.1 40.9 A 0.8 4.9 5.0 4.5 3.8 2.0 0.4 10.7 21.5
Freq. C 0.8 14.8 24.2 21.0 15.1 5.5 0.2 39.8 81.6
S 0.1 2.7 9.4 16.1 15.7 13.2 0.8 12.2 58.0 A 0.2 3.2 7.0 10.0 9.8 7.3 0.5 10.4 38.0
PB
Rare C 9.2 21.7 12.6 6.7 4.2 1.0 0.6 43.5 56.1
S 4.8 13.1 10.0 6.5 3.4 1.2 0.4 27.9 39.4
A 2.9 6.9 4.8 3.5 2.2 1.0 0.2 14.6 21.5 Freq. C 6.3 20.5 15.0 11.9 6.7 2.6 0.5 41.8 63.3
S 1.9 9.1 8.4 8.1 4.8 1.9 0.3 19.4 34.7
A 1.4 4.9 4.4 4.4 3.1 1.6 0.2 10.7 20.0
Samsung R&D Institute Invit. Lecture 2017-01-17
CLARIN-PL
Results: folded path strategy
Method Hit distance
0 1 2 3 4 ∑
PWE
Rare C 3.7 21.7 18.4 11.8 2.5 58.2
S 0.5 5.9 10.7 12.6 2.3 32.0
A 0.8 4.9 6.6 6.9 1.5 20.7
Freq. C 0.8 14.8 25.2 22.9 4.0 67.7
S 0.1 2.7 9.6 17.0 3.4 32.8
A 0.2 3.2 7.9 12.2 2.9 26.4
PB Rare C 9.2 21.7 21.9 10.7 1.9 65.5
S 4.8 13.1 15.3 13.1 1.5 47.9
A 2.9 6.9 14.7 13.2 1.7 39.4
Freq. C 6.3 20.5 20.7 18.6 2.8 68.8
S 1.9 9.1 11.5 13.5 3.1 39.2
A 1.4 4.9 8.4 11.6 2.3 28.5
Samsung R&D Institute Invit. Lecture 2017-01-17
CLARIN-PL
Results: coverage
§ For the straight path strategy § Coverage for words
§ PWE: propositions for 100% of words (freq. 100%) § Paintball: 63.15% of words (freq. 91.93%)
§ Recall for senses § PWE: 44.79% (freq. 43.93%) § Paintball : 24.66% (freq. 26.62%)
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Results: example
§ PWE suggestions for feminism {abstraction, abstract entity},
{entity}, {communication}, {group, grouping}, {state}
§ Paintball suggestions: {causal agent, cause, causal agency},
{change}, {political orientation, ideology, political theory}, {discipline, subject, subject area, subject field, field, field of study, study, bailiwick}, {topic, subject, issue, matter}
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Semi-automated Wordnet Expanssion: WordnetWeaver in Use
climbing
speedway
recreation
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Inforex History
Inforex – a system for construction, annotation and searching
text corpora (Marcińczuk et al., 2012) http://nlp.pwr.wroc.pl/inforex/ History: § Developed in WUST (G4.19) since 2010, § used:
§ In research projects: NEKST, SyNaT, CLARIN-PL § Individual research: M. Zaśko-Zielińska (językoznawstwo - listy pożegnalne
samobójców), Ł. Damurski (urbanistyka - dokumenty dotyczące polityki terytorialnej UE)
§ PhD thesises: B. Broda (WSD), M. Marcińczuk (NER, relacje semantyczne), A. Radziszewski (frazy składniowe), J. Kocoń (wyrażenia temporalne, wyznaczniki sytuacji)
§ Other research tasks: E. Kaczmarz (konwersacje z Facebooka), Bernaś (teksty w j. hebrajskim).
§ Interface to several corpora: § KPWr - Korpus Politechniki Wrocławskiej § CEN - korpus wiadomości ekonomicznych from Wikinews § PCSN - Polski korpus listów pożegnalnych samobójców
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Inforex Main features
§ http://inforex.clarin-pl.eu/ access for users with an account in DSpace
§ Accessible via web browser (Firefox is suggested) – does not require installation by the user, needs permanent access to Internet,
§ Integrated with DSpace (import/export of data), § Enables sharing data among users, § Access to data on the basis of authorisation related to
corpora and annotation layers, § Supports work on documents that are tagged (assumed
segmentation into tokens and sentences) and non-tagged § Provides visualisation of the document structure during
annotation,
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Inforex Visualisation of the document structure (1/2)
KPWr Rozmowy z Facebooka (E. Kaczmarz)
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Inforex Visualisation of the document structure (1/2)
PCSN (M. Zaśko-Zielińska) Teksty w j. hebrajskim (T. Bernaś)
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
KPWr Controlled state of the work (1/2)
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
KPWr Controlled state of the work (1/2)
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Inforex Metadata
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Inforex Content editing history
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Inforex Annotation, annotation schema
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Inforex Adding annotation to text
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Inforex Verification of annotation
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Inforex Lematisation
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Inforex Translation of phrases
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Inforex Normalisation of temporal expressions
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Inforex Adding relation links
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Inforex Relations – co-reference
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Inforex Word Sense Disambiguation
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Inforex Statistics – word frequency
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Inforex Browsing annotations
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Inforex Browsing annotations (translations)
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Inforex Browsing relation links
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Bibliography
§ Maziarz, M.; Szpakowicz, S. & Piasecki, M. (2015) A Procedural Definition of Multi-word Lexical Units. In Mitkov, R.; Angelova, G. & Boncheva, K. (Eds.) Proceedings of the International Conference Recent Advances in Natural Language Processing -- RANLP'2015, INCOMA Ltd. Shoumen, BULGARIA, 2015, 427-435 http://aclweb.org/anthology/R15-1056
§ Piasecki, M.; Wendelberger, M. & Maziarz, M. (2015) Extraction of the Multi-word Lexical Units in the Perspective of the Wordnet Expansion. In Mitkov, R.; Angelova, G. & Boncheva, K. (Eds.) Proceedings of the International Conference Recent Advances in Natural Language Processing -- RANLP'2015, INCOMA Ltd. Shoumen, BULGARIA, 2015, 512–-520 http://aclweb.org/anthology/R15-1067
§ Broda, B. & Piasecki, M. (2013) Parallel, Massive Processing in SuperMatrix -- a General Tool for Distributional Semantic Analysis of Corpora. International Journal of Data Mining, Modelling and Management, 2013, 5, pp. 1-19
§ Maziarz, M.; Piasecki, M.; Rudnicka, E. & Szpakowicz, S. (2013) Beyond the Transfer-and-Merge Wordnet Construction: plWordNet and a Comparison with WordNet. In Mitkov, R.; Angelova, G. & Boncheva, K. (Eds.) Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, INCOMA Ltd. Shoumen, BULGARIA, 2013, 443-452 http://aclweb.org/anthology/R13-1058
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
Bibliography
§ Piasecki, M. & Wendelberger, M. (2014) Partial Measure of Semantic Relatedness Based on the Local Feature Selection. In Sojka, P.; Horák, A.; Kopecek, I. & Pala, K. (Eds.) Text, Speech and Dialogue - 17th International Conference, TSD 2014, Brno, Czech Republic, September 8-12, 2014. Proceedings, Springer, 2014, 8655, 336-343
§ Piasecki, M.; Ramocki, R. & Kaliński, M. (2013a) Information Spreading in Expanding Wordnet Hypernymy Structure. In Mitkov, R.; Angelova, G. & Boncheva, K. (Eds.) Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, INCOMA Ltd. Shoumen, BULGARIA, 2013, 553-561, http://aclweb.org/anthology/R13-1073
§ Piasecki, M.; Marcińczuk, M.; Ramocki, R. & Maziarz, M. (2013b) WordnetLoom: a Wordnet Development System Integrating Form-based and Graph-based Perspectives. International Journal of Data Mining, Modelling and Management, 2013, 5, 210-232
§ Broda, B. & Piasecki, M. (2011) Evaluating LexCSD in a Large Scale Experiment Control and Cybernetics, Vol. 40, 419-436.
§ Maciej Piasecki, Łukasz Burdka, Marek Maziarz, Michał Kaliński. (2016) In Zygmunt Vetulani, Hans Uszkoreit, Marek Kubis (Eds.)Human Language Technology. Challenges for Computer Science and Linguistics. Volume 9561 of the series Lecture Notes in Computer Science pp 255-273. http://link.springer.com/chapter/10.1007/978-3-319-43808-5_20
§ Marcińczuk, M., Kocoń, J. & Broda, B (2012). Inforex — a web-based tool for text corpus management and semantic annotation. In Calzolari, N., et al (editors), Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012), pages 224-230. Istanbul, Turkey : European Language Resources Association (ELRA). https://www.researchgate.net/publication/308886657_Inforex_-_a_web-based_tool_for_text_corpus_management_and_semantic_annotation
Samsung R&D Institute
Invit. Lecture 2017-01-17
CLARIN-PL
CLARIN-PL
Thank you very much for your attention! www.clarin-pl.eu
Supported by the Polish Ministry of Science and Higher Education [CLARIN-PL]
Top Related