Nlp for Semantic Web

download Nlp for Semantic Web

of 78

Transcript of Nlp for Semantic Web

  • 8/3/2019 Nlp for Semantic Web

    1/78

    Copyright 2009 Digital Enterprise Research Institute. All rights reserved

    Digital Enterprise Research Institute www.deri.ie

    Natural Language Processing

    - for the Semantic Web -

    Paul Buitelaar

  • 8/3/2019 Nlp for Semantic Web

    2/78

    Digital Enterprise Research Institute www.deri.ie

    2

    SemanticWebChallenge:LegacyData

    LinkedData LegacyDataUnstructured,Un-Linked

  • 8/3/2019 Nlp for Semantic Web

    3/78

    Digital Enterprise Research Institute www.deri.ie

    3

    SemanticAnalysisofUnstructuredLegacyData Examplesin:SemanticSearch,Ontology-basedInformationExtraction,

    OntologyLearning

    NLPLayerCakewithPointers PartofSpeechTagging,Morphology,PhraseStructure SemanticTagging

    WordNet,FrameNet,WordSenseDisambiguation, NamedEntities,Terms,Thesauri,Ontologies, AdvancedTopic:Ontology-LexiconInterface

    GrammaticalFunctions,DependencyStructure,DiscourseAnalysis

    FurtherRelevantPointers GeneralTools,Organizations,Conferences,Journals,Sites,Lists,

    OverviewoftheTutorial

  • 8/3/2019 Nlp for Semantic Web

    4/78

    Digital Enterprise Research Institute www.deri.ie

    4

    MachineLearning in/forTextMining:FeatureExtractioninClustering,Classification,

    TextMiningin/forInformationRetrievalSeeTutorialonInformationMining byConorHayesDERIStreamonSemanticInformationMining bringstogetherNaturalLanguageProcessingandInformationMining

    WhattheTutorialwillnotaddress

  • 8/3/2019 Nlp for Semantic Web

    5/78

    Digital Enterprise Research Institute www.deri.ie

    5

    SemanticAnalysisofUnstructuredLegacyData Examplesin:SemanticSearch,Ontology-basedInformationExtraction,OntologyLearning

    NLPLayerCakewithPointers PartofSpeechTagging,Morphology,PhraseStructure SemanticTagging

    WordNet,FrameNet,WordSenseDisambiguation, NamedEntities,Terms,Thesauri,Ontologies, AdvancedTopic:Ontology-LexiconInterface

    GrammaticalFunctions,DependencyStructure,DiscourseAnalysis

    FurtherRelevantPointers Conferences,Journals,Websites,MailingLists,

    OverviewoftheTutorial

  • 8/3/2019 Nlp for Semantic Web

    6/78

    Digital Enterprise Research Institute www.deri.ie

    6

    SemanticAnnotation&Search

    MuchMore(2000-2003) Semanticannotationofasetofmedicalscientificabstracts&patientrecordsasqueriesacross

    languages(English,German)

    OntologyLearning OntoLT(2004-2005) Extractionofclasses,subclassesandrelations(objectproperties)fromalinguisticallyannotated

    documentset

    Ontology-basedInformationExtraction SmartWeb(2005-2008) Extractionofentitiesandeventsfromasetoffootballmatchreports

    SemanticVideoBrowsing K-Space(2007-2008) Extractionofentitiesandeventsfromasetoffootballmatchreports,alignedwithfootballvideo,

    enablingsemantic-levelvideoindexingandbrowsing

    OpenCalais,GIST Industrialstrengthopensource/commercialsemanticannotation&retrieval

    SomeExampleApplications

  • 8/3/2019 Nlp for Semantic Web

    7/78

    Digital Enterprise Research Institute www.deri.ie

    7

    SemanticAnnotation&SearchMuchMore

  • 8/3/2019 Nlp for Semantic Web

    8/78

    Digital Enterprise Research Institute www.deri.ie

    8

    SemanticAnnotation&SearchMuchMore

  • 8/3/2019 Nlp for Semantic Web

    9/78

    Digital Enterprise Research Institute www.deri.ie

    9

    SemanticAnnotation&SearchMuchMore

  • 8/3/2019 Nlp for Semantic Web

    10/78

    Digital Enterprise Research Institute www.deri.ie

    10

    SemanticAnnotation&SearchMuchMore

  • 8/3/2019 Nlp for Semantic Web

    11/78

    Digital Enterprise Research Institute www.deri.ie

    11

    SemanticAnnotation&SearchMuchMore

  • 8/3/2019 Nlp for Semantic Web

    12/78

    Digital Enterprise Research Institute www.deri.ie

    12

    SemanticAnnotation&SearchMuchMore

  • 8/3/2019 Nlp for Semantic Web

    13/78

    Digital Enterprise Research Institute www.deri.ie

    13inguisticStructure2OntologyMappingRules

    OntologyLearningOntoLT

  • 8/3/2019 Nlp for Semantic Web

    14/78

    Digital Enterprise Research Institute www.deri.ie

    14 Extraction&InspectContexts

    OntologyLearningOntoLT

  • 8/3/2019 Nlp for Semantic Web

    15/78

    Digital Enterprise Research Institute www.deri.ie

    15 ExtractOntologyFragments

    OntologyLearningOntoLT

  • 8/3/2019 Nlp for Semantic Web

    16/78

    Digital Enterprise Research Institute www.deri.ie

    16

    OntologyLearningOntoLTGerman Clinical Report:

    An 40 Kniegelenkprparaten wurden mittlere Patellarsehnendrittel mit einerneuen Knochenverblockungstechnik in einem

    zweistufigen Bohrkanal bzw. mitkonventioneller Interferenzschraubentechnikfemoral fixiert.

    English Translation:

    In 40 human cadaver knees, mid patellarligament thirds were fixedwith a trapezoidbone block on one side on the femoral side

    in a two-level drill hole, or with aconventional interference screw.

    LinguisticAnnotation(fragments)

  • 8/3/2019 Nlp for Semantic Web

    17/78

    Digital Enterprise Research Institute www.deri.ie

    17

    Ontology-basedInformationExtractionSmartWeb

    semistruct#Deutschland_vs_Brasilien_30_Juni_2002_18:00[

    sportevent#matchEvents -> soba#ID11 ].

    soba#ID11:sportevent#Parry[

    sportevent#committedBy ->

    semistruct#Deutschland_vs_Brasilien_30_Juni_2002_18:00_Oliver_Kahn_PFP ].

    Information Extraction

    Ontology Population

    Oliver Kahn konnte den Schuss von Beto halten.

    Oliver Kahn could stop the shot by Beto.

  • 8/3/2019 Nlp for Semantic Web

    18/78

    Digital Enterprise Research Institute www.deri.ie

    18

    Ontology-basedInformationExtractionSmartWeb

  • 8/3/2019 Nlp for Semantic Web

    19/78

    Digital Enterprise Research Institute www.deri.ie

    19

    A/V Feature

    Analysis

    ExtractedEntities and Events

    Minute-by-Minute Match Reports

    Non-Linear Event and Entity Browsing

    SemanticVideoBrowsingK-Spacehttp://keg.vse.cz/wf/kspace/smil/

  • 8/3/2019 Nlp for Semantic Web

    20/78

    Digital Enterprise Research Institute www.deri.ie

    20

    IndustrialApplicationsGIST(CALAIS)

  • 8/3/2019 Nlp for Semantic Web

    21/78

    Digital Enterprise Research Institute www.deri.ie

    21

    Open Calais Extracts Entities, Facts

  • 8/3/2019 Nlp for Semantic Web

    22/78

    Digital Enterprise Research Institute www.deri.ie

    22

    With a split decision in thefinal two primaries and a

    flurry of superdelegateendorsements, Sen. BarackObama sealed theDemocratic presidentialnomination Tuesday nightafter a grueling, history-making campaign that willmake him the first AfricanAmerican to head a major-

    party ticket.

    Before a chanting, cheering

    audience in St. Paul, Minn.,the first-term Illinois senatorsavored what once seemedan unlikely outcome to theDemocratic race againstSen. Hillary RodhamClinton. He now facesanother hard-fought battle,against Sen. John McCain,the presumptive Republicancandidate.

    Open Calais Extracts Entities, Facts

  • 8/3/2019 Nlp for Semantic Web

    23/78

    Digital Enterprise Research Institute www.deri.ie

    23

    SemanticAnalysisofUnstructuredLegacyData Examplesin:SemanticSearch,Ontology-basedInformationExtraction,

    OntologyLearning

    NLPLayerCakewithPointers PartofSpeechTagging,Morphology,PhraseStructure

    SemanticTagging WordNet,FrameNet,WordSenseDisambiguation, NamedEntities,Terms,Thesauri,Ontologies, AdvancedTopic:Ontology-LexiconInterface

    GrammaticalFunctions,DependencyStructure,DiscourseAnalysis

    FurtherRelevantPointers Conferences,Journals,Websites,MailingLists,

    OverviewoftheTutorial

  • 8/3/2019 Nlp for Semantic Web

    24/78

    Digital Enterprise Research Institute www.deri.ie

    24

    NLP-ACompleteExampleHe booked the large table in the corner.S

    heNPSubject , AgentX

    booked the large table in the cornerVP

    ... It was still available.S

    hePronoun3rd PersonAnimate

    the large table in the cornerNP

    Direct Object, PatientDefinite Y

    the large tableNP

    in the cornerPP

    tableNounSingularHeadfurniture_01

    the cornerNPDefinite Z

    inPrepositionHeadPredicate

    largeAdjectiveModifier

    was still availableVP

    itNPSubject, PatientY

    itPronoun3rd PersonInanimate

    bookVerbPast, 3rd PersonHeadPredicate

    isVerbPast, 3rd PersonHeadPredicate

    still availableAdvP

  • 8/3/2019 Nlp for Semantic Web

    25/78

    Digital Enterprise Research Institute www.deri.ie

    25

    NLPLayerCake

    Tezao

    [table]Hebookedthelargetableinthecorner...

    PoS

    hTn

    [table:noun]

    MopocaAys

    [table~s]work~ing][Sommer~schule]

    SmacTn

    [table:ARTIFACT,furniture_01]

    Pae

    [[[the][large][table]NP][[in][the][corner]PP]NP]

    D Sruue

    [[the:SPEC][large:MOD][table:HEAD]NP][[He:SUBJ][booked:PRED][[this][table:HEAD]NP:DOBJ]S]

    Dsose

    Ays

    [He:SUBJ][booked:PRED]this[[table:HEAD]NP:DOBJ:X1][[It:SUBJ:X1][was:PRED]available]

  • 8/3/2019 Nlp for Semantic Web

    26/78

    Digital Enterprise Research Institute www.deri.ie

    26

    Tokenization Where are the words?

    Part of Speech Tagging Is this word a verb or a noun or something else?

    Morphology Can I split this word up?

    Phrase Structure Do these words go together?

    Semantic Tagging What objects are expressed by the words/phrases in the sentence?

    Grammatical Functions & Dependency Structure Which objects do what? And in relation to which others?

    Discourse Analysis Which events are expressed throughout a text/discourse? How do they interact? And which objects are

    involved?

    NLPLayers

  • 8/3/2019 Nlp for Semantic Web

    27/78

    Digital Enterprise Research Institute www.deri.ie

    27

    Annotate each word in a sentence with a part-of-speech (PoS) tag -useful for subsequent syntactic parsing

    Most common PoS tag set for English is Penn Treebank set of 45 tags,e.g.

    John saw the saw and decided to take it to the table.NNP VBD DT NN CC VBD TO VB PRP IN DT NN

    Other tag sets in use for other languages, e.g. Stuttgart-Tbingen TagSet (STTS) for German Challenge in Part-of-Speech Tagging is ambiguity

    like can be Verb: I like/VBP candy. Preposition: Time flies like/IN an arrow.

    around can be Preposition: I bought it at the shop around/IN the corner. Particle: I never got around/RP to getting a car. Adverb: A new Prius costs around/RB $25K.

    Part-of-SpeechTaggingAdaptedfrom:http://www.cs.utexas.edu/~mooney/cs388/slides/pos-tagging.ppt

  • 8/3/2019 Nlp for Semantic Web

    28/78

    Digital Enterprise Research Institute www.deri.ie

    28 28

    Noun (person, place or thing) Singular (NN): dog, fork Plural (NNS): dogs, forks Proper (NNP, NNPS): John, Springfields Personal pronoun (PRP): I, you, he, she, it Wh-pronoun (WP): who, what

    Verb (actions and processes) Base, infinitive (VB): eat Past tense (VBD): ate Gerund (VBG): eating Past participle (VBN): eaten Non 3rd person singular present tense (VBP): eat 3rd person singular present tense: (VBZ): eats Modal (MD): should, can To (TO): to (to eat)

    Adjective (modify nouns) Basic (JJ): red, tall Comparative (JJR): redder, taller Superlative (JJS): reddest, tallest

    Adverb (modify verbs) Basic (RB): quickly Comparative (RBR): quicker Superlative (RBS): quickest

    Preposition (IN): on, in, by, to, with

    Determiner: Basic (DT) a, an, the WH-determiner (WDT): which, that

    Coordinating Conjunction (CC) and, but, or

    Particle (RP) off (took off), up (put up)

    Part-of-SpeechTaggingPoSEnglishAdaptedfrom:http://www.cs.utexas.edu/~mooney/cs388/slides/pos-tagging.ppt

  • 8/3/2019 Nlp for Semantic Web

    29/78

    Digital Enterprise Research Institute www.deri.ie

    29

    Closed classcategories are composed of a small, fixed set ofgrammatical function words for a given language: Pronouns (it, he, she, ) Prepositions (on, for, from, to, ) Modals (will, can, may, ) Determiners (a, the) Particles (to, up, off) Conjunctions (and, or)

    Open classcategories are composed of large sets of content words andare open to new additions:

    Nouns (a googler) Verbs (to google) Adjectives (geeky)

    29

    Part-of-SpeechTaggingClosedvs.OpenAdaptedfrom:http://www.cs.utexas.edu/~mooney/cs388/slides/pos-tagging.ppt

  • 8/3/2019 Nlp for Semantic Web

    30/78

    Digital Enterprise Research Institute www.deri.ie

    30 30

    Part-of-SpeechTaggingStateOfTheArt Overview of available PoS taggers:

    http://www-nlp.stanford.edu/links/statnlp.html#Taggers

    Many are widely-used (often retrainable), a very small selection: TreeTagger, decision trees, free research license

    http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger

    TnT, Thorsten Brants, HMM, free research licensehttp://www.coli.uni-saarland.de/~thorsten/tnt/

    ENGCG, lexicon & rules, commercial (LingSoft)http://www2.lingsoft.fi/cgi-bin/engcg

  • 8/3/2019 Nlp for Semantic Web

    31/78

    Digital Enterprise Research Institute www.deri.ie

    31

    Some definitions Morphological Analysis split up words into component morphemes

    and build a (formal) representation of word-internal structure

    Morpheme minimal meaning-bearing unit in a language Stem morpheme that forms central meaning unit in a word Affix word element that can only occur attached to a stem

    Prefix specificunspecific(English)

    Suffix wonderwonderful(English)

    Infix hingihumingi(Tagalog)

    Circumfix sagen gesagt(German)

    Morphological complexity varies between languages

    Isolated languages (no morphology): e.g., Chinese Morphologically poor languages: e.g., English Morphologically complex languages: e.g., Turkish

    MorphologicalAnalysisAdaptedfrom:http://courses.washington.edu/ling570/fei_fall07/10_15_morph.ppt

  • 8/3/2019 Nlp for Semantic Web

    32/78

    Digital Enterprise Research Institute www.deri.ie

    32

    Inflection: stem + morpheme (same PoS class) writing write + V + Progressive books book + N + Plural writes write + V + 3rd Person + Singular flies fly + N + Plural

    fly + V + 3rd Person + Singular

    Derivation: stem + morpheme (different PoS class) civil civilized civilization

    Compounding: multiple stems cabdriver cab + driver doghouse dog + house Flachbildschirm (flat screen) flach + Bildschirm (flat screen)

    Flachbild + Schirm (flat view screen)

    flach + Bild + Schirm (flat picture screen) Cliticization: stem + clitic

    Ive I + have

    MorphologicalAnalysisOverview,AmbiguityAdaptedfrom:http://courses.washington.edu/ling570/fei_fall07/10_15_morph.ppt

  • 8/3/2019 Nlp for Semantic Web

    33/78

  • 8/3/2019 Nlp for Semantic Web

    34/78

    Digital Enterprise Research Institute www.deri.ie

    34

    Phrase Group of words that functions as a single unit in syntax (Wikipedia)

    NP : Noun Phrase (the car, a clever student) VP : Verb Phrase (study hard, play the guitar) PP : Prepositional Phrase (in the class, above the earth) AP : Adjective Phrase (very tall, incredibly large)

    Phrase Structure Analysis

    Breaking up a sentence into recursively defined coherent units (constitutionalparts), e.g., an NP consisting of several NPs

    First step in sentence parsing (see also further NLP layers) Chunks

    Non-recursive phrases, as introduced by shallow parsing approach

    Chunking Also known as shallow parsing (without overall sentence structure &

    grammatical functions see also further NLP layers)

    PhraseStructureAnalysisDefinitionsWithinputfrom:http://www.kwary.net/linguistics/gl/GLSyntax01.ppt

  • 8/3/2019 Nlp for Semantic Web

    35/78

    Digital Enterprise Research Institute www.deri.ie

    35

    NP (Det) N

    NP

    the

    NDet

    bus

    PP

    in

    NPP

    the

    Det N

    yard

    PhraseStructureAnalysisNP,PPExampleAdaptedfrom:http://www.kwary.net/linguistics/gl/GLSyntax01.ppt

    PP P NP

    NP

    the

    NDet

    bus

    PP

    in

    NPP

    the

    Det N

    yard

    NP (Det) N (PP)

  • 8/3/2019 Nlp for Semantic Web

    36/78

    Digital Enterprise Research Institute www.deri.ie

    36

    PhraseStructureAnalysisVPExampleAdaptedfrom:http://www.kwary.net/linguistics/gl/GLSyntax01.ppt

    VP V (NP) (PP)VP

    took

    NPV PP

    from

    NPP

    the

    Det N

    bankthe

    Det N

    money

  • 8/3/2019 Nlp for Semantic Web

    37/78

    Digital Enterprise Research Institute www.deri.ie

    37

    PhraseStructureAnalysisStateOfTheArt Overview on Parsers (including shallow parsing) for English

    http://www.aclweb.org/aclwiki/index.php?title=Parsers_for_English

    Overview for other languages http://www.aclweb.org/aclwiki/index.php?title=List_of_resources_by_language

    Some shallow parser demos on the web: TreeTagger (PoS, Chunking), decision trees, free research license

    http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger

    CNTS Memory Based Shallow Parser (Univ. of Antwerpen), classifier, license?http://www.cnts.ua.ac.be/cgi-bin/jmeyhi/MBSP-instant-webdemo.cgi

    Univ. of Illinois at Urbana-Champaign, classifier, license?http://l2r.cs.uiuc.edu/~cogcomp/shallow_parse_demo.php

  • 8/3/2019 Nlp for Semantic Web

    38/78

    Digital Enterprise Research Institute www.deri.ie

    38

    Definition and History Classification of words, phrases with a semantically defined

    category

    Nowadays associated with Semantic Web (semanticannotation, knowledge markup) and Web 2.0 tagging

    In NLP refers to assigning a sense to a word or phrase

    Sense sets defined by Originally, machine readable dictionaries, e.g., LDOCE Recent years, wordnets (nouns), framenets (verbs) Increasingly, general & domain ontologies

    SemanticTagging

  • 8/3/2019 Nlp for Semantic Web

    39/78

    Digital Enterprise Research Institute www.deri.ie

    39

    WordNet is a Semantic Lexicon & Lexical Database Organized around meaning rather than word forms Maps words to meanings/interpretations or senses Senses are represented by synsets (sets of synonyms), e.g.,

    {board, plank} : piece of lumber {board, committee} : group of people

    Machine readable (has a formal structure) Freely downloadable: http://wordnet.princeton.edu/

    Integrated wordnets for several European languages EuroWordNet: http://www.illc.uva.nl/EuroWordNet/

    Wordnets for many languages with interoperable format http://www.globalwordnet.org/gwa/wordnet_table.htm

    SemanticTaggingWordNet

  • 8/3/2019 Nlp for Semantic Web

    40/78

    Digital Enterprise Research Institute www.deri.ie

    40

    In 1985 a group of psychologists and linguists at Princeton University undertookto develop a lexical database

    The initial idea was to provide an aid to use in searching dictionariesconceptually, rather than merely alphabetically

    WordNet instantiates hypotheses based on results of psycholinguistic research In anomic aphasia, there is a specific inability to name objects.

    When confronted with an apple, say, patients may be unable to utter apple, eventhough they will reject such suggestions as shoe or banana, and will recognize thatapple is correct when it is provided.

    Caramazza/Berndt 1978

    expose such hypotheses to the full range of the common vocabulary

    Miller, George A., Richard Beckwith, Christiane Fellbaum, Derek Gross and Katherine J. Miller. Introduction

    to WordNet: an on-line lexical database. In: International Journal of Lexicography 3 (4), 1990, pp. 235 - 244.

    SemanticTaggingWordNetOrigin

  • 8/3/2019 Nlp for Semantic Web

    41/78

    Digital Enterprise Research Institute www.deri.ie

    41

    Synsets represent different Senses Words that occur in several synsets have a corresponding

    number of senses i.e. are ambiguous:

    SemanticTaggingSynsets,Senses

  • 8/3/2019 Nlp for Semantic Web

    42/78

    Digital Enterprise Research Institute www.deri.ie

    42

    Homonymy Unrelated Senses, e.g.

    The ball went over the fence - artifact The ball went on into the late hours - event

    Systematic Polysemy Related Senses, e.g.

    The Boston office has been newly decorated - building The Boston office was founded in 1985. - organization The Boston office called. - group-of-people

    Also referred to in the literature as regular polysemy (Apresjan 1973) or

    logical polysemy (Pustejovsky 1991, 1995 ) systematic polysemyintroduced by (Nunberg & Zaenen 1992) - see also Bierwisch 1983 (schoolexample), Hobbs et al 1993 (office example)

    SemanticTaggingAmbiguitycont.

  • 8/3/2019 Nlp for Semantic Web

    43/78

    Digital Enterprise Research Institute www.deri.ie

    43

    Synsets are organized in hierarchies, defining: generalization (hypernymy) specialization (hyponymy)

    Example{entity}

    {whole, unit}

    {building material}

    {lumber, timber}{board, plank}

    hyponymy

    hypernymy

    SemanticTaggingSynsetHierarchy

  • 8/3/2019 Nlp for Semantic Web

    44/78

    Digital Enterprise Research Institute www.deri.ie

    44

    SemanticTaggingHierarchyExample

  • 8/3/2019 Nlp for Semantic Web

    45/78

  • 8/3/2019 Nlp for Semantic Web

    46/78

    Digital Enterprise Research Institute www.deri.ie

    46

    frameplacing:AgentplacesaThemeatalocation,theGoalDavidarrangedhisbriefcaseonthefloor.

    archive.v,arrange.v,bag.v,bestow.v,billet.v,bin.v,bottle.v,box.v,brush.v,cage.v,cram.v,crate.v,dab.v,

    daub.v,deposit.v,drape.v,drizzle.v,dust.v,embed.v,emplace.v,file.v,garage.v,hang.v,heap.v,

    immerse.v,implant.v,inject.v,insert.v,insertion.n,jam.v,lay.v,lean.v,load.v,lodge.v,mount.v,pack.v,

    package.v,park.v,perch.v,pile.v,place.v,placement.n,plant.v,plunge.v,pocket.v,position.v,pot.v,put.v,rest.v,rub.v,set.v,sheathe.v,shelve.v,shoulder.v,shower.v,sit.v,situate.v,smear.v,sow.v,stable.v,

    stand.v,stash.v,station.v,stick.v,stow.v,stuff.v,tuck.v,warehouse.v,wrap.v

    framearranging:AgentputsacomplexThemeintoaparticularConfigurationDavidarrangedthestonesinacircle.

    arrange.v,arrangement.n,array.v,deploy.v,deployment.n,format.v,setup.v

    SemanticTaggingFrameAmbiguity

  • 8/3/2019 Nlp for Semantic Web

    47/78

    Digital Enterprise Research Institute www.deri.ie

    47

    Word Sense Disambiguation Classification of the correct sense to a word Based on wordnets & similar resources for many languages Sense-annotated corpora enable classifier training No longer very active area of research in NLP community Annotated corpora, tools, evaluation data sets available

    from SenseVal (1-4) evaluation campaigns:

    http://www.senseval.org/ Recently attention turned to Semantic Role Labelling and

    variety of other tasks in Computational Lexical Semantics

    see SemEval evaluation campaign: http://semeval2.fbk.eu/

    SemanticTaggingWordSense

  • 8/3/2019 Nlp for Semantic Web

    48/78

    Digital Enterprise Research Institute www.deri.ie

    48

    Semantic Role Labelling Classification of correct frame category (sense) to a verb &

    assign semantic roles to its syntactic arguments

    Based on FrameNet availability and similar resources, e.g. PropBank http://verbs.colorado.edu/~mpalmer/projects/ace.html NomBank http://nlp.cs.nyu.edu/meyers/NomBank.html VerbNet http://verbs.colorado.edu/~mpalmer/projects/verbnet.html SemLinkhttp://verbs.colorado.edu/semlink/ OntoNoteshttp://www.bbn.com/ontonotes/ German FrameNethttp://www.coli.uni-saarland.de/projects/salsa

    Frame-annotated corpora enable classifier training Recently very active area of research in NLP community

    SemanticTaggingSemanticRoles

  • 8/3/2019 Nlp for Semantic Web

    49/78

    Digital Enterprise Research Institute www.deri.ie

    49

    SemanticTaggingStateOfTheArt Word Sense Disambiguation tools, selection

    WSD tools by Ted Pedersen (University of Minnesota, Duluth), freehttp://sourceforge.net/projects/wsdgate/ & others

    SenseLearner, Rada Mihalcea (Univ. of North Texas), freehttp://www.cse.unt.edu/~rada/downloads.html#senselearner

    SuperSenseTagger, SemTechLab Rome ?, license?http://sourceforge.net/projects/supersensetag/

    Semantic Role Labelling tools, selection Shalmaneser (Saarland Univ.), pluggable parsing & classifiers, free license

    http://www.coli.uni-saarland.de/projects/salsa/shal/

    Univ. of Illinois at Urbana-Champaign, parsing & classifiers, license?http://l2r.cs.uiuc.edu/~cogcomp/srl-demo.php

    SWIRL (Universitat Politecnica de Catalunya), parsing & classifiers, GPL licensehttp://www.surdeanu.name/mihai/swirl

  • 8/3/2019 Nlp for Semantic Web

    50/78

    Digital Enterprise Research Institute www.deri.ie

    50

    Metonymy (part stands for whole) The Boston office called. to call expects an object of type Human in Agent position coerce office into an object of type (Group-of) Person > Human lexical semantic inference: Person Work-at Office

    SemanticTaggingLexicalInference

    office

    Office

    Organization

    Building

    Person

    Has-addressLocated-at

    Representation-of

    Work-at

    Work-for

  • 8/3/2019 Nlp for Semantic Web

    51/78

    Digital Enterprise Research Institute www.deri.ie

    51

    Metonymy in Bridging (of discourse referents) Peter bought a car. The engine runs well. the engine refers to already introduced object (discourse referent) lexical semantic inference: Engine Part-of Car

    SemanticTaggingLexicalInference

    Car

    EnginePart-of

    Has-partcar

  • 8/3/2019 Nlp for Semantic Web

    52/78

    Digital Enterprise Research Institute www.deri.ie

    University

    Schoolis_part_of

    Campuslocated_at

    label label

    school staff

    Studentstudies_at

    Staff

    works_at

    Semantic Tagging Ontologies

  • 8/3/2019 Nlp for Semantic Web

    53/78

    Digital Enterprise Research Institute www.deri.ie

    University

    Schoolis_part_of

    Campuslocated_at

    has_German_termhas_US-English_term has_Dutch_term

    FakulttSchool Faculteit

    Studentstudies_at

    Staff

    works_at

    Semantic Tagging Classes, Terms

    RDF(S) & OWL current status

  • 8/3/2019 Nlp for Semantic Web

    54/78

    Digital Enterprise Research Institute www.deri.ie

    LingInfoOralMucosa

    hasLingInfo

    Term-1

    Mundschleimhaut

    hasOrthographicForm

    DE

    hasLang

    hasMorphSynInfo

    WordForm-1

    N

    hasPoS

    Term-2 Term-3

    hasStem

    Mund Schleimhaut

    hasOrthographicForm

    WordForm

    instanceOf

    hasMorphSynInfo

    Mucosa

    hasLingInfo

    instanceOf

    Semantic Tagging Lexicalized Ontologies

    http://olp.dfki.de/LingInfo/

    http://ontoware.org/projects/lexonto/

  • 8/3/2019 Nlp for Semantic Web

    55/78

    Digital Enterprise Research Institute www.deri.ie

    55

    Semantic tagging beyond word senses & semantic roles Terms, Classes, Relations, Properties/Attributes Names

    Terms, Classes, Relations, Properties/Attributes Semantic annotation on the basis of a thesaurus or ontology Term recognition & extraction

    terms are domain-specific phrases Relation extraction

    relations are domain-specific semantic roles Ontology-based information extraction

    SemanticTaggingTerms,Classes

  • 8/3/2019 Nlp for Semantic Web

    56/78

    Digital Enterprise Research Institute www.deri.ie

    56

    SemanticTagging Terms,Relations

    GENIA Relation GENIA Term(Class)

    SemanticRole

    GrammaticalFunction

    inhibit interleukin 1 beta

    IL-1beta

    Agent Subject

    insulin secretion Target Direct Object

    Withinputfrom:http://www.lrec-conf.org/proceedings/lrec2008/slides/496.ppt

    Terms/Classes & Relations in genetics domain GENIA corpus

    http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/home/wiki.cgi?page=GENIA+corpus

    Examples Interleukin 1 beta inhibits insulin secretion IL-1beta is known to inhibit insulin secretion Insulin secretion is inhibited by IL-1 beta

    Term recognition & extraction Grammatical function annotation: subject, direct object, etc.

    see further NLP layers

  • 8/3/2019 Nlp for Semantic Web

    57/78

    Digital Enterprise Research Institute www.deri.ie

    57

    Eurovoc ThesaurusTerminology in all EU languages on all

    EU areas: politics, trade, law, science,energy, agriculture,

    MT 3606 natural and applied sciencesUF gene pool

    genetic resourcegenotypeheredity

    BT1 biologyBT2 life sciencesNT1 DNART genetic engineering (6411)

    SemanticTaggingTerms,ClassesMedical Subject Headings (MeSH)Thesaurus with taxonomy of ~ 250,000 terms,

    representing medical subjects for retrievalpurposes

    MeSH Heading Databases, GeneticEntry Term Genetic DatabasesEntry Term Genetic Sequence DatabasesEntry Term OMIMEntry Term Mendelian Inheritance in Man

    Entry Term Genetic Data BanksEntry Term Genetic Data BasesEntry Term Genetic Information DatabasesSee Also Genetic Screening

    Gene Ontology

    Accession GO:0009292Synonyms broad : genetic exchange

    Term Lineage all : all (164142)GO:0008150 : biological process (115947)

    GO:0007275 : development (11892)GO:0009292 : genetic transfer (69)

  • 8/3/2019 Nlp for Semantic Web

    58/78

    Digital Enterprise Research Institute www.deri.ie

    58

    Semantic tagging beyond word senses & semantic roles Terms, Classes, Relations, Properties/Attributes Names

    Semantic annotation of names Named Entity Recognition Originally intended as extension of Tokenization, e.g. in

    recognizing Names and other specific tokens such as Dates, Times

    Evolved into a more general identification and classification ofnames of People, Organisations, Companies, Countries, Cities, etc.

    Currently merging with ontology-based information extraction

    SemanticTaggingNames

  • 8/3/2019 Nlp for Semantic Web

    59/78

    Digital Enterprise Research Institute www.deri.ie

    59

    SemanticTaggingStateOfTheArtcont. Named Entity Recognition

    Good overview of many available toolshttp://en.wikipedia.org/wiki/Named_entity_recognition

    Semantic annotation with thesauri, ontologies in various domains, e.g., Annotate biomedical text with UMLS Metathesaurus

    MetaMap (US National Library of Medicine), free license

    http://mmtx.nlm.nih.gov/

    Annotate business text with KIM ontologyKIM (Ontotext), free research license

    http://www.ontotext.com/kim/

    Annotate football (soccer) text with SWIntO ontologySProUT (DFKI), free research license

    http://www.dfki.de/sw-lt/heartofgold/ (web demo)

  • 8/3/2019 Nlp for Semantic Web

    60/78

    Digital Enterprise Research Institute www.deri.ie

    60

    Parsing parsing, or syntactic analysis, is the process of analyzing a

    sequence of tokens to determine their grammatical structure withrespect to a given grammar (Wikipedia)

    Shallow parsing (discussed above) provides Part of Speech tags Non-recursive phrases (chunks)

    Full (or deep) parsing provides on top of this Constituent structure

    complete syntactic structure in terms of interconnected recursive phrases and/or Clause structure

    predicate (mostly a verb) and one or more syntactic arguments (phrases) grammatical functions for predicate arguments: subject, direct object,

    and/or Dependency structure head-modifier analysis, semantic roles

    ParsingOverview

  • 8/3/2019 Nlp for Semantic Web

    61/78

    Digital Enterprise Research Institute www.deri.ie

    61

    ParsingFullParseExampleHe booked the large table in the corner.S

    heNPSubject, Agent

    booked the large table in the cornerVP

    hePronoun

    3rd

    personAnimate

    the large table in the cornerNP

    Direct Object, Patient

    the large tableNP

    in the cornerPP

    tableNounSingularHeadfurniture_01

    the cornerNPinPrepositionHeadPredicate

    largeAdjectiveModifier

    bookVerb

    Past, 3rd

    personHeadPredicate

  • 8/3/2019 Nlp for Semantic Web

    62/78

    Digital Enterprise Research Institute www.deri.ie

    62

    ParsingFullParseExampleHe booked the large table in the corner.S

    heNPSubject, Agent

    booked the large table in the cornerVP

    hePronoun

    3rd

    personAnimate

    the large table in the cornerNP

    Direct Object, Patient

    the large tableNP

    in the cornerPP

    tableNounSingularHeadfurniture_01

    the cornerNPinPrepositionHeadPredicate

    largeAdjectiveModifier

    bookVerb

    Past, 3rd

    personHeadPredicate

    Part of Speech

    Morphology

  • 8/3/2019 Nlp for Semantic Web

    63/78

    Digital Enterprise Research Institute www.deri.ie

    63

    ParsingFullParseExampleHe booked the large table in the corner.S

    heNPSubject, Agent

    booked the large table in the cornerVP

    hePronoun

    3rd

    personAnimate

    the large table in the cornerNP

    Direct Object, Patient

    the large tableNP

    in the cornerPP

    tableNounSingularHeadfurniture_01

    the cornerNPinPrepositionHeadPredicate

    largeAdjectiveModifier

    bookVerb

    Past, 3

    rd

    personHeadPredicate

    Phrases

  • 8/3/2019 Nlp for Semantic Web

    64/78

    Digital Enterprise Research Institute www.deri.ie

    64

    ParsingFullParseExampleHe booked the large table in the corner.S

    heNPSubject, Agent

    booked the large table in the cornerVP

    hePronoun

    3

    rd

    personAnimate

    the large table in the cornerNP

    Direct Object, Patient

    the large tableNP

    in the cornerPP

    tableNounSingularHeadfurniture_01

    the cornerNPinPrepositionHeadPredicate

    largeAdjectiveModifier

    bookVerb

    Past, 3

    rd

    personHeadPredicate

    Predicates

    GrammaticalFunctions

  • 8/3/2019 Nlp for Semantic Web

    65/78

    Digital Enterprise Research Institute www.deri.ie

    65

    ParsingFullParseExampleHe booked the large table in the corner.S

    heNPSubject, Agent

    booked the large table in the cornerVP

    hePronoun

    3

    rd

    personAnimate

    the large table in the cornerNP

    Direct Object, Patient

    the large tableNP

    in the cornerPPModifier

    tableNounSingularHeadfurniture_01

    the cornerNPinPrepositionHeadPredicate

    largeAdjectiveModifier

    bookVerb

    Past, 3

    rd

    personHeadPredicate

    Semantic Tags

    Semantic Roles

  • 8/3/2019 Nlp for Semantic Web

    66/78

    Digital Enterprise Research Institute www.deri.ie

    66

    ParsingFullParseExampleHe booked the large table in the corner.S

    heNPSubject, Agent

    booked the large table in the cornerVP

    hePronoun

    3

    rd

    personAnimate

    the large table in the cornerNP

    Direct Object, Patient

    the large tableNP

    in the cornerPPModifier

    tableNounSingularHeadfurniture_01

    the cornerNPinPrepositionHeadPredicate

    largeAdjectiveModifier

    bookVerbPast, 3rd personHeadPredicate

    Head-ModifierAnalysis

  • 8/3/2019 Nlp for Semantic Web

    67/78

    Digital Enterprise Research Institute www.deri.ie

    67

    ParsingDependencyStructure

    hePronoun3rd personAnimate

    tableNounSingularHeadfurniture_01

    largeAdjectiveModifierSize

    bookVerbPast, 3rd personHeadPredicate

    cornerNounSingularModifierLocation

    Agent Patient

    Size Location

    He booked the large table in the corner.

  • 8/3/2019 Nlp for Semantic Web

    68/78

    Digital Enterprise Research Institute www.deri.ie

    68

    ParsingDependencyStructureforIE

    hePronoun3rd person

    Animate

    tableNounSingular

    Head

    furniture_01

    largeAdjectiveModifier

    Size

    bookVerbPast, 3rd person

    Head

    Predicate

    cornerNounSingular

    Modifier

    Location

    Agent Patient

    Size Location

    Class + Properties Extracted Objects & Values Source

    Booking x, y PredicateBooking-sponsor Male(x) AgentBooking-order Table(y), Size(large), Location(y,z), Corner(z) Patient

    He booked the large table in the corner.

  • 8/3/2019 Nlp for Semantic Web

    69/78

    Digital Enterprise Research Institute www.deri.ie

    69

    ParsingStateOfTheArt Widely-used parsers

    MINIPAR, Dekang Lin, free research license Download: http://www.cs.ualberta.ca/~lindek/minipar.htm Web demo: http://dbis.nankai.edu.cn/miniparweb/

    Stanford Parser, Klein/Manning, free research license http://nlp.stanford.edu/software/lex-parser.shtml Web demo: http://nlp.stanford.edu:8080/parser/

    Rasp Parser (Sussex Univ.), Briscoe/Carroll, free research license http://www.informatics.susx.ac.uk/research/groups/nlp/rasp/

    Link Grammar Parser (CMU), Temperley et al., free license http://www.link.cs.cmu.edu/link/ Web demo: http://nlp.stanford.edu:8080/parser/

  • 8/3/2019 Nlp for Semantic Web

    70/78

    Digital Enterprise Research Institute www.deri.ie

    70

    Linking event participants (Semantic Role fillers) within andacross sentences, i.e.,

    an anaphor can be linked back to a discourse referent thatserves as its antecedent, e.g.,

    He bought a bottle of wine, sat down on a stone, and drank it.

    he AND it are anaphora

    a bottle of wine AND a stone introduce discourse referents

    it can be linked back to antecedent a bottle of wine OR a stone

    DiscourseAnalysisAnaphoraResolutionWithinputfrom:http://www.ling.su.se/DaLi/education/courses/ngslt_nlp06/PragmaticsGSLTLecture06.ppt

  • 8/3/2019 Nlp for Semantic Web

    71/78

    Digital Enterprise Research Institute www.deri.ie

    71

    He booked the large table in the corner.S

    heNPSubject , AgentX

    booked the large table in the cornerVP

    ... It was still available.S

    hePronoun3rd PersonAnimate

    the large table in the cornerNPDirect Object, PatientDefinite Y

    the large tableNP

    in the cornerPP

    tableNounSingularHeadfurniture_01

    the cornerNPDefinite Z

    inPrepositionHeadPredicate

    largeAdjectiveModifier

    was still availableVP

    itNPSubject, PatientY

    it

    Pronoun3rd PersonInanimate

    bookVerbPast, 3rd PersonHeadPredicate

    is

    VPast, 3rd PersonHeadPredicate

    still availableAdvP

    DiscourseAnalysisAnaphoraResolution

  • 8/3/2019 Nlp for Semantic Web

    72/78

    Digital Enterprise Research Institute www.deri.ie

    72

    Linking events in terms of temporal sequence, causality etc., e.g.,

    John bought a Mercedes, so Bill leased a BMW. (temporal sequence)

    John hid Bills car keys as he had drunk too much. (causality)

    DiscourseAnalysisDiscourseStructureWithinputfrom:http://www.ling.su.se/DaLi/education/courses/ngslt_nlp06/PragmaticsGSLTLecture06.ppt

  • 8/3/2019 Nlp for Semantic Web

    73/78

    Digital Enterprise Research Institute www.deri.ie

    73

    DiscourseAnalysisStateOfTheArt No readily available black-box tools Anaphora resolution often built-in functionality in NER,

    parsing, etc.

    To experiment with discourse referents, anaphoraresolution etc., try out e.g. Boxer

    Johan Bos, Univ. of Rome http://svn.ask.it.usyd.edu.au/trac/candc/wiki/boxer

  • 8/3/2019 Nlp for Semantic Web

    74/78

    Digital Enterprise Research Institute www.deri.ie

    74

    SemanticAnalysisofUnstructuredLegacyData Examplesin:SemanticSearch,Ontology-basedInformationExtraction,

    OntologyLearning

    NLPLayerCakewithPointers PartofSpeechTagging,Morphology,PhraseStructure SemanticTagging

    WordNet,FrameNet,WordSenseDisambiguation, NamedEntities,Terms,Thesauri,Ontologies, AdvancedTopic:Ontology-LexiconInterface

    GrammaticalFunctions,DependencyStructure,DiscourseAnalysis

    FurtherRelevantPointers GeneralTools,Organizations,Conferences,Journals,Sites,Lists,

    OverviewoftheTutorial

  • 8/3/2019 Nlp for Semantic Web

    75/78

    Digital Enterprise Research Institute www.deri.ie

    75

    GATE, Univ. of Sheffield Eclipse of Natural Language Engineering http://gate.ac.uk/

    UIMA, IBM / OpenSource 'Open, Industrial-Strength Platform for Unstructured Information Analysis and Search http://incubator.apache.org/uima/

    NLTK (Natural Language Toolkit), Melbourne Univ. ? Open source Python modules for research and development in natural language

    processing - book (June 2009): Natural Language Processing with Python http://www.nltk.org/

    MBT: Memory-based tagger-generator and tagger, Univ. of Tilburg/Antwerpen can generate a sequence tagger on the basis of a training set of tagged sequences http://ilk.uvt.nl/mbt/

    SProUT, DFKI platform for development of multilingual shallow text processing and information

    extraction systems

    http://sprout.dfki.de/

    FurtherRelevantPointersGeneralTools

  • 8/3/2019 Nlp for Semantic Web

    76/78

    Digital Enterprise Research Institute www.deri.ie

    76

    Conferences Association for Computational Linguistics

    ACL (Int.), EACL (Europe), NAACL (North-America), IJCNLP (AFNLP - Asia) http://www.aclweb.org/ ACL SIGS: http://aclweb.org/aclwiki/index.php?title=Special_interest_groups

    International Conference on Computational Linguistics COLING: http://nlp.shef.ac.uk/iccl/

    International Conference on Language Resources and Evaluation

    LREC: http://www.lrec-conf.org/

    Other NLP conferences: EMNLP, CONLL, RANLP, CICLing,

    Journals Computational Linguistics, MIT Press Natural Language Engineering, Cambridge University Press

    Journal of Logic, Language and Information, Springer Language Resources and Evaluation, Springer

    FurtherRelevantPointersPublications

  • 8/3/2019 Nlp for Semantic Web

    77/78

    Digital Enterprise Research Institute www.deri.ie

    77

    Handbooks Handbook of natural language processing, CRC Press, 2000 new editionin progress (2009) Speech and Language Processing: An Introduction to Natural Language

    Processing, Computational Linguistics, and Speech Recognition, PrenticeHall, 2008

    The Oxford handbook of computational linguistics, Oxford University Press,2005

    Foundations of statistical natural language processing, MIT Press, 2003 Relevant Mailing Lists

    Corpora list: http://gandalf.aksis.uib.no/corpora/ Linguist list: http://linguistlist.org/

    Other NLP sites - broad overviews of tools, resources, people ACL Wiki: http://aclweb.org/aclwiki LT World: http://www.lt-world.org/

    FurtherRelevantPointersMoreReading

  • 8/3/2019 Nlp for Semantic Web

    78/78

    Digital Enterprise Research Institute www.deri.ie

    Thanks!

    Further Questions:

    [email protected]