Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists •...

Post on 23-Sep-2020

5 views 0 download

Transcript of Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists •...

Introduction to JAPEIntroduction to JAPEIntroduction to JAPEIntroduction to JAPE

Mark A. Greenwood

University of Sheffield NLP

RecapRecapRecapRecap

• Installed and run GATE• Installed and run GATE• Understand the idea of

� LR – Language Resources� PR – Processing Resources

• ANNIE� Understand the goals of information extraction� Loaded ANNIE into GATE� Constructed one or more gazetteer lists

University of Sheffield NLP

OverviewOverviewOverviewOverview

• Limitations of Gazetteer Lists• High Level Overview of Pattern Matching• High Level Overview of Pattern Matching• What is JAPE?• Learn JAPE by Example

� Input Specifications� Left Hand Side� Macros� Right Hand Side� Right Hand Side� Phases� Loading JAPE into GATE

• Hands On – Extending the IE example

University of Sheffield NLP

Limitations of Gazetteer ListsLimitations of Gazetteer ListsLimitations of Gazetteer ListsLimitations of Gazetteer Lists

• Gazetteer lists are designed for annotating • Gazetteer lists are designed for annotating simple, regular features� Some flexibility is provided by matching

• Word roots

• Whole/part words

• For example, recognising e-mail • For example, recognising e-mail addresses using just a gazetteer would be impossible

University of Sheffield NLP

High Level Overview ofHigh Level Overview ofHigh Level Overview ofHigh Level Overview of

Pattern MatchingPattern MatchingPattern MatchingPattern Matching

• The early components in the ANNIE • The early components in the ANNIE pipeline produce simple annotations� Token, Sentence, Lookup

• These annotations have features� Token kind, part of speech, major type...

• Patterns in these annotations and features can suggest more complex information

University of Sheffield NLP

What is JAPE?What is JAPE?What is JAPE?What is JAPE?

• JAPE provides pattern matching in GATE• JAPE provides pattern matching in GATE• Each JAPE rule consists of the

� LHS which contains patterns to match� RHS which details the annotations (and

optionally features) to be created

• JAPE rules combine to create a phase• JAPE rules combine to create a phase• Phases combine to create a grammar

University of Sheffield NLP

Learn JAPE By ExampleLearn JAPE By ExampleLearn JAPE By ExampleLearn JAPE By ExamplePhase: EMail

Input: Token SpaceToken Options: control = appeltOptions: control = appelt

Macro: WORD_OR_NUMBER(

({Token.kind == word}|{Token.kind == number}))

Rule: emailaddressPriority: 50

((WORD_OR_NUMBER)+

({Token.string == "."}(WORD_OR_NUMBER)+)*({Token.string == "."}(WORD_OR_NUMBER)+)*{Token.string == "@"}

(WORD_OR_NUMBER)+({Token.string == "."}(WORD_OR_NUMBER)+)*

)

:email -->:email.EMail= {rule = "emailaddress"}

University of Sheffield NLP

Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:

Input SpecificationsInput SpecificationsInput SpecificationsInput Specifications

• Each JAPE file defines a phase of the grammar.grammar.

• The header specifies how the rules within the phase will be applied to the documents

• The input to the rules within this phrase is the subset of annotations specified in the the subset of annotations specified in the header

• The rules within a single phase compete based on the control option

University of Sheffield NLP

Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:

Input SpecificationsInput SpecificationsInput SpecificationsInput Specifications

• 5 different control styles:� Appelt (use of priorities)� Appelt (use of priorities)

� Once (as soon as a rule fires, matching stops)

� First (shortest rule fires)

� Brill (fire every rule that applies)

� All (all possible matches)

• Appelt priority is applied in the following order� Longest pattern

� Explicit priority (default = -1)

� First defined rule

University of Sheffield NLP

Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:

Input SpecificationsInput SpecificationsInput SpecificationsInput Specifications

A A A {A}+

Appelt

Once

Brill

First

Brill

All

University of Sheffield NLP

Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:

Left Hand Side PatternsLeft Hand Side PatternsLeft Hand Side PatternsLeft Hand Side Patterns

• LHS is expressed in terms of existing annotations, and optionally features and their valuesand optionally features and their values

• Any annotation to be used must be included in the input header

• Any annotation not included in the input header will be ignored (e.g. whitespace)

• Each annotation is enclosed in curly braces• Annotations may be combined using traditional • Annotations may be combined using traditional

Klene operators: | * + ?• Each pattern to be matched is enclosed in round

brackets and can have a label attached

University of Sheffield NLP

Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:

Left Hand Side PatternsLeft Hand Side PatternsLeft Hand Side PatternsLeft Hand Side Patterns

• As well as matching against the presence • As well as matching against the presence of an annotation, JAPE rules can access annotation features

{Token.kind==“number”}

• Features can be compared with ==, !=, >, <, =~, !~, ==~ and !=~<, =~, !~, ==~ and !=~

• Ranges can be specified({Token})[1,3] or ({Token})[3]

University of Sheffield NLP

Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:

Left Hand Side PatternsLeft Hand Side PatternsLeft Hand Side PatternsLeft Hand Side Patterns

• Contextual information can be specified in the same way, but has no labelthe same way, but has no label

• Contextual information will be consumed by the rule

({Annotation1})

({Annotation2}):match

({Annotation3})

• There are other constructs that can be used. For details see the user guide.

University of Sheffield NLP

Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:

MacrosMacrosMacrosMacros

• Macros look like the LHS of a rule but they never have a labelnever have a label

• They are used in rules by enclosing the macro name in round brackets

• Conventional to name macros in uppercase lettersuppercase letters

• Macros hold across an entire set of grammar phases

University of Sheffield NLP

Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:

Right Hand Side AnnotationsRight Hand Side AnnotationsRight Hand Side AnnotationsRight Hand Side Annotations

• LHS and RHS are separated by -- >

• Label matches that on the LHS• Annotation to be created follows the label

(Annotation1): match -->

:match.NewAnnotName = {feature1 = value1, feature2 = value2}

University of Sheffield NLP

Learn JAPE By ExampleLearn JAPE By ExampleLearn JAPE By ExampleLearn JAPE By ExamplePhase: EMail

Input: Token SpaceToken Options: control = appeltOptions: control = appelt

Macro: WORD_OR_NUMBER(

({Token.kind == word}|{Token.kind == number}))

Rule: emailaddressPriority: 50

((WORD_OR_NUMBER)+

({Token.string == "."}(WORD_OR_NUMBER)+)*({Token.string == "."}(WORD_OR_NUMBER)+)*{Token.string == "@"}

(WORD_OR_NUMBER)+({Token.string == "."}(WORD_OR_NUMBER)+)*

)

:email -->:email.Email = {rule = "emailaddress"}

University of Sheffield NLP

Learning JAPE By Example:Learning JAPE By Example:Learning JAPE By Example:Learning JAPE By Example:

Multiple PhasesMultiple PhasesMultiple PhasesMultiple Phases

• Grammars usually consist of several phases which are run sequentiallyrun sequentially

• A definition phase (conventionally called main.jape) lists the phases to be used, in order

• Only the definition phase needs to be loaded

• Temporary annotations may be created in early phases and used as input for later phasesand used as input for later phases

• Annotations from earlier phases may need to be combined or modified

17

University of Sheffield NLP

Learning JAPE By Example:Learning JAPE By Example:Learning JAPE By Example:Learning JAPE By Example:

Loading Grammars into GATELoading Grammars into GATELoading Grammars into GATELoading Grammars into GATE

• Load a JAPE transducer, with parameter • Load a JAPE transducer, with parameter the .jape file you have created

• Add to application and run• Inspect results

18

University of Sheffield NLP

Learning JAPE By Example:Learning JAPE By Example:Learning JAPE By Example:Learning JAPE By Example:

Loading Grammars into GATELoading Grammars into GATELoading Grammars into GATELoading Grammars into GATE

University of Sheffield NLP

Hands On:Hands On:Hands On:Hands On:

Extending the IE ExampleExtending the IE ExampleExtending the IE ExampleExtending the IE Example

• The best way to learn JAPE is to try • The best way to learn JAPE is to try writing rules yourself

• In the previous session you should have added a new gazetteer to look for words that might signify a change in share price

University of Sheffield NLP

Hands On:Hands On:Hands On:Hands On:

Extending the IE ExampleExtending the IE ExampleExtending the IE ExampleExtending the IE Example

• Use the Lookup annotations from your gazetteer along with named entities annotated by ANNIEalong with named entities annotated by ANNIE� Organization� Money� Percent� ...

• Annotate the documents to associate a company with a change in share price:with a change in share price:� Shares in Scoot rose 9 per cent on the

announcement...� Whitbread shares closed up 2p at 645p.� ...

Your Turn!Your Turn!Your Turn!Your Turn!Feel Free To Refer To The User Guide

And To Ask For Help

University of Sheffield NLP

Hands On:Hands On:Hands On:Hands On:

Extending the IE ExampleExtending the IE ExampleExtending the IE ExampleExtending the IE Example

Phase: SharesInput: Token Organization Lookup Money PercentInput: Token Organization Lookup Money PercentOptions: control = appelt

Rule:ShareChange(

{Organization}({Token})[0,3]{Lookup.majorType=="change"}({Token})[0,3]({Token})[0,3]({Money}|{Percent})

):change -->:change.ShareChange = {rule = "ShareChange"}