Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists •...

23
Introduction to JAPE Introduction to JAPE Introduction to JAPE Introduction to JAPE Mark A. Greenwood

Transcript of Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists •...

Page 1: Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists • Gazetteer lists are designed for annotating simple, regular features Some flexibility

Introduction to JAPEIntroduction to JAPEIntroduction to JAPEIntroduction to JAPE

Mark A. Greenwood

Page 2: Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists • Gazetteer lists are designed for annotating simple, regular features Some flexibility

University of Sheffield NLP

RecapRecapRecapRecap

• Installed and run GATE• Installed and run GATE• Understand the idea of

� LR – Language Resources� PR – Processing Resources

• ANNIE� Understand the goals of information extraction� Loaded ANNIE into GATE� Constructed one or more gazetteer lists

Page 3: Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists • Gazetteer lists are designed for annotating simple, regular features Some flexibility

University of Sheffield NLP

OverviewOverviewOverviewOverview

• Limitations of Gazetteer Lists• High Level Overview of Pattern Matching• High Level Overview of Pattern Matching• What is JAPE?• Learn JAPE by Example

� Input Specifications� Left Hand Side� Macros� Right Hand Side� Right Hand Side� Phases� Loading JAPE into GATE

• Hands On – Extending the IE example

Page 4: Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists • Gazetteer lists are designed for annotating simple, regular features Some flexibility

University of Sheffield NLP

Limitations of Gazetteer ListsLimitations of Gazetteer ListsLimitations of Gazetteer ListsLimitations of Gazetteer Lists

• Gazetteer lists are designed for annotating • Gazetteer lists are designed for annotating simple, regular features� Some flexibility is provided by matching

• Word roots

• Whole/part words

• For example, recognising e-mail • For example, recognising e-mail addresses using just a gazetteer would be impossible

Page 5: Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists • Gazetteer lists are designed for annotating simple, regular features Some flexibility

University of Sheffield NLP

High Level Overview ofHigh Level Overview ofHigh Level Overview ofHigh Level Overview of

Pattern MatchingPattern MatchingPattern MatchingPattern Matching

• The early components in the ANNIE • The early components in the ANNIE pipeline produce simple annotations� Token, Sentence, Lookup

• These annotations have features� Token kind, part of speech, major type...

• Patterns in these annotations and features can suggest more complex information

Page 6: Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists • Gazetteer lists are designed for annotating simple, regular features Some flexibility

University of Sheffield NLP

What is JAPE?What is JAPE?What is JAPE?What is JAPE?

• JAPE provides pattern matching in GATE• JAPE provides pattern matching in GATE• Each JAPE rule consists of the

� LHS which contains patterns to match� RHS which details the annotations (and

optionally features) to be created

• JAPE rules combine to create a phase• JAPE rules combine to create a phase• Phases combine to create a grammar

Page 7: Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists • Gazetteer lists are designed for annotating simple, regular features Some flexibility

University of Sheffield NLP

Learn JAPE By ExampleLearn JAPE By ExampleLearn JAPE By ExampleLearn JAPE By ExamplePhase: EMail

Input: Token SpaceToken Options: control = appeltOptions: control = appelt

Macro: WORD_OR_NUMBER(

({Token.kind == word}|{Token.kind == number}))

Rule: emailaddressPriority: 50

((WORD_OR_NUMBER)+

({Token.string == "."}(WORD_OR_NUMBER)+)*({Token.string == "."}(WORD_OR_NUMBER)+)*{Token.string == "@"}

(WORD_OR_NUMBER)+({Token.string == "."}(WORD_OR_NUMBER)+)*

)

:email -->:email.EMail= {rule = "emailaddress"}

Page 8: Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists • Gazetteer lists are designed for annotating simple, regular features Some flexibility

University of Sheffield NLP

Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:

Input SpecificationsInput SpecificationsInput SpecificationsInput Specifications

• Each JAPE file defines a phase of the grammar.grammar.

• The header specifies how the rules within the phase will be applied to the documents

• The input to the rules within this phrase is the subset of annotations specified in the the subset of annotations specified in the header

• The rules within a single phase compete based on the control option

Page 9: Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists • Gazetteer lists are designed for annotating simple, regular features Some flexibility

University of Sheffield NLP

Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:

Input SpecificationsInput SpecificationsInput SpecificationsInput Specifications

• 5 different control styles:� Appelt (use of priorities)� Appelt (use of priorities)

� Once (as soon as a rule fires, matching stops)

� First (shortest rule fires)

� Brill (fire every rule that applies)

� All (all possible matches)

• Appelt priority is applied in the following order� Longest pattern

� Explicit priority (default = -1)

� First defined rule

Page 10: Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists • Gazetteer lists are designed for annotating simple, regular features Some flexibility

University of Sheffield NLP

Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:

Input SpecificationsInput SpecificationsInput SpecificationsInput Specifications

A A A {A}+

Appelt

Once

Brill

First

Brill

All

Page 11: Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists • Gazetteer lists are designed for annotating simple, regular features Some flexibility

University of Sheffield NLP

Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:

Left Hand Side PatternsLeft Hand Side PatternsLeft Hand Side PatternsLeft Hand Side Patterns

• LHS is expressed in terms of existing annotations, and optionally features and their valuesand optionally features and their values

• Any annotation to be used must be included in the input header

• Any annotation not included in the input header will be ignored (e.g. whitespace)

• Each annotation is enclosed in curly braces• Annotations may be combined using traditional • Annotations may be combined using traditional

Klene operators: | * + ?• Each pattern to be matched is enclosed in round

brackets and can have a label attached

Page 12: Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists • Gazetteer lists are designed for annotating simple, regular features Some flexibility

University of Sheffield NLP

Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:

Left Hand Side PatternsLeft Hand Side PatternsLeft Hand Side PatternsLeft Hand Side Patterns

• As well as matching against the presence • As well as matching against the presence of an annotation, JAPE rules can access annotation features

{Token.kind==“number”}

• Features can be compared with ==, !=, >, <, =~, !~, ==~ and !=~<, =~, !~, ==~ and !=~

• Ranges can be specified({Token})[1,3] or ({Token})[3]

Page 13: Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists • Gazetteer lists are designed for annotating simple, regular features Some flexibility

University of Sheffield NLP

Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:

Left Hand Side PatternsLeft Hand Side PatternsLeft Hand Side PatternsLeft Hand Side Patterns

• Contextual information can be specified in the same way, but has no labelthe same way, but has no label

• Contextual information will be consumed by the rule

({Annotation1})

({Annotation2}):match

({Annotation3})

• There are other constructs that can be used. For details see the user guide.

Page 14: Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists • Gazetteer lists are designed for annotating simple, regular features Some flexibility

University of Sheffield NLP

Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:

MacrosMacrosMacrosMacros

• Macros look like the LHS of a rule but they never have a labelnever have a label

• They are used in rules by enclosing the macro name in round brackets

• Conventional to name macros in uppercase lettersuppercase letters

• Macros hold across an entire set of grammar phases

Page 15: Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists • Gazetteer lists are designed for annotating simple, regular features Some flexibility

University of Sheffield NLP

Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:

Right Hand Side AnnotationsRight Hand Side AnnotationsRight Hand Side AnnotationsRight Hand Side Annotations

• LHS and RHS are separated by -- >

• Label matches that on the LHS• Annotation to be created follows the label

(Annotation1): match -->

:match.NewAnnotName = {feature1 = value1, feature2 = value2}

Page 16: Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists • Gazetteer lists are designed for annotating simple, regular features Some flexibility

University of Sheffield NLP

Learn JAPE By ExampleLearn JAPE By ExampleLearn JAPE By ExampleLearn JAPE By ExamplePhase: EMail

Input: Token SpaceToken Options: control = appeltOptions: control = appelt

Macro: WORD_OR_NUMBER(

({Token.kind == word}|{Token.kind == number}))

Rule: emailaddressPriority: 50

((WORD_OR_NUMBER)+

({Token.string == "."}(WORD_OR_NUMBER)+)*({Token.string == "."}(WORD_OR_NUMBER)+)*{Token.string == "@"}

(WORD_OR_NUMBER)+({Token.string == "."}(WORD_OR_NUMBER)+)*

)

:email -->:email.Email = {rule = "emailaddress"}

Page 17: Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists • Gazetteer lists are designed for annotating simple, regular features Some flexibility

University of Sheffield NLP

Learning JAPE By Example:Learning JAPE By Example:Learning JAPE By Example:Learning JAPE By Example:

Multiple PhasesMultiple PhasesMultiple PhasesMultiple Phases

• Grammars usually consist of several phases which are run sequentiallyrun sequentially

• A definition phase (conventionally called main.jape) lists the phases to be used, in order

• Only the definition phase needs to be loaded

• Temporary annotations may be created in early phases and used as input for later phasesand used as input for later phases

• Annotations from earlier phases may need to be combined or modified

17

Page 18: Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists • Gazetteer lists are designed for annotating simple, regular features Some flexibility

University of Sheffield NLP

Learning JAPE By Example:Learning JAPE By Example:Learning JAPE By Example:Learning JAPE By Example:

Loading Grammars into GATELoading Grammars into GATELoading Grammars into GATELoading Grammars into GATE

• Load a JAPE transducer, with parameter • Load a JAPE transducer, with parameter the .jape file you have created

• Add to application and run• Inspect results

18

Page 19: Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists • Gazetteer lists are designed for annotating simple, regular features Some flexibility

University of Sheffield NLP

Learning JAPE By Example:Learning JAPE By Example:Learning JAPE By Example:Learning JAPE By Example:

Loading Grammars into GATELoading Grammars into GATELoading Grammars into GATELoading Grammars into GATE

Page 20: Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists • Gazetteer lists are designed for annotating simple, regular features Some flexibility

University of Sheffield NLP

Hands On:Hands On:Hands On:Hands On:

Extending the IE ExampleExtending the IE ExampleExtending the IE ExampleExtending the IE Example

• The best way to learn JAPE is to try • The best way to learn JAPE is to try writing rules yourself

• In the previous session you should have added a new gazetteer to look for words that might signify a change in share price

Page 21: Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists • Gazetteer lists are designed for annotating simple, regular features Some flexibility

University of Sheffield NLP

Hands On:Hands On:Hands On:Hands On:

Extending the IE ExampleExtending the IE ExampleExtending the IE ExampleExtending the IE Example

• Use the Lookup annotations from your gazetteer along with named entities annotated by ANNIEalong with named entities annotated by ANNIE� Organization� Money� Percent� ...

• Annotate the documents to associate a company with a change in share price:with a change in share price:� Shares in Scoot rose 9 per cent on the

announcement...� Whitbread shares closed up 2p at 645p.� ...

Page 22: Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists • Gazetteer lists are designed for annotating simple, regular features Some flexibility

Your Turn!Your Turn!Your Turn!Your Turn!Feel Free To Refer To The User Guide

And To Ask For Help

Page 23: Introduction to JAPE - GATE · University of Sheffield NLP Limitations of Gazetteer Lists • Gazetteer lists are designed for annotating simple, regular features Some flexibility

University of Sheffield NLP

Hands On:Hands On:Hands On:Hands On:

Extending the IE ExampleExtending the IE ExampleExtending the IE ExampleExtending the IE Example

Phase: SharesInput: Token Organization Lookup Money PercentInput: Token Organization Lookup Money PercentOptions: control = appelt

Rule:ShareChange(

{Organization}({Token})[0,3]{Lookup.majorType=="change"}({Token})[0,3]({Token})[0,3]({Money}|{Percent})

):change -->:change.ShareChange = {rule = "ShareChange"}