Leveraging XLT: (Web- Enabled) Validation of Terminology Collections Lee Gillam, University of...

20
Leveraging XLT: (Web- Leveraging XLT: (Web- Enabled) Validation of Enabled) Validation of Terminology Terminology Collections Collections Lee Gillam, University of Surrey SALT Workshop, Antwerp 31 January 2001

Transcript of Leveraging XLT: (Web- Enabled) Validation of Terminology Collections Lee Gillam, University of...

Page 1: Leveraging XLT: (Web- Enabled) Validation of Terminology Collections Lee Gillam, University of Surrey SALT Workshop, Antwerp 31 January 2001.

Leveraging XLT: (Web-Leveraging XLT: (Web-Enabled) Validation of Enabled) Validation of

Terminology CollectionsTerminology Collections

Lee Gillam, University of Surrey

SALT Workshop, Antwerp

31 January 2001

Page 2: Leveraging XLT: (Web- Enabled) Validation of Terminology Collections Lee Gillam, University of Surrey SALT Workshop, Antwerp 31 January 2001.

Surrey-EU project historySurrey-EU project history

Terminology Extraction and Management Projects: TWB, TWBII

Management of Text Collections: TRANSTERM

Term Resources: POINTERTerminology Validation: INTERVALConvergence in SALT?

Page 3: Leveraging XLT: (Web- Enabled) Validation of Terminology Collections Lee Gillam, University of Surrey SALT Workshop, Antwerp 31 January 2001.

XLT ‘opportunities’XLT ‘opportunities’

Complete terminology collections available in XML – enhancement/reuse of other collections

Large number of (multilingual) terms – difficult for humans to appraise

Terminology relates to usage – document collections highly relevant

Quantity of terms – no guarantee of quality

Page 4: Leveraging XLT: (Web- Enabled) Validation of Terminology Collections Lee Gillam, University of Surrey SALT Workshop, Antwerp 31 January 2001.

(Web-Enabled) Validation(Web-Enabled) Validation

Relevant documents on the web – contextual information

Relevant documents on the ‘corporate internet’ – contextual information

Term usage in other organisations (glossaries)/as understood by Joe E.C. Taxpayer

Resource enrichment

Page 5: Leveraging XLT: (Web- Enabled) Validation of Terminology Collections Lee Gillam, University of Surrey SALT Workshop, Antwerp 31 January 2001.

System DescriptionSystem Description

For a given (D)XLT collection of terminology:– Partition collection by specific criteria– Collect documents relevant to criteria– Analyse documents against the partitioned

collection– Report results

Page 6: Leveraging XLT: (Web- Enabled) Validation of Terminology Collections Lee Gillam, University of Surrey SALT Workshop, Antwerp 31 January 2001.

System DescriptionSystem Description

Partition collection by specific criteria:– Use of ‘Xpath’– “Give me all terms in English”

//dxlt/text/body/termEntry/langSet[@lang = ‘en’]/ntig/termGrp/term/text()

– Alternative example: “Give me all subjectFields”

//dxlt/text/body/termEntry/descrip[@type=‘subjectField’]/text() [check!]

Page 7: Leveraging XLT: (Web- Enabled) Validation of Terminology Collections Lee Gillam, University of Surrey SALT Workshop, Antwerp 31 January 2001.

System DescriptionSystem Description

Collect documents relevant to criteria– For terms, try internet/intranet searching– For subject field classifications, classification

documents will be relevant– For definitions, comparisons with other

glossaries may provide useful validating information

– …..

Page 8: Leveraging XLT: (Web- Enabled) Validation of Terminology Collections Lee Gillam, University of Surrey SALT Workshop, Antwerp 31 January 2001.

System DescriptionSystem Description

Analyse documents against the partitioned collection– Are the terms contained in the documents?– Are the terms in the documents now used as parts of

compounds?– What are the contexts in which the terms are used?– Are there a number of potential other definitions for a

particular term?– Does this fit in with a specific classification?– ….

Page 9: Leveraging XLT: (Web- Enabled) Validation of Terminology Collections Lee Gillam, University of Surrey SALT Workshop, Antwerp 31 January 2001.

System DescriptionSystem Description

Report Results– Term frequency – Zero?– Potential compounds– Contexts– Definitions– Correctly classified– …..

Page 10: Leveraging XLT: (Web- Enabled) Validation of Terminology Collections Lee Gillam, University of Surrey SALT Workshop, Antwerp 31 January 2001.

Prototype prototypePrototype prototype

‘XML’

XML attributes ‘Results Area’

Indicative Actions

Page 11: Leveraging XLT: (Web- Enabled) Validation of Terminology Collections Lee Gillam, University of Surrey SALT Workshop, Antwerp 31 January 2001.

Prototype prototypePrototype prototype

Page 12: Leveraging XLT: (Web- Enabled) Validation of Terminology Collections Lee Gillam, University of Surrey SALT Workshop, Antwerp 31 January 2001.

Prototype prototypePrototype prototypeIndicative XPaths

Page 13: Leveraging XLT: (Web- Enabled) Validation of Terminology Collections Lee Gillam, University of Surrey SALT Workshop, Antwerp 31 January 2001.

Prototype prototypePrototype prototype

Page 14: Leveraging XLT: (Web- Enabled) Validation of Terminology Collections Lee Gillam, University of Surrey SALT Workshop, Antwerp 31 January 2001.

Prototype prototypePrototype prototype

Page 15: Leveraging XLT: (Web- Enabled) Validation of Terminology Collections Lee Gillam, University of Surrey SALT Workshop, Antwerp 31 January 2001.

Prototype prototypePrototype prototype

Recall this term…

Page 16: Leveraging XLT: (Web- Enabled) Validation of Terminology Collections Lee Gillam, University of Surrey SALT Workshop, Antwerp 31 January 2001.

Prototype prototypePrototype prototype

CIRCUIT SWITCHINGFound in collected texts 43 times. Valid term?

PACKET SWITCHING also exists in this

resource.

Page 17: Leveraging XLT: (Web- Enabled) Validation of Terminology Collections Lee Gillam, University of Surrey SALT Workshop, Antwerp 31 January 2001.

DHydro SampleDHydro Sample <termEntry id="HR-7"> + <transacGrp> <descrip type="subjectField">200</descrip> + <langSet lang="fr"> <langSet lang="en"> <descripGrp> <descrip type="definition">The apparent displacement in position

of a heavenly body caused by the combination of the velocity of light and that of an observer on the surface of the earth. Aberration of light due to the rotation of the earth on its axis is termed diurnal aberration. That due to the revolution of the earth around the sun is termed annual aberration.</descrip> </descripGrp>

<ntig> <termGrp> <term id="HR-7-en-1">aberration of light</term> <termNote type="termType">main entry</termNote> <termNote

type="partOfSpeech">n</termNote></termGrp> </ntig> </langSet> + <langSet lang="es"> </termEntry>

Page 18: Leveraging XLT: (Web- Enabled) Validation of Terminology Collections Lee Gillam, University of Surrey SALT Workshop, Antwerp 31 January 2001.

Lenoch (GMT)Lenoch (GMT) <struct type="classification"> <feat type="name">AD2</feat> <feat type="documentation">public and private organisations</feat> <feat type="subclass-of">AD</feat> </struct> <struct type="classification"> <feat type="name">AD3</feat> <feat type="documentation">publications and documentary search</feat> <feat type="subclass-of">AD</feat> </struct> <struct type="classification"> <feat type="name">AD31</feat> <feat type="documentation">documentation and information systems</feat> <feat type="subclass-of">AD3</feat> </struct>

Page 19: Leveraging XLT: (Web- Enabled) Validation of Terminology Collections Lee Gillam, University of Surrey SALT Workshop, Antwerp 31 January 2001.

Lenoch (XOL)Lenoch (XOL) <class> <name>AD2</name> <documentation>public and private organisations</documentation> <subclass-of>AD</subclass-of> </class> <class> <name>AD3</name> <documentation>publications and documentary search</documentation> <subclass-of>AD</subclass-of> </class> <class> <name>AD31</name> <documentation>documentation and information

systems</documentation> <subclass-of>AD3</subclass-of> </class>

Page 20: Leveraging XLT: (Web- Enabled) Validation of Terminology Collections Lee Gillam, University of Surrey SALT Workshop, Antwerp 31 January 2001.

OutlookOutlook

Initial Results show promise for Validation of Terminological Resources

– significant development work is still required.– XPath generation needs tailoring to specific formats (DXLT), but

provides useful power– Development to merge ‘Web glossaries’ – pre-terminological validation

stage

Provide a powerful prototype of the capabilities for the (Web-Enabled) Validation of Terminology Collections – with DXLT-related formats.

DXLT as the de facto standard format for Terminology Validation?