XML and General Dutch Dictionary (ANW) Van der Kamp, Lexical databases and digital tools, april 29...

17
XML and General Dutch Dictionary (ANW) Van der Kamp, Lexical databases and digital tools, april 29 th , 2005, 1 Peter van der Kamp www.inl.nl [email protected]

Transcript of XML and General Dutch Dictionary (ANW) Van der Kamp, Lexical databases and digital tools, april 29...

Page 1: XML and General Dutch Dictionary (ANW) Van der Kamp, Lexical databases and digital tools, april 29 th, 2005, 1 Peter van der Kamp  kamp@inl.nl.

XML and General Dutch Dictionary (ANW)

Van der Kamp, Lexical databases and digital tools, april 29th, 2005, 1

Peter van der Kamp

www.inl.nl

[email protected]

Page 2: XML and General Dutch Dictionary (ANW) Van der Kamp, Lexical databases and digital tools, april 29 th, 2005, 1 Peter van der Kamp  kamp@inl.nl.

Topics

• Characteristics

• Schema

• XML Dictionary Editor

• Problems to be solved

Van der Kamp, Lexical databases and digital tools, april 29th, 2005, 2

Page 3: XML and General Dutch Dictionary (ANW) Van der Kamp, Lexical databases and digital tools, april 29 th, 2005, 1 Peter van der Kamp  kamp@inl.nl.

Characteristics

Online dictionary, no printed version

Dutch language (incl. Flanders) from 1970 - 2018

Based on a corpus of 100 mio words

Elaborated microstructure

XML

Van der Kamp, Lexical databases and digital tools, april 29th, 2005, 3

Page 4: XML and General Dutch Dictionary (ANW) Van der Kamp, Lexical databases and digital tools, april 29 th, 2005, 1 Peter van der Kamp  kamp@inl.nl.

Schema characteristics

• Divided into 12 subschemas

• Currently all elements: zero or more occurrences except headword

• Currently 186 atomic elements

• Many enumerations (378, to be used as controlled vocabulary)

• Some elements allowed at different levels

Van der Kamp, Lexical databases and digital tools, april 29th, 2005, 4

Page 5: XML and General Dutch Dictionary (ANW) Van der Kamp, Lexical databases and digital tools, april 29 th, 2005, 1 Peter van der Kamp  kamp@inl.nl.

Schema

Van der Kamp, Lexical databases and digital tools, april 29th, 2005, 5

Entry

PoS Sense

Entry

PoS

Page 6: XML and General Dutch Dictionary (ANW) Van der Kamp, Lexical databases and digital tools, april 29 th, 2005, 1 Peter van der Kamp  kamp@inl.nl.

XML Dictionary Editor

User requirements:

• Don’t want to work with tags

• Tags invisible

Van der Kamp, Lexical databases and digital tools, april 29th, 2005, 6

Page 7: XML and General Dutch Dictionary (ANW) Van der Kamp, Lexical databases and digital tools, april 29 th, 2005, 1 Peter van der Kamp  kamp@inl.nl.

XML Dictionary Editor (cont’d)

User requirements:

• Form like input• Use of predefined lists (controlled vocabulary)

Van der Kamp, Lexical databases and digital tools, april 29th, 2005, 7

Page 8: XML and General Dutch Dictionary (ANW) Van der Kamp, Lexical databases and digital tools, april 29 th, 2005, 1 Peter van der Kamp  kamp@inl.nl.

XML Dictionary Editor (cont’d)

User requirements:

• Insert, add and remove elements must be easy

Van der Kamp, Lexical databases and digital tools, april 29th, 2005, 8

Page 9: XML and General Dutch Dictionary (ANW) Van der Kamp, Lexical databases and digital tools, april 29 th, 2005, 1 Peter van der Kamp  kamp@inl.nl.

XML Dictionary Editor (cont’d)

Van der Kamp, Lexical databases and digital tools, april 29th, 2005, 9

User requirements: • Hide/show elements

Technical requirements• Subschema enabled

Page 10: XML and General Dutch Dictionary (ANW) Van der Kamp, Lexical databases and digital tools, april 29 th, 2005, 1 Peter van der Kamp  kamp@inl.nl.

XML Dictionary Editor (cont’d)

XML editor, but…

…which one?

XMLWriter

Van der Kamp, Lexical databases and digital tools, april 29th, 2005, 10

Page 11: XML and General Dutch Dictionary (ANW) Van der Kamp, Lexical databases and digital tools, april 29 th, 2005, 1 Peter van der Kamp  kamp@inl.nl.

XML Dictionary Editor (cont’d)

Currently the best possible solution:

Authentic (free XML content editor from Altova)

StyleVision (e-forms and stylesheet designer from Altova)

(http://www.altova.com)

Van der Kamp, Lexical databases and digital tools, april 29th, 2005, 11

Page 12: XML and General Dutch Dictionary (ANW) Van der Kamp, Lexical databases and digital tools, april 29 th, 2005, 1 Peter van der Kamp  kamp@inl.nl.
Page 13: XML and General Dutch Dictionary (ANW) Van der Kamp, Lexical databases and digital tools, april 29 th, 2005, 1 Peter van der Kamp  kamp@inl.nl.

XML Dictionary Editor: problems

Problem: hide element = delete element

Hide element important due to size of entry

Solution (to be implemented):

• Extra element <hide> in schema

• Checkbox as ‘data entry device’

• When unchecked: perform hide

Disadvantage:

<hide> is noise in dictionary entry

Van der Kamp, Lexical databases and digital tools, april 29th, 2005, 13

Page 14: XML and General Dutch Dictionary (ANW) Van der Kamp, Lexical databases and digital tools, april 29 th, 2005, 1 Peter van der Kamp  kamp@inl.nl.
Page 15: XML and General Dutch Dictionary (ANW) Van der Kamp, Lexical databases and digital tools, april 29 th, 2005, 1 Peter van der Kamp  kamp@inl.nl.
Page 16: XML and General Dutch Dictionary (ANW) Van der Kamp, Lexical databases and digital tools, april 29 th, 2005, 1 Peter van der Kamp  kamp@inl.nl.

XML Dictionary Editor: problems

Problem: visualize difference between container elements and atomic elements.

Current implementation requires some schema knowledge

Van der Kamp, Lexical databases and digital tools, april 29th, 2005, 16

Page 17: XML and General Dutch Dictionary (ANW) Van der Kamp, Lexical databases and digital tools, april 29 th, 2005, 1 Peter van der Kamp  kamp@inl.nl.

Conclusion / future work

Developing forms easy

Current implementation satisfying

Database solution (relational vs. xml)

Retrieval

Easy use of (X)query language

Van der Kamp, Lexical databases and digital tools, april 29th, 2005, 17