Thai-language.com Glenn Slayden October 14, 2009.
-
date post
21-Dec-2015 -
Category
Documents
-
view
217 -
download
0
Transcript of Thai-language.com Glenn Slayden October 14, 2009.
![Page 1: Thai-language.com Glenn Slayden October 14, 2009.](https://reader034.fdocuments.us/reader034/viewer/2022052509/56649d625503460f94a441cc/html5/thumbnails/1.jpg)
thai-language.com
Glenn SlaydenOctober 14, 2009
![Page 2: Thai-language.com Glenn Slayden October 14, 2009.](https://reader034.fdocuments.us/reader034/viewer/2022052509/56649d625503460f94a441cc/html5/thumbnails/2.jpg)
Agenda
• Background and history• Site surface demonstration• Database ontology• Database technology• Data Entry demonstration• Future directions• Q&A : throughout please
![Page 3: Thai-language.com Glenn Slayden October 14, 2009.](https://reader034.fdocuments.us/reader034/viewer/2022052509/56649d625503460f94a441cc/html5/thumbnails/3.jpg)
Overarching Motivation
• Long-term objectives:
–Increase linguistic rigor–Publish any new work–Maintain popular accessibility–Build community
![Page 4: Thai-language.com Glenn Slayden October 14, 2009.](https://reader034.fdocuments.us/reader034/viewer/2022052509/56649d625503460f94a441cc/html5/thumbnails/4.jpg)
Historical Parchment - 1997
![Page 5: Thai-language.com Glenn Slayden October 14, 2009.](https://reader034.fdocuments.us/reader034/viewer/2022052509/56649d625503460f94a441cc/html5/thumbnails/5.jpg)
More Parchment - 2001
![Page 6: Thai-language.com Glenn Slayden October 14, 2009.](https://reader034.fdocuments.us/reader034/viewer/2022052509/56649d625503460f94a441cc/html5/thumbnails/6.jpg)
Site Demonstration
![Page 7: Thai-language.com Glenn Slayden October 14, 2009.](https://reader034.fdocuments.us/reader034/viewer/2022052509/56649d625503460f94a441cc/html5/thumbnails/7.jpg)
Database? What Database
• How big is a monolingual dictionary?• 100,000 words x 30 b/entry = 30 MB• How much memory in a modern server?
32GB.• That’s about 1/10th of 1% (.00094)• SQL? MySql? PostGres? Not indicated.
![Page 8: Thai-language.com Glenn Slayden October 14, 2009.](https://reader034.fdocuments.us/reader034/viewer/2022052509/56649d625503460f94a441cc/html5/thumbnails/8.jpg)
Case Study
October 13, 2009 – 64-bit web server – 32 GB RAM
![Page 9: Thai-language.com Glenn Slayden October 14, 2009.](https://reader034.fdocuments.us/reader034/viewer/2022052509/56649d625503460f94a441cc/html5/thumbnails/9.jpg)
Server Memory Utilization
n.b. this entire pie chart represents 10% of total memory
![Page 10: Thai-language.com Glenn Slayden October 14, 2009.](https://reader034.fdocuments.us/reader034/viewer/2022052509/56649d625503460f94a441cc/html5/thumbnails/10.jpg)
In-memory is the way to go
• For performance• For ease and speed of development• Easy refactoring• LINQ – C# “language-integrated query”• Have a flexible and powerful object-model
without worrying about relational mapping• Completely avoid OR/M (object-relational
mapping) “impedance mismatch” issues
![Page 11: Thai-language.com Glenn Slayden October 14, 2009.](https://reader034.fdocuments.us/reader034/viewer/2022052509/56649d625503460f94a441cc/html5/thumbnails/11.jpg)
thai-language.com Ontology
• Disclaimer and warning– Internal names of programming objects are not
(any longer) intended to have any relationship to corresponding Linguistic terms. On the following slides please consider these names to be opaque monikers.
![Page 12: Thai-language.com Glenn Slayden October 14, 2009.](https://reader034.fdocuments.us/reader034/viewer/2022052509/56649d625503460f94a441cc/html5/thumbnails/12.jpg)
thai-language.com Ontology
These colors correspond (roughly) to data-entry screen colors in DBEdit
![Page 13: Thai-language.com Glenn Slayden October 14, 2009.](https://reader034.fdocuments.us/reader034/viewer/2022052509/56649d625503460f94a441cc/html5/thumbnails/13.jpg)
The most basic
![Page 14: Thai-language.com Glenn Slayden October 14, 2009.](https://reader034.fdocuments.us/reader034/viewer/2022052509/56649d625503460f94a441cc/html5/thumbnails/14.jpg)
Lucky Decision
• ..that turned out to be incredibly valuable:– Heterogeneous objects are assigned ID numbers
within mutually exclusive ranges
![Page 15: Thai-language.com Glenn Slayden October 14, 2009.](https://reader034.fdocuments.us/reader034/viewer/2022052509/56649d625503460f94a441cc/html5/thumbnails/15.jpg)
Scary Picture with Clouds In It
![Page 16: Thai-language.com Glenn Slayden October 14, 2009.](https://reader034.fdocuments.us/reader034/viewer/2022052509/56649d625503460f94a441cc/html5/thumbnails/16.jpg)
Data Entry Demonstration
![Page 17: Thai-language.com Glenn Slayden October 14, 2009.](https://reader034.fdocuments.us/reader034/viewer/2022052509/56649d625503460f94a441cc/html5/thumbnails/17.jpg)
Future directions
• Track provenance of entries and changes• Separate-out meta-information in English
senses• Move towards community curatorship while
maintaining asset value– Requires reputation-granting authority
• Refine and formalize dictionary statement of purpose (i.e. to prevent hijacking)
![Page 18: Thai-language.com Glenn Slayden October 14, 2009.](https://reader034.fdocuments.us/reader034/viewer/2022052509/56649d625503460f94a441cc/html5/thumbnails/18.jpg)
Technology Changes
• In 2009, optimizing a language dictionary database for size is not necessary
• Detailed fields should be generously deployed• Exception to the in-memory model:– Comprehensive change version tracking may
warrant database storage– This is necessary for community curatorship
![Page 19: Thai-language.com Glenn Slayden October 14, 2009.](https://reader034.fdocuments.us/reader034/viewer/2022052509/56649d625503460f94a441cc/html5/thumbnails/19.jpg)
An integrated DELPH-IN style computational-analytical grammar
• Associate a rigorous HPSG feature structure with each sense
• Display MRS and tree on dictionary page for compounds and sentences.
• Ability to designate gold standard parse trees and attestation provenance
• Live interface for LKB/PET-style parser to provide arbitrary parsing
![Page 20: Thai-language.com Glenn Slayden October 14, 2009.](https://reader034.fdocuments.us/reader034/viewer/2022052509/56649d625503460f94a441cc/html5/thumbnails/20.jpg)
Thanks for Coming!