Sofia April 27. 2006 Language Technology in the Information Society - Hot issues and open questions...
-
Upload
felicity-bruce -
Category
Documents
-
view
213 -
download
0
Transcript of Sofia April 27. 2006 Language Technology in the Information Society - Hot issues and open questions...
Sofia April 27. 2006
Language Technology in the
Information Society- Hot issues and open
questions -
Walther v.Hahn
University of Hamburg • Computer Science Department
Natural Language Systems Group
WWW: http://nats-www.informatik.uni-hamburg.de/view/User/WaltherVHahn
E-Mail: vhahn@informatik, uni-hamburg.de
Sofia April 27. 2006
Language Processing is more than a Text Processor
Text Processing
Sofia April 27. 2006
What is „Real“ Text Technology?
Text Processing
Gestures
Corpora
Images
Tools
Web Applications
Cultures
Linguistics
Workflow
Technologies Domains
User profiles
Ontologies
Languages
Models and Society
System Type
Methods
Sofia April 27. 2006
Language Technology is only useful if …
• ‘text” comprises spoken and written utterances of the language (the linguistic definition):
– Borders between the two types blur in WWW texts.
• the processes have access to the semantics of texts:
– the meaning structure with lexical semantics and syntax, including reference,
• the processes have access to the pragmatics of texts:
– the intended action (plan) behind the utterance
Sofia April 27. 2006
Focus of Interest in Texts
• We are interested in the semantics and pragmatics of utterances, not only in linguistic features or logic description
• Semantics and pragmatics are dependent on the users‘ intentions not only literal word-
centered contents• The contents, not the wording counts
paraphrases are equivalent• Utterances are planned hearer-dependent actions
only helpful answers count• Utterances are situated reference is crucial• The coherence of utterances reveals the whole truth
Without deictic resolution a system is blind
Sofia April 27. 2006
Comparing and Translating Languages)
Multilingual text processes of any sort are never a linguistic problem alone; you can write rules about languages, but not about reference, ontologies or work flows. Many processing problems are still unsolved:
• Language is ambiguousI saw a man with a hat vs I saw a man with a telescope
• differences among languagesНяколко момчета и момичета отидоха на кино, но само на
двама от тях им хареса филма.Some boys and girls went to the cinema, but only two of them (the
boys!) liked the movie.• WritingIt is even difficult to recognize names in different writings (Galja or Galia?), depending on the transliteration or conceptualization
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
Sofia April 27. 2006
Multilingual Communication
10 Emerging Technologies That Will Change Your WorldTechnology
Review Febr 2004
– Universal Translation– Synthetic Biology– Nanowires– Bayesian Machine Learning– T-Rays– Distributed Storage– RNA Interference– Power Grid Control– Microfluidic Optical Fibers– Personal Genomics
It is hopeless to provide FAHQMT translations of - any text, - any text type, - any purpose, - any situation, - any person, by one single systemonly for the 380 European language pairs. But the need is real. What is the solution? Who is responsible for what?
It is hopeless to provide FAHQMT translations of - any text, - any text type, - any purpose, - any situation, - any person, by one single systemonly for the 380 European language pairs. But the need is real. What is the solution? Who is responsible for what?
Sofia April 27. 2006
Modern Methods
The case of Machine Translation shows:
• Homogenous systems have systematic problems– Rule based (problem: crashes with unknown phenomena of any sort)– Statistical (problem: the millions of rare examples, extralinguistic
parameters)• Hybrid systems
– A) hybrid technologies• Statistical / example based/ rule based/ menue based etc. translation
– B) hybrid in/out channels • text, • sound,• images, • gestures,• lip reading
Sofia April 27. 2006
Real-Life Background of Texts
• Text analysis is crucial for cooperation in a vast variety of communication types:
• Oral communication (politics, economy, touristics, private interests, ...)
• Written communication (publications, lyrics, e-mail, touristics, bills, legal texts, ...)
• Text and images (WWW, catalogues, museums, menues, advertizing of any type, ...)
• Texts and gestures (road information, slanders, touristics, explanations of any sort, ...)
• Text and facial expression (slander, irony, first aid, ...)
Sofia April 27. 2006
Why natural language at all?
1. CoverageNL is complete. Whatever you want to express, it is , in principle, possible within NL,
2. Vagueness
The use of vague expressions is a highly efficient method in human interaction and the basis for innovative thinking. It must be tackled by language technology in a pragmatic way, not only by logical mechanisms,
3. Abbreviations
In most realistic and complex query settings, NL is shorter than formal languages. NL is coherent over whole paragraphs without repeating given information again and again. Elliptical expressions are unambigous by virtue of the situation and shared knowledge.
Sofia April 27. 2006
The Value of „Soft“ Fields
• Computer application (thus, computer science) in some areas is shifting from numerical operations to intentional support of human action. The specification, however, is conveyed exclusively by texts,
• This requires the understanding of utterances as communicative and cooperative problem solving (John Searle‘s „How to do things with words“).
• Introducing computer technology in real life increasingly requires psychology, linguistics, sociology and other humanities instead of mathematics, logics and statistics, which today is included anyway in tools, class libraries or plug-ins.
Sofia April 27. 2006
Metalanguage and Evaluation
• Any occurence of natural language is a mixture of object language and metalanguage, Error messages by the machine or by the user have to cope with both levels. This is why user cannot correct a system by giving natural language examples or helpful hints. Moreover, very often the level distinction is implicite („rubbish!“).
• Evaluation modules for quality check within hybrid systems cannot be better than the modules of the system. Otherwise a programmer would change the system‘s operating directly.
Sofia April 27. 2006
Verbmobil
Sofia April 27. 2006
International Cooperation
• Language processing is always a mixture of transnational interests, languages and (commercial, political, ethical, ...) agents, according to import and export of knowledge.
• Some tasks are international (political, economical organizations),
• others are national (language description, analysis of utterances, corpora, culture, ethics).
• In the future, EU will support pivot languages insted of 380 language pairs and every country will need to define their knowledge import and export budget and who pays for it (tax payers, companies, foundations, sponsors)
Sofia April 27. 2006
Obsolete Approaches
• FAHQ Analyis
• Isolated Systems
• Systems for all Languages
• Systems for all Domains
• Monomethodic Systems
• „The unknown trick“
Sofia April 27. 2006
The Sustainability Issue
Example:
• The digital satellite images of the rain forest from the 70ies in South America are not readable any more and can not be reconstructed by any method. They are lost for ever.
Sofia April 27. 2006
Maintenance and Conservation of Digital Documents
means
• „Refreshing“: To copy the document from media to media to keep the bit sequence. This is technically trivial.
• „Migration of content“: Conservation of the contents independently of the original perception
• “Migration of perception“: Conservation of the identical or very similar visual/acoustic surface.
Sofia April 27. 2006
Maintenance of Perception
Display 2106
Operating System 2106
Machine 2106
Document 2106
Display 2006
Operating System 2006
Machine 2006
Document 2006
Pro
cess
ing
Env
ironm
ent1
Pro
cess
ing
Env
ironm
ent2
=
Sofia April 27. 2006
Possible Solutions
• Producing paper copies, but losing programes, games, MM presentations, etc.
• Museum Approach: Keeping a machine of every new type with OS, programming languages etc. Not realistically viable (spare parts, operating knowledge, ...), extremely expensive.
• Emulation Approach: software emulation of software and OS, a long term task.
• Universal Virtual Approach: Constructing a virtual machine for the simulation of all existing machines and operating systems, extremely expensive.
• „Stone of Rosetta“-Approach: Keeping the bit sequence and producing an exact specification of the formal features of the document, the operating system and the machine. The local effort is small, but reconstruction is extremly expensive.
Sofia April 27. 2006
Emulation Approach
Display 2006
Operating System 2006
Machine 2006
Document 2006P
roce
ssin
g E
nviro
nmen
t1
Display 2006
Operating System 2006
2006-Emulator 2106
OS 2106
Machine 2106
Document 2006
Pro
cess
ing
Env
ironm
ent2
=
Sofia April 27. 2006
Conclusion
• We have gained a lot of partial knowledge about language and language use, which may be sufficient for rather specific applications, however,
• we are taking the parts for the whole,
• we expect users to adapt to our technologies,
• we define the processing tasks by the cases we can handle,
• we still define text analysis in the paradigm of the 80ies.