Vishal Vachhani CFILT and DIL, IIT Bombay CS 671 ICT For Development 19 th Sep 2008.
-
Upload
shannon-grant -
Category
Documents
-
view
214 -
download
0
Transcript of Vishal Vachhani CFILT and DIL, IIT Bombay CS 671 ICT For Development 19 th Sep 2008.
Vishal VachhaniCFILT and DIL,
IIT Bombay
CS 671 ICT For Development19th Sep 2008
Agro Explorer
A Meaning Based Multilingual Search Engine
Vishal Vachhani 2
Web-site for Indian farmers Farmers can submit their problems related
to their crops Queries are answered by Agricultural
Experts at KVK, Baramati Languages supported: Marathi, Hindi,
English
Vishal Vachhani 3
Why Need Multilingual Search
Vast Amount of Information available on the Web
Almost 70% of the Information is in English
The Indian rural populace is not English-Literate
“A Big Language Barrier” Information has to be made available to
them in their local languages.
Vishal Vachhani 4
Why Need Meaning Based Search
Most of the current Search Engines are Keyword Based.
They do not consider the semantics of the query
The result set contains a large number of extraneous documents.
Search based on the Meaning of the query will help narrow down on the desired information quickly.
Vishal Vachhani 5
Vishal Vachhani 6
Query in Hindi
English Document
System
Marathi Document
search
English Document
Result in Hindi
Vishal Vachhani 7
Same Keywords Different
Semantics
Moneylenders Exploit Farmers
Farmers Exploit Moneylenders
Found 1 Result Found 0 Result
Provides both Meaning Based Search Cross-Lingual Information Access
Vishal Vachhani 8
System Architecture
Vishal Vachhani 9
Vishal Vachhani 10
Vishal Vachhani 11
Vishal Vachhani 12
Vishal Vachhani 13
Vishal Vachhani 14
Conclusion
Provides two independent features Multi-Linguality Meaning Based Search.
Because of UNL both multi-lingual and meaning based properties can be incorporated together rather than using separate language translators in search engines. The scheme admits itself to Integration of multiple languages in a seamless, scalable manner.
Vishal Vachhani 15
Vishal Vachhani 16
UNL UNL Universal Networking Universal Networking
LanguageLanguage
Vishal Vachhani 17
UNL
English
French
Tamil
Marathi
Hindi
Direct translation - translation will be done directly - N*(N-1) translator are needed for N languages translation. Intermediate Language - intermediate language will be used for language translation - Only 2*N translators are required.
Vishal Vachhani 18
UNL is an acronym for “Universal Networking Language”.
UNL is a computer language that enables computers to process information and knowledge across the language barriers.
UNL is a language for representing information and knowledge provided by natural languages
Unlike natural languages, UNL expressions are unambiguous.
Vishal Vachhani 19
Although the UNL is a language for computers, it has all the components of a natural language.
It is composed of Universal Words (UWs), Relations, Attributes.
Knowledge :semantic graph◦ Nodes concepts◦ Arcs relation between concepts
Vishal Vachhani 20
A UW represents simple or compound concepts. There are two classes of UWs:◦ unit concepts ◦ compound structures of binary relations grouped
together ( indicated with Compound UW-Ids) A UW is made up of a character string (an
English-language word) followed by a list of constraints. ◦ <UW>::=<Head Word>[<Constraint List>]◦ example
state(icl>express) state(icl>country)
Vishal Vachhani 21
◦ A relation label is represented as strings of 3 characters or less.
◦ The relations between UWs are binary. rel (UW1, UW2)
◦ They have different labels according to the different roles they play.
◦ At present, there are 46 relations in UNL◦ For example, agt (agent), ins (instrument), pur
(purpose), etc.
Vishal Vachhani 22
Attribute labels express additional information about the Universal Words that appear in a sentence.
◦ They show what is said from the speaker’s point of view; how the speaker views what is said. (time, reference, emphasis, attitude, etc)
◦ @entry, @present, @progressive, @topic, etc.
Vishal Vachhani 23
Example: Ram eats rice.{unl}
agt(eat.@entry.@present, Ram)obj(eat.@entry.@present, rice(icl>eatable))
{/unl}
Vishal Vachhani 24
Vishal Vachhani 25
Ram
eat
rice
plc agt
Example: The boy who works here went to school.{unl}
agt(go(icl>move).@entry.@past, :01)
plt(go(icl>occur).@entry.@past,school(icl>institution))agt:01(work(icl>do), boy(icl>person.@entry))plc:01(work(icl>do),here)
{/unl}
Vishal Vachhani 26
Vishal Vachhani 27
agt
plc
plt
agt
go
here
work school
boy
:01
Vishal Vachhani 28
Enconvertor
IntermediateLanguage
Deconvertor
Source language
target language
It’s a Language Independent Generator It can deconvert UNL expressions into a variety of
native languages, using a number of linguistic data such as Word Dictionary, Grammatical Rules of each language.
The DeConverter transforms the sentence represented by a UNL expression into Natural language sentence.
Vishal Vachhani 29
Vishal Vachhani 30
Vishal Vachhani 31
DictionarySyntax
Planning Rules
UNL Parser
Case MarkingModule
Morphology Module
SyntaxPlanning Module
Case Marking
RulesMorphology
Rules
UNLDoc
HindiDoc
Language dependent Module
Language Independent Module
• UNL parser module will do following tasks
–Check input format of UNL document–Separate attributes form UWs–Separate attributes form dictionary entries–Replace UWs with Hindi root words
Category of morpho-syntactic properties which distinguish the various relations that a noun phrase may bear to a governing head.
ने�, पर ,के� , से�, प�,etc. A rule base based on :
◦ UNL attributes◦ lexical attributes from dictionary
Vishal Vachhani 33
Case marking is implemented using rules. We analyze all UNL as well as dictionary
attributes and decide next and previous case marker.
Also we use relation with parent to extract the right case mark.
Vishal Vachhani 34
agt:null:null:null:ने�:@past#V:VINT:N:null Structure
◦ relName : ◦ parent previous case marker: ◦ parent next case marker:◦ child previous case marker: ◦ child next case marker:◦ the rest four are in form of ◦ attr'REL'relationname ◦ and attr will be separated by # ◦ also relation name are separated by #
Vishal Vachhani 35
What is Morphology
◦ Study of Morphemes◦ Their formation into words, including
inflection, derivation and composition
Vishal Vachhani 36
Noun, Verb and Adjective Morphology◦ Depends on the phonetic properties of the
Hindi word Noun Morphology
◦ Depends on gender, number and vowel ending of the noun
Adjective Morphology◦ अच्छा लडके, अच्छा लडके�, अच्छा� लडके�◦ adjective अच्छा changes, lexical attribute “AdjA”
Verb Morphology◦ Depends upon tense, gender, number ,
person etc.
Vishal Vachhani 37
Verbs are categorized by ◦ Tense (past,present,future)◦ Gender(male,female)◦ Person (1st , 2nd , 3rd )◦ Number (sg,pl)
Example◦ Ladaka khana kha raha hai.
It contains present continuous tense,male, sg, and 3rd person
Vishal Vachhani 38
Arranging word according to the language structure
Rule based module It is priority based graph traversal
Vishal Vachhani 39
Algorithm for Syntax Planning:
1) Start traversing the UNL graph from the entry node.2) If node has no children then add this node to final string.3) If there is more than one child of one node then sort
children based on the priority of the relations. Relation having highest
priority will be traversed first.4) Mark that node as visited node.5) Repeat steps 3 and 4 until all the children of that node get
visited.6) If all the children of that node get visited then add that
node to final string.7) Repeat steps 2 to 4 until all the nodes get traversed.
Vishal Vachhani 40
Also, spray 5% Neemark solution.
Vishal Vachhani41
man
qua
modmod
obj
spray
alsosolution
Neemarkpercent
5
obj:17man:9mod:5qua:5
U-3
Vishal Vachhani 42
spray
Entry
Vishal Vachhani 43
spray
Entry
obj man
Vishal Vachhani 44
spray
Entry
obj:17 man:9
Vishal Vachhani 45
spray
Entry
obj:17 man:9
solution
Vishal Vachhani 46
spray
Entry
obj:17 man:9
solution
mod mod
Vishal Vachhani 47
spray
Entry
obj:17 man:9
solution
mod:5 mod:5
Vishal Vachhani 48
spray
Entry
obj:17 man:9
solution
mod:5 mod:5
percent
Vishal Vachhani 49
spray
Entry
obj:17 man:9
solution
mod:5 mod:5
percent
Vishal Vachhani 50
spray
Entry
obj:17 man:9
solution
mod:5 mod:5
percent
qua:5
Vishal Vachhani 51
spray
Entry
obj:17 man:9
solution
mod:5 mod:5
percent
qua:5
5
Output : 5
Vishal Vachhani 52
spray
Entry
obj:17 man:9
solution
mod:5 mod:5
percent
qua:5
5
Output : 5 percent
Vishal Vachhani 53
spray
Entry
obj:17 man:9
solution
mod:5 mod:5
percent
qua:5
5
Neemark
Output : 5 percent Neemark
Vishal Vachhani 54
spray
Entry
obj:17 man:9
solution
mod:5 mod:5
percent
qua:5
5
Neemark
Output : 5 percent Neemark solution
Vishal Vachhani 55
spray
Entry
obj:17 man:9
solution
mod:5 mod:5
percent
qua:5
5
Neemark
also
Output : 5 percent Neemark Solution also
Vishal Vachhani 56
spray
Entry
obj:17 man:9
solution
mod:5 mod:5
percent
qua:5
5
Neemark
also
Output : 5 percent Neemark Solution also spray
Output: 5 percent Neemark solution also spray 5 प्रति�श� ने�मअके� घो�ल भी� छि�ड़के� | 5 प्रति�श� ने�मअके� घो�ल भी� छि�ड़के� |
Vishal Vachhani 57
Vishal Vachhani 58
Input sentence: Its roots are affected by bacterial infection.
Module Output
UNL parser जड़् प्रभीति�� ज��ण्वि!�के से"क्रमण्� Case marking
Morphology
Syntax Planning
जड़् प्रभीति�� ज��ण्वि!�के से"क्रमण्� से� इसेकी� जड़ें ज��ण्वि!�के प्रभीति�� हो�ती हो� से"क्रमण् से�| ज��ण्वि!�के से"क्रमण् से� इसेके� जड़& प्रभीति�� हो��� हो(|
Output: ज��ण्वि!�के से"क्रमण् से� इसेके� जड़& प्रभीति�� हो��� हो(|
Input Its roots are affected by bacterial infection.
UNL 2005 Specifications: http://www.undl.org/unlsys/unl/unl2005/
S.Singh, M.Dalal, V.Vachhani, P.Bhattacharrya and O.Damani “Hindi generation from interlingua” MTsummit 2007
(www.cse.iitb.ac.in/~vishalv) Mrugank Surve, Sarvjeet Singh, Satish Kagathara,
Venkatasivaramasastry K, Sunil Dubey, Gajanan Rane, Jaya Saraswati, Salil Badodekar, Akshay Iyer, Ashish Almeida, Roopali Nikam, Carolina Gallardo Perez, Pushpak Bhattacharyya, AgroExplorer Group: AgroExplorer: a Meaning Based Multilingual Search Engine, International Conference on Digital Libraries (ICDL), New Delhi, India, Feb 2004.
Agro Explorer : http://agro.mlasia.iitb.ac.in aAQUA : http://www.aaqua.org
Vishal Vachhani 59