Consortium Project on Development of Dravidian WordNet: An Integrated WordNet for Telugu, Tamil,...

25
Consortium Project on Consortium Project on Development of Dravidian Development of Dravidian WordNet: WordNet: An Integrated WordNet for An Integrated WordNet for Telugu, Tamil, Kannada and Telugu, Tamil, Kannada and Malayalam Malayalam

Transcript of Consortium Project on Development of Dravidian WordNet: An Integrated WordNet for Telugu, Tamil,...

Consortium Project on Consortium Project on Development of Dravidian WordNet: Development of Dravidian WordNet:

An Integrated WordNet for An Integrated WordNet for Telugu, Tamil, Kannada and MalayalamTelugu, Tamil, Kannada and Malayalam

ObjectiveObjective• Develop an integrated WordNet in four major Dravidian

languages, viz. Tamil, Telugu, Kannada and Malayalamo Linked with Hindi and English WordNets

30-April-20132 PRSG Meeting

Hindi

English

Malayalam Kannada

Telugu Tamil

Consortium MembersConsortium Members• Consortium Leader▫ Prof. Pushpak Bhattacharya, IIT Bombay

•Consortium Members▫ Dr. S. Baskaran, Tamil University (Tamil)▫ Prof. K.P.Soman, Amrita Viswa Vidyapeetham (Malayalam)▫ Prof. C.S.Ramachandra, University of Mysore (Kannada)▫ Dr. S. Arulmozi, Dravidian University (Co-Consortium

Leader & Telugu)

30-April-20132 PRSG Meeting

Project DetailsProject Details• Total Outlay of the Project:

o 150.43 lakhs

• Date of Commencement: o 26 Dec 2011

• Duration of the Project: o 24 months

30-April-20132 PRSG Meeting

Project DeliverablesProject Deliverables•The integrated Dravidian WordNet will be linked

with Hindi and English WordNets, with which the users will be able to ▫Look up their language specific words to obtain lexico-

semantic relations like synonymy, hypernymy, meronymy etc.

▫Query for cross-lingual lexical information ▫Design and implement complex natural language

applications like machine translation and cross-lingual search

30-April-20132 PRSG Meeting

Organization and Organization and Distribution of TasksDistribution of Tasks

•IIT-B▫Overall Coordination of the project▫providing guidance on the architecture and

technology▫making available existing tools and interfaces▫Computational tasks; algorithms on WordNets

30-April-20132 PRSG Meeting

Organization & Distribution of TasksOrganization & Distribution of Tasks•Other Partners▫20000 synsets creation ▫Validation of synsets▫Adaptation of semantic relations and validation

(each in Tamil, Telugu, Malayalam and Kannada)

30-April-20132 PRSG Meeting

Tamil WordNetTamil WordNet•Commencement Date: 24 April 2012•Principal Investigator: Dr.S.Baskaran•Senior Linguist

▫ G. Vasuki, M.A. M.Phil (Ling.)

•Computer Scientist▫ G.Biju, MCA, M.Phil

•Lexicographers▫ D. Yoga, M.A. M.Phil (Ling), M.A. (Tamil)▫ M. Ramasundari, M.A. M.Phil, Ph.D (Ling.)▫ D. Vinodha, M.A.(Hindi), Dip. In Translation▫ K. Bakkiyaraj, M.A. M.Phil (Ling.)

30-April-20132 PRSG Meeting

Malayalam WordNetMalayalam WordNet• Commencement Date: 24 April 2012• Principal Investigator: Prof.K.P.Soman• Senior Linguist

o N. Rajendran, M.A. Ph.D (Ling.)

• Computer Scientisto K.Krishnakumar, MA, M.Phil, Ph.D (Ling.)

• Lexicographerso S. Veera Alagiri, M.A. M.Phil, Ph.D (Ling)o Jyothi Ratnam, M.A. (Hindi)

30-April-20132 PRSG Meeting

Telugu WordNetTelugu WordNet• Commencement Date: 2 July 2012• Principal Investigator:Dr.S.Arulmozi• Co-PI: Dr.M.C.Kesava Murty• Senior Linguist▫Dr.S.Chandra Kiran, M.A. M.Phil (Tel.) Ph.D (Comp.Lit.)

• Computer Scientist▫T. Swathi, MCA

• Lexicographers▫S. Sravanti, M.A. (Telugu)▫K. Sukanya, M.A. (Telugu)▫K. Sampoorna, M.A. (Telugu)▫N.Silparani, M.A. (Telugu)

30-April-20132 PRSG Meeting

Kannada WordNetKannada WordNet• Commencement Date: 23 July 2012• Principal Investigator: Prof. C.S.Ramachandra• Co-PI: Prof. G.Hemanthakumar• Senior Linguist

o Dr.B.P.Hemananda, M.A. Ph.D (Ling.)

• Lexicographerso Chaya Devi, M.A. Linguisticso R M Ramya, M.A. Kannada

30-April-20132 PRSG Meeting

Status of synset creationStatus of synset creationLanguage Category Total Synsets

UniversalNouns Verbs Adjectives Adverbs

Kannada 4365 252 1016 75 5708Malayalam 3235 497 1399 127 5258Tamil 4376 811 1811 170 7168Telugu 4376 811 1811 170 7168

Pan-IndianKannada 715 48 108 33 904Malayalam 721 192 371 63 1347Tamil 721 192 371 63 1347Telugu 721 192 371 63 1347

30-April-20132 PRSG Meeting

Language Noun Verb Adjective Adverb Total

Kannada 8090 430 1562 133 10215

Malayalam 7487 1143 3060 418 12109

Tamil 5097 2801 5787 442 14127

Telugu 10591 2366 4122 455 17534

Total Synsets DevelopedTotal Synsets Developed

30-April-20132 PRSG Meeting

Includes Pan-Indian, Universal, Remaining Synsets

Status on TasksStatus on Tasks• Synset Creation –

o Pan-Indian, Universal – Completedo Nouns – 40% completedo Verbs – 70 % completedo Adjectives – completedo Adverbs – 70% completed

• Language & Culture Specific synsets – Initiated• Named Entity – to start• Web tool – Telugu is completed, others are in line.

30-April-20132 PRSG Meeting

Manpower TrainedManpower TrainedManpower Number

Consortium Leader 1Co-Consortium Leader 1

Principal Investigator 5

Co-Principal Investigator 2

Project Manager 1Senior Linguist 5Lexicographer 12Computer Scientist 5Total 32

30-April-20132 PRSG Meeting

Equipment PurchasedEquipment PurchasedEquipment Number

Desktop 10

Laptop 11

Scanner 1

Printer 3

Hard Disk 1

Total 26

30-April-20132 PRSG Meeting

Financial DetailsFinancial Details

Sr. No. Name of Institute 1st Year 2nd Year Total

1 IIT Bombay 13.97 13.50 27.47

2DU, Kuppam 16.39 14.35 30.74

3 TU, Tanjavur 16.39 14.35 30.74

4UoM, Mysore 16.39 14.35 30.74

5 AU, Coimbatore 16.39 14.35 30.74

79.53 70.90 150.43Total

30-April-20132 PRSG Meeting

Institute-wise Project Institute-wise Project BudgetBudget

30-April-20132 PRSG Meeting

Head-wise Fund DistributionHead-wise Fund DistributionHead Amount

Capital Equipment 14.25

Consumable Stores 10.00

Manpower74.04

Travel 12.00

Workshop and Training

10.52

Contingencies10.00

Over heads 15%19.62

Total 150.43

30-April-20132 PRSG Meeting

Amount Received & ExpenditureAmount Received & Expenditure(upto 28 Feb 2013)(upto 28 Feb 2013)

Sr. No. Name of Institute

Amount Received

Interest Expenditure Balance

1IIT Bombay 1397000

 989653 407347

2DU, Kuppam

1639000 

1075219 563781

3TU, Thanjavur

163900025042

739294 924748

4UoM, Mysore

1639000 

694046 944954

5

AU, Coimbatore 1639000

280201628322 38698

Total 7953000 53062 5126534 2879528

30-April-20132 PRSG Meeting

Project commenced after 5 months of administrative approval

Man-power DetailsMan-power Details

30-April-20132 PRSG Meeting

Papers PublishedPapers Published• `Tamil WordNet’, Proceedings of the Fifth Global WordNet Conference,

IIT-Bombay, 31 Jan-4 Feb 2010 (S.Rajendran)• `Building a WordNet’ for Dravidian Languages, Proceedings of the Fifth

Global WordNet Conference, IIT-Bombay, 31 Jan-4 Feb 2010 (S.Rajendran, S.Gopakumar, V.Dhanalakshmi)

• `Representation of Kinship in WordNet’, Proceedings of the 9th International Tamil Internet Conference, Coimbatore, 23-27 June 2010 (S.Arulmozi)

• `Polysemy in Tamil and other Indian Languages’, Proceedings of the Fifth Global WordNet Conference, IIT-Bombay, 31 Jan-4 Feb 2010 (S.Arulmozi & Panchanan Mohanty)

• `Telugu WordNet’, Proceedings of the Fifth Global WordNet Conference, IIT-Bombay, 31 Jan-4 Feb 2010 (S.Arulmozi)

• `Augmenting IndoWordNet with Context’ Proceedings of the ICON 2010 (S.Rajendran & S.Arulmozi)

30-April-20132 PRSG Meeting

Workshop conductedWorkshop conducted• First Dravidian WordNet Workshop

o 16-17 March, 2012o Amrita Vishwa Vidyapeetham

• Second Dravidian WordNet Workshopo 5-6 October, 2012o Dravidian University

30-April-20132 PRSG Meeting

Action PlanAction Plan• Hosting Web version• Completion of synset creation• Internal validation of synsets

30-April-20132 PRSG Meeting

Thank you.Thank you.

30-April-20132 PRSG Meeting