Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu...

60
Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid Joint Project of University of Maryland, Baltimore County Bowie State University Sponsored by Department Of Defense

Transcript of Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu...

Page 1: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

Document Ontology Extractor(DOE)

Research Team:Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu

Faculty:Dr.Sadanand Srivastava

Dr.James Gil De Lamadrid

Joint Project of University of Maryland, Baltimore County

Bowie State University

Sponsored byDepartment Of Defense

Page 2: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

OVERVIEW

1. The system takes text documents as its input

2. Performs semantic analysis on these documents

3. Generates useful ontology

4. Represents it graphically

Page 3: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

GOAL

To build an Ontology utilizing

• Statistical methods

• A small amount of user feedback

• Automation

Page 4: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

Normalization

Latent Semantic Indexing(SVD)

Pre-processing

Text Document

Document Ontology

Graph Construction

GUI

Architecture of DOE

Page 5: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

INPUT

•Text documents

Page 6: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

Pre-processing

Read-in text file

Extract meaningful terms

Count their frequencies

Page 7: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

Normalization

Calculate weight of each term using

W i,k = frequency i,k nk

Σ frequency j,k

j=1

Calculate weight of each term using

W i,k = frequency i,k nk

Σ frequency j,k

Page 8: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

Normalization(contd)

Calculate normalized weight using

W i,k w(i,k)

nk

sqrt(Σ w2(j,k))

j=1

Page 9: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

Latent Semantic Indexing(LSI)

Statistical method representing documents by statistically independent concepts

Based on Singular Value Decomposition (SVD)

Page 10: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

Singular Value Decomposition (SVD)

A technique that decomposes a given matrix into three components – U, S and V.

Page 11: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

SVD (contd)

m x n term-document matrix A, of rank r, can be expressed as the product:

A = U * S * VT

U is m x r term matrix

S is r x r diagonal matrix

V is r x n document matrix

Page 12: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

SVD (contd)

Diagonal of S contains singular values of A in the descending order.

Page 13: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

SVD (contd)

A is formed from LSI as follows: A = US * SS * VsT

US - derived from U removing all but the s columns

SS - derived from S removing all but the largest s singular values

VsT - derived from VT removing all but the s corresponding rows

Page 14: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

SVD (contd)

US

VsT

Am x n

Um x r

Sr x r

VT

r x n

SS

Page 15: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

Document Ontology

Build Concept Nodes and Term Nodes

using the document matrix (V) and term matrix (U).

Page 16: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

Building concept nodes from term matrix(U)

A concept node contains information about

• Concept name

• Terms that belong to that concept

• Respective weights of terms in that concept

Page 17: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

Building concept nodes from term matrix(U) (contd)

Naming convention:• Generates automatically

• A hyphenated string of the five most high frequent terms in that concept

Page 18: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

Building concept nodes from term matrix(U) (contd)

A concept node represents a document

Each column in U corresponds to a concept node

Page 19: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

Building term nodes from term matrix(U)

A term node contains information about

• Term name

• Concepts to which it belongs

• Its respective weight in each concept

Page 20: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

Building term nodes from term matrix(U) (contd)

Naming convention:• Generates automatically

• Simply named using the term name

Page 21: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

Building term nodes from term matrix(U) (contd)

A term node represents a term

Each row in U corresponds to a term node

Page 22: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

Graph Construction

A bipartite graph is constructed with concept nodes and term nodes

A concept node is connected to all term nodes that belong to it.

A term node is connected to all concept nodes to which it belongs.

Page 23: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

Graph Construction (contd)

Term 1

Concept 1

Concept 2

Term 2

Term 3

Term 4

Term 5

Page 24: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

Graphical User Interface

(GUI)

Page 25: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

GUI (contd)

GUI consists of• Concepts list• Terms list• Display for bipartite graph• Display for list of files in ontology

Page 26: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.
Page 27: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

GUI

To view terms related to a concept, user selects that concept from concepts list

To view concepts related to a term, user selects that term from terms list

Page 28: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.
Page 29: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

GUI (contd)

To view only terms related to a specific concept:

• Select that concept from concepts list• Select checkbox “Display Selected Ones

Only”

Result:• GUI displays ONLY relations between

selected terms and concepts

Page 30: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

GUI (contd)

To view only concepts related to a term:

• Select that term from terms list• Select checkbox “Display Selected Ones

Only”

Result:• GUI displays ONLY relations between

selected terms and concepts

Page 31: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.
Page 32: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

GUI (contd)

To highlight relationship between a term and a concept:

• Select that term or concept from terms or concepts list

• Click on line connecting term and concept

Page 33: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.
Page 34: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

GUI – File Operations

New

Open

Save

saveAs

Close

Exit

Page 35: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.
Page 36: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

GUI – Ontology Updates

Add

Delete

ChangeSVDThreshold

changeConcThreshold

foldInDoc

defaultBuild

Page 37: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.
Page 38: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

GUI – Ontology Updates

Add:• Click on Add

• Select file to be added from file chooser popup menu

• Choose whether to build now or not

• If yes document is added and displayed

• If no GUI remains unchanged

Page 39: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.
Page 40: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.
Page 41: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.
Page 42: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

GUI – Ontology Updates

Delete:• Click on Delete• Select file to be deleted from file chooser

popup menu• Choose whether to build now or not• If yes document is deleted and displayed• If no GUI remains unchanged

Page 43: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.
Page 44: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.
Page 45: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.
Page 46: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

GUI – Ontology Updates

changeSVDThreshold:• SVDThreshold controls the largest s

singular values that will be selected from S.

• Default value is 70% i.e. only the singular values higher than 70% of the highest singular value are selected

• User can change this default value

Page 47: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.
Page 48: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.
Page 49: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

GUI – Ontology Updates

changeConcThreshold:• Controls the number of terms related to

a concept based upon term weight

• Default value is 70% i.e. only the terms with weights higher than 70% of the highest term weight are selected

• User can change this default value

Page 50: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.
Page 51: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.
Page 52: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

GUI – Ontology Modifications

Rename• Renames a selected concept

DelTerm• Deletes a selected term

Undo• Ignores last modification and returns to the

previous state

Page 53: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.
Page 54: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.
Page 55: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.
Page 56: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.
Page 57: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.
Page 58: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

Future Work

To investigate less expensive methods for adding new documents:

• Fold-In

• SVD update

Page 59: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

Future Work

Fold-In:• A method to add new document(s) to

an existing ontology

• Uses the existing data in document addition process

• Less expensive process than the regular build method

Page 60: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

Acknowledgements

We express our appreciation to

• Department Of Defense• University of Maryland, Baltimore County• Advisors, Bowie State University