Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual...

39
Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth Semagix , Inc. and LSDIS Lab , University of Georgia

Transcript of Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual...

Page 1: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Semantic Web in ActionOntology-driven information search, integration and analysis

NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004

Amit Sheth Semagix, Inc. and LSDIS Lab, University of Georgia

Page 2: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Some actual caveats that have been expressed by researchers

“As a constituent technology, ontology work of this sort is defensible. As the basis for programmatic research and implementation, it is a speculative and immature technology of uncertain promise.”

“Users will be able to use programs that can understand semantics of the data to help them answer complex questions … This sort of hyperbole is characteristic of much of the genre of semantic web conjectures, papers, and proposals thus far. It is reminiscent of the AI hype of a decade ago and practical systems based on these ideas are no more in evidence now than they were then.”

“Such research is fashionable at the moment, due in part to support from defense agencies, in part because the Web offers the first distributed environment that makes even the dream seem tractable.”

“It (proposed research in Semantic Web) pre-supposes the availability of semantic information extracted from the base documents -an unsolved problem of many years, …”

“Google has shown that huge improvements in search technology can be made without understanding semantics. Perhaps after a certain point, semantics are needed for further improvements, but a better argument is needed.”

Oblivious to recent progress? Lack of pragmatic approach?Narrow vision of semantics and approaches to semantics?

Page 3: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Paradigm shift over time: Syntax -> Semantics

Increasing sophistication in applying semantics Relevant Information (Semantic Search & Browsing) Semantic Information Interoperability and Integration Semantic Correlation/Association, Analysis, Early Warning

Page 4: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Empirical observations based on real-world efforts (presented in the Data Engineering paper)

Applications validate the importance of ontology in the current semantic approaches.

Ontology population is critical.

Two of the most fundamental “semantic” techniques are named entity, and semantic ambiguity resolution.

Semi-formal ontologies that may be based on limited expressive power are most practical and useful. Formal or semi-formal ontologies represented in very expressive languages (compared to moderately expressive ones) have in practice, yielded little value in real-world applications.

Page 5: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Empirical observations based on real-world efforts (presented in the Data Engineering paper) … continued

Large scale (also high quality) metadata extraction and semantic annotation is possible.

Support for heterogeneous data is key – it is too hard to deploy separate products within a single enterprise to deal with structured and unstructured data/content management.

Semantic query processing with the ability to query both ontology and metadata to retrieve heterogeneous content is highly valuable.

A vast majority of the Semantic (Web) applications that have been developed or envisioned rely on three crucial capabilities namely ontology creation, semantic annotation and querying/inferencing.

Page 6: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Ontology at the heart of the Semantic Web

Ontology provides underpinning for semantic techniques in information systems.

A model/representation of the real world (relevant set of interconnected concepts, entities, attributes, relationships, domain vocabulary and factual knowledge).

Basis of capturing agreement, and of applying knowledge

Enabler for improved information systems functionalities and the Semantic Web

Ontology = Schema (Description) + Knowledge Base (Description Base)

i.e, both T-nodes and A-nodes

Page 7: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Gen. Purpose,Broad Based

Scope of AgreementTask/ App

Domain Industry

CommonSense

Degre

e o

f Ag

reem

en

t

Info

rmal

Sem

i-Form

al

Form

al

Agreement About

Data/Info.

Function

Execution

Qos

Broad Scope of Semantic (Web) Technology

Oth

er d

ime

nsio

ns:

how

ag

reem

ents

are

re

ach

ed

,…

Current Semantic Web Focus

Semantic Web Processes

Lots of Useful

SemanticTechnology

(interoperability,Integration)

Cf: Guarino, Gruber

Page 8: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Types of Ontologies (or things close to ontology)

Upper ontologies: modeling of time, space, process, etc

Broad-based or general purpose ontology/nomenclatures: Cyc, CIRCA ontology (Applied Semantics), WordNet

Domain-specific or Industry specific ontologies News: politics, sports, business, entertainment

Financial Market

Terrorism

(GO (a nomenclature), UMLS inspired ontology, …)

Application Specific and Task specific ontologies Anti-money laundering

Equity Research

Page 9: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Ontology-driven Information Systems are becoming reality

Software and practical tools to support key capabilities and requirements for such a system are now available:

Ontology creation and maintenance

Knowledge-based (and other techniques) supporting Automatic Classification

Ontology-driven Semantic Metadata Extraction/Annotation

Semantic applications utilizing semantic metadata and ontology

Achieved in the context of successful technology transfer from academic research

(LSDIS lab, UGA’s SCORE technology) into commercial product (Semagix’s Freedom)

Page 10: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Practical Experiences on Ontology Management today

What types of ontologies are needed and developed for semantic applications today?

Is there a typical ontology?

How are such ontologies built?

Who builds them? How long it takes? How are ontologies maintained?

People (expertise), time, money

How large ontologies become (scalability)?

How are ontologies used and what are computational issues?

Page 11: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Semagix Freedom Architecture (a platform for building ontology-driven information system)

© Semagix, Inc.

Page 12: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Practical Ontology Development Observation by Semagix

Ontologies Semagix has designed:

Few classes to many tens of classes and relationships (types); very small number of designers/knowledge experts; descriptional component (schema) designed with GUI

Hundreds of thousands to several million entities and relationships (instances/assertions/description base)

Few to tens of knowledge sources; populated mostly automatically by knowledge extractors

Primary scientific challenges faced: entity ambiguity resolution and data cleanup

Total effort: few person weeks

Page 13: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

SWETO

Semantic Web Technology Evaluation Ontology

“semantic test-bed”

“ontology with large instances dataset”

Public (non-commercial) use ontology built by the LSDIS lab in a NSF funded project using commercial product Semagix Freedom

Page 14: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Creating SWETO

Data Sources Selection

semi-structured format

with interconnected instances

with entities having rich metadata

public and open sources preferred

Page 15: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Big Picture

Ontology /knowledge base

MassiveMetadata

Store

API

Ope

n/pr

oprie

tary

Het

erog

eneo

us D

ata

So

urce

s

documents

databases

Html pages

emails

XMLfeeds

TrustedSources

populates

Semi-structured

data

populates

Knowledge Discovery Algorithms

SWETO WebService Browsing

Page 16: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Statistics

Subset of classes in the ontology # Instances

Cities, countries, and states 2,902

Airports 1,515

Companies, and banks 30,948

Terrorist attacks, and organizations 1,511

Persons and researchers 307,417

Scientific publications 463,270

Journals, conferences, and books 4,256

TOTAL (as of January 2004) 811,819

Page 17: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Statistics …

relationships (subset)

# Explicit relations

located in 30,809

responsible for (event) 1,425

Listed author in 1,045,719

(paper) published in 467,367

Page 18: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Browsing

Page 19: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Entity Disambiguation

Disambiguation type # Times used

Automatic (Freedom) 248,151

Manual 210

Unresolved (Removed) 591

Page 20: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

SWETO on the Web

http://lsdis.cs.uga.edu/Projects/SemDis/Sweto/

Ontology (schema) in OWL

Visualization of the ontology

Quality, Size

Sources

API

Access via Web Service (to be released)

Check it out!

http://lsdis.cs.uga.edu/Projects/SemDis/Sweto/

Page 21: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

WWW, EnterpriseRepositories

METADATAMETADATA

EXTRACTORSEXTRACTORS

Digital Maps

NexisUPIAPFeeds/

Documents

Digital Audios

Data Stores

Digital Videos

Digital Images. . .

. . . . . .

Create/extract as much (semantics)metadata automatically as possible, from: Any format (HTML, XML, RDB, text, docs)Many mediaPush, pullProprietary, Deep Web, Open Source

Metadata extraction from heterogeneous content/data

Page 22: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Automatic Semantic Annotation of Text:Entity and Relationship Extraction

KB, statistical and linguistic

techniques

Page 23: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Automatic Semantic Annotation

Limited tagging(mostly syntactic)

COMTEX Tagging

Content‘Enhancement’Rich Semantic

Metatagging

Value-added Semagix Semantic Tagging

Value-addedrelevant metatagsadded by Semagixto existing COMTEX tags:

• Private companies • Type of company• Industry affiliation• Sector• Exchange• Company Execs• Competitors

© Semagix, Inc.

Page 24: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

• Solution: In-memory semantic querying (semantic querying in

RAM)

• Complex queries involving Ontology and Metadata

• Incremental indexing

• Distributed indexing

• High performance: 10M queries/hr; less than 10ms for typical

search queries

• 2 orders of magnitude faster than RDBMS for complex

analytical queries

• Knowledge APIs provide a Java, JSP or an HTTP-based interface

for querying the Ontology and Metadata

Semantic Query Processing and Analytics

Page 25: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

What can current semantic technology do? (sample application)

Semi-automated (mostly automated) annotation of resources of various, heterogeneous sources (unstructured%, semi-structured, structured data; media content)*

Creation of large knowledge bases (ontology population) from the trusted sources *

Semantic Search and Browsing @, &

Unified access to multiple sources (semantic integration) *,#

Inferenceing #

Relationship/knowledge discovery among the annotated resources and entities; analytics*%

Both implicit^ and explicit* relationships

* Commercial: Semagix; @ Commercial: Taalee Semantic Search (Sheth)# Commercial: Network Inference; %: Near-commercial: IBM/SemTAP& IBM: McCool, Guha, ^ LSDIS-UGA Research

Page 26: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

BLENDED BROWSING & QUERYING INTERFACEBLENDED BROWSING & QUERYING INTERFACE

ATTRIBUTE & KEYWORDQUERYING

ATTRIBUTE & KEYWORDQUERYING

uniform view of worldwide distributed assets of similar type

SEMANTIC BROWSINGSEMANTIC BROWSING

Targeted e-shopping/e-commerce

assets access

VideoAnywhere and Taalee Semantic Search Engine

Page 27: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.
Page 28: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Focused relevantcontent

organizedby topic

(semantic categorization)

Automatic ContentAggregationfrom multiple

content providers and feeds

Related relevant content not

explicitly asked for (semantic

associations)

Competitive research inferred

automatically

Automatic 3rd party content

integration

Equity Research Dashboard with Blended Semantic Querying and Browsing

Page 29: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Blended Semantic Browsing and Querying (Intelligence Analyst Workbench)

Page 30: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Application to semantic analysis/intelligence

Documentary content and factual evidence are integrated semantically via semantic metadata Intelligence sub-domain ontology

Group

Alias

Person

Country

Bank Account in

Has alias

Has email

Involved in

Occurred at

Works for/leads

Location Time

Email Add

Event

Occurred at

Originated in

Is funded by/works with

Watch-list

Appears on Watch-list

Appears on

Has position

RoleCocaine scandal sets society hearts fluttering Investigators are attempting to establish whether the suspect, Palermo businessman Alessandro Martello, was bluffing when he claimed to work as Mr Micciche's assistant and to have the use of an office in the ministry's Rome headquarters.

Mr Martello's arrest warrant, signed last week along with 10 others, alleged that he had not hesitated to deliver a consignment of cocaine inside the ministry itself, confident in the knowledge that his influential connections would protect him from suspicion.

The cocaine scandal has been a gift for the opposition, which promptly tabled a parliamentary question for the economics minister, Giulio Tremonti, asking how many times the alleged pusher had visited his ministry and whether it was true that he had the use of an office there.

The minister has yet to reply, but it has emerged that Mr Martello's frequent visits were the result of his work as a consultant for a company promoting investment in southern Italy. "What he does in his private life has nothing to do with us," his now ex-employer said.

The businessman, who is now in prison, asked the junior minister to intercede on his behalf with a bank where he was having trouble opening an account. Mr Micciche, addressed familiarly as "Gianfrancuccio", said he would see what he could do.

Prime minister Silvio Berlusconi is already engaged in an extenuating personal battle with Milan's anti-corruption magistrates, so the alleged drug entanglements of a junior Sicilian minister are the last thing he needs. Italian government Italian parliament Italian president Italian government

Classification Metadata: Cocaine seizure investigation

Semantic Metadata extracted from the article:Person is “Giulio Tremonti”Position of “Giulio Tremonti” is “Economics Minister”“Guilio Tremonti” appears on Watchlist “PEP”Group is Political party “Integrali”“Integrali” is the “Italian Government”“Italian Government” is based in “Rome”

Corroborating Evidence

Corroborating Evidence

Evidence

© Semagix, Inc.

Page 31: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Mechanisms for querying about and retrieving complex relationships between entities.

AB

C

1. A is related to B by x.y.z

x y

z

z’

y’ 2. A is related to C by i. x.y’.z’

u

v

ii. u.v (undirected path)

3. A is “related similarly” to B as it is to C (y’ y and z’ z x.y.z x.y’.z’) So are B and C related?

?

Semantic Associations: Beyond simple relationships

Page 32: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Context: Why, What, How?

Context => Relevance; Reduction in computation space

Context captures the users’ interest to provide the user with the relevant knowledge within numerous relationships between the entities

By defining regions (or sub-graphs) of the ontology we are capturing the areas of interest of the user

Page 33: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Context Weight - Example

Region1: Financial Domain, weight=0.50

Region2: Terrorist Domain, weight=0.75

e7:TerroristOrganization

e4:TerroristOrganization

e8:TerroristAttack

e6:FinancialOrganization

e2:FinancialOrganization

e1:Person

e9:Location

e5:Person

friend Of

member Of

located In

e3:Organization supportshas Account

located Inworks For

member Of

involved In

at location

Page 34: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Anti Money Laundering – Know Your Customer

Risk Profiles are developed for individuals or companies. If therisk profile changes based onnew information the individuals Risk Profile and Branch Aggregate Risk Profile is automatically updated

R

Page 35: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

View Risk Scores for a specific company or customer

Page 36: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Semantic Querying, Browsing, Integration to find potential antifungal drug targets

Databases fordifferent organisms

Is this or similar gene in other organism? (most Antifungals are associated with Sterol mechanism)

Services: BLAST, Co-expression analysis, Phylogeny; If annotated, directly access DB, else use BLAST to normalize

FGDB

Page 37: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Future: Geospatial Semantic Analytics: Thematic + Space + Time

Page 38: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Future: 3D Semantic Geo-visualization of Movements in Space and Time

Page 39: Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Conclusion

Great progress from work in semantic information interoperability/integration of early 90s until now, re-energized by the vision of Semantic Web, related standards and technological advances

Technology beyond proof of concept

But lots of difficult research and engineering challenges ahead More:

(Technology) http://www.semagix.com/downloads/downloads.shtml(Research) http://lsdis.cs.uga.edu/proj/SemDis/

Demos available