Titan atmosphere and evolution Chris McKay NASA Ames chris.mckay@nasa
Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual...
-
Upload
shavonne-davis -
Category
Documents
-
view
214 -
download
1
Transcript of Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual...
Semantic Web in ActionOntology-driven information search, integration and analysis
NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004
Amit Sheth Semagix, Inc. and LSDIS Lab, University of Georgia
Some actual caveats that have been expressed by researchers
“As a constituent technology, ontology work of this sort is defensible. As the basis for programmatic research and implementation, it is a speculative and immature technology of uncertain promise.”
“Users will be able to use programs that can understand semantics of the data to help them answer complex questions … This sort of hyperbole is characteristic of much of the genre of semantic web conjectures, papers, and proposals thus far. It is reminiscent of the AI hype of a decade ago and practical systems based on these ideas are no more in evidence now than they were then.”
“Such research is fashionable at the moment, due in part to support from defense agencies, in part because the Web offers the first distributed environment that makes even the dream seem tractable.”
“It (proposed research in Semantic Web) pre-supposes the availability of semantic information extracted from the base documents -an unsolved problem of many years, …”
“Google has shown that huge improvements in search technology can be made without understanding semantics. Perhaps after a certain point, semantics are needed for further improvements, but a better argument is needed.”
Oblivious to recent progress? Lack of pragmatic approach?Narrow vision of semantics and approaches to semantics?
Paradigm shift over time: Syntax -> Semantics
Increasing sophistication in applying semantics Relevant Information (Semantic Search & Browsing) Semantic Information Interoperability and Integration Semantic Correlation/Association, Analysis, Early Warning
Empirical observations based on real-world efforts (presented in the Data Engineering paper)
Applications validate the importance of ontology in the current semantic approaches.
Ontology population is critical.
Two of the most fundamental “semantic” techniques are named entity, and semantic ambiguity resolution.
Semi-formal ontologies that may be based on limited expressive power are most practical and useful. Formal or semi-formal ontologies represented in very expressive languages (compared to moderately expressive ones) have in practice, yielded little value in real-world applications.
Empirical observations based on real-world efforts (presented in the Data Engineering paper) … continued
Large scale (also high quality) metadata extraction and semantic annotation is possible.
Support for heterogeneous data is key – it is too hard to deploy separate products within a single enterprise to deal with structured and unstructured data/content management.
Semantic query processing with the ability to query both ontology and metadata to retrieve heterogeneous content is highly valuable.
A vast majority of the Semantic (Web) applications that have been developed or envisioned rely on three crucial capabilities namely ontology creation, semantic annotation and querying/inferencing.
Ontology at the heart of the Semantic Web
Ontology provides underpinning for semantic techniques in information systems.
A model/representation of the real world (relevant set of interconnected concepts, entities, attributes, relationships, domain vocabulary and factual knowledge).
Basis of capturing agreement, and of applying knowledge
Enabler for improved information systems functionalities and the Semantic Web
Ontology = Schema (Description) + Knowledge Base (Description Base)
i.e, both T-nodes and A-nodes
Gen. Purpose,Broad Based
Scope of AgreementTask/ App
Domain Industry
CommonSense
Degre
e o
f Ag
reem
en
t
Info
rmal
Sem
i-Form
al
Form
al
Agreement About
Data/Info.
Function
Execution
Qos
Broad Scope of Semantic (Web) Technology
Oth
er d
ime
nsio
ns:
how
ag
reem
ents
are
re
ach
ed
,…
Current Semantic Web Focus
Semantic Web Processes
Lots of Useful
SemanticTechnology
(interoperability,Integration)
Cf: Guarino, Gruber
Types of Ontologies (or things close to ontology)
Upper ontologies: modeling of time, space, process, etc
Broad-based or general purpose ontology/nomenclatures: Cyc, CIRCA ontology (Applied Semantics), WordNet
Domain-specific or Industry specific ontologies News: politics, sports, business, entertainment
Financial Market
Terrorism
(GO (a nomenclature), UMLS inspired ontology, …)
Application Specific and Task specific ontologies Anti-money laundering
Equity Research
Ontology-driven Information Systems are becoming reality
Software and practical tools to support key capabilities and requirements for such a system are now available:
Ontology creation and maintenance
Knowledge-based (and other techniques) supporting Automatic Classification
Ontology-driven Semantic Metadata Extraction/Annotation
Semantic applications utilizing semantic metadata and ontology
Achieved in the context of successful technology transfer from academic research
(LSDIS lab, UGA’s SCORE technology) into commercial product (Semagix’s Freedom)
Practical Experiences on Ontology Management today
What types of ontologies are needed and developed for semantic applications today?
Is there a typical ontology?
How are such ontologies built?
Who builds them? How long it takes? How are ontologies maintained?
People (expertise), time, money
How large ontologies become (scalability)?
How are ontologies used and what are computational issues?
Semagix Freedom Architecture (a platform for building ontology-driven information system)
© Semagix, Inc.
Practical Ontology Development Observation by Semagix
Ontologies Semagix has designed:
Few classes to many tens of classes and relationships (types); very small number of designers/knowledge experts; descriptional component (schema) designed with GUI
Hundreds of thousands to several million entities and relationships (instances/assertions/description base)
Few to tens of knowledge sources; populated mostly automatically by knowledge extractors
Primary scientific challenges faced: entity ambiguity resolution and data cleanup
Total effort: few person weeks
SWETO
Semantic Web Technology Evaluation Ontology
“semantic test-bed”
“ontology with large instances dataset”
Public (non-commercial) use ontology built by the LSDIS lab in a NSF funded project using commercial product Semagix Freedom
Creating SWETO
Data Sources Selection
semi-structured format
with interconnected instances
with entities having rich metadata
public and open sources preferred
Big Picture
Ontology /knowledge base
MassiveMetadata
Store
API
Ope
n/pr
oprie
tary
Het
erog
eneo
us D
ata
So
urce
s
documents
databases
Html pages
emails
XMLfeeds
TrustedSources
populates
Semi-structured
data
populates
Knowledge Discovery Algorithms
SWETO WebService Browsing
Statistics
Subset of classes in the ontology # Instances
Cities, countries, and states 2,902
Airports 1,515
Companies, and banks 30,948
Terrorist attacks, and organizations 1,511
Persons and researchers 307,417
Scientific publications 463,270
Journals, conferences, and books 4,256
TOTAL (as of January 2004) 811,819
Statistics …
relationships (subset)
# Explicit relations
located in 30,809
responsible for (event) 1,425
Listed author in 1,045,719
(paper) published in 467,367
Browsing
Entity Disambiguation
Disambiguation type # Times used
Automatic (Freedom) 248,151
Manual 210
Unresolved (Removed) 591
SWETO on the Web
http://lsdis.cs.uga.edu/Projects/SemDis/Sweto/
Ontology (schema) in OWL
Visualization of the ontology
Quality, Size
Sources
API
Access via Web Service (to be released)
Check it out!
http://lsdis.cs.uga.edu/Projects/SemDis/Sweto/
WWW, EnterpriseRepositories
METADATAMETADATA
EXTRACTORSEXTRACTORS
Digital Maps
NexisUPIAPFeeds/
Documents
Digital Audios
Data Stores
Digital Videos
Digital Images. . .
. . . . . .
Create/extract as much (semantics)metadata automatically as possible, from: Any format (HTML, XML, RDB, text, docs)Many mediaPush, pullProprietary, Deep Web, Open Source
Metadata extraction from heterogeneous content/data
Automatic Semantic Annotation of Text:Entity and Relationship Extraction
KB, statistical and linguistic
techniques
Automatic Semantic Annotation
Limited tagging(mostly syntactic)
COMTEX Tagging
Content‘Enhancement’Rich Semantic
Metatagging
Value-added Semagix Semantic Tagging
Value-addedrelevant metatagsadded by Semagixto existing COMTEX tags:
• Private companies • Type of company• Industry affiliation• Sector• Exchange• Company Execs• Competitors
© Semagix, Inc.
• Solution: In-memory semantic querying (semantic querying in
RAM)
• Complex queries involving Ontology and Metadata
• Incremental indexing
• Distributed indexing
• High performance: 10M queries/hr; less than 10ms for typical
search queries
• 2 orders of magnitude faster than RDBMS for complex
analytical queries
• Knowledge APIs provide a Java, JSP or an HTTP-based interface
for querying the Ontology and Metadata
Semantic Query Processing and Analytics
What can current semantic technology do? (sample application)
Semi-automated (mostly automated) annotation of resources of various, heterogeneous sources (unstructured%, semi-structured, structured data; media content)*
Creation of large knowledge bases (ontology population) from the trusted sources *
Semantic Search and Browsing @, &
Unified access to multiple sources (semantic integration) *,#
Inferenceing #
Relationship/knowledge discovery among the annotated resources and entities; analytics*%
Both implicit^ and explicit* relationships
* Commercial: Semagix; @ Commercial: Taalee Semantic Search (Sheth)# Commercial: Network Inference; %: Near-commercial: IBM/SemTAP& IBM: McCool, Guha, ^ LSDIS-UGA Research
BLENDED BROWSING & QUERYING INTERFACEBLENDED BROWSING & QUERYING INTERFACE
ATTRIBUTE & KEYWORDQUERYING
ATTRIBUTE & KEYWORDQUERYING
uniform view of worldwide distributed assets of similar type
SEMANTIC BROWSINGSEMANTIC BROWSING
Targeted e-shopping/e-commerce
assets access
VideoAnywhere and Taalee Semantic Search Engine
Focused relevantcontent
organizedby topic
(semantic categorization)
Automatic ContentAggregationfrom multiple
content providers and feeds
Related relevant content not
explicitly asked for (semantic
associations)
Competitive research inferred
automatically
Automatic 3rd party content
integration
Equity Research Dashboard with Blended Semantic Querying and Browsing
Blended Semantic Browsing and Querying (Intelligence Analyst Workbench)
Application to semantic analysis/intelligence
Documentary content and factual evidence are integrated semantically via semantic metadata Intelligence sub-domain ontology
Group
Alias
Person
Country
Bank Account in
Has alias
Has email
Involved in
Occurred at
Works for/leads
Location Time
Email Add
Event
Occurred at
Originated in
Is funded by/works with
Watch-list
Appears on Watch-list
Appears on
Has position
RoleCocaine scandal sets society hearts fluttering Investigators are attempting to establish whether the suspect, Palermo businessman Alessandro Martello, was bluffing when he claimed to work as Mr Micciche's assistant and to have the use of an office in the ministry's Rome headquarters.
Mr Martello's arrest warrant, signed last week along with 10 others, alleged that he had not hesitated to deliver a consignment of cocaine inside the ministry itself, confident in the knowledge that his influential connections would protect him from suspicion.
The cocaine scandal has been a gift for the opposition, which promptly tabled a parliamentary question for the economics minister, Giulio Tremonti, asking how many times the alleged pusher had visited his ministry and whether it was true that he had the use of an office there.
The minister has yet to reply, but it has emerged that Mr Martello's frequent visits were the result of his work as a consultant for a company promoting investment in southern Italy. "What he does in his private life has nothing to do with us," his now ex-employer said.
The businessman, who is now in prison, asked the junior minister to intercede on his behalf with a bank where he was having trouble opening an account. Mr Micciche, addressed familiarly as "Gianfrancuccio", said he would see what he could do.
Prime minister Silvio Berlusconi is already engaged in an extenuating personal battle with Milan's anti-corruption magistrates, so the alleged drug entanglements of a junior Sicilian minister are the last thing he needs. Italian government Italian parliament Italian president Italian government
Classification Metadata: Cocaine seizure investigation
Semantic Metadata extracted from the article:Person is “Giulio Tremonti”Position of “Giulio Tremonti” is “Economics Minister”“Guilio Tremonti” appears on Watchlist “PEP”Group is Political party “Integrali”“Integrali” is the “Italian Government”“Italian Government” is based in “Rome”
Corroborating Evidence
Corroborating Evidence
Evidence
© Semagix, Inc.
Mechanisms for querying about and retrieving complex relationships between entities.
AB
C
1. A is related to B by x.y.z
x y
z
z’
y’ 2. A is related to C by i. x.y’.z’
u
v
ii. u.v (undirected path)
3. A is “related similarly” to B as it is to C (y’ y and z’ z x.y.z x.y’.z’) So are B and C related?
?
Semantic Associations: Beyond simple relationships
Context: Why, What, How?
Context => Relevance; Reduction in computation space
Context captures the users’ interest to provide the user with the relevant knowledge within numerous relationships between the entities
By defining regions (or sub-graphs) of the ontology we are capturing the areas of interest of the user
Context Weight - Example
Region1: Financial Domain, weight=0.50
Region2: Terrorist Domain, weight=0.75
e7:TerroristOrganization
e4:TerroristOrganization
e8:TerroristAttack
e6:FinancialOrganization
e2:FinancialOrganization
e1:Person
e9:Location
e5:Person
friend Of
member Of
located In
e3:Organization supportshas Account
located Inworks For
member Of
involved In
at location
Anti Money Laundering – Know Your Customer
Risk Profiles are developed for individuals or companies. If therisk profile changes based onnew information the individuals Risk Profile and Branch Aggregate Risk Profile is automatically updated
R
View Risk Scores for a specific company or customer
Semantic Querying, Browsing, Integration to find potential antifungal drug targets
Databases fordifferent organisms
Is this or similar gene in other organism? (most Antifungals are associated with Sterol mechanism)
Services: BLAST, Co-expression analysis, Phylogeny; If annotated, directly access DB, else use BLAST to normalize
FGDB
Future: Geospatial Semantic Analytics: Thematic + Space + Time
Future: 3D Semantic Geo-visualization of Movements in Space and Time
Conclusion
Great progress from work in semantic information interoperability/integration of early 90s until now, re-energized by the vision of Semantic Web, related standards and technological advances
Technology beyond proof of concept
But lots of difficult research and engineering challenges ahead More:
(Technology) http://www.semagix.com/downloads/downloads.shtml(Research) http://lsdis.cs.uga.edu/proj/SemDis/
Demos available