Semantic Search: different meanings
description
Transcript of Semantic Search: different meanings
![Page 1: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/1.jpg)
Semantic Search: different meanings
![Page 2: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/2.jpg)
Semantic search: different meanings
• Definition 1: Semantic search as the problem of searching documents beyond the syntactic level of matching keywords– Hakia, PowerSet, SearchMonkey
• Definition 2: Semantic search as the problem of searching large semantic web datasets– Watson, PowerAqua, Swoogle, Sindice, SWSE
![Page 3: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/3.jpg)
Facing keyword-based search problems
• Relations between search terms: – “books about recommender systems” vs. “systems that
recommend books”• Polisemy
– “mouth” as part of the body vs. “mouth” as part of a stream
• Synonymy– “movies” vs. “films”
• Documents about individuals where query keywords do not appear: – “English banks”, individual “Abbey”
![Page 4: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/4.jpg)
Several attempts from the IR community
• Early 80s: elaboration of conceptual frameworks and their introduction in IR models– Taxonomies (categories + hierarchical relations) ,
e.g., The ODP (Open Directory Project)– Thesaurus (categories + fixed hierarchical &
associative relations), e.g., WordNet (used by linguistic approaches)
– Algebraic methods such as LSA • Limitations: The level of conceptualization is
often shallow (specially at the level of relations)
![Page 5: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/5.jpg)
The emergence of the SW
• Late 90s: introduction of ontologies as conceptual framework (classes + instances (KBs) + arbitrary semantic relations + rules) – Semantic search: Exploiting ontologies as a richer
conceptualizations & formal languages to enhance traditional keyword-based document retrieval
– Semantic search: Need to search this emergent and continuously growing structured information space (the Web of Data)
• DPLP, Geonames, DBPedia, BBC Music,... (http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets)
![Page 6: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/6.jpg)
The Web of Data 2007
2008 2009
Extracted from: Linked Data Tutorial (Florianópolis) http://www.slideshare.net/ocorcho/linked-data-tutorial-florianpolis
![Page 7: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/7.jpg)
LOD cloud May 2007
Figure from [4]
Facts:• Focal points:
• DBPedia: RDFized vesion of Wikipiedia; many ingoing and outgoing links
• Music-related datasets• Big datasets include FOAF, US Census data• Size approx. 1 billion triples, 250k links
Extracted from: Linked Data Tutorial (Florianópolis) http://www.slideshare.net/ocorcho/linked-data-tutorial-florianpolis
![Page 8: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/8.jpg)
LOD cloud September 2008
Facts:• More than 35 datasets interlinked• Commercial players joined the cloud, e.g.,
BBC• Companies began to publish and host
dataset, e.g. OpenLink, Talis, or Garlik.• Size approx. 2 billion triples, 3 million links
Extracted from: Linked Data Tutorial (Florianópolis) http://www.slideshare.net/ocorcho/linked-data-tutorial-florianpolis
![Page 9: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/9.jpg)
LOD cloud March 2009
Facts:• Big part from Linking Open Drug cloud and the
BIO2RDF project• Notable new datasets: Freebase, OpenCalais,
ACM/IEEE• Size > 10 billion triples
Extracted from: Linked Data Tutorial (Florianópolis) http://www.slideshare.net/ocorcho/linked-data-tutorial-florianpolis
![Page 10: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/10.jpg)
The LOD clouds
Extracted from: Linked Data Tutorial (Florianópolis) http://www.slideshare.net/ocorcho/linked-data-tutorial-florianpolis
![Page 11: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/11.jpg)
Commercial interest by publishers
![Page 12: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/12.jpg)
Commercial interest by search engines
• 2007 Yahoo! Presents Search Monkey
![Page 13: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/13.jpg)
Commercial interest by search engines
• July-2008 Microsoft buys Powerset
![Page 14: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/14.jpg)
Commercial interest by search engines
• April 2010 Facebook announced the use of the Open Graph protocol
![Page 15: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/15.jpg)
Commercial interest by search engines
• May-2009 Google announces Rich Snippets and it’s official use of RDFa and Microformats
![Page 16: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/16.jpg)
Commercial interest by search engines
• July-2010 Google buys Metaweb (the company behind FreeBase)
![Page 17: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/17.jpg)
Commercial interest by search engines• November-2010 Google announced the
support of the GoodRelations vocabulary for Google Rich Snippets.
![Page 18: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/18.jpg)
Challenges
• Exploiting this new information space for semantic search purposes opens new research challenges:– Scalability– Heterogeneity– Uncertainty
![Page 19: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/19.jpg)
Scalability
Effective exploitation of the linked data requires infrastructure that scales to a large and ever growing collection of interlinked data!
![Page 20: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/20.jpg)
Heterogeneity
Dbpedia:Rudi_Studer
Dblp:Studer:Rudi.html
SW:/en/rudi_studer
Dblp:~ley/db/../author
SW:Person
Dbpedia:Professor
SCHEMA-LEVEL DATA-LEVEL
Align Reconcile,Combine
Effective exploitation of the data web requires an effective mechanism for • finding the relevant data sources• integrating data sources• combining elements from different data sources
![Page 21: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/21.jpg)
Uncertainty
• Incomplete Representation of User’s Needs and content meanings– User cannot completely specify the need – The semantic information in the search space is
incompleteEffective exploitation requires• match user’s needs to data in an imprecise way • rank the results• be flexible enough to adjust to changes in constraints!
“Find action films directed by some Hong Kong film director and starring Chinese martial actors”
![Page 22: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/22.jpg)
The Search Space: different representations
![Page 23: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/23.jpg)
The search space: different representations
• Unstructured search space– The Web of documents (textual and multimedia
content)• Structured search space
– The Web of data (ontologies + Knowledge Bases)• Hybrid search space
– Unstructured content is enriched with metadata• Embedded annotations • Not embedded annotations
![Page 24: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/24.jpg)
The unstructured search space
• The Web of human-understandable content.• The Web of documents and links
– <a href="http://creativecommons.org/licenses/by/3.0/">CC License</a>
Documents
Searchspace
![Page 25: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/25.jpg)
Search engines
![Page 26: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/26.jpg)
The structured search space• The Web of machine understandable content.• The Web of objects and relations
– <a rel="license" href="http://creativecommons.org/licenses/by/3.0/"> Creative Commons License </a>
objects
Searchspace
![Page 27: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/27.jpg)
Search engines
![Page 28: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/28.jpg)
The hybrid search space
• Enriching documents with metadata
Objects
Documents
How to interlink documents and data?
Searchspace
![Page 29: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/29.jpg)
Two ways of interlinking metadata and documents
• Information Extraction• By relying on Web publishers
– More on the section Data on the (Semantic) Web
![Page 30: Semantic Search: different meanings](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165d6550346895dd8e2d0/html5/thumbnails/30.jpg)
Search engines