The problem of constructing general-purpose semantic search engines
-
Upload
luis-criado -
Category
Technology
-
view
981 -
download
2
Transcript of The problem of constructing general-purpose semantic search engines
Cambios en el Sistema
Departamento de Inteligencia Artificial
Introduction
26 de Junio de 2009
Towards a true Semantic Web
Towards a true semantic search
Conclusions
The problem
The problem of constructing general-purpose semantic search engines
Rafael Martínez Tomás and Luis Criado Fernández
Cambios en el Sistema
Departamento de Inteligencia Artificial
Introduction
26 de Junio de 2009
Towards a true Semantic Web
Towards a true semantic search
Conclusions
The problem
Introduction
Hakia, Powerset and True Knowledge
In September 1998, Tim Berner-Lee published “Semantic Web Road Map” and “What the Semantic Web can represent” [Berners-Lee 1; 1998].
The first version of OWL appeared in 2004 [Smith et al; 2004].
SPARQL in 2006 [Prud'hommeaux et al; 2006].
Therefore, since 2006, we can begin the Semantic WEB construction.
Approximately since 2007
But are these Semantic Web Search engines better?
Do they work? Are the better than google and yahoo?
The purpose of this presentation is to reply to these questions.
Cambios en el Sistema
Departamento de Inteligencia Artificial
Introduction
26 de Junio de 2009
Towards a true Semantic Web
Towards a true semantic search
Conclusions
The problem
Introduction
According to Netcraft [1], there were 216 million web sites in February 2009
[1] http://news.netcraft.com/
Cambios en el Sistema
Departamento de Inteligencia Artificial
Introduction
26 de Junio de 2009
Towards a true Semantic Web
Towards a true semantic search
Conclusions
The problem
Berman conducted a study in which he estimated that the volume of web sites was around 4·109.
static web pages.
This is equivalent in the worst case to 14 million books [Bergman; 2001].
Taking into account, Netcraft´s chart and Bergman´s work, it is deducted that approximately:
635*10correspond to94*10
Considering the same ratio, in February 2009, the estimated static web pages could be:
924,6*10
It has been calculated that the deep web, the web that is generated dynamically form accessing the Data Base content, may have information several hundred times more than the static web, and it is growing at an even greater rate [O O'Neill et al, 2003].
Equivalent to more than 86,4 million books (worst case).
Introduction
static web pages.
Cambios en el Sistema
Departamento de Inteligencia Artificial
Introduction
26 de Junio de 2009
Towards a true Semantic Web
Towards a true semantic search
Conclusions
The problemThe problem
¿What is the actual Web problem?
The volume that already has and its growth expectation.
¿Why?
Every day is more difficult to find accurate information and with quality. Consequently, the user has to devote more time to filter the links provided by search engines.
Moreover, there is much information on the web that is never found. There is more data on the Web than what the current search engines tell us.
Cambios en el Sistema
Departamento de Inteligencia Artificial
Introduction
26 de Junio de 2009
Towards a true Semantic Web
Towards a true semantic search
Conclusions
The problem
¿How can we fix it?With the Semantic Web through use of ontologies and through the population of them.
The Semantic Web has to overcome two major difficulties in order to be implemented.
1st. Ontology learning.The aim is to make ontologies that represent the knowledge of the entire Web. Although there are several proposals such as [Gómez-Pérez et al, 2003], [Valencia Garcia, 2005], [Cimiano et al, 2006], and work on projects that develop this area, such as Neon, SEKT, Dot.Kom, X-Media, Abraxas , ect. , still there is not enough strength for the industry to absorb.
2nd. Ontology population.In this research area the ontologies are already available. Therefore, what we do is to fill them with instances. The aim is to carry out semantic annotations of all web sites according to the use of ontologies. As in “Ontology learning”, current research is directed toward automatic and semi-automatic construction methods. However, actually we are very far from addressing the transformation of the current Web.
The problem
Cambios en el Sistema
Departamento de Inteligencia Artificial
Introduction
26 de Junio de 2009
Towards a true Semantic Web
Towards a true semantic search
Conclusions
The problem
Anotación semántica
Manualsemi- automatizada
es táticos(no aprenden) dinámicos
(aprendizaje)
Des cubrimiento
Inducción
E M B E B I D O
N O E M B E B I D O
Probabilís t ico
Reglas
es tático - mix to
E M B E B I D O
N O E M B E B I D O
E M B E B I D O
N O E M B E B I D O
E M B E B I D O
N O E M B E B I D O
E M B E B I D O
N O E M B E B I D O
Annotation tools
Still these are tools for
investigation. In other words,
none can be used in real
conditions o environment.
The problem
Cambios en el Sistema
Departamento de Inteligencia Artificial
Introduction
26 de Junio de 2009
Towards a true Semantic Web
Towards a true semantic search
Conclusions
The problem
Towards a trueSemantis Web
Compatibility and external annotation
External annotation has the advantage that it will only be used by applications designed to interpret it, thereby avoiding unwanted consequences of any kind for current applications like search engines.
Cambios en el Sistema
Departamento de Inteligencia Artificial
Introduction
26 de Junio de 2009
Towards a true Semantic Web
Towards a true semantic search
Conclusions
The problem
Towards a trueSemantis Web
Automating the external annotation (sw2sws)
E X T R A C C I Ó N yA N Á L I SI SM O R F O SI N T Á C T I C O
< < C lie n t > >< < C lie n t > >
A N O T A C I Ó N ( e na de la n t eI N T E R P E T A C I Ó N )
SE L E C C I O N D EO N T O L O GI A S
( e n a de la n t eI D E N T I F I C A C I Ó N )
... <p align="left">A estos animales les gusta <strong>estar en compañía </strong>por lo que, si se decide adquirir uno, conviene ubicarle en la sala de estar o el salón de la casa. Además, este pájaro puede estar suelto, volando libremente por el hogar. Por último, hay que señalar que el periquito es un ave que se encuentra más cómoda viviendo <strong>en</strong> <strong>pareja</strong> y que es capaz de pronunciar algunas <a href="Wcacadbeca16af.htm" class="txt-facilisimo11"><u>palabras</u></a>.</p> ....
... <oracion sujeto="periquito" verbo="es" CD="ave" CI=" " origen="hay que señalar que el periquito es un ave que se encuentra má"/> ....
<rdf:RDF xmlns:j.0="http://www.criado.info/owl/vertebrados_es.owl#" xmlns:protege="http://protege.stanford.edu/plugins/owl/protege#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns="http://www.criado.info/websejemplo/mascotasyhogar/mascotas/aves/Wcac51f4ad0f89_3F972AA91EEE3451B64A67FCDA136ACB.owl#" xml:base="http://www.criado.info/websejemplo/mascotasyhogar/mascotas/aves/Wcac51f4ad0f89_3F972AA91EEE3451B64A67FCDA136ACB.owl"> <owl:Ontology rdf:about=""> <owl:imports rdf:resource="http://www.criado.info/owl/vertebrados_es.owl#"/> </owl:Ontology> <j.0:aves rdf:ID="PERIQUITO"/> <owl:AllDifferent> <owl:distinctMembers rdf:parseType="Collection"> <aves rdf:about="#PERIQUITO"/> </owl:distinctMembers> </owl:AllDifferent> </rdf:RDF>
Cambios en el Sistema
Departamento de Inteligencia Artificial
Introduction
26 de Junio de 2009
Towards a true Semantic Web
Towards a true semantic search
Conclusions
The problem¿What is a Semantic Search Engine?
It is a search engine that allows to exploit the technology of the Semantic Web. Therefore, it incorporates a degree of understanding on the questions that the user makes.
A Semantic search engine first answers the question, and then provides links to justify its answer. As a result, there two fundamental elements that is needs:
• Semantic information content, for example in OWL (ontologies and instances).
• PLN.
¿Do Hakia, Powerset and True Knowledge work like Semantic Search Engines? ¿Are they better than google or yahoo?
Taking into account this concept of semantic search engine, Haki, Powerset and True Knowledge can not be considered true Semantic search engines. Simply they have no where to look because there is no sufficient semantic information, and even less in order to consider general purpose search engines.
Towards a trueSemantis search
Cambios en el Sistema
Departamento de Inteligencia Artificial
Introduction
26 de Junio de 2009
Towards a true Semantic Web
Towards a true semantic search
Conclusions
The problem
¿ What features should incorporate a semantic search engine?
1.- Lexicographic analysis
Towards a trueSemantis search
2.- Context selection
3.- OWL-DL Inferential Motor
4.- The capacity to respond
Cambios en el Sistema
Departamento de Inteligencia Artificial
Introduction
26 de Junio de 2009
Towards a true Semantic Web
Towards a true semantic search
Conclusions
The problem
Vissem
Towards a trueSemantis search
Cambios en el Sistema
Departamento de Inteligencia Artificial
Introduction
26 de Junio de 2009
Towards a true Semantic Web
Towards a true semantic search
Conclusions
The problem
Conclusions
Conclusions
We need to develop semantic web sites
Our procedure to transform Web to Semantic Web it is general purpose
Compatibility with the current Web
flexibility to obtain different interpretations according to different ontologies
Although in our prototype and examples, for simplification, we have tested it in static content according to a few ontologies and the web sites that we have transformed have been on vertebrates in order to obtain sufficient instances for Vissem. It should also be explained that the web sites used were downloaded from the Internet and were not altered, although they were chosen for their simple language, since the tool is clearly restricted by the NLP module and access to existing ontologies.
Cambios en el Sistema
Departamento de Inteligencia Artificial
Introduction
26 de Junio de 2009
Towards a true Semantic Web
Towards a true semantic search
Conclusions
The problem
Conclusions
Conclusions
Cambios en el Sistema
Departamento de Inteligencia Artificial
Introduction
26 de Junio de 2009
Towards a true Semantic Web
Towards a true semantic search
Conclusions
The problem
• Thank you very much for your attention
Questions?
Cambios en el Sistema
Departamento de Inteligencia Artificial
Introduction
26 de Junio de 2009
Towards a true Semantic Web
Towards a true semantic search
Conclusions
The problem
Bibliografía
[Bergman; 2001]Michael K. Bergman Obra original: The Deep Web: Surfacing Hidden Value.vol. 7, no. 1. Journal of Electronic Publishing, 2001. Disponible en Web: http://quod.lib.umich.edu/cgi/t/text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0007.104, http://hdl.handle.net/2027/spo.3336451.0007.104,
[O'Neill et al; 2003]Edward T. O'Neill, Brian F. Lavoie, Rick Bennett.Obra original: Trends in the Evolution of the Public Web.Volume 9 Number 4. Web: D-Lib Magazine, 2003. ISBN: ISSN 1082-9873. Disponible en Web: http://www.dlib.org/dlib/april03/lavoie/04lavoie.html,
[Berners-Lee 1; 1998]Tim Berners-Lee 1, Semantic Web Road map. 1998. Disponible en Web: http://www.w3.org/DesignIssues/Semantic.html
[Criado Fernández; 2009] Luis Criado Fernández, tesis doctoral, PROCEDIMIENTO SEMI-AUTOMÁTICO PARA TRANSFORMAR LA WEB EN WEB SEMÁNTICA; Rafael Martínez Tomás (director). Madrid: Universidad Nacional de Educación a Distancia. Escuela Técnica Superior de Ingeniería Informática, 2009.
Cambios en el Sistema
Departamento de Inteligencia Artificial
Introduction
26 de Junio de 2009
Towards a true Semantic Web
Towards a true semantic search
Conclusions
The problem
Bibliografía
[Smith et al; 2004]Michael K. Smith, Chris Welty and Deborah L. McGuinness, OWL Web Ontology Language Guide. on line: W3C, 2004. Disponible en Web: http://www.w3.org/TR/owl-guide/
[Prud'hommeaux et al; 2006]Eric Prud'hommeaux, Andy Seaborne , SPARQL Query Language for RDF. on line: W3C, 2006. Disponible en Web: http://www.w3.org/TR/rdf-sparql-query/