The problem of constructing general-purpose semantic search engines

17
Cambios en el Sistema Departamento de Inteligencia Artificial Introduction 26 de Junio de 2009 Towards a true Semantic Web Towards a true semantic search Conclusions The problem The problem of constructing general- purpose semantic search engines Rafael Martínez Tomás and Luis Criado Fernández

Transcript of The problem of constructing general-purpose semantic search engines

Page 1: The problem of constructing general-purpose semantic search engines

Cambios en el Sistema

Departamento de Inteligencia Artificial  

Introduction

26 de Junio de 2009

Towards a true Semantic Web

Towards a true semantic search

Conclusions

The problem

The problem of constructing general-purpose semantic search engines

Rafael Martínez Tomás and Luis Criado Fernández

Page 2: The problem of constructing general-purpose semantic search engines

Cambios en el Sistema

Departamento de Inteligencia Artificial  

Introduction

26 de Junio de 2009

Towards a true Semantic Web

Towards a true semantic search

Conclusions

The problem

Introduction

Hakia, Powerset and True Knowledge

In September 1998, Tim Berner-Lee published “Semantic Web Road Map” and “What the Semantic Web can represent” [Berners-Lee 1; 1998].

The first version of OWL appeared in 2004 [Smith et al; 2004].

SPARQL in 2006 [Prud'hommeaux et al; 2006].

Therefore, since 2006, we can begin the Semantic WEB construction.

Approximately since 2007

But are these Semantic Web Search engines better?

Do they work? Are the better than google and yahoo?

The purpose of this presentation is to reply to these questions.

Page 3: The problem of constructing general-purpose semantic search engines

Cambios en el Sistema

Departamento de Inteligencia Artificial  

Introduction

26 de Junio de 2009

Towards a true Semantic Web

Towards a true semantic search

Conclusions

The problem

Introduction

According to Netcraft [1], there were 216 million web sites in February 2009

[1] http://news.netcraft.com/

Page 4: The problem of constructing general-purpose semantic search engines

Cambios en el Sistema

Departamento de Inteligencia Artificial  

Introduction

26 de Junio de 2009

Towards a true Semantic Web

Towards a true semantic search

Conclusions

The problem

Berman conducted a study in which he estimated that the volume of web sites was around 4·109.

static web pages.

This is equivalent in the worst case to 14 million books [Bergman; 2001].

Taking into account, Netcraft´s chart and Bergman´s work, it is deducted that approximately:

635*10correspond to94*10

Considering the same ratio, in February 2009, the estimated static web pages could be:

924,6*10

It has been calculated that the deep web, the web that is generated dynamically form accessing the Data Base content, may have information several hundred times more than the static web, and it is growing at an even greater rate [O O'Neill et al, 2003].

Equivalent to more than 86,4 million books (worst case).

Introduction

static web pages.

Page 5: The problem of constructing general-purpose semantic search engines

Cambios en el Sistema

Departamento de Inteligencia Artificial  

Introduction

26 de Junio de 2009

Towards a true Semantic Web

Towards a true semantic search

Conclusions

The problemThe problem

¿What is the actual Web problem?

The volume that already has and its growth expectation.

¿Why?

Every day is more difficult to find accurate information and with quality. Consequently, the user has to devote more time to filter the links provided by search engines.

Moreover, there is much information on the web that is never found. There is more data on the Web than what the current search engines tell us.

Page 6: The problem of constructing general-purpose semantic search engines

Cambios en el Sistema

Departamento de Inteligencia Artificial  

Introduction

26 de Junio de 2009

Towards a true Semantic Web

Towards a true semantic search

Conclusions

The problem

¿How can we fix it?With the Semantic Web through use of ontologies and through the population of them.

The Semantic Web has to overcome two major difficulties in order to be implemented.

1st. Ontology learning.The aim is to make ontologies that represent the knowledge of the entire Web. Although there are several proposals such as [Gómez-Pérez et al, 2003], [Valencia Garcia, 2005], [Cimiano et al, 2006], and work on projects that develop this area, such as Neon, SEKT, Dot.Kom, X-Media, Abraxas , ect. , still there is not enough strength for the industry to absorb.

2nd. Ontology population.In this research area the ontologies are already available. Therefore, what we do is to fill them with instances. The aim is to carry out semantic annotations of all web sites according to the use of ontologies. As in “Ontology learning”, current research is directed toward automatic and semi-automatic construction methods. However, actually we are very far from addressing the transformation of the current Web.

The problem

Page 7: The problem of constructing general-purpose semantic search engines

Cambios en el Sistema

Departamento de Inteligencia Artificial  

Introduction

26 de Junio de 2009

Towards a true Semantic Web

Towards a true semantic search

Conclusions

The problem

Anotación semántica

Manualsemi- automatizada

es táticos(no aprenden) dinámicos

(aprendizaje)

Des cubrimiento

Inducción

E M B E B I D O

N O E M B E B I D O

Probabilís t ico

Reglas

es tático - mix to

E M B E B I D O

N O E M B E B I D O

E M B E B I D O

N O E M B E B I D O

E M B E B I D O

N O E M B E B I D O

E M B E B I D O

N O E M B E B I D O

Annotation tools

Still these are tools for

investigation. In other words,

none can be used in real

conditions o environment.

The problem

Page 8: The problem of constructing general-purpose semantic search engines

Cambios en el Sistema

Departamento de Inteligencia Artificial  

Introduction

26 de Junio de 2009

Towards a true Semantic Web

Towards a true semantic search

Conclusions

The problem

Towards a trueSemantis Web

Compatibility and external annotation

External annotation has the advantage that it will only be used by applications designed to interpret it, thereby avoiding unwanted consequences of any kind for current applications like search engines.

Page 9: The problem of constructing general-purpose semantic search engines

Cambios en el Sistema

Departamento de Inteligencia Artificial  

Introduction

26 de Junio de 2009

Towards a true Semantic Web

Towards a true semantic search

Conclusions

The problem

Towards a trueSemantis Web

Automating the external annotation (sw2sws)

E X T R A C C I Ó N yA N Á L I SI SM O R F O SI N T Á C T I C O

< < C lie n t > >< < C lie n t > >

A N O T A C I Ó N ( e na de la n t eI N T E R P E T A C I Ó N )

SE L E C C I O N D EO N T O L O GI A S

( e n a de la n t eI D E N T I F I C A C I Ó N )

... <p align="left">A estos animales les gusta <strong>estar en compa&ntilde;&iacute;a </strong>por lo que, si se decide adquirir uno, conviene ubicarle en la sala de estar o el sal&oacute;n de la casa. Adem&aacute;s, este p&aacute;jaro puede estar suelto, volando libremente por el hogar. Por &uacute;ltimo, hay que se&ntilde;alar que el periquito es un ave que se encuentra m&aacute;s c&oacute;moda viviendo <strong>en</strong> <strong>pareja</strong> y que es capaz de pronunciar algunas <a href="Wcacadbeca16af.htm" class="txt-facilisimo11"><u>palabras</u></a>.</p> ....

... <oracion sujeto="periquito" verbo="es" CD="ave" CI=" " origen="hay que se&ntilde;alar que el periquito es un ave que se encuentra m&aacute"/> ....

<rdf:RDF xmlns:j.0="http://www.criado.info/owl/vertebrados_es.owl#" xmlns:protege="http://protege.stanford.edu/plugins/owl/protege#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns="http://www.criado.info/websejemplo/mascotasyhogar/mascotas/aves/Wcac51f4ad0f89_3F972AA91EEE3451B64A67FCDA136ACB.owl#" xml:base="http://www.criado.info/websejemplo/mascotasyhogar/mascotas/aves/Wcac51f4ad0f89_3F972AA91EEE3451B64A67FCDA136ACB.owl"> <owl:Ontology rdf:about=""> <owl:imports rdf:resource="http://www.criado.info/owl/vertebrados_es.owl#"/> </owl:Ontology> <j.0:aves rdf:ID="PERIQUITO"/> <owl:AllDifferent> <owl:distinctMembers rdf:parseType="Collection"> <aves rdf:about="#PERIQUITO"/> </owl:distinctMembers> </owl:AllDifferent> </rdf:RDF>

Page 10: The problem of constructing general-purpose semantic search engines

Cambios en el Sistema

Departamento de Inteligencia Artificial  

Introduction

26 de Junio de 2009

Towards a true Semantic Web

Towards a true semantic search

Conclusions

The problem¿What is a Semantic Search Engine?

It is a search engine that allows to exploit the technology of the Semantic Web. Therefore, it incorporates a degree of understanding on the questions that the user makes.

A Semantic search engine first answers the question, and then provides links to justify its answer. As a result, there two fundamental elements that is needs:

• Semantic information content, for example in OWL (ontologies and instances).

• PLN.

¿Do Hakia, Powerset and True Knowledge work like Semantic Search Engines? ¿Are they better than google or yahoo?

Taking into account this concept of semantic search engine, Haki, Powerset and True Knowledge can not be considered true Semantic search engines. Simply they have no where to look because there is no sufficient semantic information, and even less in order to consider general purpose search engines.

Towards a trueSemantis search

Page 11: The problem of constructing general-purpose semantic search engines

Cambios en el Sistema

Departamento de Inteligencia Artificial  

Introduction

26 de Junio de 2009

Towards a true Semantic Web

Towards a true semantic search

Conclusions

The problem

¿ What features should incorporate a semantic search engine?

1.- Lexicographic analysis

Towards a trueSemantis search

2.- Context selection

3.- OWL-DL Inferential Motor

4.- The capacity to respond

Page 12: The problem of constructing general-purpose semantic search engines

Cambios en el Sistema

Departamento de Inteligencia Artificial  

Introduction

26 de Junio de 2009

Towards a true Semantic Web

Towards a true semantic search

Conclusions

The problem

Vissem

Towards a trueSemantis search

Page 13: The problem of constructing general-purpose semantic search engines

Cambios en el Sistema

Departamento de Inteligencia Artificial  

Introduction

26 de Junio de 2009

Towards a true Semantic Web

Towards a true semantic search

Conclusions

The problem

Conclusions

Conclusions

We need to develop semantic web sites

Our procedure to transform Web to Semantic Web it is general purpose

Compatibility with the current Web

flexibility to obtain different interpretations according to different ontologies

Although in our prototype and examples, for simplification, we have tested it in static content according to a few ontologies and the web sites that we have transformed have been on vertebrates in order to obtain sufficient instances for Vissem. It should also be explained that the web sites used were downloaded from the Internet and were not altered, although they were chosen for their simple language, since the tool is clearly restricted by the NLP module and access to existing ontologies.

Page 14: The problem of constructing general-purpose semantic search engines

Cambios en el Sistema

Departamento de Inteligencia Artificial  

Introduction

26 de Junio de 2009

Towards a true Semantic Web

Towards a true semantic search

Conclusions

The problem

Conclusions

Conclusions

Page 15: The problem of constructing general-purpose semantic search engines

Cambios en el Sistema

Departamento de Inteligencia Artificial  

Introduction

26 de Junio de 2009

Towards a true Semantic Web

Towards a true semantic search

Conclusions

The problem

• Thank you very much for your attention

Questions?

Page 16: The problem of constructing general-purpose semantic search engines

Cambios en el Sistema

Departamento de Inteligencia Artificial  

Introduction

26 de Junio de 2009

Towards a true Semantic Web

Towards a true semantic search

Conclusions

The problem

Bibliografía

[Bergman; 2001]Michael K. Bergman Obra original: The Deep Web: Surfacing Hidden Value.vol. 7, no. 1. Journal of Electronic Publishing, 2001. Disponible en Web: http://quod.lib.umich.edu/cgi/t/text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0007.104, http://hdl.handle.net/2027/spo.3336451.0007.104,

[O'Neill et al; 2003]Edward T. O'Neill, Brian F. Lavoie, Rick Bennett.Obra original: Trends in the Evolution of the Public Web.Volume 9 Number 4. Web: D-Lib Magazine, 2003. ISBN: ISSN 1082-9873. Disponible en Web: http://www.dlib.org/dlib/april03/lavoie/04lavoie.html,

[Berners-Lee 1; 1998]Tim Berners-Lee 1, Semantic Web Road map. 1998. Disponible en Web: http://www.w3.org/DesignIssues/Semantic.html

[Criado Fernández; 2009] Luis Criado Fernández, tesis doctoral, PROCEDIMIENTO SEMI-AUTOMÁTICO PARA TRANSFORMAR LA WEB EN WEB SEMÁNTICA; Rafael Martínez Tomás (director). Madrid: Universidad Nacional de Educación a Distancia. Escuela Técnica Superior de Ingeniería Informática, 2009.

Page 17: The problem of constructing general-purpose semantic search engines

Cambios en el Sistema

Departamento de Inteligencia Artificial  

Introduction

26 de Junio de 2009

Towards a true Semantic Web

Towards a true semantic search

Conclusions

The problem

Bibliografía

[Smith et al; 2004]Michael K. Smith, Chris Welty and Deborah L. McGuinness, OWL Web Ontology Language Guide. on line: W3C, 2004. Disponible en Web: http://www.w3.org/TR/owl-guide/

[Prud'hommeaux et al; 2006]Eric Prud'hommeaux, Andy Seaborne , SPARQL Query Language for RDF. on line: W3C, 2006. Disponible en Web: http://www.w3.org/TR/rdf-sparql-query/