How to get your data into Sindice and Google with sitemap4rdf

17
Copyright 2009 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.i e How to get your data into Sindice and Google with sitemap4rdf Boris Villazón-Terrazas (OEG), Richard Cyganiak (DERI)

description

 

Transcript of How to get your data into Sindice and Google with sitemap4rdf

Page 1: How to get your data into Sindice and Google with sitemap4rdf

Copyright 2009 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

How to get your data into Sindice and Google with

sitemap4rdfBoris Villazón-Terrazas (OEG), Richard Cyganiak (DERI)

Page 2: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute www.deri.ie

Publishing Linked Data

from a triple store

Page 3: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute www.deri.ie

Linked Data frontends for triple stores

Source: Pubby website, http://www4.wiwiss.fu-berlin.de/pubby/

Page 4: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute www.deri.ie

Search engines

Page 5: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute www.deri.ie

Sindice: the best RDF search engine

Page 6: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute www.deri.ie

Sindice: the best RDF search engine

120M+ documents Continuously updating since 2006 Low-latency search API RDF/XML, Turtle, RDFa, microformats

Page 7: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute www.deri.ie

The Sitemap protocol

Page 8: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute www.deri.ie

Sitemap Protocol

Used by web crawlers Efficiently find all your content &

discover what has been updated

http://sitemaps.org/

Page 9: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute www.deri.ie

Sitemap Protocol: Simple example

<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://yoursite/</loc> </url> <url> <loc>http://yoursite/products/53546</loc> </url> <url> <loc>http://yoursite/products/98421</loc> </url> <url> <loc>http://yoursite/products/41003</loc> </url></urlset>

Page 10: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute www.deri.ie

Sitemap Protocol: Optional parts

<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://yoursite/</loc> <lastmod>2010-06-24</lastmod> <changefreq>daily</changefreq> </url></urlset>

Page 11: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute www.deri.ie

Sitemap Protocol: Huge sitemaps

Gzip-compress your sitemap Limit: 50k URLs or 10MB

split into multiple sitemap filesadd a sitemap index file

Page 12: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute www.deri.ie

Sitemap Protocol: Discovery

Publish the sitemap file Add a line to http://yoursite/robots.txt

Sitemap: http://yoursite/sitemap.xml

Page 13: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute www.deri.ie

sitemap4rdfGenerate Sitemap files from a SPARQL endpoint

Page 14: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute www.deri.ie

sitemap4rdf

Simple command line tool Sends a SPARQL query to list all URIs Generates sitemap

sitemap4rdf http://yoursite/sparql http://yoursite/resource/

Page 15: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute www.deri.ie

Submit the sitemap location - Sindice

http://sindice.com/main/submit

Page 16: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute www.deri.ie

Submit the sitemap location - Google

https://www.google.com/webmasters/tools/

Page 17: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute www.deri.ie

Summary

Sitemap protocol informs search engines about available pages Supported by Sindice!

sitemap4rdf generates Sitemap files by listing URIs in a SPARQL endpoint Open source, Java http://lab.linkeddata.deri.ie/2010/sitemap4rdf/