How to get your data into Sindice and Google with sitemap4rdf
-
Upload
richard-cyganiak -
Category
Technology
-
view
105 -
download
2
description
Transcript of How to get your data into Sindice and Google with sitemap4rdf
Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
How to get your data into Sindice and Google with
sitemap4rdfBoris Villazón-Terrazas (OEG), Richard Cyganiak (DERI)
Digital Enterprise Research Institute www.deri.ie
Publishing Linked Data
from a triple store
Digital Enterprise Research Institute www.deri.ie
Linked Data frontends for triple stores
Source: Pubby website, http://www4.wiwiss.fu-berlin.de/pubby/
Digital Enterprise Research Institute www.deri.ie
Search engines
Digital Enterprise Research Institute www.deri.ie
Sindice: the best RDF search engine
Digital Enterprise Research Institute www.deri.ie
Sindice: the best RDF search engine
120M+ documents Continuously updating since 2006 Low-latency search API RDF/XML, Turtle, RDFa, microformats
Digital Enterprise Research Institute www.deri.ie
The Sitemap protocol
Digital Enterprise Research Institute www.deri.ie
Sitemap Protocol
Used by web crawlers Efficiently find all your content &
discover what has been updated
http://sitemaps.org/
Digital Enterprise Research Institute www.deri.ie
Sitemap Protocol: Simple example
<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://yoursite/</loc> </url> <url> <loc>http://yoursite/products/53546</loc> </url> <url> <loc>http://yoursite/products/98421</loc> </url> <url> <loc>http://yoursite/products/41003</loc> </url></urlset>
Digital Enterprise Research Institute www.deri.ie
Sitemap Protocol: Optional parts
<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://yoursite/</loc> <lastmod>2010-06-24</lastmod> <changefreq>daily</changefreq> </url></urlset>
Digital Enterprise Research Institute www.deri.ie
Sitemap Protocol: Huge sitemaps
Gzip-compress your sitemap Limit: 50k URLs or 10MB
split into multiple sitemap filesadd a sitemap index file
Digital Enterprise Research Institute www.deri.ie
Sitemap Protocol: Discovery
Publish the sitemap file Add a line to http://yoursite/robots.txt
Sitemap: http://yoursite/sitemap.xml
Digital Enterprise Research Institute www.deri.ie
sitemap4rdfGenerate Sitemap files from a SPARQL endpoint
Digital Enterprise Research Institute www.deri.ie
sitemap4rdf
Simple command line tool Sends a SPARQL query to list all URIs Generates sitemap
sitemap4rdf http://yoursite/sparql http://yoursite/resource/
Digital Enterprise Research Institute www.deri.ie
Submit the sitemap location - Sindice
http://sindice.com/main/submit
Digital Enterprise Research Institute www.deri.ie
Submit the sitemap location - Google
https://www.google.com/webmasters/tools/
Digital Enterprise Research Institute www.deri.ie
Summary
Sitemap protocol informs search engines about available pages Supported by Sindice!
sitemap4rdf generates Sitemap files by listing URIs in a SPARQL endpoint Open source, Java http://lab.linkeddata.deri.ie/2010/sitemap4rdf/